Re: [I18ndir] [I18n-discuss] Fwd: Security consideration: math symbols in an exotic IP address format in a phishing mail

John C Klensin <> Mon, 18 May 2020 02:52 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8333D3A0848; Sun, 17 May 2020 19:52:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.003
X-Spam-Status: No, score=0.003 tagged_above=-999 required=5 tests=[SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id ri4uHtE_lFII; Sun, 17 May 2020 19:52:33 -0700 (PDT)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 3F0A83A0845; Sun, 17 May 2020 19:52:33 -0700 (PDT)
Received: from [] (helo=PSB) by with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <>) id 1jaVtE-000BAU-Qq; Sun, 17 May 2020 22:52:28 -0400
Date: Sun, 17 May 2020 22:52:22 -0400
From: John C Klensin <>
To: Asmus Freytag <>
Message-ID: <2F2F0459414826E2B292C328@PSB>
In-Reply-To: <>
References: <20200517014230.329b11b5@spixxi> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Archived-At: <>
Subject: Re: [I18ndir] [I18n-discuss] Fwd: Security consideration: math symbols in an exotic IP address format in a phishing mail
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 18 May 2020 02:52:38 -0000

Long note warning; summary hint: The note that Asmus forwarded
(below) describes the use of a string of non-ASCII characters,
specifically ones from the Unicode "Mathematic Digits" group, in
URI contexts and their interpretation as an IP address.  That
raises issues of ambiguity and security (including an attack
vector).  Trying to explore the URI specifications in RFC 3986
identifies further problems, including an incompatibility with
other protocols that could lead to confusion and/or
interoperability problems and a provision for allowing IP
addresses other than IPv4 and IPv6 that lacks specifications for
documentation, approval, or a registry.  My message concludes
with some specific topics on which IETF considerations and
actions are in order, some of which may have impact in the
Internet Area as well as the ART one.

--On Sunday, May 17, 2020 12:30 -0700 Asmus Freytag
<> wrote:

> FYI.
> A./
> -------- Forwarded Message --------
> Subject: 	Security consideration: math symbols in an exotic IP
> address format in a phishing mail
> Date: 	Sun, 17 May 2020 01:43:17 +0200
> From: 	Marius Spix via Unicode <>
> Reply-To: 	Marius Spix <>
> To:
> Today I received an interesting phishing mail which had an URL
> containing mathematical bold numbers. Interestingly the address
> ppppppppppp was interpreted
> as an octal number 05671360302, which is
> another spelling for This worked for both
> Firefox and
> Chrome. I don't know why such an address is accepted in the
> authority
> part of a HTTPS URI of current browsers. Section 7.4 in RFC
> 3986 states
> that additional IP address formats can become a security
> concern, but
> it also says that literals should be converted to numeric form.
> I wonder if this case should be added to UTR #36.
> Regards
> Marius


Adding the I18n directorate list since the status of
i18n-discuss seems to be somewhat uncertain and the Internet
Area ADs for reasons that will quickly be obvious  (the ART ADs
are already on the directorate list).

Whatever might reasonably be done in UTR #36, it seems to me the
real problem(s)  here are in RFC 3986 (and maybe in browser
practices, which would seem to be a W3C and/or WHATWG problem).
As I dug into 3986 a bit, I found a can of worms.  In no
particular order:

(1) While often interpreted as normative (ordinarily the case
for an Internet Standard), RFC 3986 is written as a descriptive
document with some sections, notably include Section 7.4, being
especially descriptive.  Indeed, while we have recently had to
struggle in other contexts with the meaning of strongly
normative language and references to RFCs 2119 and 8174 (BCP 14)
in Informational documents, 3986 does not reference them at all.

(2) To me at least, Section 7.4 seems to say "despite what is
said in Section 3.2.2 about address ("Host") formats, some
implementations (present, former, or imaginary) allow some very
strange formats in things that might be IP addresses and
implementations interpreting URIs ought to figure out some way
to interpret them".  Turning an example from that text around,
it seems to me that there has been no excuse for interpreting an
IP address for an object (i.e., a "Host") in address Class terms
since we adopted CIDR many years ago and some years before RFC
3986.   From a security standpoint, trying to interpret
something strange (and that ought to be non-conforming) by
translation into some guess or what might have been intended is
not DWIM or an application of the robustness principle; it is
just an invitation to "confuse the users and get them to do
something stupid" attacks (of which phishing is merely a special

(3) Given that most of Section 7.4 is about these strange
formats and how to interpret them, I'm not sure how to interpret
that last paragraph of that Section, which starts:

	"These additional IP address formats are not allowed in
	the URI syntax due to differences between platform
	implementations.  However, they can become a security
	concern if..." 

Now, if they are not allowed, then Firefox and Chrome, in
accepting the string described below (by the time it reached me,
I couldn't tell what characters were used in the original
although the point is clear), and doing it without even a
warning, are out of conformance with 3986.   Of course, there is
another problem: if those characters are non-ASCII, even if they
look like numbers and NFKC would turn them into ASCII numbers,
we are at the IRI-URI boundary and I think an application is
expected to convert the string to %-notation before trying to
interpret it as an IP address (at which point the conversion
would certainly fail).  In particular, there is no requirement
in RFC 3987 (or, AFAICT, expectation - see Section 3.1, Step
1(c) of that document, which appears to me )to prohibit such
numeric conversions and Section 5.3 is only about comparison)
that NFKC be applied to a string before conversion to a URI.  

(4) Another issue is that, if Section 3.2.2 of 3986 is read
carefully (including the "first-match-wins" principle, then 

(4.1) Since the string in question doesn't start with "[", it
must be either an IPv4address or a reg-name.  But the syntax for
IPv4address allows only ASCII digits with the layout of four
<dec-octet>s separated by periods.  So it must be a reg-name and
interpreting it as an address is just an error.  But it can't be
a reg-name, at least in an HTTP/HTTPS URI or IRI because, for a
URI, it would have to be either %-encoded or a dot-separated
sequence of IDNA A-labels and, for an IRI, well, URI/IRI syntax
aside, a TLD label (except an IDNA A-label, which a sequence of
Mathematical Digits are clearly not) cannot even contain a
digit, much less be entirely digits.

(4.2) And, if it did start with a bracket, it is required to be
either an IPv6address (defined by RFC 3513 or its successors,
but 3513 is not a normative reference) or an IPvFuture string.
The latter is denoted (if I read the ABNF correctly) by starting
with a "v" followed by one or more hex digits as a version
number followed by a period followed by other stuff.   That
raises a couple of other issues.  First, the address literal for
email (going back to RFC 821 and earlier) is a left bracket, an
IPv4 address in dotted decimal notation, and a right bracket.
RFC 2821 expanded what could appear between the brackets to
allow IPv6 and other address forms (see below).  By requiring
that IPv4 addresses be "bare" and that brackets can surround
either IPv6 addresses or some future thing, 3986 creates an
unnecessary (and maybe surprising and error or attack-prone)
ambiguity (Martin, I haven't thought through how this would
affect a MAILTO URL for a mail address like "user@[]"
but you or someone else should probably think about that, noting
that "user@" is flatly invalid for email on the public
Internet).  Second, there is no approval procedure or registry
established for those version number identification strings, so,
presumably, if M. Mouse or one of his collaborators decided to
launch HTTP/HTTPS or similar URIs for IPv8, nothing would
prevent their simply squatting on either "v1.<stuff>" or
"v8.<stuff>", competing with D. Duck's future addresses of the
same type.  That is the reason why, after extended discussion
with IANA and the Internet Area leadership at the time, RFC 2821
structured address literals as

   address-literal  = "[" ( IPv4-address-literal /
                    IPv6-address-literal /
                    General-address-literal ) "]"

   IPv6-address-literal  = "IPv6:" IPv6-addr

   General-address-literal  = Standardized-tag ":" 1*dcontent

and required that the "Standardized-tag"s be established by
Standards-track RFCs and registered with IANA.  Disallowing IPv4
addresses in brackets, and using version numbers without
documentation, approval, or IANA registry requirements is almost
certainly an invitation for conflicts or attacks to come.

(4.3) And, FWIW, there is no provision that I can find in 3986
for interpreting a string of digits (ASCII or otherwise) without
brackets and without coming in IPv4 dotted-decimal form as an
address.  FWIW, RFC 821 did provide for such numeric strings,
but they have a different introducer syntax, I assume precisely
to avoid ambiguities.  So such a string is just another type of
reg-name and, for URI schemes that expect domain names or
addresses, just plain invalid.

(5) Conclusions:

(5.1) A browser or other implementation that is interpreting a
string of Unicode Mathematical Digits in a "host" field is, at
best, performing a dangerous disservice to its users.  If anyone
cares, it is flagrantly disregarding requirements of RFCs 3986
and 3987.

(5.2) The partial incompatibilities between address literal
syntax in Internet mail and RFC 3986 are an invitation to
trouble (including security problems).

(5.3) Discussions of Class-based addressing without comments
about its appropriateness also seems questionable.

(5.4) One of the things we have learned, repeatedly and often
painfully, over the years is that putting syntax for open-ended
extensibility into a protocol without establishing review,
approval, documentation, and/or registration mechanisms for the
extension identifiers leads, sooner or later, to serious
problems.  It may be worth nothing that, if the URI address
literal syntax established in RFC 3986 and the email address
literal syntax established in RFC 2821 (and affirmed in 5321)
were harmonized, the latter documents already specify
documentation requirements, approval mechanisms, and an IANA

So, it seems to me that, independent of what this example of an
attack might mean for UTR #36 (which, fwiw, RFC 3986 does not
reference), the IETF has some housekeeping (or housecleaning) to
do on RFC 3986, possibly 3987, and almost certainly how we
handle IP addresses (IPv4, IPv6, and future (or localized)
extensions) in applications and applications protocols,
especially those that allow non-ASCII characters in the
addresses.  Different forms and different registries for
different applications or applications protocols (or cluster of
them) seems to me to be definitely not in the best interests of
the Internet.