Re: [I18ndir] [I18n-discuss] Fwd: Security consideration: math symbols in an exotic IP address format in a phishing mail

"Asmus Freytag (c)" <> Mon, 18 May 2020 06:17 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D2B553A0863; Sun, 17 May 2020 23:17:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.099
X-Spam-Status: No, score=-0.099 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key); domainkeys=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id G2gZD4pzAxCk; Sun, 17 May 2020 23:17:11 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 1144E3A085B; Sun, 17 May 2020 23:17:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dk12062016; t=1589782631; bh=b6ozKia0K/GwBBf39x4BxVgoroUCl2PBTrNi 9rjxcOw=; h=Received:Subject:To:Cc:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=kTAWEkPjGXMMVOIEJKTRMtsjMf/a/5fJM F0a1nDdeZXEHuWjjvREdLwy+AQM1n9d5r1HdTSxQkMDPvZoYgH3U9LyP9h18Z7sO6RY DhaXdG9N9f4oRBt0wNRnBBbG4hz0qlWV1ykM/X7VsbYRZSZ6HNWb7wxqGTULEz2isLK +p4Lu102SvZlxDGNfkMD2XQbG3Ee1oSgvwCuy9+z/PIWlBcIBkcwOPsbHDinjA09Kgi nmqHDVWN2EDkqwUNxfXsNhb29zL08QWm8Q9z0QAlFyvdj+KIVwLKQ5CBI6KVsRuVGZw GX6bLRiXtpOGojcLQAfQsYsKiu3MrhRHo47kmuWiw==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016;; b=L3KUIEkErzMK778rfshkG+KKyLSRaKbDN2AbdARJ2nok9Zh/Q3EKbpgaSf0/r8JaVyG8j5XjpDMANV+PS6PQ60o6Ia9s7Gzis/xfDRqqQBKcgEGx5+xTWGnrNVDHJkdZh72J0x+4NwjXY4fYOj0wE02brlxG7rZ83hM2CggvnWJ5aRVZWRulkgvp9wdDEbOVCEg2MwpKVn+6M/nelrv8fOXurVNWGLtE+Hxj+Ggne9SyK0ZRefp8wVcsnXvWbUbRraYW3xaE3gx2q8lpG7GWv6Uwyp3E/I4/6XglncreNirIKUpehAuVSzOCZI585sEu52Uofa0nVFftdTbuWq7HkA==; h=Received:Subject:To:Cc:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [] (helo=[]) by with esmtpa (Exim 4) (envelope-from <>) id 1jaZ5H-000DcO-Ki; Mon, 18 May 2020 02:17:08 -0400
To: John C Klensin <>
References: <20200517014230.329b11b5@spixxi> <> <2F2F0459414826E2B292C328@PSB>
From: "Asmus Freytag (c)" <>
Message-ID: <>
Date: Sun, 17 May 2020 23:17:06 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0
MIME-Version: 1.0
In-Reply-To: <2F2F0459414826E2B292C328@PSB>
Content-Type: multipart/alternative; boundary="------------984C4960A1C8DF2ECC53EC6C"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b26976a2cdabd2db7ae79a1825ad775b8d9ae8902fe4e33649350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Archived-At: <>
Subject: Re: [I18ndir] [I18n-discuss] Fwd: Security consideration: math symbols in an exotic IP address format in a phishing mail
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 18 May 2020 06:17:14 -0000


(You may have to forward this to some list(s) I am not on).

I note this as part of your reply:

    (5.1) A browser or other implementation that is interpreting a
    string of Unicode Mathematical Digits in a "host" field is, at
    best, performing a dangerous disservice to its users.  If anyone
    cares, it is flagrantly disregarding requirements of RFCs 3986
    and 3987.

I think what's happening here may be that at some early stage
an NF*K*C normalization is applied which would turn the
Mathematical Bold numbers into ASCII ones.

In other words, the part that interprets the string as host
address isn't necessarily aware that the original input was
Mathematical Bold digits from Unicode Plane 1.

Other than pointing out that I have nothing to add to your


On 5/17/2020 7:52 PM, John C Klensin wrote:
> Long note warning; summary hint: The note that Asmus forwarded
> (below) describes the use of a string of non-ASCII characters,
> specifically ones from the Unicode "Mathematic Digits" group, in
> URI contexts and their interpretation as an IP address.  That
> raises issues of ambiguity and security (including an attack
> vector).  Trying to explore the URI specifications in RFC 3986
> identifies further problems, including an incompatibility with
> other protocols that could lead to confusion and/or
> interoperability problems and a provision for allowing IP
> addresses other than IPv4 and IPv6 that lacks specifications for
> documentation, approval, or a registry.  My message concludes
> with some specific topics on which IETF considerations and
> actions are in order, some of which may have impact in the
> Internet Area as well as the ART one.
> --On Sunday, May 17, 2020 12:30 -0700 Asmus Freytag
> <> wrote:
>> FYI.
>> A./
>> -------- Forwarded Message --------
>> Subject: 	Security consideration: math symbols in an exotic IP
>> address format in a phishing mail
>> Date: 	Sun, 17 May 2020 01:43:17 +0200
>> From: 	Marius Spix via Unicode <>
>> Reply-To: 	Marius Spix <>
>> To:
>> Today I received an interesting phishing mail which had an URL
>> containing mathematical bold numbers. Interestingly the address
>> ppppppppppp was interpreted
>> as an octal number 05671360302, which is
>> another spelling for This worked for both
>> Firefox and
>> Chrome. I don't know why such an address is accepted in the
>> authority
>> part of a HTTPS URI of current browsers. Section 7.4 in RFC
>> 3986 states
>> that additional IP address formats can become a security
>> concern, but
>> it also says that literals should be converted to numeric form.
>> I wonder if this case should be added to UTR #36.
>> Regards
>> Marius
> Asmus,
> Adding the I18n directorate list since the status of
> i18n-discuss seems to be somewhat uncertain and the Internet
> Area ADs for reasons that will quickly be obvious  (the ART ADs
> are already on the directorate list).
> Whatever might reasonably be done in UTR #36, it seems to me the
> real problem(s)  here are in RFC 3986 (and maybe in browser
> practices, which would seem to be a W3C and/or WHATWG problem).
> As I dug into 3986 a bit, I found a can of worms.  In no
> particular order:
> (1) While often interpreted as normative (ordinarily the case
> for an Internet Standard), RFC 3986 is written as a descriptive
> document with some sections, notably include Section 7.4, being
> especially descriptive.  Indeed, while we have recently had to
> struggle in other contexts with the meaning of strongly
> normative language and references to RFCs 2119 and 8174 (BCP 14)
> in Informational documents, 3986 does not reference them at all.
> (2) To me at least, Section 7.4 seems to say "despite what is
> said in Section 3.2.2 about address ("Host") formats, some
> implementations (present, former, or imaginary) allow some very
> strange formats in things that might be IP addresses and
> implementations interpreting URIs ought to figure out some way
> to interpret them".  Turning an example from that text around,
> it seems to me that there has been no excuse for interpreting an
> IP address for an object (i.e., a "Host") in address Class terms
> since we adopted CIDR many years ago and some years before RFC
> 3986.   From a security standpoint, trying to interpret
> something strange (and that ought to be non-conforming) by
> translation into some guess or what might have been intended is
> not DWIM or an application of the robustness principle; it is
> just an invitation to "confuse the users and get them to do
> something stupid" attacks (of which phishing is merely a special
> case).
> (3) Given that most of Section 7.4 is about these strange
> formats and how to interpret them, I'm not sure how to interpret
> that last paragraph of that Section, which starts:
> 	"These additional IP address formats are not allowed in
> 	the URI syntax due to differences between platform
> 	implementations.  However, they can become a security
> 	concern if..."
> Now, if they are not allowed, then Firefox and Chrome, in
> accepting the string described below (by the time it reached me,
> I couldn't tell what characters were used in the original
> although the point is clear), and doing it without even a
> warning, are out of conformance with 3986.   Of course, there is
> another problem: if those characters are non-ASCII, even if they
> look like numbers and NFKC would turn them into ASCII numbers,
> we are at the IRI-URI boundary and I think an application is
> expected to convert the string to %-notation before trying to
> interpret it as an IP address (at which point the conversion
> would certainly fail).  In particular, there is no requirement
> in RFC 3987 (or, AFAICT, expectation - see Section 3.1, Step
> 1(c) of that document, which appears to me )to prohibit such
> numeric conversions and Section 5.3 is only about comparison)
> that NFKC be applied to a string before conversion to a URI.
> (4) Another issue is that, if Section 3.2.2 of 3986 is read
> carefully (including the "first-match-wins" principle, then
> (4.1) Since the string in question doesn't start with "[", it
> must be either an IPv4address or a reg-name.  But the syntax for
> IPv4address allows only ASCII digits with the layout of four
> <dec-octet>s separated by periods.  So it must be a reg-name and
> interpreting it as an address is just an error.  But it can't be
> a reg-name, at least in an HTTP/HTTPS URI or IRI because, for a
> URI, it would have to be either %-encoded or a dot-separated
> sequence of IDNA A-labels and, for an IRI, well, URI/IRI syntax
> aside, a TLD label (except an IDNA A-label, which a sequence of
> Mathematical Digits are clearly not) cannot even contain a
> digit, much less be entirely digits.
> (4.2) And, if it did start with a bracket, it is required to be
> either an IPv6address (defined by RFC 3513 or its successors,
> but 3513 is not a normative reference) or an IPvFuture string.
> The latter is denoted (if I read the ABNF correctly) by starting
> with a "v" followed by one or more hex digits as a version
> number followed by a period followed by other stuff.   That
> raises a couple of other issues.  First, the address literal for
> email (going back to RFC 821 and earlier) is a left bracket, an
> IPv4 address in dotted decimal notation, and a right bracket.
> RFC 2821 expanded what could appear between the brackets to
> allow IPv6 and other address forms (see below).  By requiring
> that IPv4 addresses be "bare" and that brackets can surround
> either IPv6 addresses or some future thing, 3986 creates an
> unnecessary (and maybe surprising and error or attack-prone)
> ambiguity (Martin, I haven't thought through how this would
> affect a MAILTO URL for a mail address like "user@[]"
> but you or someone else should probably think about that, noting
> that "user@" is flatly invalid for email on the public
> Internet).  Second, there is no approval procedure or registry
> established for those version number identification strings, so,
> presumably, if M. Mouse or one of his collaborators decided to
> launch HTTP/HTTPS or similar URIs for IPv8, nothing would
> prevent their simply squatting on either "v1.<stuff>" or
> "v8.<stuff>", competing with D. Duck's future addresses of the
> same type.  That is the reason why, after extended discussion
> with IANA and the Internet Area leadership at the time, RFC 2821
> structured address literals as
>     address-literal  = "[" ( IPv4-address-literal /
>                      IPv6-address-literal /
>                      General-address-literal ) "]"
>     IPv6-address-literal  = "IPv6:" IPv6-addr
>     General-address-literal  = Standardized-tag ":" 1*dcontent
> and required that the "Standardized-tag"s be established by
> Standards-track RFCs and registered with IANA.  Disallowing IPv4
> addresses in brackets, and using version numbers without
> documentation, approval, or IANA registry requirements is almost
> certainly an invitation for conflicts or attacks to come.
> (4.3) And, FWIW, there is no provision that I can find in 3986
> for interpreting a string of digits (ASCII or otherwise) without
> brackets and without coming in IPv4 dotted-decimal form as an
> address.  FWIW, RFC 821 did provide for such numeric strings,
> but they have a different introducer syntax, I assume precisely
> to avoid ambiguities.  So such a string is just another type of
> reg-name and, for URI schemes that expect domain names or
> addresses, just plain invalid.
> (5) Conclusions:
> (5.1) A browser or other implementation that is interpreting a
> string of Unicode Mathematical Digits in a "host" field is, at
> best, performing a dangerous disservice to its users.  If anyone
> cares, it is flagrantly disregarding requirements of RFCs 3986
> and 3987.
> (5.2) The partial incompatibilities between address literal
> syntax in Internet mail and RFC 3986 are an invitation to
> trouble (including security problems).
> (5.3) Discussions of Class-based addressing without comments
> about its appropriateness also seems questionable.
> (5.4) One of the things we have learned, repeatedly and often
> painfully, over the years is that putting syntax for open-ended
> extensibility into a protocol without establishing review,
> approval, documentation, and/or registration mechanisms for the
> extension identifiers leads, sooner or later, to serious
> problems.  It may be worth nothing that, if the URI address
> literal syntax established in RFC 3986 and the email address
> literal syntax established in RFC 2821 (and affirmed in 5321)
> were harmonized, the latter documents already specify
> documentation requirements, approval mechanisms, and an IANA
> registry.
> So, it seems to me that, independent of what this example of an
> attack might mean for UTR #36 (which, fwiw, RFC 3986 does not
> reference), the IETF has some housekeeping (or housecleaning) to
> do on RFC 3986, possibly 3987, and almost certainly how we
> handle IP addresses (IPv4, IPv6, and future (or localized)
> extensions) in applications and applications protocols,
> especially those that allow non-ASCII characters in the
> addresses.  Different forms and different registries for
> different applications or applications protocols (or cluster of
> them) seems to me to be definitely not in the best interests of
> the Internet.
> best,
>     john