Re: [I18nrp] Mappings for IDNA2008 ?

"Patrik Fältström " <> Fri, 08 February 2019 20:06 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4262F130F9A for <>; Fri, 8 Feb 2019 12:06:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.721
X-Spam-Status: No, score=-1.721 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FROM_EXCESS_BASE64=0.979, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Pd4tpu1HDik0 for <>; Fri, 8 Feb 2019 12:06:30 -0800 (PST)
Received: from ( [IPv6:2a02:80:3ffe::176]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 24C15130F8E for <>; Fri, 8 Feb 2019 12:06:30 -0800 (PST)
Received: from [] (unknown [IPv6:2a02:80:3ffc:0:932:a0b9:61be:d623]) by (Postfix) with ESMTPSA id 1B15B26EEC; Fri, 8 Feb 2019 21:06:27 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;; s=mail; t=1549656387; bh=znnbcBRZsn37z7CDhyKVNf8LtwmEfgD+e3fvqKeVqHA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eNMOSACPgpgOwMXa6Knb62NqiEjFpoN6cc2EsGV3yycOPLFR8ctaonCbkg/XMfRFW ZWA3bJfGhzKszj6L55/jjo1OU+J+VIOHDI2x6uma9mwwZnrw+H1Mzodt8nzetETej4 6bpOa1CguL9kQP10LhZ1pxRBoERngwrlEMo0wDpQ=
From: "Patrik =?utf-8?b?RsOkbHRzdHLDtm0=?=" <>
To: "Andrew Sullivan" <>
Date: Fri, 08 Feb 2019 21:06:25 +0100
X-Mailer: MailMate (1.12.4r5597)
Message-ID: <>
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <> <>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_E6548F9B-A0A2-4C72-800B-46787ED5E28C_="; micalg=pgp-sha1; protocol="application/pgp-signature"
Archived-At: <>
Subject: Re: [I18nrp] Mappings for IDNA2008 ?
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 08 Feb 2019 20:06:32 -0000

On 8 Feb 2019, at 20:54, Andrew Sullivan wrote:

> Yes.  And the way to mess with you is to (e.g.) send you something that has various foldings or mappings in _your_ locale that are not the same as the various foldings and mappings in the locale with which the string was generated.  These are blessedly rare, but they're not "never", which is John's point upthread about working irregularly vs working.  This gets even worse when the mappings vary according to application or whatever, which is _also_ a potential feature of IDNA.


> UTS#46 doesn't do anything to help this _either_ because it is no more able to know the locale-at-identifier-generation time than anything else is, because that information is simply not carried around with the identifier.
> This problem is, I should note, well understood among network protocol people, who spent a lot of time talking about it during PRECIS.  But it keeps coming back up because people keep forgetting that these protocols are locale-free even if you think you know the locale at lookup time.

And this is why I am stronger and stronger a believer that we should in protocols and what not only allow strings of characters which really are allowed in the protocols.

It is the mix between "displayed string" and "allowed in protocol string" which is problematic, specifically as we seems to allow a larger set of characters (and combination of characters) in the "displayed string", EVEN IF we are really and intentionally exposing the protocol element (like a domain name) to the end user.

That is when things breaks down I claim.

And this is why the mapping, if any, must be done as close to when the string enters "the system" as possible, and after that stays as only the allowed characters in the allowed combinations in the protocol and because of that as display.

Real protocols (whatever that is -- but lets take email as an example) should separate display strings from protocol elements, and this is why I thing the deal with local part of email addresses is a so called Big Mistake. The protocol element is ascii only. The display part is different. But this is easy for me to say that uses latin characters. I do understand people using other scripts (specifically different directionality).

I.e. my email address is:

  "Patrik H:son Fältström" <>

The whole thing! First the display string, and then the protocol element.

A security issue of course, like HTML:

<A HREF="security-error">display</A>

But that is at the same time the best we can get...

/me thinks