Re: [Idna-update] [I18nrp] Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC

"Asmus Freytag (c)" <> Fri, 07 December 2018 08:41 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 27E5712875B; Fri, 7 Dec 2018 00:41:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key); domainkeys=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id KRpn5LUpr1_U; Fri, 7 Dec 2018 00:41:46 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 17E29128766; Fri, 7 Dec 2018 00:41:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dk12062016; t=1544172106; bh=pV67Q3G67DqRvI+ElkqlGGfA2tQE98Mwe0gh rTMvleE=; h=Received:Subject:To:Cc:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=g+mSmuwVGlY4eLkUmiLpuh/dcebBN+mof iKMN7HX2QdLEF8t7f0X4WpumFSSCcqP1ZoNx62N182jJzjB6pECHE5SqISRoAWF830I LdN2RIEau9eSQsh2VADvbEWi0sxLzvBkrvydO4g53aqAuHlZWZ9Wm+wC7mrWNa5UrqQ keUQQJlYIbNknTA+mHDZnUFkG+/Yjkpkh+E7PvttqqGLiuu2tKOwlgC7xnAU0QfUaHH hk1D4dVoEd8bXqb9hivjVtBCjEDOheo3uT32CGVL5lHoDMtIHLVInEjbpRizie2i+HW nelcycZOa9xvV7xMkzpBP3pDJC099PdYUpIEMkzlg==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016;; b=YgCPjfh/bTefKjth/+Xo0m4P5UWlUndEPwQeN6Qxf1gcBnAOP/maNAtyh9BaqAYta183OrFWawFevQ8difcVXtdLT2o848S1KKVIsRMLpuw1Y0uTUJIDTrL0SRs9x+SANG2W5dmUkyHHRmvay7esYyhe0PDC6pR7GamozRLmQENfcDcv5bTQrDqsqVYiR6rLuOLgyaM9Qv04ACBiPA0WgNqDtbSkGzh9KuoFHjF0q6jhqfFr3wReojnSoRuFwNTilLwroSFsB0F0SKCOSrZQ4bZWBKnzOgjFByus5p+MVjksuoybOzhEXz4ymdOkkXb4aeeyk6KLGIDgksuRg6cHjQ==; h=Received:Subject:To:Cc:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [] (helo=[]) by with esmtpa (Exim 4) (envelope-from <>) id 1gVBhg-0001hQ-D1; Fri, 07 Dec 2018 03:41:44 -0500
To: John C Klensin <>, Larry Masinter <>, =?UTF-8?B?J1BhdHJpayBGw6RsdHN0csO2bSc=?= <>
Cc:, 'Paul Hoffman' <>,
References: <> <> <> <> <> <> <> <> <> <> <055301d48dc8$0ea95120$2bfbf360$> <D9A581E7AFAB4DD89232F106@PSB>
From: "Asmus Freytag (c)" <>
Message-ID: <>
Date: Fri, 7 Dec 2018 00:41:48 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.2
MIME-Version: 1.0
In-Reply-To: <D9A581E7AFAB4DD89232F106@PSB>
Content-Type: multipart/alternative; boundary="------------E7EBC23B236005A12CC57FDF"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b28d93432b0f0788b9f47340de9550aa4afcc9f05cefcfe93c350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Archived-At: <>
Subject: Re: [Idna-update] [I18nrp] Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\) implementation and update discussions" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 07 Dec 2018 08:41:48 -0000

On 12/6/2018 9:39 PM, John C Klensin wrote:
> Part of the problem with the IDNA RFCs is that the requirement
> for registry conservatism and responsibility is too scattered
> through several documents to come across as important as I think
> we all believe it is.   draft-klensin-idna-rfc5891bis (now
> expired, but waiting in the wings) was intended to address that
> issue.  I haven't looked at the text for a while but strongly
> suspect it does not identify and stress labels being
> "re-enterable" and "retypable" as you suggest.   It should.  If
> you don't mind taking a look at it, I'd appreciate recommended
> text.   Or, if that isn't convenient, I'll try to remember to
> say something when I next open the document.

To "typable", you might add "processable" and "renderable", not to 
mention "human-readable". (Where multiple languages/scripts are 
supported, "typable" should obviously mean"typable by a native user of 
that language/script").

Allowing certain specialized or historical forms that are not part of 
the modern orthographic repertoire would allow characters that are 
unlikely to be supported by keyboards (or even fonts). (Some such 
characters are effectively not recognized by modern readers, or simply 
mistaken for more familiar characters).

Allowing "do not use" sequences renders labels not processable by 
processes that do expect data to not contain effectively deprecated 
content.  (They are also structurally unsound, see next).

Allowing structurally unsound sequences for any of the fifteen or so 
complex scripts risks data not being renderable, as neither layout 
systems or fonts are (universally) expecting to deal with them (or do it 
differently in unpredictable ways). T

Allowing arbitrary sequences can lead to strings that human readers are 
not prepared to recognize because they violate deep-seated assumption of 
what is possible. Linearly progressing, non-connecting scripts tend to 
not have that issue; ASCII certainly doesn't, but already with Latin 
combining marks you enter territory where fully random sequences can't 
be supported.

Then you get into the issue of substitutable code points/labels that 
should not be simultaneously allowed (aka "blocked" variants) -- the 
other type of variants ("allocatable") is more of a usability issue, 
which for some scripts can be critical but is not strictly required 
under the heading of "Conservatism".

Allowing two labels that even reasonably observant readers will 
substitute for one another, because different abstract text elements in 
Unicode may have identical (or very near identical) appearance when 
presented in typical user interfaces, adds no value, only security 
risks. (Yes this is a subset of the wider issue of labels that are 
merely similar, but that's not an excuse to take well-known an 
unambiguous cases off the table at the policy level).

Spelling out these considerations would go beyond what is currently 
scattered through the IDNA RFCs, but would be a first step to putting 
the Registries on notice what their responsibility actually entails.

Now that we have the Root Zone LGR giving practical examples (and the 
emerging reference LGRs doing the same for the second level), there's 
less excuse for registries not doing the right thing - good approximate 
solutions exist.