Re: [Idna-update] IDNA and combining sequences
"Asmus Freytag (c)" <asmusf@ix.netcom.com> Sat, 10 March 2018 18:08 UTC
Return-Path: <asmusf@ix.netcom.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id D65E9126DD9
for <idna-update@ietfa.amsl.com>; Sat, 10 Mar 2018 10:08:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.721
X-Spam-Level:
X-Spam-Status: No, score=-2.721 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01,
RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001]
autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key)
header.d=ix.netcom.com; domainkeys=pass (2048-bit key)
header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id xKBSbt8vrjmN for <idna-update@ietfa.amsl.com>;
Sat, 10 Mar 2018 10:08:09 -0800 (PST)
Received: from elasmtp-masked.atl.sa.earthlink.net
(elasmtp-masked.atl.sa.earthlink.net [209.86.89.68])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id B9F39124D68
for <idna-update@ietf.org>; Sat, 10 Mar 2018 10:08:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com;
s=dk12062016; t=1520705289; bh=e7I57hpCQOnLFhgTEQ20VNprFkXjECtW9sjT
2RCV5FI=; h=Received:Subject:To:Cc:References:From:Message-ID:Date:
User-Agent:MIME-Version:In-Reply-To:Content-Type:
Content-Transfer-Encoding:Content-Language:X-ELNK-Trace:
X-Originating-IP; b=RH0Pjf+f2iNJot0KYguT2J1LJe9x2RsEBFNXb3aWsucFi2
ZpfPbHJ8B6wf76D5CRUk3Sec63AFJBUzd946ptGHjf5u6VaVAwOBO68CjUZH/2GZQRA
KwMuugq+tXmX2kLgWCvcCVO0NdqZdBOqW04wrtxgqDb1wRlSigINFuoKgZ51rKkLs6W
HZcLG8+YQvKcX4YKF929vBXmaJosmIUrK/p1gx8BLryNhxgM5GX0x6UwU0gl4AAKbdk
muTEavkaizZwOy7fdwIFoJVAItRMEnBGQgiP8GHoOX26+m8RMR4LpHtfmbeOYWx+GZz
8056wayb7HDKU+tCjv7uMZqOW49w==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com;
b=hWipInacNINjTmaxaSxpyAEPgbFwDmMtW46D41UQhrKKmDmU7KpQYwJbZ3AdFDL2n4XImBjbVq0R1fmdul/NVuoJYBvoxX92Pc58MHlqCMhkGGh6fJx7EUnowTwar4mmJN2SJKKRk/I8Ic2OCfyiNFiKRBuCKZLLiQ2/ZLBh3z3mGhC1sB59ClJwP/CXLc0KJVKb2zhuTR7n3nE7c6cxgoSvBo6voSUz0Ilki/SxIdiSlbYuJzSQpo2bYeJF+tAFZ+Eh+9YZEKuLab4n5e/Po7o0R6E+hIYekbJDScyUeWGGXoJqex3+iky3KXvJ1RroZdL8i+gKhCBEZIBQo9q4mQ==;
h=Received:Subject:To:Cc:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Transfer-Encoding:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [46.21.151.107] (helo=[10.4.47.190])
by elasmtp-masked.atl.sa.earthlink.net with esmtpa (Exim 4)
(envelope-from <asmusf@ix.netcom.com>)
id 1euiub-00070i-UQ; Sat, 10 Mar 2018 13:08:06 -0500
To: John C Klensin <john-ietf@jck.com>, =?UTF-8?B?UGF0cmlrIEbDpGx0c3Ryw7Zt?=
<paf@frobbit.se>
Cc: idna-update@ietf.org
References: <C4FBCF12821031786F472AA2@PSB>
<02c29140-29f1-cc81-8c4f-8249d0f23b2c@ix.netcom.com>
<1E562CDE39B4224F227E765D@PSB>
<516E58F3-015D-4AD7-A3FD-0749A6890245@frobbit.se>
<D26CE952D968BBEC0AB96A76@PSB>
From: "Asmus Freytag (c)" <asmusf@ix.netcom.com>
Message-ID: <70445d5e-6294-26f5-50b8-cb9ae7345c56@ix.netcom.com>
Date: Sat, 10 Mar 2018 10:08:08 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <D26CE952D968BBEC0AB96A76@PSB>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2c1627926350bb93fe0b4bf86a1aafe751d858ffcef8ab932350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 46.21.151.107
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/2eDu2dj114IOYfsU8x4Vt4QxRCo>
Subject: Re: [Idna-update] IDNA and combining sequences
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\)
implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>,
<mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>,
<mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 10 Mar 2018 18:08:12 -0000
Patrik asked, where do we go from here. Let me propose something, but first some response to John. On 3/10/2018 5:51 AM, John C Klensin wrote: > > --On Saturday, March 10, 2018 09:06 -0400 Patrik Fältström > <paf@frobbit.se> wrote: > >> I think we should do some scenario planning here. Remember >> that we do not have a world where IDNA2008 based on Unicode >> 6.x is what people use. People use all different kind of mix >> between IDNA2008, IDNA2003 and Unicode versions. I have myself >> worked with the curl library (that uses libidn) and to be >> honest, I do not think people KNOW what they use. Or they >> know, and they know they violate the rules. For the contracted >> parties, they have the LGR coming down the road anyways, so... Correct, and it doesn't stop there: some parties are off the reservation doing emoji... >> >> And *if* we go down this path, is it "enough" to do in the LGR >> (i.e. ICANN) or should IETF do some adoption (and W3C?), or >> should IETF say "we can move forward in a more safe way AS WE >> KNOW ICANN DO LGR"... > As I said at far more length in another note, the LGR is (at > least by its charter/ Procedure) designed for, and applicable > only to, the root. While those code points would probably be > safe to use in any zone, trying to apply them globally would be > unduly restrictive, would violate the "administratively > distributed hierarchy" principle, and, IMO, just wouldn't fly.. > The latter would probably quickly reach the point that attempts > to apply the LGR to labels at the second level and beyond would > encourage non-compliance and daring ICANN to do anything about > it. Totally agree. There's no benefit in "imposing" the RZ-LGR on other zones. However, the various script LGRs can serve both as examples and as starting points. They do contain (or are about to contain) worked examples of repertoire context rules and or variant rules applicable to modern users of each script. They are also heavily documented, so that anybody can understand the "why" behind their design. That makes them useful as a starting point as well, to which you can add features as needed. One thing almost anyone will want to add outside the root zone is digits (and the hyphen). That's why I have consistently argued that the RZ-LGR is a useful example and starting point. > > As an example of the overly restrictive part, I assume that the > LGR rules, like the ccTLD Fast Track Procedures, would prohibit > the use of "ур" (U+0443 U+0440) or other permutations of those > characters) in to root on the grounds of either intrinsic > confusability or the potential conflict with the 3166-1 alpha-2 > list. The LGR does not have to "prohibit" these. Instead, they are recognized as so-called cross-script variants of the Latin letter. Meaning, if you have a label containing Latin "yp" in the root, then you do not get to use Cyrillic "ур"in some other label, if the two would otherwise be (or look) the same. However, you remain free to use Cyrillic "ур" in any label that does not collide with some Latin label, either because no homograph label exists, or because yours has some specific Cyrillic code point in it that would distinguish it. The ability to use the variant mechanism for such cases is the main difference between using LGRs (aka idn tables) compared to the protocol in enforcing restrictions. The cross-script variant mechanism is a more appropriate method to address issues like that, because it doesn't blindly ban perfectly acceptable uses of code points, but instead resolves any conflicts in favor of the first mover. This reflects linguistic reality - in most cases, cross-script labels that are homographs would look more than a bit "contrived", such the licence plate with the Russian expletive AXY HEXO that some Unicoder had for a while - it makes no sense in English, and passed the DMV. At one point, we did some research in one of the scripts and found that defining even a substantial numbers of such variants had little impact on blocking legitimate words. Had a similar number of code points been banned, the utility of the IDN labels would have been seriously compromised. For that reason alone, simply banning code points is more restrictive than necessary. > However, if we accept the logic that justifies IDN TLDs, > that sequuence would be acceptable in a subtree of a domain > whose TLD label was in Cyrillic and that published and applied a > policy that its subdomains were entirely Cyrillic. There are other RZ restrictions (no digits, no hyphen) that aren't appropriate for zones lower down the tree. And there are a few cases where it wasn't possible to cater to all languages in a given script simultaneously, as some had conflicting conventions (for complex scripts). Such cases, while rare, are another good reason to use the Root Zone as a starting point (and not the end-point) of LGR design for other Zones. However, the audience of 90%+ of existing zones would have been served equally well had their idntables and policies been based on nothing more than the RZ-LGR plus digits and hyphen. The stuff that the Root Zone restricts is very often simply not in use. > >> Or... > I think the above puts us well into the "or..." range. > I think we've come to that conclusion before - and we keep revisiting ground that I thought had somewhat settled. We had been working on a two-prong approach, of which your ID was one of the prongs: 1) to reiterate that the "raw" protocol doesn't by itself design useful and secure LGRs (aka. idn tables/policies), but requires conscious selection (and other decisions). 2) to give guidance as to what the issues are and how to address them. Now, we thought that the best approach for the second prong was to create a registry of "troublesome" code points. That turned out to be more challenging than anticipated. Not because it is difficult to come up with a list. That's actually not as difficult as it looks - for modern scripts. No, the issue is that the code points, like your example of "yp" are not problematic in isolation. What then should get listed? If we list just the Cyrillic code points, then people creating a Cyrillic zone will complain loudly that the entry is unjustified. If we list both the Latin and the Cyrillic code point, on the basis that their interaction is problematic, we get that reaction from two groups. However, in the RZ effort, both groups were perfectly happy defining mutually blocked variants - in fact, we had to stop them from going overboard. . . . Where to we need to go? We need some recommendation that goes beyond "know what you are doing" (while true, it's too vague to be helpful). We need a set of recommendations that is more or less like the example I wrote down for the combining marks. Something that's specific enough that people can act on it, but also can be adjusted to fit circumstances. In writing recommendations for (more) "secure" IDNs, it's important to make a distinction between modern-use scripts and the vast number of historical scripts and technical (phonetic) notations. The latter are really not well suited to secure use in public zones. (The protocol must cater for non-public zones as well). Now, one option would be to create something like a "profile" on IDNA2008 for "(more) secure IDNs for public zone." Doing that would sidestep the issue of backwards compatibility because you could be more restrictive; however, there's the same issue that any enumerated restrictions would run into the same issues of changes in usage (spelling reforms) or changes in encoding. However, qualitative descriptions wouldn't suffer as much from that issue; I think they would add value, whether as recommendation or as "profile". A./
- [Idna-update] FWD: Expiration impending: <draft-k… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Kim Davies
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… John R. Levine
- Re: [Idna-update] [Ext] FWD: Expiration impending… Suzanne Woolf
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] FWD: Expiration impending: <dra… Francisco Arias
- Re: [Idna-update] [Ext] FWD: Expiration impending… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Francisco Arias
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- [Idna-update] IDNA and combining sequences (was: … John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Mark Davis ☕️
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine