Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>
Asmus Freytag <asmusf@ix.netcom.com> Thu, 08 March 2018 20:02 UTC
Return-Path: <asmusf@ix.netcom.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id 0C2AC120721
for <idna-update@ietfa.amsl.com>; Thu, 8 Mar 2018 12:02:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.721
X-Spam-Level:
X-Spam-Status: No, score=-2.721 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01,
RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001]
autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key)
header.d=ix.netcom.com; domainkeys=pass (2048-bit key)
header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id LZt6Kq6y3N9z for <idna-update@ietfa.amsl.com>;
Thu, 8 Mar 2018 12:02:52 -0800 (PST)
Received: from elasmtp-masked.atl.sa.earthlink.net
(elasmtp-masked.atl.sa.earthlink.net [209.86.89.68])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id 03DEB1250B8
for <idna-update@ietf.org>; Thu, 8 Mar 2018 12:02:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com;
s=dk12062016; t=1520539370; bh=iBi9FuJtIImrZR2bOMtVy/143lpX+5Yhter+
hfOEL38=; h=Received:Subject:To:References:From:Message-ID:Date:
User-Agent:MIME-Version:In-Reply-To:Content-Type:
Content-Transfer-Encoding:Content-Language:X-ELNK-Trace:
X-Originating-IP; b=WHWxRvflgvcDsN7yY8655sgQmV0Hg0PM3LUd459xeYktaF
h/3jAQS4H5N2iaEp3g6p79otIvfTcb/v4hPZbTjnw5r7Cbm5pFM1EIWBsBr7as5kBlu
F8BArcyOhr62U3mmw9rB1kpJdUhHbhPGu8Chw92+iaDUuZoG4NCZDEV8N7/UIjNR9kz
CBKd/PAAC4ugy8BY+sXRm6kV09a6yvtxrxtD9GN9S8wxnR/jwcB4RoCe3Tnkt3l2P2c
EAth7FTkSYppZvIaSSy4mCo5nEMv2Qp+KFrWOBSvAbVnD0PxNbSu54RqgiIuq2CoXlt
7Q2+GzWXmmW+vbKt36H7qx5UKs0w==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com;
b=DZSlPTrjEUafs0vbEgaAkjp9X5R5VIUJ1DAJff4/mXLIQP7aml8ziOnpI0n2KjEd8YnP90lOWCbSh+sY9FD0qd/lo19U6wd9VmPqeh87RoNk2lGrrt9ydw00e8BU7tFOAQQUrczq20hhX2sEyEtIP8Zfipxf+HGCsAMuIktybunEFGDYXlYiXmQZI8woGKpaQ5AuRpD9zvwkaNYCnMEQlhE9EuW7yV1Tw6Tc7MjpAopfga+XaWHRJvZSDWbhbFePBRN7BMSXBNk0UNkkoP4sVtdmjdvgHqYUNCBDRJK7+WDXJR9/UW4cNDrh/1a3p8PKk75HPAxvhLADvXOLTARXNA==;
h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Transfer-Encoding:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [71.35.186.204] (helo=[192.168.1.103])
by elasmtp-masked.atl.sa.earthlink.net with esmtpa (Exim 4)
(envelope-from <asmusf@ix.netcom.com>) id 1eu1kV-0002m9-UU
for idna-update@ietf.org; Thu, 08 Mar 2018 15:02:48 -0500
To: idna-update@ietf.org
References: <C4FBCF12821031786F472AA2@PSB>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <02c29140-29f1-cc81-8c4f-8249d0f23b2c@ix.netcom.com>
Date: Thu, 8 Mar 2018 12:02:52 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <C4FBCF12821031786F472AA2@PSB>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2c1627926350bb93f5c24928e240690f528ed23686566b73f350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 71.35.186.204
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/M1d0fBC7wGu5vHLoNMudUCQ8SOc>
Subject: Re: [Idna-update] Expiration impending:
<draft-klensin-idna-rfc5891bis-01.txt>
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\)
implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>,
<mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>,
<mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Mar 2018 20:02:55 -0000
On 3/8/2018 9:11 AM, John C Klensin wrote: > One more thought about something it may be helpful to remind > ourselves periodically. > > One of the notions floated when IDNs were first being discussed > and raised a few times after that was to simply ban combining > forms of all types. The theory was that there was no > entitlement to write any character form (e.g., any "word") in > DNS labels, that those labels were all about mnemonics and > nothing but mnemonics, and that a "no combining sequences" rule > would eliminate most issues about normalization and grapheme > clusters and make everything a lot easier to explain to > implementers and others who wanted to conform but were not > willing to make the investment to actually understand all of the > complex issues and rules with which we are now dealing. It would also very nicely prevent IDNs on the entire Indian Subcontinent. Ironically, it is Arabic, where most (all) of the combining can safely be excluded from IDNs. This is being done for the Root Zone, for example, as you acknowledge below. The reason is that in Arabic, these marks are generally optional and/or used for specific purposes in specialized text. Therefore, leaving them out is not detrimental to the usability of IDNs and the Root Zone will not allow them (extending the set of prohibited ones from RFC 5564) Therefore, in the Root Zone the Unicode 7 addition of the U+08A1 would not be an issue. > > If that were the rule and someone really, really, wanted a > grapheme that could only be formed in Unicode with a combining > sequence, it would be up to them to convince the Unicode > Consortium that their favorite character (grapheme) needed to be > added to Unicode as a single code point. However hard that > might be, it would not be our problem. Well, we see what happens if someone does get Unicode to add a pre- composed form: the entire process is derailed. That's what happened with the addition of U+08A1 even though it is NOT the case that this is a true "precomposed" form - while the same graphical elements are involved, the result does not look identical. There is a strong similarity of course, but despite what people read into the Unicode character name, this is not a case of an exact homoglyph. > > FWIW, a "nothing but precombined characters" rule is essentially > the recommendation for Arabic IDNs in RFC 5564 and, I > understand, in the emerging Arabic script rules for the root > zone. That is because of the way these function in Arabic. Unicode could not generally add precomposed forms of, say, Latin code points, say a new letter with a dot above, because of normalization stability. In the Latin script, a combining dot above and a precomposed dot above are identical. However, even there you have a number of combining marks that are not considered as part of possible decompositions: they are the code points for various (stroke) overlays and some attached extenders. Like the Arabic combining marks, they could (and should) be disallowed from LGRs. (The Latin LGR for the Root Zone will not allow combining marks other than in enumerated combinations - that's something that works for Latin, Greek, Cyrillic and practically all scripts that are not South or South East Asian complex scripts. > > We didn't go down that path, not only because of impassioned > pleas for some of the character forms that might be excluded but > because of precisely the reassurance that arguably led to the > non-decomposing characters thread -- assurances that no new code > points would be added to Unicode if there were already a > combining sequence that could reasonably substitute for it in > the same script except under very unusual circumstances and, > when those circumstances occurred, the new code points would > decompose to those sequences. Other participants in that discussion remember this claim differently. Unicode Normalization forms C and D were never about "reasonable substitutions" but about "exact equivalents" or "the same thing, except for the encoding". As there is generally no benefit in encoding another representation of "the same thing", Unicode does not allow addition of precomposed code points that can be decomposed into something that is the exact equivalent. > > I don't suggest that we try to reverse that decision at this > point. I assume that, if nothing else, it would just be too > disruptive. However, it is worth pointing out that a "no > combining sequences" rule would eliminate the non-decomposing > character problem and at least a few other potential spoofing > and related cases. It might also be worth examining as a > guideline or advice for registries who are interesting in > raising the safety level of what they allow to be registered > without having to understand the underlying issues more deeply. A useful set of recommendation for handling combining marks safely in LGRs would consist of: 1) in all non-complex scripts: allow only fixed enumerations of base code point and combining marks. (The number of required combinations is small, even for a sprawling script like Latin). 2) in all complex scripts (where the number of combinations is too large), provide context rules that assure combining marks are not placed in the wrong part of a syllable (such wrong contexts cannot be "read" by humans and not rendered correctly by machines). The Root LGR presents suitable examples of this for SEA scripts, Indic scripts in preparation. 3) in scripts where combining marks express optional elements (vowels, etc.) disallow all of them. (Arabic, see Root Zone LGR for example) 4) in scripts where combining marks are used for historical/ special purposes disallow those (diacritics for classical Greek, stroke overlays and other marks for linguistics). 5) in LGRs supporting variants, consider mutually blocking labels that vary only in the presence or absence of some combining diacritic; some diacritics are not reliably distinguished from each other (comma below, cedilla) or from an undecorated base character (forms with/without Nukta). What is a complex script: a good approximation is whether it contains a code point with ccc=virama, plus Thaana, which isn't really complex but has mandatory vowels classed as combining marks. Effectively all South and South East Asian scripts that are abugidas. In addition, you will want recommendations that actually address the homoglyph issues: there are a fair number of non-combining mark exact visual duplicates in Unicode. These are not exact equivalents, because Unicode does not treat code points that differ in script, case or digit/letter properties as equivalent. A./ > > best, > john > > > > _______________________________________________ > IDNA-UPDATE mailing list > IDNA-UPDATE@ietf.org > https://www.ietf.org/mailman/listinfo/idna-update >
- [Idna-update] FWD: Expiration impending: <draft-k… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Kim Davies
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… John R. Levine
- Re: [Idna-update] [Ext] FWD: Expiration impending… Suzanne Woolf
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] FWD: Expiration impending: <dra… Francisco Arias
- Re: [Idna-update] [Ext] FWD: Expiration impending… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Francisco Arias
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- [Idna-update] IDNA and combining sequences (was: … John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Mark Davis ☕️
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine