Re: [I18nrp] Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC
Asmus Freytag <asmusf@ix.netcom.com> Tue, 04 December 2018 01:33 UTC
Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7DAC6130DCE for <i18nrp@ietfa.amsl.com>; Mon, 3 Dec 2018 17:33:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level:
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rJHzZzoY7ItN for <i18nrp@ietfa.amsl.com>; Mon, 3 Dec 2018 17:33:23 -0800 (PST)
Received: from elasmtp-curtail.atl.sa.earthlink.net (elasmtp-curtail.atl.sa.earthlink.net [209.86.89.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8DFD5130DD3 for <i18nrp@ietf.org>; Mon, 3 Dec 2018 17:33:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1543887203; bh=72yCJqllq1A9PY5GLcqh9T91sgyMTYnPPoYT uLCzCaA=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=PfZ3lspp+Cv97lvx87yJPVRL+CH9DHkHd FhV7VgHXxoxrjf8BUDP60lUZS+CMYk3edHiax/+h2ysRuTYTnAovL/vlAKtzXLqy9eM x818Ykdy/iAfPKxkDp5uM3ogGgUmjiLvuBkDPNjivKygwGTQs0txMeFlyeplUNU1rSQ B7Vb8RUVT9TYosHAItea6WHhB/dO/QpNSQJ4N85sz4kqOUCWC2OYOVtpjuC1SeKluDf QR9Ph144DNit7U6DvJvvtpbYuDVQM5sgxxPTYyVm1RHRnIJzu/ZSKQQNm25Ak53b12l FqCdc1utgh/MZkn/UxMCZdCfrlvDxALg6QM5DwyOw==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=DiI8R2urjXlpfnjO2V6hZEvchISS4aBAnNdaLDyhdzYcNyAxAo6mdb93clfdfqUSmnIW4iq/49E9kwTeQ+8is4khFbSed9QaSTP3/D316oPd5PcZD6IESw2dw4YXEXlBj7g38/UpWtGwJKmXMYxdvmPobsKl1HUFn9xJ+AMzCC8iYuoJSQSOrgiemAZD0xJWeKCmSu+4qjOA/LXQ7ZLAuS6v3SZ7LsshMMnpMa0/MIwWOR52JWMsmHDy05zYvpTIfG7Sp6g0qqqsVHwYxLGxiJMKm/8wMWBPYzqCvBzB7R7Pn/G1m4E0MM1FrPS2AonjxKwIuADtGbG8wd6ep6VQRQ==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [174.21.171.131] (helo=[192.168.1.111]) by elasmtp-curtail.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1gTzaR-0006K5-PB for i18nrp@ietf.org; Mon, 03 Dec 2018 20:33:20 -0500
To: i18nrp@ietf.org
References: <154385119878.18333.5085298134102919486.idtracker@ietfa.amsl.com> <FF6F9EB9-C73B-4EC0-AC4F-3E3BFBABA0AB@vpnc.org> <8E20D432-01B0-4B52-80BB-3348C5FE73AF@vpnc.org>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <ef327a06-fb46-811a-5225-5ba8ec7a88fe@ix.netcom.com>
Date: Mon, 03 Dec 2018 17:33:20 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.2
MIME-Version: 1.0
In-Reply-To: <8E20D432-01B0-4B52-80BB-3348C5FE73AF@vpnc.org>
Content-Type: multipart/alternative; boundary="------------7790F66EB3053C5476DDBB9D"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b28d93432b0f0788b97a2e8c09e8e6e523f7eb36cd9cf3602a350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 174.21.171.131
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/AQFoEADDFq6x6sDDP2QRDKlYzbA>
Subject: Re: [I18nrp] Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Dec 2018 01:33:27 -0000
On 12/3/2018 4:02 PM, Paul Hoffman wrote: > Before I go to the ietf@ietf.org mailing list with my concerns about > this draft, I hope it is OK to bounce them off people here in case I'm > wildly off track. > > ===== > > In Section 1: > Specifically, the Internet Architecture Board did issue a statement > [IAB] which requested IETF to resolve the issues related to the code > point ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1), introduced in > Unicode 7.0.0 [Unicode-7.0.0]. This document resolves this issue and > suggests IDNA2008 standard is to follow the Unicode Standard and not > update RFC 5892 [RFC5892] or any other IDNA2008 RFCs. > > In Section 4.1: > The discussion in the IETF concluded that although it is possible to > create "the same" character in multiple ways, the issue with U+08A1 > is not unique. In the case of U+08A1, it can be represented with the > sequence ARABIC LETTER BEH (U+0628) and ARABIC HAMZA ABOVE (U+0654). > Just like LATIN SMALL LETTER A WITH DIAERESIS (U+00E4) can be > represented via the sequence LATIN SMALL LETTER A (U+0061), and > COMBINING DIAERESIS (U+0308). One difference between these sequences > is how they are treated in the normalization forms specified by the > Unicode Consortium. > > This sounds like the IETF is saying that if the Unicode Consortium > changes how a character appears in a normalization form other than for > case folding (Section 2.2 of RFC 5892), that change does not affect > the tables for IDNA2008. Is that correct? It's actually not correct that these are necessarily the "same" characters, if by "same" you mean _identical_ glyph outlines in high-resolution/high-quality layout. While U+08A1 contains a graphical element that looks like the letter BEH and another that looks like a HAMZA, the relative placement of these two are not necessarily the same as when both BEH and HAMZA are used independently. (Effectively reflecting that the combination is an independent letter, not an alternate encoding for the sequence). Unlike the case of Latin Ä, where high-resolution/high-quality rendering ideally results in the exact same appearance, reflecting the fact that precomposed code point and sequence are alternating encodings of the same letter). Unicode normalization has never been a full "glyph folding"; it was never intended to do that: it's design point is to fold alternate encodings for the same abstract entity. This is not the same as preventing exact lookalikes, because there are some entities that are quite obviously logically distinct, yet are lookalikes. The case of Latin/Greek capitals (not so relevant for IDNs) and Latin/Cyrillic (both Capitals and lowercase, therefore more relevant for IDNs) are well known. Everybody accepts that once a letter is part of a different script, it is different and cannot be normalized. Another distinction that cannot be normalized is that between letters and digits. Yet there are quite a few scripts where native digits sometimes have the same shape as a native letter. (In Latin we have 0 and O; in our typographic tradition most, but not all, fonts will strive to make these distinct - that is not so in other scripts, where the shapes are often identical). Again, there's a shared understanding that normalizing digits to letters causes more problems than it solves and therefore the two are treated as distinct. What I see Patrik's document reflecting is the inherent limitation of the normalization algorithm: it folds alternate encodings of the same abstract entity, but does not purport to be a universal glyph folding. Given that the set of PVALID code points has long included both code points and sequences that have identical glyphs, it was surprising that the process was stopped for something that was merely a set of close lookalikes. Finally, there's nothing here about a "change" in normalization. Whether any code point that is contemplated for addition should have a decomposition or not is something that depends on the identity of the abstract entity that it represents, not simply on the shape, and certainly not on an approximate shape - or a shape oriented character name. Now, once a decomposition is found to be appropriate, the stability rules actually stipulate that the code point should not be added, so the only cases that are at all permissible are code points that represent a distinct entity that is not otherwise encoded. And, to continue that line of thought, the fact that some entities share a shape (or that some sequences could result in a more or less identical shape) is not something that effects this determination. Unicode is fundamentally not a lego set for creating glyph shapes; what is encoded is abstract characters (even if there's the historical baggage of precomposed characters in the face of open-ended use of diacritics that has lead to normalization in the Latin/Greek and Cyrillic scripts). > > ===== > > In Section 4.1: > As U+08A1 is discussed in draft-freytag-troublesome-characters > [I-D.freytag-troublesome-characters] and elsewhere. Regardless of > whether those discussions ends in recommending including the code > point in the repertoire of characters permissable for registration or > not, it is acceptable to allow the code point to have a derived > property value of PVALID. > > This sounds like it is saying that even though > draft-freytag-troublesome-characters is meant for standards track, > because it is not yet finished, this document (which is informational) > can ignore the other document and make changes to the IANA registry. > If that's correct, it concerns me because it could make the IANA > registry unstable for characters that we know about and are actively > discussing. If I'm not correct, I'd like to hear why so that maybe > this document can be reworded. The set of code points that are PVALID are overbroad; they represent an outer envelope of what is allowed under the protocol for ANY given zone, but that set is much wider than what should be RECOMMENDED for inclusion in ALL zones (or, setting aside the special restrictions for the Root, recommended for all public zones). The draft ID cited in Section 4.1 is clear that it does not intend to change the set of PVALID code points; there are some zones where allowing any PVALID code point may well not cause issues. However, in widely shared zone, particularly those with users from different languages and scripts, a well-chosen subset of the PVALID code points would reduce security issues and improve usability. The details of the best approach in these cases are not always obvious. In Arabic, for example, it is the combining HAMZA that would become NOT RECOMMENDED (because unlike the letter U+08A1 and similar) it is not needed to form "useful mnemonics" in Arabic or any of the other languages using the Arabic script. PVALID code points include those that look like (!) and (') - these are also not to be recommended for general use. In the Indic scripts there are many examples of compound glyphs that can be achieved by more than one sequence of code points. In these instances, the Unicode standard has gone on record identifying the sequences that MUST NOT be used. (As only one of the alternate sequences is legitimate, there is no normalization -- because normalization would have implied both sequences are equivalent). Unfortunately, IDNA2008 lacks a scalable mechanism to DISALLOW sequences of code points. (The only available formalism, the use of CONTEXTO rules is not scalable beyond a few cases, as it is not presented in a machine readable format -- unlike for example RFC 7940). The draft ID cited in Section 4.1 may well RECOMMEND that registries not allow such sequences (and that they use machine-readable specifications based on RFC 7940 to do so). However, that entire discussion is unaffected by Patrik's document and unaffected by IANA proceeding in the context of the existing parameters for IDNA 2008. A deeper linguistic reason for keeping the domains of these ID's separate is in the fact that best practices for support of complex scripts in public zones would implement policies that have context rules that go beyond preventing specific alternate sequences. In many cases, the latter fall out from applying more general restrictions that reflect, for example, the syllable structure that is a common feature of complex scripts. Such general restrictions are sensitive not only to the script, but in some cases, to the language(s) to be supported. A one-size-fits-all approach (as would be required on the protocol level) is therefore inappropriate. Perhaps it might be possible to edit the text of Section 4.1 to make sure that the "repertoire" mentioned is to be understood as the repertoire for "any particular zone" as opposed to something like the set of PVALID code points. With that clarification, I see no issue in that section. A./ > > --Paul Hoffman > > _______________________________________________ > i18nRP mailing list > i18nRP@ietf.org > https://www.ietf.org/mailman/listinfo/i18nrp >
- [I18nrp] Last Call: <draft-faltstrom-unicode11-05… Paul Hoffman
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Paul Hoffman
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Paul Hoffman
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Paul Hoffman
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Nico Williams
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Vint Cerf
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Nico Williams
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Asmus Freytag (c)
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Nico Williams
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Nico Williams
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Shawn Steele
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Martin J. Dürst
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Martin J. Dürst
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Martin J. Dürst
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Martin J. Dürst