Re: [I18nrp] Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC
Asmus Freytag <asmusf@ix.netcom.com> Tue, 04 December 2018 18:48 UTC
Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A5518130ED9 for <i18nrp@ietfa.amsl.com>; Tue, 4 Dec 2018 10:48:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FzrSLfEBhEcI for <i18nrp@ietfa.amsl.com>; Tue, 4 Dec 2018 10:48:46 -0800 (PST)
Received: from elasmtp-dupuy.atl.sa.earthlink.net (elasmtp-dupuy.atl.sa.earthlink.net [209.86.89.62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E29E7130FA6 for <i18nrp@ietf.org>; Tue, 4 Dec 2018 10:48:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1543949325; bh=WdO/RqdB19vrFB1J9fNWYPpF8aB+a3xsStmJ iYysg3I=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=eSwWndJIjYFer4f76RNnZXKSCy7Wlsd6c QCOYpa/AQIJaZ7xwyyFVAFIVjrjxTDaIiytp1d6QTsxhO+vF0xfgRLtBDe2r8xcY5RK tr5SCROLQJMf6G5wLjTiVx848Cn8DMtfTS2xy5n0PhmaD2s7a1NnmpyAJmisKfWTrDB /KvSrctmaGlWDlEqrzPwMcx6JNyLFC4V9l1Ukz34jUy4isa5UDf5xnBbkvM01XwonEj MMoHlrdZKieX+NIUIjQT+V7IoQNWN7TOk4gi7SWwrHkNfsEwUVNxnEA9qdpzu0+y+F1 PgD09vXYzzjdJyCfc6HylN7XJYMESqfz1xb8kAgyg==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=S+bsLYBmjhLEExjYKLfja0xWM+RZyWtL5GAUeKOrsm2W9v0NioGxRQBuBBFBhQgT1zTkrbA2oH20BbRs67Dp21vBVMfnCLzU/S5d0ovhEvTTiPr4t8RbMhej6mkcv7cm/BKJWX7InY33v5vg3m8IFNptW3oEsTqOOQOpP22Z1WuSF6wxo3e0wfbOcsYDr/YIJvIBRA5XiC3W+/vAXI3DPqLhGUEhc98QTal8ChL7G04TcKlCQsjADI0uFEfmwaxTE8cBQ8iKyBgbgPjWCLdKHU7mRqs1b+Ww1hv5mUCDo+CFX8hkDDETo7bhpL5im3dwrCWnhLX44IxKL7hPzEOYbA==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [174.21.171.131] (helo=[192.168.1.111]) by elasmtp-dupuy.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1gUFkS-00056o-82 for i18nrp@ietf.org; Tue, 04 Dec 2018 13:48:44 -0500
To: i18nrp@ietf.org
References: <154385119878.18333.5085298134102919486.idtracker@ietfa.amsl.com> <FF6F9EB9-C73B-4EC0-AC4F-3E3BFBABA0AB@vpnc.org> <8E20D432-01B0-4B52-80BB-3348C5FE73AF@vpnc.org> <A5B69D318689A6515CCB4883@PSB> <E5A7B829-FE59-4649-AE36-2918E910A667@frobbit.se> <95EB3D3321F6E91A22C9FE3B@PSB>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <db1db0c8-2b05-1ba9-8a15-71de3713fc14@ix.netcom.com>
Date: Tue, 04 Dec 2018 10:48:45 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.2
MIME-Version: 1.0
In-Reply-To: <95EB3D3321F6E91A22C9FE3B@PSB>
Content-Type: multipart/alternative; boundary="------------10221F3A107E1A091EAA9E03"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b28d93432b0f0788b991b6ab7e7f45776766f30427ff285833350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 174.21.171.131
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/e7f6WeG0HMN_uD1Tliz6NfFciLk>
Subject: Re: [I18nrp] Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Dec 2018 18:48:49 -0000
On 12/4/2018 2:33 AM, John C Klensin wrote: > Incidentally, if Asmus's comments about the insufficiency of the > CONTEXTo model for a range of problems is correct, we should be > looking at whether IDNA2008 is in need of changes/ updating. > Those changes might turn out to be substantive or as minor as > making it much more clear that what IDNA2008 permits is a > superset of what should be allowed in a particular zone and who > has responsibility (and accountability) for selecting a > zone-specific subset (aided by whatever hints we can provide). > It seems clear to me that we should not process this document > until we have answers to, or at least much more clarity about, > that issue. YMMD. Complex scripts (such as Indic and related scripts) are conceptually written as a stream of syllables. However, what is encoded is the set of elements that make up syllables. This leads to a well-formedness problem: it is possible to type sequences that are not well-formed syllables. There are two kinds: syllables that "can't happen" given the phonetics of given language, and syllables that are structurally unsound (violate the principles of the writing system). Both can be problematic. The latter, because unsound sequences can look exactly like correct sequences, or, alternatively, because unsound sequences are not supported by renderers and fonts and lead to unpredictable display. The "can't happen" sequences may cause issues to readers; the best analogy is that they are mentally processed like a nonsense letter (such as the ones invented by Dr. Seuss in "On Beyond Zebra"), instead of like a nonsense "word" as an arbitrary sequence of Latin letters would. Structurally unsound sequences should be prohibited by policy, while some or all of the phonetically impossible sequences may be worth prohibiting for added security, but the latter are language dependent. (Some languages may differ in what constitutes a structurally sound syllable as well, an issue we encountered in the Thai script, where renderers designed for the Thai language will not handle some of the other languages). Because of the (potential/actual) language dependency much of these constraints cannot be embedded in the protocol layer like IDNA 2008. There are subset of unsound sequences that Unicode calls out as having a "do not use" status independent of language (See chapter 12 of the standard for examples). These could be captured with CONTEXTO (or similar) constraints in an updated IDNA2008. However, they are always a subset of the "real" set of unsound sequences; any registry doing a good job for complex scripts would therefore find that implementing the CONTEXTO constraints becomes redundant. (Also, unlike RFC7940, these rules are not machine readable and therefore not trivial to implement and verify). Fundamentally, IDNA2008 primarily focuses on repertoire, the few CONTEXTx rules notwithstanding. Changing that model would be a major undertaking and it is questionable that the results would be beneficial. Perhaps some other tack could be taken: in your draft, it could be pointed out that Unicode assigns "do not use" to some sequences (which means that there is no text, identifier or not, that should contain them). Preventing these from registrations could be declared part of the responsibility of registries under RFC 5891. This approach is more flexible because it allows registries to adopt context rules that are a superset of the rules needed to exclude these particular sequences. Which is precisely what we are doing in the Root Zone. If you go to http://icann.org/idn and look for the set of RZ-LGR proposals, you will find that the context rules for the various Indic scripts do not explicitly call out these "do not use" sequences, but that they adopt context rules on some of the constituent code points that also disallow those sequences, while implementing a more general constraint appropriate to the writing system. For an example, see section 7 of https://www.icann.org/en/system/files/files/proposal-devanagari-lgr-27jul18-en.pdf This shows the more generalized rules for Devanagari, but also where they had to be tweaked to allow for both Hindi and other languages, e.g. Santali. For comparison, check for example, Table 12-1 in http://www.unicode.org/versions/latest/ch12.pdf You will find that Table 12-1 is covered by the rule: 3. M: must be preceded by C or CN (Read: "Dependent vowel signs (matras) must follow a consonant with or without an optional Nukta"). This rule prevents dependent vowel signs from being combined with independent vowel letters, which is what the list in table 12-1 enumerates. However, the RZ-LGR rule also covers many other contexts that are inappropriate (unsound) for matras; some of the latter constraints may not be as universal, and, unlike the prohibition of Table 12-1 would be out of scope for a protocol. The Root Zone LGR uses RFC 7940 to translate this rule into a single line of XML, which can be machine processed. A./
- [I18nrp] Last Call: <draft-faltstrom-unicode11-05… Paul Hoffman
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Paul Hoffman
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Paul Hoffman
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Paul Hoffman
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Patrik Fältström
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Nico Williams
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Vint Cerf
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Nico Williams
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Asmus Freytag (c)
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Nico Williams
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Nico Williams
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Shawn Steele
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… John C Klensin
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Asmus Freytag (c)
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Larry Masinter
- Re: [I18nrp] [Idna-update] Last Call: <draft-falt… Martin J. Dürst
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Martin J. Dürst
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Martin J. Dürst
- Re: [I18nrp] Last Call: <draft-faltstrom-unicode1… Martin J. Dürst