[I18nrp] The evolutionary future of IDNA (was: Re: Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC)
John C Klensin <john-ietf@jck.com> Thu, 06 December 2018 16:01 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 14587130DFA for <i18nrp@ietfa.amsl.com>; Thu, 6 Dec 2018 08:01:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ga0lfS6-57gM for <i18nrp@ietfa.amsl.com>; Thu, 6 Dec 2018 08:01:18 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ADDAB130E08 for <i18nrp@ietf.org>; Thu, 6 Dec 2018 08:01:17 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1gUw5T-0007NW-U6; Thu, 06 Dec 2018 11:01:15 -0500
Date: Thu, 06 Dec 2018 11:01:08 -0500
From: John C Klensin <john-ietf@jck.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>, i18nrp@ietf.org
Message-ID: <9F6A8117BA3220C4447B1D72@PSB>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/ArYd2SlLLTVfr4R62ojqcKp4EAQ>
Subject: [I18nrp] The evolutionary future of IDNA (was: Re: Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC)
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Dec 2018 16:01:23 -0000
Hi. Paul's note from Tuesday seems to raise some fundamental questions about where we are taking IDNA. They don't seem to me to have much to do with the debate about directorates or procedural correctness or even much to do with the argument for tuning draft-faltstrom-unicode11 and pushing it forward at this point unless the community concludes that advancing that document without understanding where we are headed is a bad idea. I happen to believe that, but I think the key issues Paul raises would still be key issues even if the community decides it is ok to advance draft-faltstrom-unicode11 and then use it as a new starting point. So I want to make a start at reviewing some context and then looking at those issues... --On Tuesday, December 4, 2018 06:59 -0800 Paul Hoffman <paul.hoffman@vpnc.org> wrote: >... >> First of all, this document evaluates the individual changes >> made up until and including Unicode 11. Sure, one could say >> this has implications on the IETF view of existence of >> normalization rules (or not) but that is not the intention >> here. The result of this review should neither be >> extrapolated to future versions of Unicode nor to future >> evolutions of normalizations. > > This is good to know. In that case, could you either remove > the "One difference between these sequences..." sentences, or > add a sentence in the Introduction section that says "The > result of this review should neither be extrapolated to future > versions of Unicode." Either action would clear up this > confusion. I think such a sentence in the introduction is needed even if the "One difference' sentence is removed. >>> ===== >> First of all, this document is (as it seems to me now) to be >> Standards Track. So that issue is taken care of. > > Procedurally, it is not, I believe. A new draft needs to be > issued, and the IETF Last Call has to start again. > Fortunately, this IETF Last Call is only a few days old, so > this should not delay anything much. Noting that the Last Call has been withdrawn but other comments have disagreed with that decision, I concur about the procedural requirement. Another reason the Directorate should be able to form an opinion and make a recommendation before the Last Call is restarted is that, for this particular case, if Patrik wants to reference other documents for details and explanations (which I agree is the right thing to do), we'd better be sure that those documents say what we want them to say and are at least reasonably stable. There is no assurance at all of that with expired documents that have had no meaningful discussion in the community. Were the Directorate to sort that out partway through the Last Call, we would have the risk (I'd guess high odds) of having to revise this document as well as some of those it references, meaning that the Last Call would be about an obsolete and replaced I-D, leaving the IESG to either try to guess at what the community would have said about the new drafts or to repeat the Last Call. Again. If we are looking for things to complain about procedurally, the IESG making a decision to advance a document when the Last Call was about a version that was quite different from the one being sent to the RFC Editor would be high on the list. So, again, Directorate first, then requests from the Directorate to update and post relevant documents, then review by the Directorate, then posted recommendations from the Directorate, and only then an IETF Last Call on whatever document or documents the Directorate recommends. However, the important part of this note starts with Paul's other (or perhaps main) concern. >... > This misses my concern. There is an active draft > (draft-freytag-troublesome-characters) that seems to want to > change the IANA registry. Your draft > (draft-faltstrom-unicode11) also wants to change the registry, > but in a different way. My question is whether we should be > making the registry unstable in this way. Possibly this has been adequately covered already but, if draft-freytag-troublesome-characters is not clear about what I'm about to say (which I think is consistent with Asmus's recent postings) it needs to be fixed. My understanding (as a co-author) is that we expect to end up with two separate registries (or one meta-registry with two or three subregistries -- a detail that requires discussion with IANA). Neither the code point registry nor the proposed new one is normative in the usual sense; the context rules registry is normative. They consist of: (1) The registry of code points called for by IDNA2008 (including the Contextual Rules Registry) addressed by the present I-D. That registry should be stable except for additions unless some new discovery or change in Unicode requires reclassifying an existing code point or modifying or writing new Contextual rules. Changes due to the latter reasons (i.e., other than adding new code points) should be rare and made only after careful discussion, but are explicitly contemplated as possible by IDNA2008. It is perhaps worth noting that this is, by the design of IDNA2008, an inclusion registry: anything not explicitly permitted is forbidden. (2) The registry of recommendations and advice for zone administrators to consider in deciding what code points, sequences, etc., to allow in their zones. draft-freytag-troublesome-characters is intended to establish and seed that registry. It is not expected to be stable but, instead, is expected to evolve to reflect new discoveries and evolving knowledge. I hope this is clear from that draft, but there are three issues with that draft and one more general issue that are almost independent of its specific code point lists and tables and that almost certainly require community discussion. Personally, I not only don't know the answers but am torn about the tradeoffs I see so what follows is an attempt at a summary, not advice about what to do about them. (2.1) Whether having this sort of list as an official or quasi-official IETF-provided list is a good idea at all. As clarified in draft-klensin-idna-rfc5891bis, IDNA2008 requires that zone administrators [1] register only labels that contain characters, and character sequences, drawn from scripts (and, where relevant, associated languages) they fully understand and which, of course, conform to the parts of IDNA2008 now under discussion. It has been widely observed that many or most ICANN-Accredited Registrars and the Registries they support ignore that rule. The IETF can, at this point, go down either of two paths (or can continue to ignore the issue, which I'm fairly sure would be irresponsible). One is to reaffirm the rule (in which case a global troublesome characters list may not be needed or must be put in a different context). Unless we like making statements that are generally ignored, we would also decide to use whatever formal and informal mechanisms we have to convince ICANN (at least) to use whatever mechanisms they have to hold registrars and registries who are blatantly ignoring those rules accountable. The other is to conclude that registration of strings that the registrar and registry may not understand or be willing to take responsibility for is today's new normal and that providing advice for those registrations who lack deep understanding is a good idea (in which case draft-freytag-troublesome-characters is extremely relevant and draft-klensin-idna-rfc5891bis should be altered to explain the new reality and to update IDNA2008 (not just 5891 - see discussion in the latter I-D) to adjust the "don't register what you don't understand" rule to match the new understanding. If we go down that path, we probably need to understand that those SLD registries whose label evaluation model is essentially "we rely completely on the registrars to do the right thing and will accept whatever they send us" and those registrars whose model is close to "you pay for it and we register it" (including combinations of the two that advertise violations of the specs or good sense because they think they can sell them) are unlikely to be affected by better guidance. A modification of those options would be to give advice along the lines of "troublesome characters" to ICANN-accredited Registrars and TLD Registries and/or Registrars and Registries for so-called public domains but try to continue with the "don't register what you don't understand" rule inside enterprise and equivalent domains (most of whom, AFAICT, are mostly following it just because the alternative makes little sense). That would probably require (or at least benefit from) modification of both I-Ds. It would also not be clear in that circumstance whether the appropriate responsible party for the troublesome character list and corresponding registry is the IETF, ICANN, or someone else. Probably unfortunately, there are candidates for "someone else". (2.2) Expanding from the above, there is a question about the audience for any new efforts in this area, especially attempts to give advice such as the "troublesome character" list. If my observations and a certain amount of logic are correct, most administrators of domains that are used with relatively deep structure intra-enterprise are already fairly careful about what they register if only because reasonable rules fall out from the way the enterprise (or equivalent organization) is organized and from responsiveness to usability by people within that enterprise. If additional guidance is not needed at that level, then the question becomes mostly about what can usefully be done about SLDs (or, more generally, the domains at the boundary between "public" and "private" domains). To take an extreme example, it seems to me that there is no chance that the behavior of a registry that is happily making money registering and delegating labels that are clearly invalid under the IDAN2008 rules (with emoji as a prominent current examples), is going to change its behavior if the IETF says that they should only register strings that are conformant to IDNA2008 and that they understand or if the IETF gives them a list of troublesome characters that are PVALID under IDNA2008. A significant change in their behavior would almost certainly require forceful action by ICANN or by some effective government with appropriate jurisdiction. There are questions whether the IETF should be acting as an advocate for such actions. Probably those questions lie within the IAB's responsibility but, given potentially far-reaching implications, IMO it would be really unfortunate if they made such decisions without the consensus of an informed IETF community. This is not an easy problem. (2.3) The IETF has traditionally been very reluctant to create an IANA registry that requires maintenance but whose maintenance depends entirely on a single person and his or her expertise. We even have traditional jokes about those situations, e.g., about the consequences of "truck fade". IANA, too, has been known to raise questions about the creation and management of such registries. At least IMO, the most recent published version of draft-freytag-troublesome-characters does not address the maintance issue in a definitive way. I wouldn't expect the Directorate to have the bandwidth and skill set (other than with Asmus should he be appointed and want to sign up for that maintenance role long-term and on personal title) to actively maintain that registry. AFAICT, the discussion during the IETF 102 BOF did not contemplate the Directorate taking on such a role. (3) Finally, Asmus seem to be arguing quite strongly (and persuasively) that the contextual rules model of IDNA2008, especially the CONTEXTO collection, is completely inadequate and, in particular, that the rules specified in RFC 5892 do not adequately reflect an understanding of what he calls "complex scripts" (or don't reflect any understanding of complex script issues at all). Speaking as someone with considerable responsibility for the development of that model and those rules, he is almost certainly correct (although see below). However, if he is, it seems to me that we need to reexamine that part of IDNA2008 and decide whether to (3.1) Drop the CONTEXTO category and rely on registries being responsible [2], whether we provide more advice along the lines of the troublesome character list or not. (3.2) Review the CONTEXTO list and registry to be sure that the boundary between rules in the protocol and advice is right, modify the descriptive text as needed (that may affect other IDNA2008 documents), and add, modify, or drop code points and rules accordingly. (3.3) Rethink the CONTEXTO rules and descriptions so that important rules for complex scripts are adequately specified at the protocol level. It is worth noting that going very far down that path may require reexamining the boundary between labels as "words" and strings that may make mnemonic sense even if they make no linguistic sense for a particular language or writing system at all. It was one interpretation of that boundary along with experience with IDNA2003 (not just ignorance) that resulted in the original decisions about what should and should not be included in CONTEXTO. One way or another, I don't think it is appropriate to avoid asking whether Asmus's analysis makes a part of IDNA2008 technically defective and therefore obligates us to fix it. If the answer is that it is and we are, that may be the strongest argument for not advancing draft-faltstrom-unicode11 at least until we have the underlying issues in hand because the update (or confirmation that no update is needed) it represents is required to address contextual rules as well as code points. best, john [1] RFC 1591 effectively equates "registrar" with "whomever decides what names to put in a zone", i.e., with "zone administrator". It is not clear whether those terms are accepted as equivalent today or whether we should, e.g., be using "zone administrator" to describe that function for all zones and to reserve "registrar" for so-called public zones for which they take money to register names.
- [I18nrp] The evolutionary future of IDNA (was: Re… John C Klensin
- Re: [I18nrp] The evolutionary future of IDNA (was… Asmus Freytag
- Re: [I18nrp] The evolutionary future of IDNA (was… Jefsey
- Re: [I18nrp] The evolutionary future of IDNA (was… Martin J. Dürst