Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>
John C Klensin <john-ietf@jck.com> Tue, 06 March 2018 23:06 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id ABD431200FC
for <idna-update@ietfa.amsl.com>; Tue, 6 Mar 2018 15:06:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01]
autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id h5fI7ZxrMHd5 for <idna-update@ietfa.amsl.com>;
Tue, 6 Mar 2018 15:06:51 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id 2E2C7126C0F
for <idna-update@ietf.org>; Tue, 6 Mar 2018 15:06:51 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB)
by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD))
(envelope-from <john-ietf@jck.com>) id 1etLfM-000KlZ-JU
for idna-update@ietf.org; Tue, 06 Mar 2018 18:06:40 -0500
Date: Tue, 06 Mar 2018 18:06:33 -0500
From: John C Klensin <john-ietf@jck.com>
To: idna-update@ietf.org
Message-ID: <3CE2979FE460587F4D1FC3EE@PSB>
In-Reply-To: <0FB6F961-41C2-42EB-8713-C5B2F2CA83FD@frobbit.se>
References: <091044BC-5FE8-4050-911F-DACB83A4DDD4@icann.org>
<0FB6F961-41C2-42EB-8713-C5B2F2CA83FD@frobbit.se>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/cKOzbsCVW8Mkr6v1pPsugFpNtPM>
Subject: Re: [Idna-update] Expiration impending:
<draft-klensin-idna-rfc5891bis-01.txt>
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\)
implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>,
<mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>,
<mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Mar 2018 23:06:56 -0000
Hi. I have been mostly offline after sending my note a bit before 23:30 UTC yesterday and am going to try to address several issues in one response. I'm going to try to write this quickly, so apologies if I miss something important. (1) First, I think there are a few major principles that separate IDNA2008 from IDNA2003. They are: (i) Moving away from the design of tables of permitted and not permitted code points (including some transformations expressed in the same tables) to principles about code point inclusion and exclusion expressed primarily in rules about Unicode properties. The IDNA2003 model requires revisions with every version of Unicode to stay current. The IDNA2008 model prefers a review with each new version of Unicode but, in principle (and probably in practice if we got the rules right) does not. (ii) Clarifying responsibility for the protocol to work correctly. IDNA2003 put almost all of the responsibility on the registration side of things by having only very weak checks on lookup -- in essence, if one could find the name, it was ok. That model, btw, exists in the UTR#46 and closely-related WHATWG recommendations. IDNA2008 requires lookup-time checks for at some minimal level of validity, a change that not only makes the protocol work better but that provides something of a check on abusive and/or non-conforming registration processes at all levels of the DNS tree (see below for more about that). At the same time, IDNA2008 clarifies the responsibility of registries to know what they are doing in allowing a string to be registered. draft-klensin-idna-rfc5891bis-01 essentially reinforces and restates that requirement without changing anything. It may also be helpful to remember that requirements that registries understand what they are doing and act in a responsible fashion predate IDNs, and ICANN, and even the "trustee" language of RFC 1591. Enforcement is, of course, not the IETF's problem. See below. (iii) What is normative and what is just helpful. It is important to note that it is the rules of IDNA2008 that are normative. The derived tables (including the ones in the IANA databases) are an important response to those who "just want to be told what to do" or who don't have the time and resources to do the calculations, but they are not part of the standard or, as far as that standard is concerned, particularly important. Of course, those table do not remove the obligation to be informed and careful from anyone, but it is good to be realistic, especially in the current ICANN climate. Of course, users of the standard may treat tables more authoritatively. That is fine as long as the tables for one year do not treat previously disallowed characters as PVALID or vice versa without careful explanation and documentation .. see below. (iv) Permissiveness and conservatism. Depending on how one reads IDNA2003, this may not be a difference, but, while one of the IDNA2008 design principles is to extend the LDH "preferred syntax" rules to non-ASCII characters, it is ultimately very permissive, with few restrictions other than those motivated by having the protocol work. It is about specifying things that allow reasonable parties to create mnemonic labels using characters that they understand (see (ii) above and comments below) and not about trying to guard against bad behavior except in the most problematic circumstances. In particular, there are no rules requiring that labels make linguistic sense, no rules against archaic scripts, no rules prohibiting mixed-script labels or mixing, e.g., digit types within a label, any of which rules might be reasonable for a registry to impose. (v) Finally and driven strongly by user experience and reports, IDNA2008 attempts to be sure that the round trip between a label as expressed by the user and the label as stored in the DNS is an identity relationship. The restrictions on mapping and case folding (especially the latter) have other motivations as well, but are primarily just corollaries of the U-label <-> A-label requirements. The principle implies other requirements that we haven't talked much about. One is that using a non-ASCII string in the domain context of a URI is a really bad idea (it wasn't a good idea under IDNA2003, but the issues are now more clear). Another is that names in reference or identifier contexts (including but not limited to URIs) should probably be final names (in more or less the same sense as the use of that term for MX DATA (targets)). What the user sees or can type are other matters, but implementations should be designed so that users do not enter or see one string and get an error (or other) message about some other one. IMO and in retrospect, we did not do a nearly good enough job of explaining those issues in the IDNA2008 documents; if there were energy to do the writing (there probably is) and to review (we have evidence that there is not) explanatory and clarifying materials would probably be helpful. (2) Whether the issues that we uncovered when we reviewed the changes in Unicode 7.0 were "new" or not depends on definitions and that is, IMO, not particularly important. I've been told that there were significant debates fairly early in Unicode's history about whether normalization should be restricted to a very small number of methods or whether there should be options or variations for different circumstances, the latter partially to address some of these issues. What is clear is that we not only didn't know about those cases when the design of IDNA2008 was being sorted out but that we asked explicitly about relationships going forward and got answers that persuaded us that additional rules were unnecessary and that gave us no hint (at least that we understood) that we should look more deeply into the existing cases. It is clear to me that, had we known about the situation in the 2006-2008 period, we would have attempted to include rules to deal with it in IDNA2008. That is not to claim that we would have been successful in developing such rules, but I believe the specs would have at least included stronger warnings. (3) One key principle that is shared by IDNA2003 (and Stringprep more specifically) and IDNA2008 is that the IETF has strong consensus on avoiding getting into the business of going through Unicode one code point at a time and making our own classifications on an individual code point basis. We haven't wanted to delegate that job to an individual or small expert committee, especially ones we don't know how to sustain long-term. That possibility has been examined multiple times, in multiple IETF WGs and in the IAB, and the conclusion has always been, at least AFAIK, the same. (4) ICANN's relationship to this situation require some sort of shared understanding of their role, something I'm not sure we have even today. There is general agreement that they can and should make rules for labels that can appear in the root zone and nearly as general agreement that those rules should be very, very, conservative -- much more narrow than what is allowed by IDNA2008 with regard to what code points can be used. Whether they have, or should have, authority over second-level labels that conform to IDNA2008 (or even those that don't) has been hotly debated. Even ignoring the consequences of the "distributed administrative hierarchy" principle, the empirical evidence has been that they are powerless in practice and hence that debating applicability of rules developed for the root (or even modified versions of such rules) to second level labels and below is a waste of time. There have been discussions about the importance of different rules at the second and third level (and beyond) since the earliest attempts by ICANN to define IDN policy. Those discussions (which occurred before anyone started talking seriously about IDNs in the root) included, for example, the clear understanding that labels using archaic scripts might be very useful and appropriate within the DNS trees of particular institutions or enterprises even if they were not appropriate at the second level. With the introduction of IDN TLDs and the general flattening of the DNS and the root, the same arguments may apply at the second level within an appropriately-defined TLD. (5) As implied above, I think it is important to distinguish between efforts designed to help well-intentioned people, and registries who are willing to invest resources into doing the right thing, and efforts about preventing confusion, especially confusion created by those who have either evil intentions or who don't care about the use of DNS names as identifiers. That distinction is particularly important when suggestions are made about rules to be defined on the basis of what "most fonts" do with particular code points. At least until there are changes in how generic presentation software (such as web browsers and IM programs) works -- changes to which there appears to be considerable resistance -- "most fonts" is not helpful because malicious characters can not only select their own preferred fonts but, if something like CSS is in use, they can even overlay graphemes that were never intended to be overlaid, making, among other things, concerns about normalization almost trivial. That does not mean we should give up, but it does imply, at least to me, that we need to be very careful about our expectations and promises, even if registries get smart enough to require the malicious actors to turn things up a notch. (6) Now, a few observations on some of Asmus's comments, and then I will come back to ICANN and the IANA tables at the end of this section. >... > In many cases, using the variant mechanism (blocked variants) > would be the appropriate remedy - but this is a tool not > available to the IANA review. It is also not a tool that is part of IDNA (either version), except insofar as it is part of the "registries must take responsibility for what they choose to register, including the scripts and characters involved" model. While I think identifying variants and blocking them is a useful tool (and that has been proven to be the case in some domains), the identification process itself requires considerable effort and knowledge. At least outside the root and especially if less well-known scripts are involved, it is not going to be 100% reliable except for dichotomous pairs of characters (e.g., for Simplified and Traditional Chinese) and the implications of missing a string that should be treated as a variant (particularly relative to invitations to bad actors) should be thoroughly understood and accepted (by lawyers as well as from a technical standpoint). We also have significant empirical evidence that almost every effort to define variants and then block them has rapidly led to some party who claims that they want to/ need to/ have the right to/ are entitled to have some of those variants delegated. And then often get their wishes. So, for the root where the resources to do the analyses are available and we can at least hope that a "no delegated variants" rule can hold, I think blocked variants are a fine idea. I do note however that the so-called ccTLD Fast Track apparently allowed some variants to be delegated and that "it is unfair to allow those who got in line first to get something that is now barred to us" and "we are being put at a competitive disadvantage" have often been persuasive argument to ICANN in the past. Below the top level, where such a system has to depend on registries investing significant resources to identify and block names, I think it is just unrealistic. Some registries may comply, but experience indicates that, faced with a choice of registering and delegating a name and thereby producing revenue and investing significant effort to avoid delegating such names (and therefore not making the money) in an environment in which there are no effective sanctions if they register the name(s), I am pessimistic about its being very effective. I do think that blocked variants are a useful tool, but that their being effective requires registries to be very responsible, including seeking out and blocking questionable cases. To the extent to which that behavior not only costs them money but that, in the absence of clear rules that do not require subjective judgment, might possibly subject them to litigation, I think expectations should be kept low. > In any case, there is little benefit in keeping the IANA > tables stuck at Unicode 6.3.0 -- the number of pre-existing > cases (going as far back as the earliest Unicode version > covered by IDNA2008) definitely exceeds the expected > incremental addition from pending versions of Unicode. I'd love to know how you quantify the latter going forward, especially in an environment in which the Unicode Consortium has apparently decided to treat additional characters and code points (via the "adopt a character" program) as a profit center. If one were to follow the proposed rules for the root and exclude archaic characters and other characters that would be disallowed for other (non-IDNA2008) reasons (neither of which I think you have proposed), the size of the expected incremental additions gets even smaller. > And a > comprehensive solution lies outside the methodology of > property-based inclusion; however, the property-based IANA > tables would make a solid base on which to implement > additional mitigation. > > Therefore, the rationale to maintain this process in a stalled > state is tenuous at best. Now let's return to my comment near the beginning. The IANA tables are not normative. They are published as a guideline and convenience for those who want and appreciate such things. If ICANN, as the registrar for the root zone, decided to make its own calculation of a base code point repertoire from the IDNA2008 rule set and use that repertoire as input to the LGR process for the root zone, not only is there no one to stop them, but their doing that is perfectly conformant to IDNA2008. I don't think it would even violate the IAB statement as I read it. But the purpose of that IANA table as I have understood it since we created the registry is, again, to help out the less wary, less informed and knowledgeable, and, frankly, the more lazy. The mere existence of that population is probably a sign that the IDNA2008 requirement that registries not register strings that they do not thoroughly understand (even if those strings are otherwise IDNA2008-conformant and might be acceptable if registered in a different zone) is not working well. For them, I think there has been no case made that holding off on revising the tables is a cause of significant harm.. and some argument that it just represents a very conservative list, albeit not as conservative as the list would be if similar-looking strings within the same script were identified and somehow excluded (and not as conservative as the lists that will presumably be the result of the LGR process). That is an issue with the "troublesome character" list as well. If it is taken as one person's informative catalogue of code points that even the most careful and informed of registries should be extra-careful about, I think it provides a useful service. If, as has been the case with "whatever is allowed by IDNA2003", "whatever is allowed by IDNA2008", or even "whatever is allowed by UTR#46", regardless of what those documents actually say, it is interpreted as "the IETF and/or ICANN say that anything allowed by IDAN2008 and not on this list is safe to register", then it becomes a problem... as well as colliding with the principle described in (3) above. >... > Secondary to that is finding a way to communicate the to > consumers of these tables that simply allowing all PVALID code > points isn't a robust solution for many writing systems and > additional due diligence needs to be preformed - for example > along the same lines as is being done now for the Root Zone > (which is quickly defining the state of the art in that > respect). Well, IDNA2008 says that additional due diligence is required and that registries need to do it. draft-klensin-idna-rfc5891bis-01 says it more clearly and forcefully but can't seem to get processed in the IETF (which is where this thread started). ICANN has been asked, more than once, to make some clear statements about registry responsibility and to work on a plan to enforce that principle (including a plan to have the GAC work out a plan for ccTLDs if that is required). Those suggestions are not proceeding any faster than <draft-klensin-idna-rfc5891bis. I think they are even more bogged down and have been for longer, in part because, unlike the I-D in the IETF, the idea meets active resistance. best, john p.s., especially to IAB members who might be reading this. ICANN study groups and review efforts have an extensive history, especially in relatively recent years, of being turned into efforts that can only produce conclusions that support whatever ICANN is doing, perhaps suggesting more of it, perhaps suggesting cosmetic fine-tuning, and sometimes both but with actual criticisms prevented from being published and circulated. If the IAB is convinced that will not occur in this case, then, if someone reasonably qualified is needed and you don't have a better candidate, I will volunteer despite my reluctance to do further free consulting work for ICANN (and assuming the rules for the study group do not require that I go out of pocket). On the other hand, if you do not have good reason to be convinced of that, I suggest that it is in the best interests of the IAB and the DNS technical and user communities that you decline the offer to supply someone for the position. p.p.s. Just to respond, for context, to a few of the comments made and discussed earlier, I could read text in three scripts before I graduated from secondary school, with more than one language in two of them and one of the three scripts running right to left. I can also read the characters of at least one more script but have no claim on ever being able to read the associated language except, painfully, for a few sections of some classical materials. I have some claim on at least some understanding how one other script, very different from those above, works. I also did some work on character coding, identifiers in programming languages (including non-ASCII ones), and multilingual thesauri, and worked a bit with a very well-known typography expert, most long before there was a Unicode. Do those things, or the fact that I studied a bit with experts on the evolution of languages and writing systems, make me an expert in this area? Nope. But it does give me some basis for believing I'm got a little bit more background for constructing reasonable intuitions and having some idea what questions to ask than someone who is familiar with only one language, or maybe two, and who has deduced that everything else follows the same rules (conclusions that are very common in the IETF and ICANN although, I hope, not on this list).
- [Idna-update] FWD: Expiration impending: <draft-k… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Kim Davies
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… John R. Levine
- Re: [Idna-update] [Ext] FWD: Expiration impending… Suzanne Woolf
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] FWD: Expiration impending: <dra… Francisco Arias
- Re: [Idna-update] [Ext] FWD: Expiration impending… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Francisco Arias
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- [Idna-update] IDNA and combining sequences (was: … John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Mark Davis ☕️
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine