Re: [Idna-update] Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>
John C Klensin <john-ietf@jck.com> Thu, 15 March 2018 00:56 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id 1EF2512D7F5
for <idna-update@ietfa.amsl.com>; Wed, 14 Mar 2018 17:56:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001]
autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id 0oPD45_SdvIW for <idna-update@ietfa.amsl.com>;
Wed, 14 Mar 2018 17:56:31 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id 0ED1D12D7E5
for <idna-update@ietf.org>; Wed, 14 Mar 2018 17:56:30 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB)
by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD))
(envelope-from <john-ietf@jck.com>)
id 1ewHC1-000Leo-5p; Wed, 14 Mar 2018 20:56:29 -0400
Date: Wed, 14 Mar 2018 20:56:21 -0400
From: John C Klensin <john-ietf@jck.com>
To: Asmus Freytag <asmusf@ix.netcom.com>
cc: idna-update@ietf.org
Message-ID: <3F9BFC15AA26613D6E364523@PSB>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/xV0US13TdyYXvcqV4ui2tSITXyA>
Subject: Re: [Idna-update] Expiration impending:
<draft-klensin-idna-rfc5891bis-01.txt>
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\)
implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>,
<mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>,
<mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Mar 2018 00:56:34 -0000
--On Saturday, 10 March, 2018 23:28 -0800 Asmus Freytag <asmusf@ix.netcom.com> wrote: >... >> (2) Advice, guidelines, or policy requirements applicable to a >> broad variety of domains. > May not need to define precisely what the limits of that > variety is, but should note that zones where all labels are > owned by the same entity may already be restrictive in other > ways, whereas zones that not only are "public" but also > support users of more than one script (and by extension > multiple languages are perhaps most in need of these). Yes, absolutely. But this is one of the places where my rant about DNAME (see recent note in response to John Levine) comes in. Suppose we have a nice, orderly, TLD named "область" with rules that allow basic Latin and Russian Cyrillic code points. Suppose those tables are not only enforced by the registry but that they are published using the mechanisms of RFC 7940 and that lookup applications, knowing that those are the published rules, enforce them at lookup time. The latter would not be necessary if all registries followed the rules but would constitute powerful protection for users if some did not (and would not have been possible until RFC 7940 standardized the table format and make it machine-processable). Now, let's assume a second TLD, κτήση, and assume it has rules that only allow the subset of Greek script used in modern Greek. Assume that its registry operator is well-behaved enough to avoid anything at the second level that includes a DNAME pointing out of the zone, but it does register an SLD that, to make the problem clear, we will call κακό. We now have κακό.κτήση, a perfectly valid SLD (modulo some issues about the tonos, but let's not get distracted). But now the owner/ zone administrator of κακό.κτήση, being as unconcerned about good and safe user experience as its chosen name implies, creates a subdomain whose DNS entry, in the usual presentation form, looks like κτήση.κακό.κτήση. IN DNAME область. Now, all of the rules of both TLDs have been conformed to -- there are no labels in subdomains of область that are anything but Latin or Cyrillic and no labels in subdomains of κτήση that are anything but Greek. But apparent subdomains of κτήση.κακό.κτήση are not in Greek, rules that apply to individual labels in individual zones (rather than FQDNs) are of no help, and, given the number of graphemes that appear in both Greek and Cyrillic, all bets are off. One could try to make rules against FQDNs that use different scripts in different labels, but that would not only be contradictory to the "distributed administration" principle but there are a large collection of reasonable examples for which it would make no sense (consider, e.g., subdomains of the EU TLD). Some far nastier examples are possible and are left as an exercise. Many of those examples require an unscrupulous or inattentive TLD registry, but we haven't had a big shortage of those. >... >> For zones that are well >> and carefully run, with name use and the best interests of >> Internet users in mind, it is reasonable to expect that >> well-designed guidelines will be followed, as least unless >> the zone administrators understand things well enough to >> carefully allow exceptions. > These zones will take active steps to mitigate issues caused > by the fact that IDNs are necessarily reflective of f the > writing systems involved - including their comparative > complexity vis a vis the basic Latin alphabet. Right. I just believe that, absent much more serious enforcement mechanisms than have proven possible in the past (for those who believe such mechanisms are feasible, I suggest a careful study of ICANN's history when a TLD has started with one set of rules, signed appropriate contracts, and then come back and said something equivalent to "well, that didn't work as a business model so we want to make changes") or a sudden outbreak of a disease that causes all TLD registries to suddenly start putting the best interests of Internet users ahead of the interests of their registrants and profits, we need to be quite clear about our expectations. >> The meta-rule that zones are required to >> understand what they are doing is one such guideline and, at >> least to a first approximation, I think most registries >> (including zones below the second level that are using IDNs) >> try. > Not my impression at all. For anything other than very simple > scripts, the IDNtables that can be easily inspected because > they are available online as tables are not reflective of a > deeper understanding of the writing system, beyond the most > basic limitation of the repertoire to a given script. Ok. Probably I was being too optimistic. But, if you are correct (as you probably are), that is another answer to John Levine's question. It also raises what is probably the fundamental strategic question in all of this. There are two possible extreme positions. There are probably some middle grounds, but I have had no success in finding them (perhaps you and others can do better). At one extreme, we try to persuade registries to allow only those labels whose implications they fully understand, not just in terms of code point lists but in, to use your words, reflecting a deeper understanding of the writing system. Although it isn't spelled out clearly in the IDNA2008 specs, and may not be clear enough even in the 5891bis draft, it means that a TLD registry that does not have a deep understanding of a script and _all_ of the writing systems with which it is used has no business allowing SLD registrations in that script at all and that, the DNAME issue notwithstanding, each (TLD) registry should be contractually imposing similar requirements on each SLD it creates and all of their subdomains (something 1591 clearly anticipated but ICANN has required only for a few "sponsored" domains and then effectively abandoned the requirements). The alternative, stated as positively as I can manage, is that we accept as a fact that we are never going to get that level of responsible behavior from registries, even some (or many) TLD registries, and take measures to enable those who don't have a clue about particular scripts but are determined to register labels that use them anyway to do a somewhat less bad job. The thing that concerns me about going down the second path is that it may encourage registries who have no idea about, e.g., how the writing system for Lower Slobbovian works to say, "well, maybe we can make some money selling Lower Slobbovian names and all we have to do is follow ICANN's code point lists and rules and, even if the result is dangerous in some way, no one will be able to blame us". >... >> What has not panned out is the assumption that ICANN would >> make at least some serious effort to either persuade or induce >> registries, even second-level entities under contracted TLD >> registries, to follow those guidelines, especially with >> regard to taking active responsibility for what they are >> doing. > This is an empty letter as long as no guidelines exist that > establish meaningful metrics allowing to tell apart successful > mitigation from simply copying some basically insufficient IDN > table from some other zone. I'm not sure I agree with your assertion at all and doubt the fundamental assumption behind it, but, assuming that I did, I don't see how you establish credible metrics. The problem, as we have discussed at length and agreed about in other contexts, is that no set of tables and rules, no matter how detailed, are going to be able to enumerate all of the possible cases that should be blocked, not looked up, or otherwise avoided. So, you can count the number of labels that have been blocked but cannot, as far as I can tell, count those that should have been blocked and weren't ... unless you can persuade the browser vendors to instrument their code (and give you the results) and I'm not sure even that would be effective. >> But that doesn't make establishing the guidelines a bad idea. >> I'm not aware of it having happened yet with IDNs, but there >> is precedent for people who are harmed by someone else's bad >> behavior to cite violation of widely-accepted guidelines as >> evidence that the behavior was bad or negligent. > This would be a beneficial use of guidelines that actually > cover meaningful best-practice. It would also be, at least from my point of view, a beneficial use of guidelines that clearly identified stupid behavior. If the "meaningful best practice" comes down to "don't do anything stupid or that you don't understand" then the two formulations are probably equivalent. >> (3) The rules that are used by the operator/ administrator of >> a particular registry to determine what can or will be >> allowed in that zone. > Note that for many zones what is allowed is not static, but > depends on what is already delegated. The rules effectively > describe what is allowed next, given a status quo. > This approach is something that is useful (or should be > understood as useful) for the majority of zones: in all zones > registrants compete for available labels, but in some zones, a > delegated label does not only prevent the same label, but also > certain related labels from being registered. !! I think I finally figured out why we agree on the facts but keep struggling over the conclusions. Sorry for taking so long. Let me see if I can explain. The above is perfectly reasonable and rational. It is, indeed, the way the DNS was run for many years before ICANN came along. Names were delegated (or reserved, which did happen, although not often, and some names were rejected when applied for independent of formal reservations) on a first-come, first-served (FCFS) basis. If someone came along later and wanted a reserved name, they needed to be able to explain convincingly why the reason it was reserved didn't apply any more. If they wanted one that was already delegated, too bad for them. Disputes about "rights" to names were largely avoided by repeating that the things were no more than convenient mnemonics and not brands or marks (that might not have been realistic, but it was generally believed and accepted with a few very notable exceptions). The system, and the reasons for it, were generally enough understood that the frequency with which someone needed to be told "no, you lose, either propose a different name or go away" was exceedingly low. That model worked well for all levels of the tree below the root and under the distributed administration model. The root was different because it had its own, fairly clear, rules (see RFC 1591 for an instantiation of those rules). What you are suggesting about would have fit very will into that model. Now, let me see if I can describe an alternate reality. In that reality, people decide that there should be a competitive market in names. Suddenly, the childhood fantasy of one of my peers to own "3" becomes plausible, especially after someone else actually does get to buy "zero". Because a lot of people would like to own "3", (or whatever example one might pick), rather complex policies are put in place to manage dispute resolution, policies that, in general, do not recognize either FCFS or technical considerations. A result of that and other factors caused FCFS moved from being an administrative convenience to being "early adopter advantage" and unfair to those who came along later and did so in an environment in which unfairness was a powerful argument for reversing a policy decision. If a registrant collected a series of names, perhaps to minimize confusion or make typing more convenient when it was important (for example, consider "isoc.org" and "internetsociety.org"), it was possible that the dispute resolution procedures (or legal action independent of them) could apply a Solomon-like solution to a dispute and split the bundle up. Of course, the second is a different reality sufficiently different that none of us could imagine it in, say, 1996. I suggest that it bears a much closer resemblance to what has occurred in practice in recent years than the earlier model. > Such related labels are called variants -- but the use of the > term variants does not necessary imply that more than one > label gets to be allocated. On the contrary, our experience in > Root Zone has shown that blocked variants are by far more > common and useful tool. Useful to whom? If the goal question is "how best to prevent confusion" or "how to avoid getting tangled up in the DNS's poor alias facilities", certainly blocking anything that can be confusing with a known and delegated (or about to be delegated) label is the right answer. If the goal is to increase the odds that users will find the names they are looking for no matter how they type them, then one wants to delegate as many of the strings as possible and cope with the aliasing problem (possibly by pushing it off to the registrant with some sort of "same owner" rule and hoping that sticks. I know it isn't how parts of ICANN want to think about the problem, but I believe that, if a consistent user experience is the target, then whatever rules are applied to blocking or delegation of character variants ought to also apply to other types of orthographic variants and, if the labels are words, same-language synonyms. >> Noting that the rules for the root (including the >> LGR rules but not limited to them, at least yet) are just a >> special case of this category, all that is necessary to >> application and enforcement of the rules is that the registry >> decide to do so... and that the rules that are chosen be >> sufficiently acceptable to whatever community(ies) are in a >> position to hold the registry accountable that they don't >> either turn into a source of never-ending strife or change >> often enough to cause perceived instability. > Again, the Root Zone project is informative. > So far, a single proposed script LGR has proven contentious -- > with the wider community objecting to it being too permissive, > rather than the other way around. As this has not reached the > stage where an LGR has been submitted, the community may yet > come to some agreement. > Generally, the drafting panels appear to have been well > anchored in their respective communities and their submitted > proposals perform well when tested against lists of putative > labels while mitigating known issues. I don't want to go much further down this path -- it is very much separate from anything having to do with IDNA as a protocol -- but I wonder if it is worth remembering that there have been questions about whether some of the "communities" of which those panels are representative are actually defined correctly and that the number of new gTLDs actually allocated and delegated under LGR rules so far is zero. It may be a little early to start drawing firm conclusions. >> I think some of our discussions have confused the first two >> and last two categories and then gone on to confuse the third >> (general guidelines for the second level and below) and fourth >> (registry-specific rules), assuming, for example, that what is >> suitable for one registry should be applicable, either as a >> guideline or as a protocol changes, to the others. > > Very cogent observation. > > I'm certainly keenly aware of the difference between the > RZ-LGR project which effectively results in registry-specific > rules (for the root) and general guidelines that can be used > to define registry-specific rules on other levels. > The RZ-LGR is purposefully limited to the modern-use subset of > modern- use scripts. (It also includes hyphen and digits). It > further makes the assump- tion that the zone is shared by > users of multiple scripts - and users of "all" languages - at > least those that are widely written for everyday purposes. Good assumptions except that I thought digits and the hyphen were banned from labels in the root by other decisions... decisions that go back to RFCs 1123 and 1591. Or maybe I misunderstood you. > The individual script LGRs are also accompanied by very > detailed description > and references to source materials. > That allows them to serve as examples/starting point for > defining registry- specific rules for other zones. The closer > these zones match the other assumptions for the RZ project, > the closer to the RZ-LGR their eventual rules would be > expected to end up - unless the RZ-LGR erred in how to account > for the risks inherent in multi-script/multi-zone domains. > This simplifies the task of the guideline writer, without > reducing the task to simply imposing the RZ-LGR unexamined > everywhere. Yes, certainly. > If I were to be asked by a registry to develop policy for the > second level, I might proceed as follows: > > (0) Start with the RZ-LGR s for all the scripts to be > supported in the new zone. > > (1) Add all digits and the hyphen > (1a) Mitigate the issue of homoglyph digits (Arabic) by making > them blocked in-script variants > (1b) Mitigate the issue of some digits being homglyphs of > letters by making these blocked in-script variants > > (2) Retain cross-script variants from RZ-LGR for all scripts > in the new zone > > (3) Optional, if a feature (restriction) in the RZ-LGR is > documented as being > motivated by the need to be particularly > restrictive for the root, > investigate the cost/benefit of removing the > restriction. (If in doubt keep the restriction). > > (4) If the zone is limited to certain languages, remove > features (restrictions) > documented as being necessary in a multilingual zone > (4a) investigate the cost/benefit of adding language-specific > support for any language that isn't well-supported due to RZ > restrictions. > > (5) Make sure LGR follows guidelines on combining marks; (like > the ones we discussed under separate cover. > > (6) Make sure LGR follows guidelines on repertoire (TBD). > > (7) Make sure WLE rules and context rules continue to work if > repertoire expanded > (7a) It may be appropriate to relax / tighten some of > these rules if > the mix of languages to be supported requires that > or benefits from it. > > and so on. This seems to me like good guidance, especially for those cases that use scripts and languages that are part of the root zone process, I note that it might be perfectly sensible for a script community to decide that the processes required by ICANN (formation of a Generation Panel, making the request for one or more TLDs that involves a more or less complex and expensive application process, coming up with an operations model or operator that would meet ICANN requirements, paying an application fee, etc.) was not worth the effort and consequently to not try to put names in that script in the root while nonetheless using them at the second level or below in trees that the considered appropriate. Presumably, any guidelines would not try to block names that were part of that pattern, but that implies that either they would not be applicable at all to such scripts but would need to make special provisions for them. Either way, it is not an IETF problem > The guidelines would effectively focus on well-understood > modern-use script, because that's what is addressed by the > RZ-LGR. Because (most) historical scripts are supremely > ill-understood and have not had the benefit of any deeper > analysis for IDN purposes, a general guideline would strongly > recommend against including them. > Some lesser-used and/or emerging scripts (with modern user > communities) would benefit from their communities following > the RZ-LGR procedure with appropriate adaptations. See above. > Some very limited zones, e.g. Cyrillic only, or Polynesian > only, might come under pressure to support code points for > characters that look like punctuation marks; these code points > were not ruled out in IDNA, but are considered deeply > troublesome (not least by RFC 6912) -- actual guidelines would > have to be written so as to settle the question whether > "secure" LGRs should always eschew them, or whether limited > exceptions are to be seen as justified. I think that trying to make general statements or guidelines about these situations put you into territory where neither you nor ICANN would be wise to go but, again, not an IETF problem. > The issues facing the various scripts are diverse enough that > trying to write fully general guidelines gets either > meaningless (too vague) or bewilderingly complex in no time. > That's the reason I keep coming back to the use of the RZ-LGR > as a non-binding starting point. Understood. john
- [Idna-update] FWD: Expiration impending: <draft-k… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Kim Davies
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… John R. Levine
- Re: [Idna-update] [Ext] FWD: Expiration impending… Suzanne Woolf
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] FWD: Expiration impending: <dra… Francisco Arias
- Re: [Idna-update] [Ext] FWD: Expiration impending… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Francisco Arias
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- [Idna-update] IDNA and combining sequences (was: … John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Mark Davis ☕️
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine