Re: [I18ndir] Request for review: draft-faltstrom-unicode11-07
John C Klensin <john-ietf@jck.com> Wed, 06 February 2019 20:21 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A1752127AC2 for <i18ndir@ietfa.amsl.com>; Wed, 6 Feb 2019 12:21:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VTYDNJuK6MTc for <i18ndir@ietfa.amsl.com>; Wed, 6 Feb 2019 12:21:52 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5E964130EB3 for <i18ndir@ietf.org>; Wed, 6 Feb 2019 12:21:52 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1grThd-000BJj-1C; Wed, 06 Feb 2019 15:21:49 -0500
Date: Wed, 06 Feb 2019 15:21:43 -0500
From: John C Klensin <john-ietf@jck.com>
To: Patrik Fältström <paf=40frobbit.se@dmarc.ietf.org>, Alexey Melnikov <aamelnikov@fastmail.fm>
cc: i18ndir@ietf.org
Message-ID: <E04605C09F1D2EBF18CAAF86@PSB>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/vFddEKMIN0DR55QALal16H_1WYw>
Subject: Re: [I18ndir] Request for review: draft-faltstrom-unicode11-07
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Feb 2019 20:21:56 -0000
I'd like to suggest, not a different view of the subject from Patrik's, but a different set of necessary priorities and interactions, at least hinted at by his (3) below. I've had many parts of this discussion in other forums with most of the people on this list, so will try to summarize and expand or answer questions if needed. This note has taken me close to a week to write and is still longer than I would like -- these are not easy problems as the whole long history that has brought us to this directorate indicates. We've got a rather large pile of documents on the table (or fallen on the floor next to it). At one point or another, one or more of the ART or former Apps ADs have been asked to process these documents or asked to create a context in which they could be processed (and then process them). The request for the BOF that led to his directorate was a next-to-last-ditch effort to establish a framework in which they could be evaluated, improved, and processed. TL;DR summary of this note: It is extremely unwise for the IETF to process draft-faltstrom-unicode11-07 before a number of other issues (and associated documents) are carefully considered and resolved. The main body of the rest of this note is divided into two parts: a summary of the issues and the more obvious of the associated documents and a discussion of why processing draft-faltstrom-unicode11 without first addressing those issues poses significant risks for IDNA, for the IETF, and for the Internet. Those parts are followed by a third which makes some summary recommendations based on them. I'd encourage people to read and try to understand both but those who have too little time may want to look at the second part first. The note ends with what I hope are two constructive suggestions as to how we might move forward. ------ I. THe issues and the documents As a group, those other documents raise fundamental questions about the design of IDNA2008, whether that design is actually practical, and whether aspects of it need to be explained better or tuned to a greater or lesser degree. The "can we trust UTC" question that Patrik mentions is just one aspect of that situation. Interestingly, it seems to me that there are three possible answers to the question, not just two: * Yes, but... (aka "trust but verify"). That conclusion immediately leads to the question of what we do about the anomalies that are clearly present and the new ones that will appear in the future. More on that below. * No. But then we need to figure out what to do next, something that will almost certainly require new rules in IDNA2008 (specifically 5891bis and/or 5892bis) and/or the long-discussed (and long-avoided for good reasons) IETF-specific normalization form. * Yes, really. UTC, or at least various important actors in it, have long suggested that the IETF is incompetent to do this work and that we should simply turn IDNA over to them and drop out, just as, e.g., we handed off HTML to W3C (in both cases, the perception from the other body is that we had no business being in those areas in the first place). One of the ways to read some of the revisions of UTR#46 is that they constitute moves toward "Unicode domain names" that would replace any IETF IDN work. Remember, because it is important, that IDNA2008, as now written, assumes we can construct a model of permissible labels based on a collection of property-based rules. By contrast, key Unicode people have stated that any such system is ultimately doomed by the edge cases and that only tables of properties of particular code points can ever be authoritative. Noting that suggestions have been made fairly recently on the IETF, IDNA, and I18nrp lists that we should either turn everyone over to UTC or accept UTR#46 as definitive and either ignore conflicts with IDNA2008 or modify IDNA2008 to match. The desire for (or perceived necessity of) normative tables and code point by code point decisions may put some perspective on some of the demand for draft-faltstrom-unicode11 ... or it may not. Some of the more technical issues associated with that part of the problem are discussed in draft-klensin-idna-5892upd-unicode70 (-05 is expired and I've been discouraged, including by some relevant ADs and the IAB Program leadership, from posting -06). However, that document includes only one type of examples of those fundamental questions. As an example of another, IDNA2008 calls for, and depends on, a very high degree of responsible behavior on the part of registries and registrars at all levels of the DNS as part of what, conceptually, is a layered system in which the rules of RFC 5892 provide an upper bound on what is permitted but that additional processes further restrict that set for a particular zone, usually drastically (the number of groups who have ignored or misunderstood that layering concept may suggest another area in which the IDNA2008 may need work). At least part of what is often called the registry restriction issue is discussed in draft-klensin-idna-rfc5891bis, but there is a more fundamental problem, which is that IDNA2008's advice to registries is to allow only labels written in scripts (or derived from languages) that they thoroughly understand. IDNA2008 also calls for lookup applications to test strings for at least superficial validity before looking them up (IDNA2003, UTR#46, and common practice among browser implementers is to assume that anything that is given to such an application can and should be looked up and that 100% of the responsibility for acceptable, non-problematic, labels should rest on the registries). Observations from the last eight years or so suggest that those IDNA2008 requirements are widely ignored. Most or all of the browser vendors are following the IDNA2003 and UTF#46 recommendations and assuming that pre-checking of putative labels is too much of a performance hit to be justified, at least under the reasoning they have heard and accepted. Conformance by registries varies widely, with some flaunting even the validity rules imposed by RFC 5982, some trying to be very careful, and some registries (especially TLD registries with global scope) effectively taking the position that the requirement that they have in-depth understanding of the scripts (or even combinations of scripts) from which they are accepting labels is unrealistic from a business standpoint. One way to cope with the latter problem is for the IETF to reach into the next layer of restrictions by providing advice about particular risks and traps to those registries who intend to register labels in scripts for which they do not have particular expertise. draft-freytag-troublesome-characters provides at least one form of that type of guidance but would represent a big step in several important ways. While it is not clear to me whether they are part of what this note claims is the critical path, a number of efforts to build on RFC 3743 --albeit in very different directions-- to determine which labels are appropriate in combination in a particular zone or to support registration of sets of labels so that the user will find "the right place" no matter which particular representation appears in a query. Most of those efforts, including mechanisms for supporting so-called delegated variants and mechanisms for extending the DNS's aliasing capabilities, have occurred outside the ART/Apps area (e.g., work on DNAME variations in DNSEXT and then DNSOP( but that does not make them less part of the i18n/idn scene as observed from outside the IETF. More generally, when the IETF publishes Proposed Standards and notices after five (or eight) years that provisions of them are not being followed, we usually consider it appropriate --assuming we still believe in the "running code" bits -- to review those specifications and see if anything needs to be changed to adjust to reality or of there are more persuasive actions we should be taking to increase conformance. The answer might be "no" or "someone else's problem" to both, but it seems to me that we need to ask the question. ------- II. Why not just process and approve draft-faltstrom-unicode11-07 and then come back to these issues as Patrik seems to suggest? RFC 6452, the precedent for this document, made an affirmative assertion that there were no issues worth worrying about -- there were simply no concerns of any significance. That assertion was the basis of what Patrik lists as 2b below -- no changes were necessary because the then-new version of Unicode did not raise any issues that required special consideration of action. By proposing 2b now, this I-D essentially make the same assertion, i.e., that we have discovered no issues, either in new Unicode versions or through experience, that require exception entries or other modifications to the IDNA2008 specifications. That just isn't the case. We know there are issues: not only did the January 2015 IAB statement that the document cites say that, but so does draft-klensin-idna-5892upd-unicode70, whose most recent posted versions (and every version since the date of the IAB statement) contain a much better and more complete analysis of the issues than the IAB statement itself (which was written when we thought the issue was all about Hamza rather than a much larger set of code points that don't decompose the way some reasonable people think they would if they were not constrained by history or language distinctions). The discussion above points to other issues as well, and parts of the "troublesome characters" spec and the discussions leading up to it strongly suggest that the "ContextO" category in RFC 5892 is woefully inadequate and should be rethought, that it needs many more code points and context rules, or that it should be dropped in favor of some other model. One way to view the difficulty is that a statement in Section 4.1 of draft-faltstrom-unicode11-07 is simply not correct. Another is that approving the document has implications far beyond its stated scope. That statement begins "The discussion in the IETF concluded...". Now, while I believe that particular conclusion is correct as stated, there has never been a discussion in the IETF --much less a Last Call to determine consensus -- about that issue, at least unless one counts the LUCID BOF (and we don't even permit WG meetings during IETF to determine WG consensus, much less IETF consensus). More important, whatever we might have believed in relatively short period in 2014 and very early 2015 (right after the discovery of the issues with U+08A1), we learned that code point was just an example of a much broader problem. Because of its timing, the IAB Statement did not identify or explain that problem, but a number of mailing list discussions (including in the IAB I18n Project), some discussion during the LUCID BoF, and the evolving versions of draft-klensin-idna-5892upd-unicode70 attempt to do so. So, unless the community does the work to understand and evaluate the issues raised by those various documents (and summarized, perhaps badly, in Part I above) and decide what to do about them, it is inappropriate to approve --or, I suggest even to process through IESG consideration -- the present document in its present form. FWIW, I also think there are some issues about fairness to those who were asked to prepare the documents mentioned above, a number of other explanations of the issues, and materials for the BoF if that work is brushed off without any evaluation or explanation (or in some important cases, even references or specific acknowledgment) by publishing this document in its present form and moving on, but those are not directorate problems. Independent of what should have been done in the past, publication of this document in its present form, especially with a statement like the one cited above, effectively puts the IETF on record as having decided that none of the issues raised by the other documents are relevant and that IDNA2008 is just fine without addressing those issues at all. Maybe that is the right answer, but it seems to ma that it at least needs careful discussion -- more discussion than I can imagine being possible during a four-week last call -- before that conclusion is reached. The documents other than draft-klensin-idna-5892upd-unicode70 are important in this regard too. It might be reasonable to say "it is ok to ignore those issues as far as IDNA2008 tables are concerned iff registries exert special care as described in draft-klensin-idna-rfc5891bis" or "it is ok to ignore those issues as far as IDNA2008 tables are concerned iif registries (and perhaps lookup applications) pay extra-special attention to the characters called out in draft-freytag-troublesome-characters", or both. I think all three of those documents would need at least one more iteration to make such statements about them appropriate. The statements would also turn those documents into normative reference and thereby prevent rushing the present I-D out the door, but at least we would not be discarding that other work without serious review and consideration. If we are going to put this document out in a form that concludes that the issues raised in those other documents are unimportant, we should all be aware of another implication of that choice. As most of us know, there are large communities of people out there who believe the IETF should simply go along with whatever the Unicode Consortium decides without trying to do any additional review or apply any additional rules, especially reviews or rules that might result in inconsistencies with UTC recommendations. Some of those people are probably just ignorant of the issues but looking for simplicity; others are as informed as anyone on this list but have reached different conclusions, perhaps based on different assumptions (the availability of language information to DNS users has been a particularly important one in the past) or different priorities. In practice, if this document is published with the conclusion that none of the issues discovered and suggestions made since RFC 6452 was published are relevant, it not only takes us a large step toward "just accept whatever Unicode does", but, if new issues are discovered in the future, it sets us up for a discussion along the lines of "this change and problem is much less severe than the ones you decided in RFC XXXX (ex- draft-faltstrom-unicode11) were unimportant, so why are you proposing some action now? I would not have an answer to that question. I hope that those who believe this document should be advanced at the present time and without additional work do. ------- III. Recommendations (1) My preferred option is that we put draft-faltstrom-unicode11 aside, figure out how to consider (at least) draft-klensin-idna-5892upd-unicode70, draft-klensin-idna-rfc5891bis, and draft-freytag-troublesome-characters, figure out what should be done about the issues they raise (and any other issues raised during that discussion), then publish whatever is deemed appropriate and recast this I-D in the context of those conclusions. I don't know if an examination of whether IDNA2008 is sufficiently inconsistent with general practice and "running code" that we need to reexamine the core documents in that context, but I think we need to ask ourselves that question too (and figure out how to ask the IETF more broadly if that is appropriate). (2) Perhaps there is really great urgency for processing and publishing draft-faltstrom-unicode11 now and either drop those other issues or try to come back to them at some point in the future. I've heard that suggested several times. Alexey's "already been delayed by 2 months" comment appears to align with a sense of urgency even though consideration and processing of draft-klensin-idna-5892upd-unicode70 has arguably been delayed by four years (version 00 was posted in August 2014 and a successor version was requested to be processed not later than January 2015). If this is actually urgent, then lets revise the I-D to reflect that situation. I think that would include being explicit about the urgency including where it is coming from and what the consequences would be of delay. Then let's explicitly note that there are issues outstanding and describe what cautions should be taken with code points that RFC 6452 does not allow but draft-faltstrom-unicode11-07 lists as PVALID. I don't have a list of such precautions to propose right now but believe we could fairly easily devise one, even if required a return to some variation of the very early pre-IDNA2008 idea known then as "Probably-Yes". Either way, let's not publish this I-D in a way that ignores the outstanding issues and documents and that does so in a way that can easily be read as an IETF conclusion that they are not relevant. best, john --On Sunday, February 3, 2019 16:34 +0100 Patrik Fältström <paf=40frobbit.se@dmarc.ietf.org> wrote: > As the editor of the document I want to point out the > mechanisms of the document explicitly. > > 1. It identifies a few incompatibilities in the Unicode > Standard that have been made between the versions. Including > one already identified by IAB. > > 2. For each one of the incompatibilities two choices can be > made: > > 2a. Add one or more exceptions to the IDNA2008 standard (i.e. > updating it with new rules) > > 2b. Add zero exceptions and acknowledge that the > incompatibilities have been made (i.e. updating it without new > rules). > > [2b] is the choice IETF did last time, RFC 6452. > > [2b] is what this document suggests. > > 3. The IAB statement on code points > <https://www.iab.org/documents/correspondence-reports-document > s/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/> is a > larger issue which can not easily be solved by "just" adding > some exceptions. One can also see in for example > <https://www.alvestrand.no/pipermail/idna-update/2015-February > /007911.html> that the views on the overall problem (if it is > a problem or not) is split. My view is that the over arching > issue here to some degree have to do with "trust" between IETF > and Unicode Consortium and whether IETF should continue to use > various meta data in the Unicode Standard for calculating what > code points are valid or not. Alternatives could be to just > let Unicode Consortium do it, or for IETF to do code point > selection code point by code point. The latter IETF tried in > IETF 2003, and we decided then that was a stupid idea and came > up with IDNA2008. > > To conclude, as an editor of IDNA2008 I suggest [2b], i.e. no > changes to IDNA2008 even though there are incompatibilities, > and I think [3] is a larger discussion that for example should > be held here. > > IF, but only IF, that is the consensus of the IETF, then this > draft is what should be an RFC. If any of the questions I list > above is answered differently, then the draft is the wrong > thing and should contain something else. > > I.e. I wrote it that way to trigger some discussion as I did > not feel after getting some letters from IAB that ultimately > "is my boss" in my role as expert reviewer of the IDNA2008 > tables at IANA I could "just" ask IETF what to do. I needed > something that "is a proposal". > > Here it is! > > And I am happy to answer questions! > > Patrik > > On 3 Feb 2019, at 13:59, Alexey Melnikov wrote: > >> Dear I18N Directorate, >> >> I will be sending this document to IETF Last Call shortly, so >> I would like to request a review. In particular, I would >> appreciate: >> >> 1) speedy review of the document, due to Unicode Consortium >> intent to publish Unicode 12 in March 2019 (*). 2) comments >> on technical content of the document, separated into major, >> minor and nit categories. 3) comments on whether this >> document should be Proposed Standard or Informational. 4) >> comments on dependencies between this document and others. >> >> Thank you, >> Alexey, as an ART AD. >> >> (*) - I appreciate that finishing discussions by March >> doesn't give much time, but this document is already delayed >> by 2 months. >> >> -- >> I18ndir mailing list >> I18ndir@ietf.org >> https://www.ietf.org/mailman/listinfo/i18ndir
- Re: [I18ndir] Request for review: draft-faltstrom… Patrik Fältström
- [I18ndir] Request for review: draft-faltstrom-uni… Alexey Melnikov
- Re: [I18ndir] Request for review: draft-faltstrom… Patrik Fältström
- Re: [I18ndir] Request for review: draft-faltstrom… John C Klensin
- Re: [I18ndir] Request for review: draft-faltstrom… Hollenbeck, Scott
- Re: [I18ndir] Request for review: draft-faltstrom… John C Klensin
- Re: [I18ndir] Request for review: draft-faltstrom… Martin J. Dürst
- Re: [I18ndir] Request for review: draft-faltstrom… Hollenbeck, Scott
- Re: [I18ndir] Request for review: draft-faltstrom… John C Klensin
- Re: [I18ndir] Request for review: draft-faltstrom… Asmus Freytag
- Re: [I18ndir] Request for review: draft-faltstrom… John C Klensin
- Re: [I18ndir] Request for review: draft-faltstrom… Asmus Freytag (c)
- Re: [I18ndir] Request for review: draft-faltstrom… John C Klensin
- Re: [I18ndir] Request for review: draft-faltstrom… Asmus Freytag (c)
- Re: [I18ndir] Request for review: draft-faltstrom… Asmus Freytag (c)
- Re: [I18ndir] Request for review: draft-faltstrom… Asmus Freytag (c)
- Re: [I18ndir] Request for review: draft-faltstrom… Patrik Fältström
- Re: [I18ndir] Request for review: draft-faltstrom… Martin J. Dürst