[I18ndir] The Unicode version review model and reviewing Unicode 12.1 (was: Re: draft-faltstrom-unicode11-07)
John C Klensin <john-ietf@jck.com> Mon, 11 March 2019 00:01 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 727A312788C for <i18ndir@ietfa.amsl.com>; Sun, 10 Mar 2019 17:01:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L17NheaQ9erB for <i18ndir@ietfa.amsl.com>; Sun, 10 Mar 2019 17:01:53 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D823A1275F3 for <i18ndir@ietf.org>; Sun, 10 Mar 2019 17:01:52 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1h38O6-000Gko-S5; Sun, 10 Mar 2019 20:01:50 -0400
Date: Sun, 10 Mar 2019 20:01:45 -0400
From: John C Klensin <john-ietf@jck.com>
To: Marc Blanchet <marc.blanchet@viagenie.ca>, Patrik Fältström <paf@netnod.se>
cc: i18ndir@ietf.org
Message-ID: <E9C9D5E709F93B632A380733@PSB>
In-Reply-To: <1E40A2B2-0890-459E-BF36-437E2DB73247@viagenie.ca>
References: <37939676-2D8A-4329-B6A0-A854F9530016@episteme.net> <8BC8E1D7-D760-44BE-997A-C39B770D66A7@viagenie.ca> <C2D2BB4F-9264-451B-8C72-0EADFDF4D303@netnod.se> <1E40A2B2-0890-459E-BF36-437E2DB73247@viagenie.ca>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/yUj5KAlyCseOkfHN10QaDopb4rM>
Subject: [I18ndir] The Unicode version review model and reviewing Unicode 12.1 (was: Re: draft-faltstrom-unicode11-07)
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Mar 2019 00:01:57 -0000
Another over-long note, unfortunately. Executive summary: Thinking about several recent notes and some implications of draft-faltstrom-unicode11-07 has caused me to got back to the IDNA2008 specs to try to find the basis for these reviews. While I hope it does not come as news to any of us, we may need reminding of what those specs actually say. In particular: (1) The IANA tables are not normative, the rules in IDNA5890-5894 are. Normatively, IDNA2008 is on Unicode 12.0 and has been for about five days now. (2) While a particular registry or other entity could make a different decision for its own purposes, there is nothing about the IANA tables that imply whether a particular Unicode version is supported. (3) There is nothing in the IDNA2008 documents that explicitly specifies this type of review, what it should consist of, or even whether it should produced a report. (4) If there is a bias in the documents between "follow Unicode" and "preserve backward compatibility" is to toward the latter. (5) The tone, and probably some of the text, of draft-faltstrom is inconsistent with some of the above and should be adjusted to avoid future confusion. (6) Substantially independent of the content of draft-faltstrom, we really need to get the review process and its role nailed down and explain to various communit8ies what is normative and what isn't. (7) Given the release of Unicode 12.0, we should quickly review and make a recommendation as to whether the I-D should be held until a report on the current version can be incorporated. Inline below. --On Sunday, March 10, 2019 11:34 -0400 Marc Blanchet <marc.blanchet@viagenie.ca> wrote: > > On 10 Mar 2019, at 11:20, Patrik Fältström wrote: > >> On 15 Feb 2019, at 5:31, Marc Blanchet wrote: >> >>> Hello, >>> read it (did not check the content of Appendix A). comments: >>> a) it says in the intro: « It further suggests a path >>> forward for the IETF to ensure IDNA2008 follows the >>> evolution of the Unicode Standard. ». Unless I skip >>> something, it is far from clear to me that when a new >>> version of Unicode is out (as one is expecting pretty soon >>> and as Asmus wrote, will likely to happen before the RFC is >>> published), what is the exact path forward? >... > so we are collectively saying that we are limiting the update > of the IANA registries up to Unicode 11.0.0 and any new > version of Unicode is not supported by IDNA2008, as it is > unspecified for IANA. It may be that Unicode 12.X or other new > ones, including minor versions, will not have more or less > known issues than Unicode 11.0.0 but we are saying: Unicode 12 > not supported. Hence, the text I suggested to carry the cases > where a new Unicode version does not harm more than the > previous. > > Given the popularity of emojis and their new versions coming > every year, implementors and vendors of OS, librairies and > other tools are aggressively updating their systems/libraries > to the new versions of Unicode. Therefore, to handle IDNA2008 > with a old version of Unicode is just way complicated for > implementors. So freezing to a specific version of Unicode is > just not helping IDNA2008, IMHO. > > And I'm not sure we have collectively cycles to issue a new > internet-draft for every Unicode version that goes out, > including minor versions. Patrik and Marc, So that the comments below can be read by all of us in the same context, Unicode 12.0 was officially announced last Tuesday, 5 March, 554 new characters, four new scripts, and, for those who care, 61 new emoji characters http://blog.unicode.org/2019/03/announcing-unicode-standard-version-120.html So, if any of the sense of urgency associated with draft-faltstrom-unicode11 was associated with "get this out before Unicode 12 appears", we've lost that window. Going forward, while I want to recognize and deal with urgency if it is real, I think we all know that trying to do things quickly and under pressure often leads to errors and other types of poor-quality work, so we should be careful that the urgency is indeed real. I've downloaded some of the tables and text for 12.0 last week and my long note was based on it even though I downgraded a reference or two to refer to Unicode 11.0 for consistency with the I-D. Based on a superficial inspection of the descriptions of changes and a spot check of the UnicodeData table, I have trouble convincing myself that anything about new code points in Unicode 12.0 justifies a sense or urgency about IDNA2008 calculations and tables whether implementations upgrade to new versions of Unicode or not. What I see are newly-added historic scripts and abstract characters and code points (including the new emoji) for which there is unlikely to be a rush for inclusion in labels for legitimate identifier uses. The group includes many code points that the IDNA2008 algorithm (unchanged from 5892) would conclude were DISALLOWED. I have not tried to run a comparison for category changes for code points that were defined in 11.0 or earlier but, as noted in the long review, any such changes result in problematic incompatibilities between versions of the IANA tables whether we decide to treat them in some special way or not. The release of 12.0 does raise the question of whether we should withdraw draft-faltstrom-unicode11, apply the same reasoning and explanations to 12.0, and then push the more current and comprehensive document through the system. Very little of the text, other than Section 4.3 (of version -07 of the I-D), is specific to Unicode version 11 and we are already focused on the I-D and underlying principles, so I'm guessing that going directly from 7.0 to 12 would not represent a lot of extra work... work that would be easier to do now than trying to draw an effort together a year hence. I'm not advocating for doing that, especially in the moments when I think the directorate will succeed sufficiently that doing a review for tables for Unicode 12 in some months will not be an ordeal. But I believe we, in conjunction with the ART ADs and other relevant parties, should give it a bit of thought rather than rejecting the possibility by inertia. However, whether we view draft-faltstrom-unicode11 as a normal, if overdue, code point review and update or whether we see it as a perhaps-clumsy patch to catch us up, the question of what we are going to do about future versions of Unicode seems to me to be critical. My preference would have been to answer that question and document the results as part of, or in parallel with, draft-faltstrom-unicode11, but we don't seem to be having that discussion. My fallback is that draft-faltstrom-unicode11 should at least be very clear about which of the two it is. I note that, if it is "normal", the apparent failure to consider and resolve the new code point review for Unicode 7.0 [1] and the absence of such reviews for Unicode 8, 9, 10, and 11 (and maybe 12 if we decide to incorporate it) should be a showstopper. I also have not ever interpreted any part of IDAN2008 as equating "no IDNA table" with "unsupported". "Unverified", maybe, but not "unsupported". To be sure I wasn't making that up, I've gone back today and done a careful rereading of relevant parts of RFCs 5892 and 5894 (to which 5892 refers for information about the IANA tables). That rereading was very illuminating. It is reflected below and I would encourage others to reread the specs too. Some other body might certainly decide that it was not interested in unverified versions of Unicode in IDNA, but is not an IDNA2008 (or other IETF) requirement, certainly nothing that 5982 has to say. That relationship is a bit of a separate question, but another one that I think we need to sort out, be explicit about, and probably update IDNA2008 in some normative way to reflect. Going forward (and _not_ as part of draft-faltstrom-unicode11), I agree with Marc that we need to come to grips with our plan for future review cycles (whether starting with Unicode 12 or Unicode 13). Because there seems (nearly a decade later and despite the comments below) to be some uncertainty about what IDNA2008 intended, I think whatever we agree on should be documented in an update, probably to 5892 but perhaps elsewhere in the set. I think it is very important that whatever we decide be realistic, i.e., if we cannot realistically expect to get the energy together to generate a new I-D in a timely way for each Unicode version, then we had best not specify that. To be clear about the present status of things, there is no explicit requirement in RFC 5892 for periodic updates to the IANA registry. It creates the registry, populates it with Unicode 5.2 values, and while some statements in the text can be read as implying updates of the IANA registry for new versions of Unicode, it does not specify that or the procedure for getting there. It does refer to RFC 5894 for that information. Section 7.1.2 of RFC 5984 talks about the requirements for registries to update their own tables (see particularly the second paragraph (starting with "Under this model, registry tables..."). What it does not say is "that registries should look at the tables on file with IANA and believe what they find there". Similarly Section 7.1.3, first bullet, says that lookup applications "Maintain IDNA and Unicode tables that are consistent with regard to versions,", i.e., the Unicode version they use is up to them (not some higher authority), but they must use IDNA tables that are consistent with it. Again, no statement about going off to IANA and fetching something authoritative there. The IANA Considerations section of 5894 is explicitly descriptive rather than normative and does not specify any IANA actions (5894 is, after all, Informational) but it does say: "While not normative, an IANA registry of characters and scripts and their categories, updated for each new version of Unicode and the characters it contains, are convenient for programming and validation purposes. The details of this registry are specified in the Tables document." So, good idea to update the IANA tables but, updated or not, the rules of IDNA2008 are the only normative stuff around and there is no notion of "unsupported" because the IANA tables have or have not been updated. For completeness and to save others work, RFC 5890 and 5891 don't specify procedures for updating the IANA tables or say anything about validity or supported-ness in conjunction with them either; they just point to 5892 and 5894. I've looked at the few notes from that period that I have readily available, tried to search my fading memory, and reread what seem to be the relevant sections of 5892 and 5894 and the I-Ds that preceded the latter. I believe our intent was that the IANA tables would be regularly updated with new Unicode versions and with modifications to IDNA (protocol, tables, or exception lists in the latter) as a convenience for anyone who wanted them but that they had no normative effect, much less an effect about which Unicode versions were "supported". RFC 5894 is quite explicit that a key reason for moving from IDNA2003 to IDNA2008 was to get away from the requirement to do a substantive and normative update to the standard for each version of Unicode. There is, at best, an implicit requirement for a regular review but the remedy for a Unicode action that is problematic for IDNA is an update to the relevant section of one of the IDNA documents when the problem is discovered, not the result of some IANA table-building process. It is interesting that RFC 6452 reinforces that view: it doesn't say "this review was done because the base IDNA2008 specifications require it". Instead, at the end of Section 2, it simply says "This RFC has been produced because 6.0 is the first version of Unicode to be released since IDNA2008 was published". Another point of interest is that it seems clear from the discussion of backward compatibility in 5892 and 5894 that, if Unicode changed the properties of an existing code point, our bias was to preserve the older derived property unless it could be shown that no harm would result from the switch or at least that the advantages of staying with Unicode clearly outweighed the harm of an incompatible change. As reports, both 6452 and the current draft-faltstrom seem to ignore that preference without comment, simply asserting that it is better to say with Unicode, a position for which there is little or no support in any consensus base IDNA2008-related document. On the other hand, if someone felt strongly about preserving backward compatibility for any of those code points, they would be free, at any time, to propose an update to IDNA2008 to make that correction. I assume (or at least hope) that such a proposal would cause a lively debate about the tradeoffs for that particular code point or set of code points_ between stability and backward compatibility one one hand and consistency with Unicode and, in many cases correctness, on the other (I assume that most of the property changes between Uniocde versions are because UTC was persuaded, or concluded on their own, that the initial classification was incorrect). Curiously, this probably also answers the document category question. Both RFC 6452 and draft-faltstrom-unicode11 are reports of analyses that are not normatively required by IDNA2008. They do not update or otherwise modify the IDNA2008 specs. And, while they explain the IANA tables and changes that were (or were not) made (a very useful thing to do), neither the tables nor the explanations have any normative effect. Sounds to me about as close to a description of Informational as one can get... probably to the point that the IESG should be asked to reclassify 6452 to Informational for consistency. For the future, more generally, and with the understanding that this entire "review new Unicode versions and update the IANA tables" business is oral tradition rather than anything that makes a particular version of Unicode more or less valid for IDNA, I think we are at a three-way fork in the road. One path involves our doing a serious, mandated, review for each new Unicode version with the serious possibility of modifying IDNA2008 to including more code points in the exception list but the consequence of actually tying IDNA2008 to a specific and normative version of Unicode. A second would involve a review, and a report on that review, but with the review being no more than advisory about any modifications to the IDNA2008 tables of rules or protocol. That is what the IDNA2008 documents seem to call for (and is very different from the way draft-faltstrom has been treated in recent discussions). The questions of how bad a problem needs to be to justify a change to IDNA2008 to, e.g., preserve backward compatibility and whether to do new code point reviews almost certainly need to be addressed if we are going to take either of those two paths and, to the extent possible, we probably should get it written down this time. The third path (actually the other extreme -- there may be intermediaries) is to decide that these reviews are a waste of time (and scarce resources), that we are never again going to decide to deviate from wherever the Unicode properties, or changes in property values for existing code points, lead us and we should just change IDNA2008 to eliminate any notion of a review and provide for uncritical generation of new IANA tables shortly after new Unicode versions appear. The latter still would not prevent proposals to change the standard in a way that would affect both the normative rules and the tables but I'd hope we would be confident that such proposals would be taken as seriously as they deserve and processed, not thrown in the back of a queue that never moves. FWIW, I'm less pessimistic about our ability to more clearly define and then take the first or second path than I believe Marc to be after reading the comments above and in his earlier notes. First, I don't see much evidence that, as long as there are no terrible surprises with a new version of Unicode, doing the calculations, doing a quick pass through the new code points (especially those that would come out PVALID or that seem obviously controversial) should require a huge amount of effort. Producing an I-D to summarize the process and it conclusions should certainly not be a big deal. The thing that held this revision up for years was not the difficulty of running the tables or producing I-Ds. It was precisely that we encountered what appeared to be a showstopper problem and couldn't manage to engage with addressing it for years (and, based on draft-faltstrom-unidoe11-07, we still can't). On the other hand, if we can't get the energy together to do these reviews and produce timely updates, it would be highly questionable whether the IETF is capable of doing any serious i18n work that has sufficient informed involvement and participation to make any claims of IETF consensus about the result plausible. I am still trying to avoid the conclusion that we cannot manage such work, but that is getting harder. If we cannot, any i18n work that leads to informed consensus, then what we can or cannot do with IDNA2008 reviews for new versions of Unicode is probably among the least of our problems. best, john [1] An appropriate way to look at draft-klensin-idna-5892upd-unicode70, especially ifs first few versions, is that it is a report on that new code point review. I don't imagine anyone wants to take the time, but probably the ideal way to proceed with draft-faltstrom-unicode11 (or 12) would be to recast and publish draft-klensin as a report on a problem discovered while making a review, publish is, remove most of the material about Unicode 7.0 from draft-faltstrom, and make the latter about 8.0-11 (or 12), and not about 7.0 at all. That would be especially advantageous because, unless Patrik has fixed it in -08. the treatment of the 7.0 issue in the draft-faltstrom I-D is really not very good. On the other hand, because Patrik is a co-author on draft-klensin, it wouldn't let him off the hook.
- [I18ndir] draft-faltstrom-unicode11-07 Pete Resnick
- Re: [I18ndir] draft-faltstrom-unicode11-07 Pete Resnick
- Re: [I18ndir] draft-faltstrom-unicode11-07 Pete Resnick
- Re: [I18ndir] draft-faltstrom-unicode11-07 Peter Saint-Andre
- Re: [I18ndir] draft-faltstrom-unicode11-07 Martin J. Dürst
- Re: [I18ndir] draft-faltstrom-unicode11-07 Pete Resnick
- Re: [I18ndir] draft-faltstrom-unicode11-07 Asmus Freytag
- Re: [I18ndir] draft-faltstrom-unicode11-07 Marc Blanchet
- Re: [I18ndir] draft-faltstrom-unicode11-07 Patrik Fältström
- Re: [I18ndir] draft-faltstrom-unicode11-07 Marc Blanchet
- Re: [I18ndir] draft-faltstrom-unicode11-07 Patrik Fältström
- Re: [I18ndir] draft-faltstrom-unicode11-07 Patrik Fältström
- Re: [I18ndir] draft-faltstrom-unicode11-07 Asmus Freytag (c)
- Re: [I18ndir] draft-faltstrom-unicode11-07 Peter Saint-Andre
- Re: [I18ndir] draft-faltstrom-unicode11-07 Asmus Freytag
- Re: [I18ndir] draft-faltstrom-unicode11-07 Asmus Freytag
- [I18ndir] Hostages (was: Re: draft-faltstrom-unic… John C Klensin
- [I18ndir] "Troublesome" and the Unicode 11 update… John C Klensin
- [I18ndir] Directorate procedural question (was: R… John C Klensin
- [I18ndir] Hooks (was: Re: draft-faltstrom-unicode… John C Klensin
- [I18ndir] Who do you trust? (was:Re: draft-faltst… John C Klensin
- Re: [I18ndir] Hooks - and other non-decomposing d… Asmus Freytag
- Re: [I18ndir] "Troublesome" and the Unicode 11 up… Asmus Freytag
- Re: [I18ndir] Hostages Asmus Freytag (c)
- Re: [I18ndir] Hostages Asmus Freytag
- Re: [I18ndir] Hooks - and other non-decomposing d… Patrik Fältström
- Re: [I18ndir] Hooks - and other non-decomposing d… Asmus Freytag (c)
- Re: [I18ndir] Hooks - and other non-decomposing d… John C Klensin
- Re: [I18ndir] Hooks - and other non-decomposing d… John C Klensin
- Re: [I18ndir] Hooks - and other non-decomposing d… Asmus Freytag
- Re: [I18ndir] Hooks - and other non-decomposing d… Patrik Fältström
- Re: [I18ndir] Hooks - and other non-decomposing d… Asmus Freytag (c)
- Re: [I18ndir] Hooks - and other non-decomposing d… Patrik Fältström
- Re: [I18ndir] Hooks - and other non-decomposing d… Asmus Freytag (c)
- Re: [I18ndir] Directorate procedural question (wa… Pete Resnick
- Re: [I18ndir] Directorate procedural question (wa… Alexey Melnikov
- Re: [I18ndir] Directorate procedural question (wa… Ben Campbell
- Re: [I18ndir] Directorate procedural question (wa… Pete Resnick
- Re: [I18ndir] Hooks - and other non-decomposing d… John C Klensin
- Re: [I18ndir] Hooks - and other non-decomposing d… John C Klensin
- Re: [I18ndir] Hooks - and other non-decomposing d… Asmus Freytag (c)
- Re: [I18ndir] Hooks - and other non-decomposing d… Asmus Freytag
- Re: [I18ndir] draft-faltstrom-unicode11-07 Patrik Fältström
- Re: [I18ndir] draft-faltstrom-unicode11-07 Patrik Fältström
- Re: [I18ndir] draft-faltstrom-unicode11-07 Marc Blanchet
- Re: [I18ndir] draft-faltstrom-unicode11-07 Asmus Freytag (c)
- Re: [I18ndir] draft-faltstrom-unicode11-07 Peter Saint-Andre
- Re: [I18ndir] draft-faltstrom-unicode11-07 Marc Blanchet
- Re: [I18ndir] draft-faltstrom-unicode11-07 Peter Saint-Andre
- Re: [I18ndir] draft-faltstrom-unicode11-07 Patrik Fältström
- [I18ndir] The Unicode version review model and re… John C Klensin
- Re: [I18ndir] The Unicode version review model an… Patrik Fältström
- Re: [I18ndir] The Unicode version review model an… Patrik Fältström
- Re: [I18ndir] The Unicode version review model an… John C Klensin
- Re: [I18ndir] draft-faltstrom-unicode11-07 John C Klensin
- Re: [I18ndir] draft-faltstrom-unicode11-07 Peter Saint-Andre
- Re: [I18ndir] draft-faltstrom-unicode11-07 John C Klensin