[I18ndir] The Unicode version review model and reviewing Unicode 12.1 (was: Re: draft-faltstrom-unicode11-07)

John C Klensin <john-ietf@jck.com> Mon, 11 March 2019 00:01 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 727A312788C for <i18ndir@ietfa.amsl.com>; Sun, 10 Mar 2019 17:01:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L17NheaQ9erB for <i18ndir@ietfa.amsl.com>; Sun, 10 Mar 2019 17:01:53 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D823A1275F3 for <i18ndir@ietf.org>; Sun, 10 Mar 2019 17:01:52 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1h38O6-000Gko-S5; Sun, 10 Mar 2019 20:01:50 -0400
Date: Sun, 10 Mar 2019 20:01:45 -0400
From: John C Klensin <john-ietf@jck.com>
To: Marc Blanchet <marc.blanchet@viagenie.ca>, Patrik Fältström <paf@netnod.se>
cc: i18ndir@ietf.org
Message-ID: <E9C9D5E709F93B632A380733@PSB>
In-Reply-To: <1E40A2B2-0890-459E-BF36-437E2DB73247@viagenie.ca>
References: <37939676-2D8A-4329-B6A0-A854F9530016@episteme.net> <8BC8E1D7-D760-44BE-997A-C39B770D66A7@viagenie.ca> <C2D2BB4F-9264-451B-8C72-0EADFDF4D303@netnod.se> <1E40A2B2-0890-459E-BF36-437E2DB73247@viagenie.ca>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/yUj5KAlyCseOkfHN10QaDopb4rM>
Subject: [I18ndir] The Unicode version review model and reviewing Unicode 12.1 (was: Re: draft-faltstrom-unicode11-07)
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Mar 2019 00:01:57 -0000

Another over-long note, unfortunately.   

Executive summary:  Thinking about several recent notes and some
implications of draft-faltstrom-unicode11-07 has caused me to
got back to the IDNA2008 specs to try to find the basis for
these reviews.  While I hope it does not come as news to any of
us, we may need reminding of what those specs actually say.   In
particular: (1) The IANA tables are not normative, the rules in
IDNA5890-5894 are.  Normatively, IDNA2008 is on Unicode 12.0 and
has been for about five days now.  (2) While a particular
registry or other entity could make a different decision for its
own purposes, there is nothing about the IANA tables that imply
whether a particular Unicode version is supported. (3) There is
nothing in the IDNA2008 documents that explicitly specifies this
type of review, what it should consist of, or even whether it
should produced a report. (4) If there is a bias in the
documents between "follow Unicode" and "preserve backward
compatibility" is to toward the latter.  (5) The tone, and
probably some of the text, of draft-faltstrom is inconsistent
with some of the above and should be adjusted to avoid future
confusion. (6) Substantially independent of the content of
draft-faltstrom, we really need to get the review process and
its role nailed down and explain to various communit8ies what is
normative and what isn't. (7) Given the release of Unicode 12.0,
we should quickly review and make a recommendation as to whether
the I-D should be held until a report on the current version can
be incorporated.

Inline below.


--On Sunday, March 10, 2019 11:34 -0400 Marc Blanchet
<marc.blanchet@viagenie.ca> wrote:

> 
> On 10 Mar 2019, at 11:20, Patrik Fältström wrote:
> 
>> On 15 Feb 2019, at 5:31, Marc Blanchet wrote:
>> 
>>> Hello,
>>>  read it (did not check the content of Appendix A). comments:
>>> a) it says in the intro:  « It further suggests a path
>>> forward for  the IETF to ensure IDNA2008 follows the
>>> evolution of the Unicode  Standard. ». Unless I skip
>>> something, it is far from clear to me  that when a new
>>> version of Unicode is out (as one is expecting pretty  soon
>>> and as Asmus wrote, will likely to happen before the RFC is 
>>> published), what is the exact path forward?
>...
> so we are collectively saying that we are limiting the update
> of the IANA registries up to Unicode 11.0.0 and any new
> version of Unicode is not supported by IDNA2008, as it is
> unspecified for IANA. It may be that Unicode 12.X or other new
> ones, including minor versions, will not have more or less
> known issues than Unicode 11.0.0 but we are saying: Unicode 12
> not supported. Hence, the text I suggested to carry the cases
> where a new Unicode version does not harm more than the
> previous.
> 
> Given the popularity of emojis and their new versions coming
> every year, implementors and vendors of OS, librairies and
> other tools are aggressively updating their systems/libraries
> to the new versions of Unicode. Therefore, to handle IDNA2008
> with a old version of Unicode is just way complicated for
> implementors. So freezing to a specific version of Unicode is
> just not helping IDNA2008, IMHO.
> 
> And I'm not sure we have collectively cycles to issue a new
> internet-draft for every Unicode version that goes out,
> including minor versions.

Patrik and Marc,

So that the comments below can be read by all of us in the same
context, Unicode 12.0 was officially announced last Tuesday, 5
March,
   554 new characters,
   four new scripts, and, for those who care,
   61 new emoji characters
http://blog.unicode.org/2019/03/announcing-unicode-standard-version-120.html

So, if any of the sense of urgency associated with
draft-faltstrom-unicode11 was associated with "get this out
before Unicode 12 appears", we've lost that window.   Going
forward, while I want to recognize and deal with urgency if it
is real, I think we all know that trying to do things quickly
and under pressure often leads to errors and other types of
poor-quality work, so we should be careful that the urgency is
indeed real.

I've downloaded some of the tables and text for 12.0 last week
and my long note was based on it even though I downgraded a
reference or two to refer to Unicode 11.0 for consistency with
the I-D.   Based on a superficial inspection of the descriptions
of changes and a spot check of the UnicodeData table, I have
trouble convincing myself that anything about new code points in
Unicode 12.0 justifies a sense or urgency about IDNA2008
calculations and tables whether implementations upgrade to new
versions of Unicode or not.  What I see are newly-added historic
scripts and abstract characters and code points (including the
new emoji) for which there is unlikely to be a rush for
inclusion in labels for legitimate identifier uses.  The group
includes many code points that the IDNA2008 algorithm (unchanged
from 5892) would conclude were DISALLOWED.  I have not tried to
run a comparison for category changes for code points that were
defined in 11.0 or earlier but, as noted in the long review, any
such changes result in problematic incompatibilities between
versions of the IANA tables whether we decide to treat them in
some special way or not.

The release of 12.0 does raise the question of whether we should
withdraw draft-faltstrom-unicode11, apply the same reasoning and
explanations to 12.0, and then push the more current and
comprehensive document through the system. Very little of the
text, other than Section 4.3 (of version -07 of the I-D), is
specific to Unicode version 11 and we are already focused on the
I-D and underlying principles, so I'm guessing that going
directly from 7.0 to 12 would not represent a lot of extra
work... work that would be easier to do now than trying to draw
an effort together a year hence.   I'm not advocating for doing
that, especially in the moments when I think the directorate
will succeed sufficiently that doing a review for tables for
Unicode 12 in some months will not be an ordeal.  But I believe
we, in conjunction with the ART ADs and other relevant parties,
should give it a bit of thought rather than rejecting the
possibility by inertia.

However, whether we view draft-faltstrom-unicode11 as a normal,
if overdue, code point review and update or whether we see it as
a perhaps-clumsy patch to catch us up, the question of what we
are going to do about future versions of Unicode seems to me to
be critical.   My preference would have been to answer that
question and document the results as part of, or in parallel
with, draft-faltstrom-unicode11, but we don't seem to be having
that discussion.  My fallback is that draft-faltstrom-unicode11
should at least be very clear about which of the two it is.  I
note that, if it is "normal", the apparent failure to consider
and resolve the new code point review for Unicode 7.0 [1] and
the absence of such reviews for Unicode 8, 9, 10, and 11 (and
maybe 12 if we decide to incorporate it) should be a showstopper.

I also have not ever interpreted any part of IDAN2008 as
equating "no IDNA table" with "unsupported".  "Unverified",
maybe, but not "unsupported".  To be sure I wasn't making that
up, I've gone back today and done a careful rereading of
relevant parts of RFCs 5892 and 5894 (to which 5892 refers for
information about the IANA tables).  That rereading was very
illuminating.  It is reflected below and I would encourage
others to reread the specs too.    Some other body might
certainly decide that it was not interested in unverified
versions of Unicode in IDNA, but is not an IDNA2008 (or other
IETF) requirement, certainly nothing that 5982 has to say.  That
relationship is a bit of a separate question, but another one
that I think we need to sort out, be explicit about, and
probably update IDNA2008 in some normative way to reflect.   

Going forward (and _not_ as part of draft-faltstrom-unicode11),
I agree with Marc that we need to come to grips with our plan
for future review cycles (whether starting with Unicode 12 or
Unicode 13).   Because there seems (nearly a decade later and
despite the comments below) to be some uncertainty about what
IDNA2008 intended, I think whatever we agree on should be
documented in an update, probably to 5892 but perhaps elsewhere
in the set.  I think it is very important that whatever we
decide be realistic, i.e., if we cannot realistically expect to
get the energy together to generate a new I-D in a timely way
for each Unicode version, then we had best not specify that.

To be clear about the present status of things, there is no
explicit requirement in RFC 5892 for periodic updates to the
IANA registry.  It creates the registry, populates it with
Unicode 5.2 values, and while some statements in the text can be
read as implying updates of the IANA registry for new versions
of Unicode, it does not specify that or the procedure for
getting there.  It does refer to RFC 5894 for that information.
Section 7.1.2 of RFC 5984 talks about the requirements for
registries to update their own tables (see particularly the
second paragraph (starting with "Under this model, registry
tables...").  What it does not say is "that registries should
look at the tables on file with IANA and believe what they find
there".  Similarly Section 7.1.3, first bullet, says that lookup
applications "Maintain IDNA and Unicode tables that are
consistent with regard to versions,", i.e., the Unicode version
they use is up to them (not some higher authority), but they
must use IDNA tables that are consistent with it.  Again, no
statement about going off to IANA and fetching something
authoritative there.   

The IANA Considerations section of 5894 is explicitly
descriptive rather than normative and does not specify any IANA
actions (5894 is, after all, Informational) but it does say:

	"While not normative, an IANA registry of characters and
	scripts and their categories, updated for each new version
	of Unicode and the characters it contains, are convenient
	for programming and validation purposes.  The details of
	this registry are specified in the Tables document."

So, good idea to update the IANA tables but, updated or not, the
rules of IDNA2008 are  the only normative stuff around and there
is no notion of "unsupported" because the IANA tables have or
have not been updated.   For completeness and to save others
work, RFC 5890 and 5891 don't specify procedures for updating
the IANA tables or say anything about validity or supported-ness
in conjunction with them either; they just point to 5892 and
5894.

I've looked at the few notes from that period that I have
readily available, tried to search my fading memory, and reread
what seem to be the relevant sections of 5892 and 5894 and the
I-Ds that preceded the latter.  I believe our intent was that
the IANA tables would be regularly updated with new Unicode
versions and with modifications to IDNA (protocol, tables, or
exception lists in the latter) as a convenience for anyone who
wanted them but that they had no normative effect, much less an
effect about which Unicode versions were "supported".  RFC 5894
is quite explicit that a key reason for moving from IDNA2003 to
IDNA2008 was to get away from the requirement to do a
substantive and normative update to the standard for each
version of Unicode.    There is, at best, an implicit
requirement for a regular review but the remedy for a Unicode
action that is problematic for IDNA is an update to the relevant
section of one of the IDNA documents when the problem is
discovered, not the result of some IANA table-building process.
It is interesting that RFC 6452 reinforces that view: it doesn't
say "this review was done because the base IDNA2008
specifications require it".  Instead, at the end of Section 2,
it simply says  "This RFC has been produced because 6.0 is the
first version of Unicode to be released since IDNA2008 was
published".

Another point of interest is that it seems clear from the
discussion of backward  compatibility in 5892 and 5894 that, if
Unicode changed the properties of an existing code point, our
bias was to preserve the older derived property unless it could
be shown that no harm would result from the switch or at least
that the advantages of staying with Unicode clearly outweighed
the harm of an incompatible change.  As reports, both 6452 and
the current draft-faltstrom seem to ignore that preference
without comment, simply asserting that it is better to say with
Unicode, a position for which there is little or no support in
any consensus base IDNA2008-related document.  On the other
hand, if someone felt strongly about preserving backward
compatibility for any of those code points, they would be free,
at any time, to propose an update to IDNA2008 to make that
correction.  I assume (or at least hope) that such a proposal
would cause a lively debate about the tradeoffs for that
particular code point or set of code points_ between stability
and backward compatibility one one hand and consistency with
Unicode and, in many cases correctness, on the other (I assume
that most of the property changes between Uniocde versions are
because UTC was persuaded, or concluded on their own, that the
initial classification was incorrect).

Curiously, this probably also answers the document category
question.  Both RFC 6452 and draft-faltstrom-unicode11 are
reports of analyses that are not normatively required by
IDNA2008.   They do not update or otherwise modify the IDNA2008
specs.  And, while they explain the IANA tables and changes that
were (or were not) made (a very useful thing to do), neither the
tables nor the explanations have any normative effect.   Sounds
to me about as close to a description of Informational as one
can get... probably to the point that the IESG should be asked
to reclassify 6452 to Informational for consistency.

For the future, more generally, and with the understanding that
this entire "review new Unicode versions and update the IANA
tables" business is oral tradition rather than anything that
makes a particular version of Unicode more or less valid for
IDNA, I think we are at a three-way fork in the road.   One path
involves our doing a serious, mandated, review for each new
Unicode version with the serious possibility of modifying
IDNA2008 to including more code points in the exception list but
the consequence of actually tying IDNA2008 to a specific and
normative version of Unicode.  A second would involve a review,
and a report on that review, but with the review being no more
than advisory about any modifications to the IDNA2008 tables of
rules or protocol.  That is what the IDNA2008 documents seem to
call for (and is very different from the way draft-faltstrom has
been treated in recent discussions).  The questions of how bad a
problem needs to be to justify a change to IDNA2008 to, e.g.,
preserve backward compatibility and whether to do new code point
reviews almost certainly need to be addressed if we are going to
take either of those two paths and, to the extent possible, we
probably should get it written down this time.   The third path
(actually the other extreme -- there may be intermediaries) is
to decide that these reviews are a waste of time (and scarce
resources), that we are never again going to decide to deviate
from wherever the Unicode properties, or changes in property
values for existing code points, lead us and we should just
change IDNA2008 to eliminate any notion of a review and provide
for uncritical generation of new IANA tables shortly after new
Unicode versions appear.  The latter still would not prevent
proposals to change the standard in a way that would affect both
the normative rules and the tables but I'd hope we would be
confident that such proposals would be taken as seriously as
they deserve and processed, not thrown in the back of a queue
that never moves.

FWIW, I'm less pessimistic about our ability to more clearly
define and then take the first or second path than I believe
Marc to be after reading the comments above and in his earlier
notes.  First, I don't see much evidence that, as long as there
are no terrible surprises with a new version of Unicode, doing
the calculations, doing a quick pass through the new code points
(especially those that would come out PVALID or that seem
obviously controversial) should require a huge amount of effort.
Producing an I-D to summarize the process and it conclusions
should certainly not be a big deal.  The thing that held this
revision up for years was not the difficulty of running the
tables or producing I-Ds.  It was precisely that we encountered
what appeared to be a showstopper problem and couldn't manage to
engage with addressing it for years (and, based on
draft-faltstrom-unidoe11-07, we still can't).

On the other hand, if we can't get the energy together to do
these reviews and produce timely updates, it would be highly
questionable whether the IETF is capable of doing any serious
i18n work that has sufficient informed involvement and
participation to make any claims of IETF consensus about the
result plausible.   I am still trying to avoid the conclusion
that we cannot manage such work, but that is getting harder.  If
we cannot, any i18n work that leads to informed consensus, then
what we can or cannot do with IDNA2008 reviews for new versions
of Unicode is probably among the least of our problems.

best,
    john 

[1] An appropriate way to look at
draft-klensin-idna-5892upd-unicode70, especially ifs first few
versions, is that it is a report on that new code point review.
I don't imagine anyone wants to take the time, but probably the
ideal way to proceed with draft-faltstrom-unicode11 (or 12)
would be to recast and publish draft-klensin as a report on a
problem discovered while making a review, publish is, remove
most of the material about Unicode 7.0 from draft-faltstrom, and
make the latter about 8.0-11 (or 12), and not about 7.0 at all.
That would be especially advantageous because, unless Patrik has
fixed it in -08. the treatment of the 7.0 issue in the
draft-faltstrom I-D is really not very good.  On the other hand,
because Patrik is a co-author on draft-klensin, it wouldn't let
him off the hook.