Re: [I18ndir] Request for review: draft-faltstrom-unicode11-07

I'd like to suggest, not a different view of the subject from
Patrik's, but a different set of necessary priorities and
interactions, at least hinted at by his (3) below.  I've had
many parts of this discussion in other forums with most of the
people on this list, so will try to summarize and expand or
answer questions if needed.  This note has taken me close to a
week to write and is still longer than I would like -- these
are not easy problems as the whole long history that has
brought us to this directorate indicates.

We've got a rather large pile of documents on the table (or
fallen on the floor next to it).  At one point or another, one
or more of the ART or former Apps ADs have been asked to
process these documents or asked to create a context in which
they could be processed (and then process them).  The request
for the BOF that led to his directorate was a
next-to-last-ditch effort to establish a framework in which
they could be evaluated, improved, and processed.

TL;DR summary of this note: It is extremely unwise for the IETF
to process draft-faltstrom-unicode11-07 before a number of
other issues (and associated documents) are carefully
considered and resolved.  The main body of the rest of this
note is divided into two parts: a summary of the issues and the
more obvious of the associated documents and a discussion of
why processing draft-faltstrom-unicode11 without first
addressing those issues poses significant risks for IDNA, for
the IETF, and for the Internet. Those parts are followed by a
third which makes some summary recommendations based on them.
I'd encourage people to read and try to understand both but
those who have too little time may want to look at the second
part first.  The note ends with what I hope are two
constructive suggestions as to how we might move forward.

    ------

I. THe issues and the documents

As a group, those other documents raise fundamental questions
about the design of IDNA2008, whether that design is actually
practical, and whether aspects of it need to be explained
better or tuned to a greater or lesser degree.  The "can we
trust UTC" question that Patrik mentions is just one aspect of
that situation.  Interestingly, it seems to me that there are
three possible answers to the question, not just two:

* Yes, but... (aka "trust but verify").  That conclusion
	immediately leads to the question of what we do about
	the anomalies that are clearly present and the new ones
	that will appear in the future.  More on that below.

 * No.  But then we need to figure out what to do next,
	something that will almost certainly require new rules
	in IDNA2008 (specifically 5891bis and/or 5892bis) and/or
	the long-discussed (and long-avoided for good reasons)
	IETF-specific normalization form.

 * Yes, really.   UTC, or at least various important
	actors in it, have long suggested that the IETF is
	incompetent to do this work and that we should simply
	turn IDNA over to them and drop out, just as, e.g., we
	handed off HTML to W3C (in both cases, the perception
	from the other body is that we had no business being in
	those areas in the first place).   One of the ways to
	read some of the revisions of UTR#46 is that they
	constitute moves toward "Unicode domain names" that
	would replace any IETF IDN work.   Remember, because it is
	important, that IDNA2008, as now written, assumes we
	can construct a model of permissible labels based on a
	collection of property-based rules.  By contrast, key
	Unicode people have stated that any such system is
	ultimately doomed by the edge cases and that only tables of
	properties of particular code points can ever be
	authoritative.  Noting that suggestions have been made 
    fairly recently on the IETF, IDNA, and I18nrp lists that 
    we should either turn everyone over to UTC or accept 
    UTR#46 as definitive and either ignore conflicts with 
    IDNA2008 or modify IDNA2008 to match.  The desire for (or
	perceived necessity of) normative tables and code point by
	code point decisions may put some perspective on some of
	the demand for draft-faltstrom-unicode11 ... or it may not.

Some of the more technical issues associated with that part of
the problem are discussed in
draft-klensin-idna-5892upd-unicode70 (-05 is expired and I've
been discouraged, including by some relevant ADs and the IAB
Program leadership, from posting -06).  However, that document
includes only one type of examples of those fundamental
questions.  As an example of another, IDNA2008 calls for, and
depends on, a very high degree of responsible behavior on the
part of registries and registrars at all levels of the DNS as
part of what, conceptually, is a layered system in which the
rules of RFC 5892 provide an upper bound on what is permitted
but that additional processes further restrict that set for a
particular zone, usually drastically (the number of groups who
have ignored or misunderstood that layering concept may suggest
another area in which the IDNA2008 may need work).  At least
part of what is often called the registry restriction issue is
discussed in draft-klensin-idna-rfc5891bis, but there is a more
fundamental problem, which is that IDNA2008's advice to
registries is to allow only labels written in scripts (or
derived from languages) that they thoroughly understand.
IDNA2008 also calls for lookup applications to test strings for
at least superficial validity before looking them up (IDNA2003,
UTR#46, and common practice among browser implementers is to
assume that anything that is given to such an application can
and should be looked up and that 100% of the responsibility for
acceptable, non-problematic, labels should rest on the
registries).

Observations from the last eight years or so suggest that those
IDNA2008 requirements are widely ignored.  Most or all of the
browser vendors are following the IDNA2003 and UTF#46
recommendations and assuming that pre-checking of putative
labels is too much of a performance hit to be justified, at
least under the reasoning they have heard and accepted.
Conformance by registries varies widely, with some flaunting
even the validity rules imposed by RFC 5982, some trying to be
very careful, and some registries (especially TLD registries
with global scope) effectively taking the position that the
requirement that they have in-depth understanding of the
scripts (or even combinations of scripts) from which they are
accepting labels is unrealistic from a business standpoint.
One way to cope with the latter problem is for the IETF to
reach into the next layer of restrictions by providing advice
about particular risks and traps to those registries who intend
to register labels in scripts for which they do not have
particular expertise.  draft-freytag-troublesome-characters
provides at least one form of that type of guidance but would
represent a big step in several important ways.

While it is not clear to me whether they are part of what this
note claims is the critical path, a number of efforts to build
on RFC 3743 --albeit in very different directions-- to
determine which labels are appropriate in combination in a
particular zone or to support registration of sets of labels so
that the user will find "the right place" no matter which
particular representation appears in a query.  Most of those
efforts, including mechanisms for supporting so-called
delegated variants and mechanisms for extending the DNS's
aliasing capabilities, have occurred outside the ART/Apps area
(e.g., work on DNAME variations in DNSEXT and then DNSOP( but
that does not make them less part of the i18n/idn scene as
observed from outside the IETF.

More generally, when the IETF publishes Proposed Standards and
notices after five (or eight) years that provisions of them are
not being followed, we usually consider it appropriate
--assuming we still believe in the "running code" bits -- to
review those specifications and see if anything needs to be
changed to adjust to reality or of there are more persuasive
actions we should be taking to increase conformance.  The
answer might be "no" or "someone else's problem" to both, but
it seems to me that we need to ask the question.

      -------

II.  Why not just process and approve
draft-faltstrom-unicode11-07 and then come back to these issues
as Patrik seems to suggest?

RFC 6452, the precedent for this document, made an affirmative
assertion that there were no issues worth worrying about --
there were simply no concerns of any significance.  That
assertion was the basis of what Patrik lists as 2b below -- no
changes were necessary because the then-new version of Unicode
did not raise any issues that required special consideration of
action.  By proposing 2b now, this I-D essentially make the
same assertion, i.e., that we have discovered no issues, either
in new Unicode versions or through experience, that require
exception entries or other modifications to the IDNA2008
specifications.  That just isn't the case.  We know there are
issues: not only did the January 2015 IAB statement that the
document cites say that, but so does
draft-klensin-idna-5892upd-unicode70, whose most recent posted
versions (and every version since the date of the IAB
statement) contain a much better and more complete analysis of
the issues than the IAB statement itself (which was written
when we thought the issue was all about Hamza rather than a
much larger set of code points that don't decompose the way
some reasonable people think they would if they were not
constrained by history or language distinctions).  The
discussion above points to other issues as well, and parts of
the "troublesome characters" spec and the discussions leading
up to it strongly suggest that the "ContextO" category in RFC
5892 is woefully inadequate and should be rethought, that it
needs many more code points and context rules, or that it
should be dropped in favor of some other model.

One way to view the difficulty is that a statement in Section
4.1 of draft-faltstrom-unicode11-07 is simply not correct.
Another is that approving the document has implications far
beyond its stated scope.  That statement begins "The discussion
in the IETF concluded...".  Now, while I believe that
particular conclusion is correct as stated, there has never
been a discussion in the IETF --much less a Last Call to
determine consensus -- about that issue, at least unless one
counts the LUCID BOF (and we don't even permit WG meetings
during IETF to determine WG consensus, much less IETF
consensus).  More important, whatever we might have believed in
relatively short period in 2014 and very early 2015 (right
after the discovery of the issues with U+08A1), we learned that
code point was just an example of a much broader problem.
Because of its timing, the IAB Statement did not identify or
explain that problem, but a number of mailing list discussions
(including in the IAB I18n Project), some discussion during the
LUCID BoF, and the evolving versions of
draft-klensin-idna-5892upd-unicode70 attempt to do so.  So,
unless the community does the work to understand and evaluate
the issues raised by those various documents (and summarized,
perhaps badly, in Part I above) and decide what to do about
them, it is inappropriate to approve --or, I suggest even to
process through IESG consideration -- the present document in
its present form.  FWIW, I also think there are some issues
about fairness to those who were asked to prepare the documents
mentioned above, a number of other explanations of the issues,
and materials for the BoF if that work is brushed off without
any evaluation or explanation (or in some important cases, even
references or specific acknowledgment) by publishing this
document in its present form and moving on, but those are not
directorate problems.

Independent of what should have been done in the past,
publication of this document in its present form, especially
with a statement like the one cited above, effectively puts the
IETF on record as having decided that none of the issues raised
by the other documents are relevant and that IDNA2008 is just
fine without addressing those issues at all.  Maybe that is the
right answer, but it seems to ma that it at least needs careful
discussion -- more discussion than I can imagine being possible
during a four-week last call -- before that conclusion is
reached.  The documents other than
draft-klensin-idna-5892upd-unicode70 are important in this
regard too.  It might be reasonable to say "it is ok to ignore
those issues as far as IDNA2008 tables are concerned iff
registries exert special care as described in
draft-klensin-idna-rfc5891bis" or "it is ok to ignore those
issues as far as IDNA2008 tables are concerned iif registries
(and perhaps lookup applications) pay extra-special attention
to the characters called out in
draft-freytag-troublesome-characters", or both.  I think all
three of those documents would need at least one more iteration
to make such statements about them appropriate.  The statements
would also turn those documents into normative reference and
thereby prevent rushing the present I-D out the door, but at
least we would not be discarding that other work without
serious review and consideration.

If we are going to put this document out in a form that
concludes that the issues raised in those other documents are
unimportant, we should all be aware of another implication of
that choice.  As most of us know, there are large communities
of people out there who believe the IETF should simply go along
with whatever the Unicode Consortium decides without trying to
do any additional review or apply any additional rules,
especially reviews or rules that might result in
inconsistencies with UTC recommendations.  Some of those people
are probably just ignorant of the issues but looking for
simplicity; others are as informed as anyone on this list but
have reached different conclusions, perhaps based on different
assumptions (the availability of language information to DNS
users has been a particularly important one in the past) or
different priorities.  In practice, if this document is
published with the conclusion that none of the issues
discovered and suggestions made since RFC 6452 was published
are relevant, it not only takes us a large step toward "just
accept whatever Unicode does", but, if new issues are
discovered in the future, it sets us up for a discussion along
the lines of "this change and problem is much less severe than
the ones you decided in RFC XXXX (ex-
draft-faltstrom-unicode11) were unimportant, so why are you
proposing some action now?  I would not have an answer to that
question.  I hope that those who believe this document should
be advanced at the present time and without additional work do.

      -------

III. Recommendations

(1) My preferred option is that we put
draft-faltstrom-unicode11 aside, figure out how to consider (at
least) draft-klensin-idna-5892upd-unicode70,
draft-klensin-idna-rfc5891bis, and
draft-freytag-troublesome-characters, figure out what should be
done about the issues they raise (and any other issues raised
during that discussion), then publish whatever is deemed
appropriate and recast this I-D in the context of those
conclusions.  I don't know if an examination of whether
IDNA2008 is sufficiently inconsistent with general practice and
"running code" that we need to reexamine the core documents in
that context, but I think we need to ask ourselves that
question too (and figure out how to ask the IETF more broadly
if that is appropriate).

(2) Perhaps there is really great urgency for processing and
publishing draft-faltstrom-unicode11 now and either drop those
other issues or try to come back to them at some point in the
future.  I've heard that suggested several times.  Alexey's
"already been delayed by 2 months" comment appears to align
with a sense of urgency even though consideration and
processing of draft-klensin-idna-5892upd-unicode70 has arguably
been delayed by four years (version 00 was posted in August
2014 and a successor version was requested to be processed not
later than January 2015).  If this is actually urgent, then
lets revise the I-D to reflect that situation.  I think that
would include being explicit about the urgency including where
it is coming from and what the consequences would be of delay.
Then let's explicitly note that there are issues outstanding
and describe what cautions should be taken with code points
that RFC 6452 does not allow but draft-faltstrom-unicode11-07
lists as PVALID.  I don't have a list of such precautions to
propose right now but believe we could fairly easily devise
one, even if required a return to some variation of the very
early pre-IDNA2008 idea known then as "Probably-Yes". 

Either way, let's not publish this I-D in a way that ignores
the outstanding issues and documents and that does so in a way
that can easily be read as an IETF conclusion that they are not
relevant.

best,
   john

--On Sunday, February 3, 2019 16:34 +0100 Patrik Fältström
<paf=40frobbit.se@dmarc.ietf.org> wrote:

> As the editor of the document I want to point out the
> mechanisms of the document explicitly.
> 
> 1. It identifies a few incompatibilities in the Unicode
> Standard that have been made between the versions. Including
> one already identified by IAB.
> 
> 2. For each one of the incompatibilities two choices can be
> made:
> 
> 2a. Add one or more exceptions to the IDNA2008 standard (i.e.
> updating it with new rules)
> 
> 2b. Add zero exceptions and acknowledge that the
> incompatibilities have been made (i.e. updating it without new
> rules).
> 
> [2b] is the choice IETF did last time, RFC 6452.
> 
> [2b] is what this document suggests.
> 
> 3. The IAB statement on code points
> <https://www.iab.org/documents/correspondence-reports-document
> s/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/> is a
> larger issue which can not easily be solved by "just" adding
> some exceptions. One can also see in for example
> <https://www.alvestrand.no/pipermail/idna-update/2015-February
> /007911.html> that the views on the overall problem (if it is
> a problem or not) is split. My view is that the over arching
> issue here to some degree have to do with "trust" between IETF
> and Unicode Consortium and whether IETF should continue to use
> various meta data in the Unicode Standard for calculating what
> code points are valid or not. Alternatives could be to just
> let Unicode Consortium do it, or for IETF to do code point
> selection code point by code point. The latter IETF tried in
> IETF 2003, and we decided then that was a stupid idea and came
> up with IDNA2008.
> 
> To conclude, as an editor of IDNA2008 I suggest [2b], i.e. no
> changes to IDNA2008 even though there are incompatibilities,
> and I think [3] is a larger discussion that for example should
> be held here.
> 
> IF, but only IF, that is the consensus of the IETF, then this
> draft is what should be an RFC. If any of the questions I list
> above is answered differently, then the draft is the wrong
> thing and should contain something else.
> 
> I.e. I wrote it that way to trigger some discussion as I did
> not feel after getting some letters from IAB that ultimately
> "is my boss" in my role as expert reviewer of the IDNA2008
> tables at IANA I could "just" ask IETF what to do. I needed
> something that "is a proposal".
> 
> Here it is!
> 
> And I am happy to answer questions!
> 
>    Patrik
> 
> On 3 Feb 2019, at 13:59, Alexey Melnikov wrote:
> 
>> Dear I18N Directorate,
>> 
>> I will be sending this document to IETF Last Call shortly, so
>> I would like to request a review. In particular, I would
>> appreciate:
>> 
>> 1) speedy review of the document, due to Unicode Consortium
>> intent to publish Unicode 12 in March 2019 (*). 2) comments
>> on technical content of the document, separated into major,
>> minor and nit categories. 3) comments on whether this
>> document should be Proposed Standard or Informational. 4)
>> comments on dependencies between this document and others.
>> 
>> Thank you,
>> Alexey, as an ART AD.
>> 
>> (*) - I appreciate that finishing discussions by March
>> doesn't give much time, but this document is already delayed
>> by 2 months.
>> 
>> -- 
>> I18ndir mailing list
>> I18ndir@ietf.org
>> https://www.ietf.org/mailman/listinfo/i18ndir