[I18nrp] The evolutionary future of IDNA (was: Re: Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC)

John C Klensin <john-ietf@jck.com> Thu, 06 December 2018 16:01 UTC

Date: Thu, 06 Dec 2018 11:01:08 -0500
From: John C Klensin <john-ietf@jck.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>, i18nrp@ietf.org
Message-ID: <9F6A8117BA3220C4447B1D72@PSB>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/ArYd2SlLLTVfr4R62ojqcKp4EAQ>
Subject: [I18nrp] The evolutionary future of IDNA (was: Re: Last Call: <draft-faltstrom-unicode11-05.txt> (IDNA2008 and Unicode 11.0.0) to Informational RFC)
Precedence: list

Hi. 
Paul's note from Tuesday seems to raise some fundamental
questions about where we are taking IDNA.  They don't seem to me
to have much to do with the debate about directorates or
procedural correctness or even much to do with the argument for
tuning draft-faltstrom-unicode11 and pushing it forward at this
point unless the community concludes that advancing that
document without understanding where we are headed is a bad
idea.  I happen to believe that, but I think the key issues Paul
raises would still be key issues even if the community decides
it is ok to advance draft-faltstrom-unicode11 and then use it as
a new starting point.

So I want to make a start at reviewing some context and then
looking at those issues...

--On Tuesday, December 4, 2018 06:59 -0800 Paul Hoffman
<paul.hoffman@vpnc.org> wrote:

>...
>> First of all, this document evaluates the individual changes
>> made up  until and including Unicode 11. Sure, one could say
>> this has  implications on the IETF view of existence of
>> normalization rules (or  not) but that is not the intention
>> here. The result of this review  should neither be
>> extrapolated to future versions of Unicode nor to  future
>> evolutions of normalizations.
> 
> This is good to know. In that case, could you either remove
> the "One difference between these sequences..." sentences, or
> add a sentence in the Introduction section that says "The
> result of this review should neither be extrapolated to future
> versions of Unicode." Either action would clear up this
> confusion.

I think such a sentence in the introduction is needed even if
the "One difference' sentence is removed.

>>> =====
>> First of all, this document is (as it seems to me now) to be
>> Standards  Track. So that issue is taken care of.
> 
> Procedurally, it is not, I believe. A new draft needs to be
> issued, and the IETF Last Call has to start again.
> Fortunately, this IETF Last Call is only a few days old, so
> this should not delay anything much.

Noting that the Last Call has been withdrawn but other comments
have disagreed with that decision, I concur about the procedural
requirement.    

Another reason the Directorate should be able to form an opinion
and make a recommendation before the Last Call is restarted is
that, for this particular case, if Patrik wants to reference
other documents for details and explanations (which I agree is
the right thing to do), we'd better be sure that those documents
say what we want them to say and are at least reasonably stable.
There is no assurance at all of that with expired documents that
have had no meaningful discussion in the community.  Were the
Directorate to sort that out partway through the Last Call, we
would have the risk (I'd guess high odds) of having to revise
this document as well as some of those it references, meaning
that the Last Call would be about an obsolete and replaced I-D,
leaving the IESG to either try to guess at what the community
would have said about the new drafts or to repeat the Last Call.
Again.  If we are looking for things to complain about
procedurally, the IESG making a decision to advance a document
when the Last Call was about a version that was quite different
from the one being sent to the RFC Editor would be high on the
list.

So, again, Directorate first, then requests from the Directorate
to update and post relevant documents, then review by the
Directorate, then posted recommendations from the Directorate,
and only then an IETF Last Call on whatever document or
documents the Directorate recommends.

However, the important part of this note starts with Paul's
other (or perhaps main) concern.

>...
> This misses my concern. There is an active draft
> (draft-freytag-troublesome-characters) that seems to want to
> change the IANA registry. Your draft
> (draft-faltstrom-unicode11) also wants to change the registry,
> but in a different way. My question is whether we should be
> making the registry unstable in this way.

Possibly this has been adequately covered already but, if
draft-freytag-troublesome-characters is not clear about what I'm
about to say (which I think is consistent with Asmus's recent
postings) it needs to be fixed.  My understanding (as a
co-author) is that we expect to end up with two separate
registries (or one meta-registry with two or three subregistries
-- a detail that requires discussion with IANA).   Neither the
code point registry nor the proposed new one is normative in the
usual sense; the context rules registry is normative.  They
consist of:

(1) The registry of code points called for by IDNA2008
(including the Contextual Rules Registry) addressed by the
present I-D.  That registry should be stable except for
additions unless some new discovery or change in Unicode
requires reclassifying an existing code point or modifying or
writing new Contextual rules.  Changes due to the latter reasons
(i.e., other than adding new code points) should be rare and
made only after careful discussion, but are explicitly
contemplated as possible by IDNA2008.  It is perhaps worth
noting that this is, by the design of IDNA2008, an inclusion
registry: anything not explicitly permitted is forbidden.

(2) The registry of recommendations and advice for zone
administrators to consider in deciding what code points,
sequences, etc., to allow in their zones.
draft-freytag-troublesome-characters is intended to establish
and seed that registry.  It is not expected to be stable but,
instead, is expected to evolve to reflect new discoveries and
evolving knowledge.  

I hope this is clear from that draft, but there are three issues
with that draft and one more general issue that are almost
independent of its specific code point lists and tables and that
almost certainly require community discussion.  Personally, I
not only don't know the answers but am torn about the tradeoffs
I see so what follows is an attempt at a summary, not advice
about what to do about them.

(2.1) Whether having this sort of list as an official or
quasi-official IETF-provided list is a good idea at all.  As
clarified in draft-klensin-idna-rfc5891bis, IDNA2008 requires
that zone administrators [1] register only labels that contain
characters, and character sequences, drawn from scripts (and,
where relevant, associated languages) they fully understand and
which, of course, conform to the parts of IDNA2008 now under
discussion.   It has been widely observed that many or most
ICANN-Accredited Registrars and the Registries they support
ignore that rule.  The IETF can, at this point, go down either
of two paths (or can continue  to ignore the issue, which I'm
fairly sure would be irresponsible).   One is to reaffirm the
rule (in which case a global troublesome characters list may not
be needed or must be put in a different context).  Unless we
like making statements that are generally ignored, we would also
decide to use whatever formal and informal mechanisms we have to
convince ICANN (at least) to use whatever mechanisms they have
to hold registrars and registries who are blatantly ignoring
those rules accountable.    

The other is to conclude that registration of strings that the
registrar and registry may not understand or be willing to take
responsibility for is today's new normal and that providing
advice for those registrations who lack deep understanding is a
good idea (in which case draft-freytag-troublesome-characters is
extremely relevant and draft-klensin-idna-rfc5891bis should be
altered to explain the new reality and to update IDNA2008 (not
just 5891 - see discussion in the latter I-D) to adjust the
"don't register what you don't understand" rule to match the new
understanding.   If we go down that path, we probably need to
understand that those SLD registries whose label evaluation
model is essentially "we rely completely on the registrars to do
the right thing and will accept whatever they send us" and those
registrars whose model is close to "you pay for it and we
register it" (including combinations of the two that advertise
violations of the specs or good sense because they think they
can sell them) are unlikely to be affected by better guidance. 

A modification of those options would be to give advice along
the lines of "troublesome characters" to ICANN-accredited
Registrars and TLD Registries and/or Registrars and Registries
for so-called public domains but try to continue with the "don't
register what you don't understand" rule inside enterprise and
equivalent domains (most of whom, AFAICT, are mostly following
it just because the alternative makes little sense).   That
would probably require (or at least benefit from) modification
of both I-Ds.  It would also not be clear in that circumstance
whether the appropriate responsible party for the troublesome
character list and corresponding registry is the IETF, ICANN, or
someone else.  Probably unfortunately, there are candidates for
"someone else".

(2.2) Expanding from the above, there is a question about the
audience for any new efforts in this area, especially attempts
to give advice such as the "troublesome character" list.   If my
observations and a certain amount of logic are correct, most
administrators of domains that are used with relatively deep
structure intra-enterprise are already fairly careful about what
they register if only because reasonable rules fall out from the
way the enterprise (or equivalent organization) is organized and
from responsiveness to usability by people within that
enterprise.    If additional guidance is not needed at that
level, then the question becomes mostly about what can usefully
be done about SLDs (or, more generally, the domains at the
boundary between "public" and "private" domains).   To take an
extreme example, it seems to me that there is no chance that the
behavior of a registry that is happily making money registering
and delegating labels that are clearly invalid under the
IDAN2008 rules (with emoji as a prominent current examples), is
going to change its behavior if the IETF says that they should
only register strings that are conformant to IDNA2008 and that
they understand or if the IETF gives them a list of troublesome
characters that are PVALID under IDNA2008.   A significant
change in their behavior would almost certainly require forceful
action by ICANN or by some effective government with appropriate
jurisdiction.  There are questions whether the IETF should be
acting as an advocate for such actions.  Probably those
questions lie within the IAB's responsibility but, given
potentially far-reaching implications, IMO it would be really
unfortunate if they made such decisions without the consensus of
an informed IETF community.   This is not an easy problem.

(2.3) The IETF has traditionally been very reluctant to create
an IANA registry that requires maintenance but whose maintenance
depends entirely on a single person and his or her expertise.
We even have traditional jokes about those situations, e.g.,
about the consequences of "truck fade". IANA, too, has been
known to raise questions about the creation and management of
such registries.   At least IMO, the most recent published
version of draft-freytag-troublesome-characters does not address
the maintance issue in a definitive way.  I wouldn't expect the
Directorate to have the bandwidth and skill set (other than with
Asmus should he be appointed and want to sign up for that
maintenance role long-term and on personal title) to actively
maintain that registry.  AFAICT, the discussion during the IETF
102 BOF did not contemplate the Directorate taking on such a
role.

(3) Finally, Asmus seem to be arguing quite strongly (and
persuasively) that the contextual rules model of IDNA2008,
especially the CONTEXTO collection, is completely inadequate
and, in particular, that the rules specified in RFC 5892 do not
adequately reflect an understanding of what he calls "complex
scripts" (or don't reflect any understanding of complex script
issues at all).  Speaking as someone with considerable
responsibility for the development of that model and those
rules, he is almost certainly correct (although see below).
However, if he is, it seems to me that we need to reexamine that
part of IDNA2008 and decide whether to

(3.1) Drop the CONTEXTO category and rely on registries being
responsible [2], whether we provide more advice along the lines
of the troublesome character list or not.

(3.2) Review the CONTEXTO list and registry to be sure that the
boundary between rules in the protocol and advice is right,
modify the descriptive text as needed (that may affect other
IDNA2008 documents), and add, modify, or drop code points and
rules accordingly.

(3.3) Rethink the CONTEXTO rules and descriptions so that
important rules for complex scripts are adequately specified at
the protocol level.

It is worth noting that going very far down that path may
require reexamining the boundary between labels as "words" and
strings that may make mnemonic sense even if they make no
linguistic sense for a particular language or writing system at
all.   It was one interpretation of that boundary along with
experience with IDNA2003 (not just ignorance) that resulted in
the original decisions about what should and should not be
included in CONTEXTO.

One way or another, I don't think it is appropriate to avoid
asking whether Asmus's analysis makes a part of IDNA2008
technically defective and therefore obligates us to fix it.  If
the answer is that it is and we are, that may be the strongest
argument for not advancing draft-faltstrom-unicode11 at least
until we have the underlying issues in hand because the update
(or confirmation that no update is needed) it represents is
required to address contextual rules as well as code points.

best,
    john

[1] RFC 1591 effectively equates "registrar" with "whomever
decides what names to put in a zone", i.e., with "zone
administrator".  It is not clear whether those terms are
accepted as equivalent today or whether we should, e.g., be
using "zone administrator" to describe that function for all
zones and to reserve "registrar" for so-called public zones for
which they take money to register names.

[I18nrp] The evolutionary future of IDNA (was: Re… John C Klensin
Re: [I18nrp] The evolutionary future of IDNA (was… Asmus Freytag
Re: [I18nrp] The evolutionary future of IDNA (was… Jefsey
Re: [I18nrp] The evolutionary future of IDNA (was… Martin J. Dürst