Re: [I18ndir] Finding a way to conclude the review of draft-faltstrom

John C Klensin <john-ietf@jck.com> Tue, 19 February 2019 21:46 UTC

Date: Tue, 19 Feb 2019 16:46:36 -0500
From: John C Klensin <john-ietf@jck.com>
To: Asmus Freytag <asmusf@ix.netcom.com>
cc: i18ndir@ietf.org
Message-ID: <D35AAAD52083CD14E6E5ACF1@PSB>
In-Reply-To: <bcf6de35-e1db-5022-5109-76764357c8b0@ix.netcom.com>
References: <d3b501b1-dfcf-debf-b256-a0642ff560e3@alvestrand.no> <3E8A24AA2DB42A56A429133E@PSB> <bcf6de35-e1db-5022-5109-76764357c8b0@ix.netcom.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/uYfw1RgmMWYxG6qPSSbOSOTII-I>
Subject: Re: [I18ndir] Finding a way to conclude the review of draft-faltstrom
Precedence: list

--On Tuesday, February 19, 2019 09:38 -0800 Asmus Freytag
<asmusf@ix.netcom.com> wrote:

>...
>> That position is that there
>> have been a rather long series of issues raised, in multiple
>> documents, about IDNA2008 and the appropriateness of
>> uncritically applying its procedures to new versions of
>> Unicode.
> 
> And some of the many positions were concerned with the fact
> that the other ones were focused on the wrong problem.

Perhaps.   Or perhaps that is part of the key problem.

> Given the design of the protocol, moving forward is the
> appropriate option.

Ok.  While some of the differences in positions and perspective
are, IMO, important and should be explored further, many were,
as I suggested was possible early on, were just differences in
vocabulary and focus.   I think the difference of opinion
between the two of us comes down to two issues.  As an example
of the vocabulary issue, I have heard no arguments against
"moving forward".  The question is whether the best way to move
forward is to publish draft-faltstrom now, more or less as it
appears in -07, and then deal with the other stuff [a] or
whether there is other stuff that should be addressed and
possibly resolved before draft-faltstrom is published (with
draft-faltstrom reflecting that work if appropriate.

I think that choice comes down to two questions, the first tied
closely to "the design of the protocol"

Background: Many details aside, the basic design target for what
IDNA2008 was intended to allow was, quite consciously, the same
as the target for pre-DNS host names, the syntax allowed in the
host-part of domain-part by SMTP, and the "preferred syntax" of
RFC 1034, i.e., letters [b], digits, and as little punctuation
as feasible.  From that very high conceptual level, the only
difference between the target criteria of IDNA2008 and those
predecessors is expansion from the ASCII repertoire to the range
of abstract characters supported by Unicode.  The question of
how that conceptual definition is turned into rules, and
ultimately into lists of code points, is an implementation
detail.  All of the more specific issues, including whether
labels are conceivably words in some languages (or obey other
orthographic conventions), whether mixed-script label are
allowed, are left to the registry and registries are expected to
behave responsibly (and have been since at least RFC 1591 --
that wasn't an IDN innovation either).

Now if, for example, some combining marks were never intended
for use as letters (or, if one prefers, parts of words) and were
assigned code points for other purposes or notations, then that
letter-digit model dictates that they should not be permitted in
labels.  If a particular version of the specification allows
them, it is what is called a "mistake", arguably an IDNA
protocol mistake due to Unicode's not making a distinction
between non-spacing marks used with letters (even sometimes,
which might have been a case for more CONTEXTO classifications)
and non-spacing marks used only in non-letter contexts.  Even
the non-decomposing character issue is arguably a mistake
because inferences were made about what normalization would
accomplish that turned out to be incorrect (and, incidentally,
the slash characters you mentioned were found in the subsequent
investigation and are called out in
draft-klensin-idna-5892upd-unicode70).

(1) The first question is tied critically to "the design of the
protocol" and what to do about mistakes.  The tradition in a
number of communities is to view them as irreversible and just
try to warn about them or work around them.   Unicode's view of
stability of code point assignments and a number of properties
is consistent with that model (maybe even an extreme version of
it) but they are by no means the only one.   The Internet
(including IETF) tradition has been to fix mistakes but to be
very pragmatic about it (see the second question).  I observe
that no one is running IPv1 these days.  For the specific case
of IDNA2008, the observation that the review and update
procedure specifically allows additions to the exception list
and context rules strongly suggests that the design of the
protocol contemplated fixing mistakes, not just digging in and
moving forward because nothing about the rules could be changed.
In any event, the question of whether and how to fix mistakes
gives us three general options:

(1.1) For anything that appears to be a mistake, create
exceptions or new rules for the "new" code points affected _and_
for all (or at least some) code points that were assigned
earlier that are shown to share the same problems.

(1.2) For anything that appears to be a mistake, create
exceptions or new rules for the "new" code points affected, but
let the old ones go in the interest of stability, not
invalidating existing labels, etc.  IMO, if we are smart and
don't want to invite confusion, we explain the situation --and
why categories for older code points are not-- changed
somewhere, but that is a separate judgment call.

(1.3) We take the position that old decisions are immutable and
that the best thing to do with new code points that may
duplicate older mistakes (or omissions from the rules, or
whatever more kind terminology we choose) is to be
mistake-consistent.  Perhaps we say, somewhere, "well this is a
problem but it is better to make it a problem for the registries
and count on their good behavior rather than making a change to
the protocol".

Of those three choices, the first two seem to me to require
delaying processing and publication of draft-faltstrom until we
can devise the right fixes and language to describe them ... or
at least until we can devise a strategy and language that avoids
having draft-faltstrom say "this code point is PVALID" only to
have some subsequent, mistake-fixing, document come along in (I
would hope) a short time and say "no it isn't".  For the third,
I'd still avoid waiting to process and publish draft-faltstrom
until we can explain what the problem was and why we decided to
ignore it, push it off, or otherwise refrain from changing RFC
5892.  However, there are other ways to handle that and they
depend considerably on the second question.

(2) How critical is it that we get draft-faltstrom out in a
hurry?  Who needs it, for what, and why should the IETF make
whatever compromises are necessary to prioritize getting it out?
I've been asking those questions on and off for some time and
still don't think I've seen an answer.   However, assuming it
really is critical (in ways that the IETF recognizes), then I
think we should look for ways to get on with publication while
compromising our ability to make the decisions under (1) as
little as possible.  That does not imply rushing the document to
publication in a way that effectively preempts that discussion,
leaving us with either 1.3 or trying to deceive the community
about the timing associated with 1.2 (although those are
options) because it would be awkward, but not difficult, to
insert text into draft-faltstrom explaining the loose ends,
indicating that the IETF was going to take the other issues up
as soon as possible, and that people should treat all (or at
least some if we can describe them without normative references
[c]) of the newly-added code points with special care until
those issues are sorted out.   I assume Patrik could write such
a paragraph quickly; if not, I'll volunteer.   But, again, if
getting the document out is not critical enough to allow careful
consideration of what might actually be a mistake that should be
considered as input to a protocol update, I think prudence and
the design of the protocol require that we consider the above
issues (and, ideally, the others in the queue) carefully before
moving forward with publication.

best,
   john

[a] I am deliberately using obviously vague terminology because
I do not believe there is demonstrated consensus about what the
list of other stuff is (although there may be consensus about
some of the items on that list), nor demonstrated consensus
about what work is the IETF's responsibility and what should be
done elsewhere if at all, nor demonstrated consensus about what
conclusions the IETF should accept uncritically from other
bodies (and which bodies).  But, as long as the list of "other
stuff" is non-null, the immediate question about publication of
draft-faltstrom does not requiring getting agreement on any of
that.

[b] I hope we have general agreement on what that term means and
that it is closely bound to abstract characters used to write
the words (or other conceptual units) of human languages.  For
example, mathematical symbols are not "letters".

[c] That comment hides an interesting procedural issue.  Assume
some of the possible changes mentioned above result in text for
draft-faltstrom that includes normative forward references to
documents we aren't ready to process and approve yet.  That
would put draft-faltstrom into the RFC Editor queue with a
normative reference hold so that it would not actually be
published until the other document(s) were ready.  Under normal
circumstances, that would be more than adequate -- while it
would be unusual, draft-faltstrom could be pulled out of the
queue and revised if that later work actually required that.
However, it isn't clear, absent other instructions
--instructions that we have not discussed-- whether IANA would
act to update its tables at the time of IESG approval of the
document or whether they would wait for publication.  If the
former, we would have exactly the same issues with defining a
code point as PVALID and then changing that which are part of
the reason to delay processing and approval.

[I18ndir] Finding a way to conclude the review of… Harald Alvestrand
Re: [I18ndir] Finding a way to conclude the revie… John C Klensin
Re: [I18ndir] Finding a way to conclude the revie… Asmus Freytag
Re: [I18ndir] Finding a way to conclude the revie… Asmus Freytag
Re: [I18ndir] Finding a way to conclude the revie… John C Klensin