[I18ndir] In-depth analysis of draft-faltstrom-unicode11-07

John C Klensin <john-ietf@jck.com> Sun, 10 March 2019 10:29 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B3DA0126C15 for <i18ndir@ietfa.amsl.com>; Sun, 10 Mar 2019 03:29:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TBBEcRWRYWeu for <i18ndir@ietfa.amsl.com>; Sun, 10 Mar 2019 03:29:39 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C79DC124D68 for <i18ndir@ietf.org>; Sun, 10 Mar 2019 03:29:38 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1h2vi5-0009MB-9t for i18ndir@ietf.org; Sun, 10 Mar 2019 06:29:37 -0400
Date: Sun, 10 Mar 2019 06:29:31 -0400
From: John C Klensin <john-ietf@jck.com>
To: i18ndir@ietf.org
Message-ID: <DCAD48F7F821C77EC623233D@PSB>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/ljcLxwnSJpCQS6sOdC08PGA-ut0>
Subject: [I18ndir] In-depth analysis of draft-faltstrom-unicode11-07
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Mar 2019 10:29:43 -0000


Hi.  I've avoided turning cryptic notes on paper into this note
so far because of guidance from Peter and Pete but, because I
seem to be seeing things in draft-faltstrom-unicode11-07 that
others don't and vice versa and Harald's draft of a few hours
ago doesn't begin to cover this span of issues, so maybe it is
time to post a relatively careful analysis.  I've been writing
continuously for many hours and my eyes are fading to the point
that I cannot reliably see the screen, so I have decided to
post this now (early in Harald's 24 hours) rather than
carefully proofread, so please forgive typos and bad sentences.

This note is necessarily long (I think longer than the document
prior to references and tables); my apologies to those who find
long analyses hard to take.  For those who like their
conclusions at the beginning, it is that, for multiple reasons,
the document is not ready for publication (not even "Almost
ready, with caveats").  Those reasons include a few technical
errors; asserting IETF discussions and consensus that have not
occurred; tutorial material that is not really part of the
scope of the document as described in the Abstract or
Introduction (or borrowed from RFC 6452) including a discussion
of deployment that is inadequate and possibly incorrect; making
assertions about why things are being done and the possible
effects of some actions that are unsupported by observable
reality; and assuming that the IETF is shifting authority
boundaries and taking on (or endorsing) efforts that have not
even been seriously discussed.  

In addition, I believe that the
intent of the IDNAbis WG was that the review prior to moving
forward with a new version of Unicode was to consist of two
parts -- a check for older code points who properties had
changed in ways that might affect IDNA and at least a
superficial examination of newly-added code points to see if
any stuck out as needing special treatment.   That two-part
review model was followed in the work leading to RFC 6452.  The
second element of review was what first identified the likely
issue with U+08A1 (aka the Hamza mess).   I'm not aware of that
type of review being conducted (or even seriously attempted)
for Unicode versions 8 through 11 nor any IETF discussion about
abandoning it; the I-D appears to assume it is no longer
relevant.

Where I could, I've suggested text to at least mitigate the
problems.   My personal preference continues to be that we at
least figure out how we are going to address the issues raised
by other i18n documents (and, if appropriate, another one
identified below) before coming back to review this I-D, but,
because others seem to be feeling more urgency (or are
convinced that we will never get to the other documents and
issues), I've tried to focus the comments below on getting a
revision of this document together and getting it published.

Issues are discussed below in the order in which the relevant
topics are covered in the I-D, not in order of importance.
Some (although few) of the comments are nit-picking but no one
else seemed to have dnoe prior to last week so they seemed
worth including.  I have not, however, identified some problems
that are so obvious that it is clear that the RFC Editor would
fix them (most of which were pointed out in Martin's review
anyway) such as "also recomments" in the Abstract.


    ---- Issues and Detailed Comments ----

(1) Meta-comment: This document jumps back and forth between
tutorial materials that explain the context and applicability
of IDNA (a few of which might be considered as updates to
5894), statements that essentially clarify the IDNA2008 base
documents (and arguably update 5892 or 5891), and the normative
statements about the disposition of new Unicode versions and
code points that the document is nominally about.  That is not
necessarily a problem, but we should be aware that it is
happening and should try to avoid any statements we'd be likely
to change on further review.

(2) Nit (but substantive, not editorial).  The first sentence
is Section 1 is simply wrong.  According to tracker (as well as
my memory) the IDNAbis WG was not chartered until April 2008
and, despite the wildly optimistic schedule shown at the bottom
of https://datatracker.ietf.org/wg/idnabis/about/, the
documents did not really start to come together enough to make
a "largely completed" statement plausible until well into 2009.
According to my notes, IETF Last Call stretched into January
2010 and the documents, of course, bear August 2010 publication
dates.  Some readers of this note will recall that there was
event a fairly extended discussion as to whether "IDNA2008" was
appropriate for 2010 documents.  Suggestion:

Old: 
	...was largely completed in 2008, and is thus known within
New:
    ...was initiated in 2008 and, despite not being completed
	until 2010, is widely known

(3) Section 1, bullets.  The two bullets miss an important issue
that
lies at the root of the non-decomposing character problem and
other potential issues, including some speculation about how
the various emoji issues might eventually play out (details on
request, but not critical to this analysis).   Suggest adding a
third bullet, to read something like

  o Problems can also be created if the properties assigned to
    those code points are inconsistent with IDNA2008 assumptions
    about how properties are assigned and/or about how code
points
    with those properties are used or behave.

In the last sentence of the second bullet the phrase "a code
point that was not allowed (and thus is blocked in some
situations" appears.  Noting that, under IDNA2008 a code point
that is DISALLOWED MUST NOT be stored in a label and that
lookup applications MUST reject a label containing such a code
point (and not look it up), I'm not sure where "some
situations" comes from and what it is expected to imply.  If
the intent was to say that implementations that had not been
updated to the latest version of Unicode (and set of tables)
would reject the code point while more recently updated ones
would allow it, there are probably ways to say that which are
more clear.  I also note that the WG explicitly discussed the
implications of these sorts of transitions and accepted them,
despite arguments both that no changes should be allowed and
that DISALLOWED-> PVALID changes should be allowed but not
PVALID -> DISALLOWED.

(4) Section 1, Introduction, paragraph 3 (starting
"Historically...").

IMO, this is very misleading.  As far as I
can tell, "accepted all implications of changes in the Unicode
Standard" encompasses exactly the three code points called out
in RFC 6452 and described in the last bullet of Section 3.1 of
the present I-D (more on that below).  Those three code points
were examined very carefully and broadly enough to make a claim
of IETF rough consensus plausible (even then, the second
paragraph of its Acknowledgements section may be important).
While not clearly called out by the RFC, there was also at
least a superficial review of all of the new code points in the
hope of catching any those that might pose special problems.
U+19DA got special attention in that regard because any labels
that incorporated would be rendered invalid; the discussion,
IIR, included the likelihood that such labels might actually
exist.

There is little evidence that, with a very small number of
exceptions, the individual codes points and changes identified
in the current I-D have received a nearly equivalent level of
review.  Patrik can speak for himself but, at least for me, one
of the reasons we asked for the I18NRP BOF, as well as having
had several other discussions, was to fix our apparent
inability to review documents like this the way the IDNA2008
specs intended.

That intent is another core issue both generally with IDNA2008
and with this document.   My understanding in the 2009-2010
period was that the running of the tables and checking for, and
then carefully examining, the code points for which the derived
properties had changed was only part of the process.  The other
part was to do at least a sanity check on all of the
new code points in the hope of detecting anything weird going
on.   It was precisely the latter type of review on Unicode 7.0
that turned up the issue with U+0A81 and that led to the
improved understanding that followed: it was not a change in
the derived property of an existing code point; it was an
entirely newly-assigned code point in 7.0.0.   Had the intend
of IDNA2008 been to accept new code points without review or
possible criticism, the 2015 IAB "hold" request would have been
unjustified and completely unreasonable.

I believe we also have criteria for when a newly-assigned code
point (or one whose properties have changed) is problematic
enough that special treatment is justified.  That is the list
of code points that the IETF singled out for special,
exceptional, treatment in Section 2.6 of RFC 5892.  The mere
existence of that list contradicts what I think many people
will read into "staying with the Unicode Standard has been
viewed as important", i.e., that we have always accepted
decisions of the Unicode Standard. 

I believe (and believed when 6452 was under development and may
have said so explicitly then) that the list in Section 6.2 was
not a "one time and never again" list but rather that it set
reasonable criterion for evaluating new code points and changes
is whether the issues with those code points are less, or more
or equivalent, serious than those with the code points already
in Section 2.6.  If less, we accept them and, at most, grumble
a bit.  If more, we talk about them very carefully and with the
full understanding that letting something go by that would move
the "minimum problem level to justify an exception" needle
upward sets a precedent that will be difficult to avoid in the
future when Unicode does something equally or slightly less
egregious.

There is also an issue with the last sentence of that
paragraph.   There is zero evidence that this is the primary
reason for the choice.  There were, by definition, not a
"diversity of implementations" when the IDNA2008 specs were
published.  I don't even know if that statement would have
been true when 6452 was published.  If it represents a change
of policy with this draft, that policy implicitly updates the
IDNA2008 specs (again, noting that those specs call out a list of
differences from the set of code points that would have been
determined by Unicode properties alone) and that should be
noted and a serious discussion held about it.  Certainly there
is a preference for staying as close to the Unicode standard as
possible consistent with IDNA2008 principles, but it has little
or nothing to do with the number of different things claiming
to be IDNA that are out there (not really a diversity of
implementations of IDNA2008 either).  Suggestion:

Old: 
   Historically, the IETF has accepted all implications of
   changes in the Unicode Standard even though the changes have
   resulted in problematic changes in the derived property
   value.  The primary reason for that choice is that staying
   with the Unicode Standard has been viewed as important
   because of the diversity of implementations already existing
   in the wild.

New: 
   <<>>>

(5) Section 1, paragraph 4 (starting "As described in Section
4")

Halfway through this paragraph, the sentence "If a change occurs,
and it is between any of the derived property values except
DISALLOWED, there is not a problem."   I am supposed to be an
expert on IDNA2008 and I have no idea what that sentence means.
I'm not completely sure about the first sentence of the
paragraph either, but at least there I can guess the intent.

(6) Section 1, paragraph 6 (starting "In 2015, the Internet")

The last sentence claims that the issue is resolved by the
current document.  First of all, while the IAB statement was
never updated to reflect discussions and a more general
understanding of what became known as the "non-decomposing
character problem", including the attempted LUCID BOF
discussions, we quickly learned that the issue was not just
about U+08A1 or the closely-related problem of
language-dependent code points within a script (the latter was
called out even before the IAB statement).  So while the
document may resolve the BEH WITH HAMZA ABOVE issue (I contend
that it does not, see below), concentrating on the IAB
statement exclusively is deeply misleading relative to
discussions in the IETF (and the late IAB Program) about the
issue, essentially blowing them off.

(7) Section 3.1 (Editorial)

This section would be far easier to read and understand it
document titles, or at least abbreviated ones, were given.  For
example:

Old:
  o  RFC 5890 [RFC5890], informally
New:
  o  Internationalized Domain Names for Applications (IDNA):
     Definitions and Document Framework [RFC5890].  This
	 document is informally

The RFC Editor may have better suggestions and may prefer that
the title be in quotes.  Similarly for the other documents
listed.  Note that, because the sentence structure for the
final document in the set (RFC 6452) is different from the
others and it does not have an informal name, it will need a
slightly different change.

(8) Section 3.2 Deployment

While I think this is extremely useful information (at least
after some small corrections), it does not appear to be part of
the mandate of this document and is certainly not reflected in
the abstract.  If one were to follow the structure of the
existing IDNA2008 documents, this Section is really an update
to RFC 5894.  Consideration should be given to removing it
entirely and putting the information somewhere else, perhaps in
an expanded discussions of those variations on the IDNA
approach.  The following assumes that the section is,
nonetheless, retained.

New (add a new sentence to the Abstract, reading something
like)"
	To improve understanding, this document describes systems
	that are being used as alternatives to those that conform
	to IDNA2008.

Note that the Abstract is getting long for RFC Editor (and good
sense) preferences and might be in need of rethinking.

The list itself is not complete and is arguably a bit
misleading.  In particular, a fully-conforming implementation
of IDNA2003 is bound to Stringprep as described in RFC 3454 and
Nameprep as described in RFC 3491 (and hence is tied to Unicode
3.2).  Because each of the documents that describe IDNA2003
(including Stringprep, on which Nameprep is normatively
dependent) has been obsoleted by later work, the status of
IDNA2003 is a little weird, but I don't think this document
needs to get into that.  However, there are a number of things
out there that effectively claim to be IDNA2003 implementations
but that use extrapolations, by their developers) of what
Stringprep and Nameprep would look like if the IETF had updated
them to reflect various versions of Unicode.

That this distinction is important is illustrated by a recent
on-list discussion.   If "IDNA2003" really means "IDNA2003" as
specified in the cited standards, then all of the code points
introduced since those specifications were published are
invalid for use in stored labels.   If it means "the rules of
IDNA2003 applied to whatever tables an implementer thinks are
appropriate but informed mostly by toCaseFold and NFKC", then,
well, who knows but probably, e.g., emoji are valid in domain
name labels.

Old:

  o  IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491
	 [RFC3491] which implies using a table within which it is
	 said whether code points are allowed to be used or not,
	 after doing the normalization specified in IDNA2003.

New: 

  o  IDNA2003 as specified in RFC 3490 [RFC3490] and RFC 3491
	 [RFC3491].  Those specifications are dependent on case
	 folding and NFKC normalization and on tables that specify
	 for each code point whether it is allowed to be used or
	 not, with a distinction made between use for "stored
	 strings" and "query strings".  The tables themselves are
	 dependent on version 3.2 of Unicode [Unicode-3.2.0].

  o  A number of variations on IDNA2003, sometimes presented as
     "updated IDNA2003" or the like, which follow the
	 principles of IDNA2003 as understood by the implementers
	 but that use tables that represent how the implementers
	 believe Stringprep [RFC3454] and Nameprep [RFC3491] would
	 have evolved had the IETF not moved in the direction of
	 IDNA2008 instead. 

Note that the second bullet in that section in
draft-faltstrom-unicode11-07 is a specific IDNA2003
non-conforming variation in which validity of post-Unicode 3.2
code points is calculated according to IDNA2008 rules but
codepoints actually shown in Stringprep have their IDNA2003
validity values.  Such an implementation would be wildly
inconsistent internally, enough so that user confusion and
complaints would be nearly certain.   For example, it would
treat symbols that appear in Unicode 3.2 as valid but those
that were added later as invalid and would, I think, treat
characters removed by NFKC but that appeared in Unicode 3.2 as
invalid while such characters added later than Unicode 3.2 (but
unaffected by case folding) would be valid.

I don't believe I've seen such an implementation in the wild
but will take Patrik's word for their existence.

(9) Section 3.2, first paragraph.

As a general observation about terminology, the I-D (and our
other discussions) should probably use "implementation" much
more carefully.    As an example, the first sentence of this
paragraph talks about the "level of deployment of IDNA2008",
while the second talks about "existing implementations" without
specifying of what, hence implying that they are
implementations of IDNA2008.   That is fairly close to deadly
in some contexts, e.g., if statements were made requiring use
of IDNA2008 in some contexts, having an IETF document that
seems to claim that all of these variations are
"implementations" of IDNA2008 would almost certainly be
exploited by assorted bad actors.  Those who are at the ICANN
meeting will almost certainly know what I'm talking about;
others should consider themselves lucky.

Suggestion:
Old: 
	that existing implementations are known

New:
	that implementations that claim to be IDNA or variations on
	IDNA are known

(10) Section 3.2, Last bullet (starting "A mix between
IDNA2003 and IDNA2008 according")

This mischaracterizes the present state of UTS#46.  Yes, there
is flexibility based on how one selects so-called transition
options that can produce many different outcomes for assorted
edge cases, but its core in all recent versions (probably going
all the way back, but I haven't checked) is a normative table
that is much more Stringprep-like than anything having to do
with IDNA2008.  See comment 17 below.

(11) Section 3.2, Second paragraph (starting "The issue is
further...")

This paragraph seems a bit out of place.  RFC 5894 is
informational and does not, formally, have requirements.  It is
not clear what DNS registry operators have to do with this
document, which is about new and changed code points, not
IDNA2008 operations.  If the intention is to slide the subject
matter of draft-klensin-idna-rfc5891bis, that document should
probably be referenced explicitly and the paragraph rewritten.

(12) Section 3.2, third paragraph (starting "In practice, the
Unicode")

I find this paragraph a little bit confusing as written, in
part because it isn't clear whether unassigned code points are
part of that Unicode Consortium maximum.   However IDNA2008
actually creates several subsets, each associated with a
derived property value (remembering that there are four, not
two).  

Suggestion:

Old:

	The IDNA2008 rules based on the Unicode Standard create a
	subset of these by assigning the PVALID derived property
	value to them.  

New:

	The IDNA2008 rules use the Unicode Standard to create a
	further subset of code points and context that are
	permitted in DNS labels associated with its PVALID,
	CONTEXTJ, and CONTEXTO derived property values. 

I think that can be said better, but I'm out of ideas at the
moment.


(13) Section 4.1, paragraph 1.

As noted above, that extensive discussion in the IETF rapidly
moved beyond U+08A1 and to the more general issues it raised.
Even where the IAB statement was concerned, I believe it would
have been completely unreasonable for the IAB to issue that
precautionary "just wait" statement over a single code point
that our research suggested was part of a writing system that
was not in significant contemporary use, especially on the
Internet (even though "contemporary use" was never a criterion
for IDNA2008 acceptability).  I don't believe that was the
IAB's motivation.  Instead, the "Hamza" discussion was almost
entirely about the two issues that it raised and how they might
have led to misunderstandings in the design of IDNA2008.  Those
two issues were that some code points, including newly-added
ones, appeared to be composable by combining sequences of
earlier code points without NFC decomposing to those sequences
(something the IDNAbis design team believed we were promised
would never occur and designed IDNA2008 accordingly) and the
presence of different ways to code an abstract character or
character sequence within the same script, essentially
asserting that they were different abstract characters because
they were used differently in different languages.

The real discussion of the issues that the U+08A1 discovery
opened up is in draft-klensin-idna-5892upd-unicode70, while
draft-freytag-troublesome-characters just identifies that code
point and a few others (and points to
draft-klensin-idna-5892upd-unicode70).  While I hope no one
could claim consensus for either, the draft-klensin document
was on the agenda of a BOF and therefore has received some, I
hope serious, discussion in the IETF and well as in the IAB
Program and elsewhere (my recollection is that it even come up
in PRECIS discusions, but Peter can confirm).  The
draft-freytag document has, if I recall, been discussed only
among a small circle of friends (most of whom are now on the
directorate) and has never been on an IETF or WG agenda for
substantive discussion.  

The final sentence of the paragraph has no claim whatsoever on
IETF Consensus although, if the document is approved, it is
certain that claims will be made that it represents such
consensus.  See the comments above about the bar for actually
making an addition to the exception list IDNA2008
specifications.  Saying here that "it is still acceptable to
allow the code point to have a derived property value of
PVALID", without further explanation, is equivalent to
concluding that all (present, past, and future) code point and
code point sequences within a script that are distinguished
only by language and/or do not decompose to reasonable and
plausible combining sequences should be treated as PVALID if
Unicode considers them letters and that either disallowing them
or classifying them as CONTEXTO by exception will not be
considered.   The alternative is to walk us into either
significant inconsistency or retroactive incompatible changes.
I believe that Asmus's recent observations about combining
characters that are not intended for use with letters further
complicates these problems.

For whatever it worth, we are talking about IDNA2008 here.
"recommendations to include the code point in the repertoire of
characters permissible for registration or not" are not part of
the IDNA2008 vocabulary and have no place in this document
without significant IETF protocol specification work.  IDNA2008
itself does not anticipate such recommendations: a code point
is either PVALID (or CONTEXTx and meeting the contextual rules
as needed) or it isn't.  Much as I'm sympathetic to the
draft-freytag work (although I see significant problmes with
it), we should not be using it to get the IETF into the
recommendation business as the result of a casual remark in
this I-D.  In the longer term, it poses another alternative,
which is to decide that all issues involving such types of
characters and maybe others should be dealt with simply by
calling the attention of registries to them, hoping (even in
the presence of evidence to the contrary) that all registries
will be responsible about their delegations, and then move on.
But the IETF has not agreed to any such thing and, If that is
what we intend, it should be explicit.  It would also be a
sufficiently important decision that it should not be slipped
into a document after IETF LC has nominally closed.  That is
especially true if the intent is to effectively eliminate the
importance of future reviews of this sort because because we
are just going to leave all decisions about new code points or
changed properties to lists like the "troublesome" one and
registries rather than --even as a possibility-- adding
additional cases to the IDNA2008 exception list.  I've been
assured several times that the I-D does not intend to preempt
future actions by the IETF but, insofar as consistency with
prior actions is an important criterion and we continue to have
a high level of reluctance to make incompatible changes, if
this code point is going to be accepted, especially with the
sort of statement made in that section, solid explanation as to
how such precedent-setting is to be avoided is appropriate,
necessary, and certainly does not appear in the document.

It would be future work but, if the plan is really to shift
responsibilities that now belong at the IDNA2008 protocol level
to registries, we should examine whether the CONTEXTO, and
maybe even the CONTEXTJ, categories are needed or whether we
should reduce the complexity of IDNA2008 by eliminating them in
favor of more advice to (and responsibility for) registries.

I want to stress that I don't believe there is IETF consensus
for how to handle language-sensitive (within script) or
non-decomposing code point, saying that there is when there
hsas been little discussion, none of it conclusive, seems to ne
to be inappropriate and risky.  I do see two ways out of
there is the directorate and community believe that there is
urgency about processing and publishing this I-D or its
successor.   One would be to update IDNA2008 to create a
protocol-level derived value we might call "tentative",
assigned only by exception (as CONTEXTO is now assigned) that
would be treated the same as PVALID except with the caution
that the classification is, well, tentative and that users of
labels containing such characters and registries delegating
labels containing them should be aware that, as knowledge
improves, they might be DISALLOWED without our paying much
attention to crities of agony about incompatible changes.
Advising them to seek additional advice from the "troublesome"
list and elsewhere would be entirely appropriate in that
context.

The other would be to simply exclude those non-decomposing code
points from the decisions and tables of this document, treating
them as if they were unassigned until (and unless) we can come
up with better and more definitive answers.

(14) Section 4.2

While I believe the statement that no changes were been made
that change the derived property value for code points that
existed earlier, as far as I know, no one has done a review of
that new code points similar to the reviews conducted for
versions 6.x and 7.0 of Unicode.  If anyone has done such a
review in an IDNA2008 context, they should identify themselves.

This omission presumably also to version 11.0 because all
the second paragraph, first bullet of Section 4.2 says is we
should go off and calculate, presumably without examining the
code points even superficially.

If the description in Section 4.3 includes mention of how many
code points were added (something that has nothing to do with
changes) why is that information not present for versions 7
through 9?

(15) Section 4.3, paragraph 2, bullet 3 (starting "The
U+111C9...")

After heated debate, especially with Mark Davis, who insisted
that changes from DISALLOWED to PVALID were or but changes from
PVALID to DISALLOWED should never be allowed, the WG very
clearly concluded that both types of changes needed to be
examined and treated carefully.   Part of the reason was that
the requirement that lookup applications that conform to
IDAN2008 are required to check labels for code point validity
and reject any labels containing DISALLOWED code points without
actually getting near the DNS (another requirement to which
Mark objected strenuously but was far out in the rough), with
the result that a change from DISALLOWED to PVALID could result
in false negatives.  

So, on what basis is this "an acceptable change"?  I think it
probably is (and that the deviation isn't worth the trouble)
but, unless some IETF process reversed that clear consensus in
the WG, stating it as acceptable as if we had adopted Mark's
model doesn't fly.  And, again, if there has been any IETF
discussion of this issue, or even a serious attempt at such a
discussion, I haven't been aware of it and would be happy to be
pointed to email message, minutes, etc.


(16) Section 5, paragraph 1.   

I note that this paragraph talks about "the conclusion of this
document", which is entirely reasonable whether one agrees with
it or not, and avoids the language about IETF conclusions
which, as noted above, I do not believe can be substantiated.

(17) Section 5, paragraph 2 (starting "To increase overall
harmonization")

This gets back to the "implementation" question discussed
above, the issue of whether the IETF should be making
recommendations about how to make non-confirming systems work
better (or less badly), and the unproven hypothesis that there
are a non-trivial number of implementations out there that are
using IDNA2008 calculations for new values in local extensions
of Stringprep and Nameprep tables.  Certainly the most
important and widely-cited of the alternative IDNA protocols --
UTS#46 does not do so: it is based on a normative table of its
own,
https://unicode.org/Public/idna/11.0.0/IdnaMappingTable.txt,
and examination of that table and the text that describes it in
the USS#46 base document give no hints of paying attention to
IDAN2008's calculations.

Then the text says "derived property values MUST be calculated as
specified in the documents listed in section Section 3.1"
(noting section Section" as an editorial nit).  This is either
a new requirement, in which case it must update RFC 5892 or
potentially 5890, or (as I believe) it isn't, unless this
document is really slipping over into the "recommendations for
registries" business.  To me, IDAN2008 is perfectly clear: one
either property values as the IDNA2008 specifications provide
or it isn't IDNA2008.

Recommendation: either drop or drastically rewrite this
paragraph.

(18) Section 5, paragraph 3 (starting "All DNS registries (and
other organizatios)" (sic)

If this recommendation is wroth making or repeating here, what
are this document and the IANA tables for?  That is a
rhetorical question, but this paragraph should either be
rewritten to explain the relationship or should be dropped.