Re: [I18ndir] One more try

John C Klensin <john-ietf@jck.com> Thu, 21 February 2019 16:08 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 94FDA12E036 for <i18ndir@ietfa.amsl.com>; Thu, 21 Feb 2019 08:08:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Nf5jrwehlYcY for <i18ndir@ietfa.amsl.com>; Thu, 21 Feb 2019 08:08:19 -0800 (PST)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CF5211286E7 for <i18ndir@ietf.org>; Thu, 21 Feb 2019 08:08:18 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1gwqtU-000OgC-EF; Thu, 21 Feb 2019 11:08:16 -0500
Date: Thu, 21 Feb 2019 11:08:11 -0500
From: John C Klensin <john-ietf@jck.com>
To: Patrik Fältström <paf@frobbit.se>
cc: i18ndir@ietf.org
Message-ID: <EE13B13992F8A37C44A0246A@PSB>
In-Reply-To: <6A6BEFCB-C318-40C8-A2CA-86CC5F11ED38@frobbit.se>
References: <2B91B60DE56B36DD5D667679@PSB> <6A6BEFCB-C318-40C8-A2CA-86CC5F11ED38@frobbit.se>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/99IU5AqqiKneqavBis3H4KAF584>
Subject: Re: [I18ndir] One more try
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2019 16:08:24 -0000

Patrik,

Thanks.  To clarify one thing that might help us move forward, I
have never advocated completing what you describe as (2) before
action is taken on (1).  In part because, from my point of view,
a very large fraction of the reason for proposing the BOF that
led to creation of this directorate was to start addressing and
moving forward on the issues that make up (2), I'd be a lot
happier if we were showing evidence of having (or even making) a
plan about (2).  If we don't --if all the directorate can do is
respond to IETF Last Calls and issues that one or more ADs
define as crises-- then I actually think there is an argument
for dropping draft-faltstrom and going to work on a statement
about IETF dropping out of I18n development and what that means
to the Internet.  

However, the less we actually do about (2) in the near term, the
more I think it is important that draft-faltstrom-unicode11 be
crystal-clear that the IETF may, in the future, update or
otherwise revise parts of the IDNA2008 specifications, that we
are not obligating ourselves to accept whatever comes over the
wall from Unicode in the future, that the I-D does not set
irreversible precedents, and perhaps even that the decision that
it was important to get the I-D through (after a long delay) by
accepting the Unicode categories may include classification of
some marginal cases as PVALID that might be changed as our work
continues.  

Some of that may require a bit of triage on (2), particularly in
the area of how we think about and describe the boundary between
the absolute and global requirements of IDAN2008 and what is
left to registries acting responsibly and thoughtfully.  IMO,
some of the discussion Asmus, Patrik, and I have been having is
about that even though it wasn't obviously our intention.
However, again, I'm talking about enough consideration of, and
agreement on, the general outlines of what it is we want to do
(and, ideally, how we are going to do it) that Patrik (and
whomever he can get to help) can adjust the text of the I-D.  

I'm happy to try to help write if Patrik would find that useful,
but, like him, that first requires a decision (or a few of them)
about where we want to go.

best,
    john



--On Thursday, February 21, 2019 10:33 +0100 Patrik Fältström
<paf@frobbit.se> wrote:

> Let me say I have noted these things (thanks John) and I as
> editor will take on these as soon as we know which one of the
> main path forward we choose of for me these two work items:
> 
> 1. Do changes to draft-faltstrom-unicode11 INCLUDING potential
> warning signs etc telling "this is an extrapolation of
> [possibly bad] choices made once upon a time by IETF" to
> IDNA2008 to Unicode 11 (or 12)
> 
> 2. Work on the issues with various things found in my
> document, johns, asmus and many others
> 
> I am completely of the view that [2] is the most important
> thing this group (and IETF) should do, because "just"
> extrapolating IDNA2008 forward is not possible "forever". We
> most certainly need something new. Regardless of it is a tweak
> or something completely new.
> 
> The main question *now* I think is whether we should wait with
> [1] before we do [2], and that I do not see a consensus yet.
> 
> I see arguments (including from me now and then) that
> adherence to IDNA2008 is so darn important that we must move
> forward, else people will continue to claim IETF is slow,
> mapping is done according to the list john points to below and
> even more ugly things.
> 
> I also see arguments (including from John and others) that if
> we do not do (2) now before (1), we will never do (2). And
> this I can *also* agree with.
> 
> 
> Harald, I do not envy you!
> 
> Once again, thanks John for a good note.
> 
>    Patrik
> 
> On 21 Feb 2019, at 9:31, John C Klensin wrote:
> 
>> Ok.  I've been accused of just trying to hold things up,
>> possibly by making them so complicated, or bundling so many
>> things together, to make progress impossible.  That is not my
>> intent.
>> 
>> So let me see if I can boil this down further at the risk of
>> oversimplifying a bit and leaving some possibly-important
>> nuances out (indented paragraphs, which are also set off by
>> "(( .... ))" are examples or speculative comments -- feel
>> free to skip over them is you are confident you understand
>> what I'm talking about and don't have time.:
>> 
>> (1)  There are two ways to view the IDNA2008 specs.  One is
>> that we have made our choices, are stuck with them, and that
>> anything we discover or that Unicode does to us, even things
>> that with ordinary IETF specifications would be considered
>> known technical defects, just have to be accepted and dealt
>> with by giving advice to registries and hoping they follow it
>> 
>> 	((The questions of who gives that advice, how it is
>> 	maintained, in what ways different sorts of registries
>> 	are differentiated, etc., are a separate issue from the
>> 	current I-D and can be handled whenever (or if) we get
>> 	around to it.  So can any clarifications to IDNA2008 to
>> 	better identify the role and registries and the
>> 	importance of their being careful, conservative, and
>> 	responsible as long as those clarifications don't
>> 	actually change the specifications or their intent.))
>> 
>> draft-faltstrom-unicode11-07 is aligned with that view.  It
>> doesn't say so in so many words (and I think it should if
>> they is what we are going to do), but it does say that some
>> things that have been identified as problems are best just
>> left with whatever categories are determined by the RFC 5891
>> and 5892 rules and then sorted out by registries, citing
>> draft-freytag-troublesome-characters non-normatively as a
>> specific example of such advice and that, whatever we do, we
>> just follow along with Unicode's properties and categories.
>> 
>> The other view is that these are IETF Proposed Standards like
>> all other IETF Proposed Standards.  If we discover technical
>> defects, we address them in a serious way.  Perhaps we fix
>> them. IDNA2008 (IMO) anticipated that possibility: code
>> points can be added to the exception list, new contextual
>> rules can be added or old ones modified, or new rule sets can
>> be added to section 2 of RFC 5892.   One can even imagine
>> Unicode making a set of additions that would justify adding
>> an entry to the block collection in Section 2.4 of RFC 5892.
>> Or possibly we can explain the issue in prose (possibly in
>> 5891 rather than 5892) and move on.  In some cases, that
>> explanation might be that after careful examination we
>> concluded that the pain and suffering that would be caused by
>> the incompatible changes needed to get things right so far
>> exceeded the problems that were likely to be caused by the
>> defect that the latter is better just left alone but, if so,
>> we should document that, not hand-wave around it. However, if
>> some of the things we discover (or changes or new code points
>> in a new version of Unicode) suggest global restrictions at a
>> level of importance equivalent to global restrictions we now
>> impose, the "just like any other standard" principle and the
>> structure of IDNA2008 strongly suggests that we should
>> consider modifying the standard and its rules, not ending up
>> in a situation in which the real distinction between what is
>> handled as a global rule (remembering that IDNA2008 has
>> expectations about global rules being enforced at lookup
>> time, something that cannot possibly be done with suggestions
>> that are applied by individual registries at all levels of
>> the DNS hierarchy.
>> 
>> 	((Let me illustrate with a real example that Patrik's
>> 	search did not turn up this time, nor did our search
>> 	when RFC 5892 was finalized.  The example is chosen
>> 	partially because I'm just getting tired of talking
>> 	about non-decomposing code points and associated
>> 	combining sequences.  Section 2.4 of 5892 lists (and
>> 	disallows) two types of symbols for musical notation,
>> 	both essentially European.  Handling them by blocks is
>> 	necessary because both notation systems consist of a
>> 	combination of basic musical symbols and combining
>> 	symbols.   The former have General Category So and would
>> 	hence be DISALLOWED without needing a special rule.  But
>> 	the others have General Category Mn which, absent other
>> 	action, make them PVALID.  Well sometime around Unicode
>> 	5.x, coding was introduced for Balinese, including
>> 	Balinese musical notation.  It has the same property
>> 	relationships as the more traditional (in Unicode)
>> 	musical notations -- the base notational symbols have
>> 	General Category So and the combining marks are in Mn.
>> 	Other than "slipped through the cracks" there is no
>> 	rationale that makes sense in an IDNA2008 context for
>> 	DISALLOWing the Balinese base musical symbols (along
>> 	with the traditional western European and Greek ones
>> 	while treating the Balinese musical combining symbols as
>> 	PVALID while the others are not.  There is, however, a
>> 	Unicode reason: while, in Blocks.txt, the musical
>> 	symbols listed in Section 2.4 have their own blocks,
>> 	they are folded into a Balinese block with letters of
>> 	the writing system for the latter.  If we had noticed
>> 	this in 2010, we would have had an interesting
>> 	discussion about blocks that should be DISALLOWED for
>> 	consistency but that are not named as blocks in
>> 	Blocks.txt.   So, maybe we just live with this as an
>> 	error on the theory that a registry would be insane to
>> 	use those combining marks with anything but Balinese
>> 	musical symbols and those are DISALLOWED.  Maybe it
>> 	deserves a note somewhere; maybe not.   But now suppose
>> 	a proposal for another musical notation comes along that
>> 	uses a combination of base symbols and combining marks
>> 	and Unicode accepts a proposal to include it.   Suppose
>> 	too that the block is big enough that the musical code
>> 	points can't be folded into some associated script block
>> 	so we end up with a separately-named block for it.   Are
>> 	we going to follow the precedent of Section 2.4 and
>> 	DISALLOW the block or are we going to follow the
>> 	precedent of Balinese and let it go (and hope that
>> 	registries get the right advice and follow it)?  Would
>> 	what typical graphemes for those code points look like
>> 	and how they combine make a difference?   If we accept
>> 	the "just follow Unicode" instructions of
>> 	draft-faltstrom-unicode11 the answer is clear but it is
>> 	less clear that it is the right one.))
>> 
>> There is another kind of mistake we might have made -- using
>> the example above, if the answer for Balinese is going to be
>> the answer for all musical notations we encounter in the
>> future (i.e., musical combining characters are PVALID even
>> though the base ones are DISALLOWED), maybe we should be
>> figuring out whether including two musical notations but not
>> the others in Section 2.4 was a mistake and we should take
>> those two our for consistency, thereby shifting several code
>> points from DISALLOWED to PVALID (and then presumably
>> advising registries to not use them).
>> 
>> 	((I can't imagine that this would be a big issue one way
>> 	or the other.  But think about what happens if the next
>> 	round of emoji excesses introduces some code points
>> 	classified as non-spacing marks.   Are we willing to
>> 	have them be PVALID and rely on registries, some
>> 	emoji-happy, to do the right thing?   That is a real
>> 	question -- I don't know the answer but think we should
>> 	be in a position to think about it should the situation
>> 	arise, and that makes "just follow Unicode" a bad idea.))
>> 
>> (2) The second choice above -- treating IDNA2008 as an
>> ordinary set of Proposed Standards that may have technical
>> defects that we should be looking to straighten out rather
>> than deciding that anything introduced by changes in Unicode
>> or discovered as our knowledge increases -- does not require
>> delaying draft-faltstrom-unicode11 until all of the issues
>> that have been raised are resolve (or until the end of time
>> if that does first).    It does require that we go through
>> the text to remove or modify the "just accept whatever
>> Unicode throws over the wall" language to make it clear that
>> careful reviews are in order.  It also requires that we have
>> some sort of plan about such reviews (not easy given that,
>> even with this directorate, the number of people who have
>> participated substantively are very few, but lots easier than
>> solving the problems and that we can describe the issues that
>> have been outstanding for a few years and either identify
>> them as suitable for leaving to registries or ones that
>> should cause some choices in the I-D to be marked as
>> tentative.   I'd imagine that, if more of us can really
>> engage rather than sitting thing out, those things can be
>> done fairly quickly, possibly without holding up the document
>> up at all, because...
>> 
>> (3) All of these issues aside, I don't know how many people
>> in this directorate who are fluent native speakers of English
>> (either major variation) have read the document closely, but
>> it appears to ma to need work.   There are some typos that we
>> can probably rely on the RFC Editor to fix (e.g.,
>> "recomments" in the Abstract), but there are also some
>> technical errors (e.g., IDNA2008 was not "largely completed"
>> in 2008; that is more or less the year that revision work got
>> underway with "largely complately" belonging to 2010) and
>> more complex editorial ones (e.g., changes have not been made
>> to Unicode "related to the   algorithm IDNA2008 specifies"
>> which would imply that those changes were made because of
>> IDNA2008 or with IDNA2008 in mind. I don't believe either of
>> those are true and believe that Mark would deny it even if
>> they were.  It would be much better and more accurate to say
>> that some Unicode changes have consequences for IDNA2008 (or
>> for its algorithm)).
>> 
>> I also don't believe some of the more substantive statements
>> are accurate.  For example, the paragraph starting
>> "Historically" implies that Unicode 6.0 raised substantive
>> issues as serious as those identified in the 7.0 timeframe
>> and later and that we decided to just accept them.   I don't
>> recall there being any such issues.  From that standpoint,
>> the key statement in RFC 6452 appears under Security
>> Considerations: "The three code points are unlikely to occur
>> in internationalized domain names,..."   That is a very
>> specific rationale for accepting the change for those
>> particular code points.  It is quite different from "The
>> primary reason for that choice is that staying with the
>> Unicode Standard has been viewed as important because of the
>> diversity of implementations already existing in the wild.",
>> a statement for which I don't think there is much evidence of
>> informed IETF consensus.
>> 
>> There are also several places where the draft omits material
>> that would considerably strengthen it.   For example, under
>> 3.2, there are "interpretations" of UTS#46 that go well
>> beyond "a mix between IDNA2003 and IDNA2008".   As one
>> important group of examples,
>> https://www.unicode.org/Public/idna/11.0.0/IdnaMappingTable.t
>> xt shows many or most symbols, including all of the emoji, as
>> valid.  Neither IDNA2008 or any plausible interpretation of
>> IDNA2003 allow them, they raise issues that go beyond
>> "registrar beware", and they do not appear to be on the
>> "troublesome" list. PAs an aside, perhaps it is too paranoid,
>> but every time I see a piece describing emoji as a new
>> language, I wonder whether some future version of Unicode
>> will reclassify them from So and Sk to Lo and Mn.  If we
>> follow the doctrine or just accepting whatever Unicode throws
>> over the wall, that would take us straight to vomiting
>> cowboys and worse.  However, nothing to be fixed here that
>> should take much time; just a strong argument that we fix the
>> text to avoid promising we will accept whatever Unicode
>> throws at us.
>> 
>> I see these as editorial issues -- topics on which the I-D
>> can be changed to say a bit less and be a good deal less
>> controversial about the claims it makes and how those claims
>> might constrain us in the future without making any
>> significant substantive changes.  I think many of those
>> changes should still be made if we conclude that IDNA2008 is
>> frozen and it is all up to Unicode in the future, no matter
>> what that brings.   I think they can be done fairly quickly
>> if people pitch in; somewhat less quickly if some very small
>> number of us need to deal with both these editorial issues
>> and smoothing things over if we want to preserve the option
>> of making technical adjustments to deal with defects in to
>> IDNA2003 or to better explain some of the issues.
>> 
>> best,
>>    john
>> 
>> -- 
>> I18ndir mailing list
>> I18ndir@ietf.org
>> https://www.ietf.org/mailman/listinfo/i18ndir