Re: [I18ndir] One more try

"Patrik Fältström " <paf@frobbit.se> Thu, 21 February 2019 09:33 UTC

Return-Path: <paf@frobbit.se>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F198512D829 for <i18ndir@ietfa.amsl.com>; Thu, 21 Feb 2019 01:33:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.721
X-Spam-Level:
X-Spam-Status: No, score=-1.721 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FROM_EXCESS_BASE64=0.979, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=frobbit.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wYyuqp_JSo56 for <i18ndir@ietfa.amsl.com>; Thu, 21 Feb 2019 01:33:51 -0800 (PST)
Received: from mail.frobbit.se (mail.frobbit.se [85.30.129.185]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A92CE128B36 for <i18ndir@ietf.org>; Thu, 21 Feb 2019 01:33:48 -0800 (PST)
Received: from [169.254.33.207] (unknown [IPv6:2a01:3f0:1:0:884a:9577:1adc:47d9]) by mail.frobbit.se (Postfix) with ESMTPSA id 6924E26C98; Thu, 21 Feb 2019 10:33:45 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=frobbit.se; s=mail; t=1550741625; bh=0secMw8oW0jKjK99H9jxF0VIav4n6r8CA3IQsOzbJpY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IQQ7c+vDT5StaI9zgUoWe7zdFbldtTCmN80YWe8Qkyhla6I3ABsphqnZzBYEBieNr agF4AjoShTOupj1sTh0M1hRXYLIuGqM7Px3ULN8rKetH2rda9wLUxMUCuTixQCLip8 zgLFyJyiNWJ6NPx1f7QL5U2h7Zna1TY442I5x3+0=
From: Patrik Fältström <paf@frobbit.se>
To: John C Klensin <john-ietf@jck.com>
Cc: i18ndir@ietf.org
Date: Thu, 21 Feb 2019 10:33:43 +0100
X-Mailer: MailMate (1.12.4r5597)
Message-ID: <6A6BEFCB-C318-40C8-A2CA-86CC5F11ED38@frobbit.se>
In-Reply-To: <2B91B60DE56B36DD5D667679@PSB>
References: <2B91B60DE56B36DD5D667679@PSB>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_139462B7-06FB-42F7-AB5B-97A532A3DBAF_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/Sdq_UHVDxA_J4wuAjtB1DDdmtko>
Subject: Re: [I18ndir] One more try
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2019 09:33:54 -0000

Let me say I have noted these things (thanks John) and I as editor will take on these as soon as we know which one of the main path forward we choose of for me these two work items:

1. Do changes to draft-faltstrom-unicode11 INCLUDING potential warning signs etc telling "this is an extrapolation of [possibly bad] choices made once upon a time by IETF" to IDNA2008 to Unicode 11 (or 12)

2. Work on the issues with various things found in my document, johns, asmus and many others

I am completely of the view that [2] is the most important thing this group (and IETF) should do, because "just" extrapolating IDNA2008 forward is not possible "forever". We most certainly need something new. Regardless of it is a tweak or something completely new.

The main question *now* I think is whether we should wait with [1] before we do [2], and that I do not see a consensus yet.

I see arguments (including from me now and then) that adherence to IDNA2008 is so darn important that we must move forward, else people will continue to claim IETF is slow, mapping is done according to the list john points to below and even more ugly things.

I also see arguments (including from John and others) that if we do not do (2) now before (1), we will never do (2). And this I can *also* agree with.


Harald, I do not envy you!

Once again, thanks John for a good note.

   Patrik

On 21 Feb 2019, at 9:31, John C Klensin wrote:

> Ok.  I've been accused of just trying to hold things up,
> possibly by making them so complicated, or bundling so many
> things together, to make progress impossible.  That is not my intent.
>
> So let me see if I can boil this down further at the risk of oversimplifying a bit and leaving some possibly-important
> nuances out (indented paragraphs, which are also set off by "((
> .... ))" are examples or speculative comments -- feel free to skip over them is you are confident you understand what I'm
> talking about and don't have time.:
>
> (1)  There are two ways to view the IDNA2008 specs.  One is that we have made our choices, are stuck with them, and that anything we discover or that Unicode does to us, even things that with ordinary IETF specifications would be considered known technical defects, just have to be accepted and dealt with by giving
> advice to registries and hoping they follow it
>
> 	((The questions of who gives that advice, how it is
> 	maintained, in what ways different sorts of registries
> 	are differentiated, etc., are a separate issue from the
> 	current I-D and can be handled whenever (or if) we get
> 	around to it.  So can any clarifications to IDNA2008 to
> 	better identify the role and registries and the
> 	importance of their being careful, conservative, and
> 	responsible as long as those clarifications don't
> 	actually change the specifications or their intent.))
>
> draft-faltstrom-unicode11-07 is aligned with that view.  It
> doesn't say so in so many words (and I think it should if they is what we are going to do), but it does say that some things that have been identified as problems are best just left with whatever categories are determined by the RFC 5891 and 5892
> rules and then sorted out by registries, citing
> draft-freytag-troublesome-characters non-normatively as a
> specific example of such advice and that, whatever we do, we just follow along with Unicode's properties and categories.
>
> The other view is that these are IETF Proposed Standards like all other IETF Proposed Standards.  If we discover technical defects, we address them in a serious way.  Perhaps we fix them.
> IDNA2008 (IMO) anticipated that possibility: code points can be added to the exception list, new contextual rules can be added or old ones modified, or new rule sets can be added to section 2 of RFC 5892.   One can even imagine Unicode making a set of
> additions that would justify adding an entry to the block
> collection in Section 2.4 of RFC 5892.   Or possibly we can
> explain the issue in prose (possibly in 5891 rather than 5892)
> and move on.  In some cases, that explanation might be that
> after careful examination we concluded that the pain and
> suffering that would be caused by the incompatible changes
> needed to get things right so far exceeded the problems that were likely to be caused by the defect that the latter is better just left alone but, if so, we should document that, not
> hand-wave around it. However, if some of the things we discover (or changes or new code points in a new version of Unicode)
> suggest global restrictions at a level of importance equivalent to global restrictions we now impose, the "just like any other standard" principle and the structure of IDNA2008 strongly
> suggests that we should consider modifying the standard and its rules, not ending up in a situation in which the real
> distinction between what is handled as a global rule
> (remembering that IDNA2008 has expectations about global rules being enforced at lookup time, something that cannot possibly be done with suggestions that are applied by individual registries at all levels of the DNS hierarchy.
>
> 	((Let me illustrate with a real example that Patrik's
> 	search did not turn up this time, nor did our search
> 	when RFC 5892 was finalized.  The example is chosen
> 	partially because I'm just getting tired of talking
> 	about non-decomposing code points and associated
> 	combining sequences.  Section 2.4 of 5892 lists (and
> 	disallows) two types of symbols for musical notation,
> 	both essentially European.  Handling them by blocks is
> 	necessary because both notation systems consist of a
> 	combination of basic musical symbols and combining
> 	symbols.   The former have General Category So and would
> 	hence be DISALLOWED without needing a special rule.  But
> 	the others have General Category Mn which, absent other
> 	action, make them PVALID.  Well sometime around Unicode
> 	5.x, coding was introduced for Balinese, including
> 	Balinese musical notation.  It has the same property
> 	relationships as the more traditional (in Unicode)
> 	musical notations -- the base notational symbols have
> 	General Category So and the combining marks are in Mn.
> 	Other than "slipped through the cracks" there is no
> 	rationale that makes sense in an IDNA2008 context for
> 	DISALLOWing the Balinese base musical symbols (along
> 	with the traditional western European and Greek ones
> 	while treating the Balinese musical combining symbols as
> 	PVALID while the others are not.  There is, however, a
> 	Unicode reason: while, in Blocks.txt, the musical
> 	symbols listed in Section 2.4 have their own blocks,
> 	they are folded into a Balinese block with letters of
> 	the writing system for the latter.  If we had noticed
> 	this in 2010, we would have had an interesting
> 	discussion about blocks that should be DISALLOWED for
> 	consistency but that are not named as blocks in
> 	Blocks.txt.   So, maybe we just live with this as an
> 	error on the theory that a registry would be insane to
> 	use those combining marks with anything but Balinese
> 	musical symbols and those are DISALLOWED.  Maybe it
> 	deserves a note somewhere; maybe not.   But now suppose
> 	a proposal for another musical notation comes along that
> 	uses a combination of base symbols and combining marks
> 	and Unicode accepts a proposal to include it.   Suppose
> 	too that the block is big enough that the musical code
> 	points can't be folded into some associated script block
> 	so we end up with a separately-named block for it.   Are
> 	we going to follow the precedent of Section 2.4 and
> 	DISALLOW the block or are we going to follow the
> 	precedent of Balinese and let it go (and hope that
> 	registries get the right advice and follow it)?  Would
> 	what typical graphemes for those code points look like
> 	and how they combine make a difference?   If we accept
> 	the "just follow Unicode" instructions of
> 	draft-faltstrom-unicode11 the answer is clear but it is
> 	less clear that it is the right one.))
>
> There is another kind of mistake we might have made -- using the example above, if the answer for Balinese is going to be the answer for all musical notations we encounter in the future
> (i.e., musical combining characters are PVALID even though the base ones are DISALLOWED), maybe we should be figuring out
> whether including two musical notations but not the others in Section 2.4 was a mistake and we should take those two our for consistency, thereby shifting several code points from
> DISALLOWED to PVALID (and then presumably advising registries to not use them).
>
> 	((I can't imagine that this would be a big issue one way
> 	or the other.  But think about what happens if the next
> 	round of emoji excesses introduces some code points
> 	classified as non-spacing marks.   Are we willing to
> 	have them be PVALID and rely on registries, some
> 	emoji-happy, to do the right thing?   That is a real
> 	question -- I don't know the answer but think we should
> 	be in a position to think about it should the situation
> 	arise, and that makes "just follow Unicode" a bad idea.))
>
> (2) The second choice above -- treating IDNA2008 as an ordinary set of Proposed Standards that may have technical defects that we should be looking to straighten out rather than deciding that anything introduced by changes in Unicode or discovered as our knowledge increases -- does not require delaying
> draft-faltstrom-unicode11 until all of the issues that have been raised are resolve (or until the end of time if that does
> first).    It does require that we go through the text to remove or modify the "just accept whatever Unicode throws over the
> wall" language to make it clear that careful reviews are in
> order.  It also requires that we have some sort of plan about such reviews (not easy given that, even with this directorate, the number of people who have participated substantively are very few, but lots easier than solving the problems and that we can describe the issues that have been outstanding for a few years and either identify them as suitable for leaving to
> registries or ones that should cause some choices in the I-D to be marked as tentative.   I'd imagine that, if more of us can really engage rather than sitting thing out, those things can be done fairly quickly, possibly without holding up the document up at all, because...
>
> (3) All of these issues aside, I don't know how many people in this directorate who are fluent native speakers of English
> (either major variation) have read the document closely, but it appears to ma to need work.   There are some typos that we can probably rely on the RFC Editor to fix (e.g., "recomments" in the Abstract), but there are also some technical errors (e.g., IDNA2008 was not "largely completed" in 2008; that is more or less the year that revision work got underway with "largely
> complately" belonging to 2010) and more complex editorial ones (e.g., changes have not been made to Unicode "related to the   algorithm IDNA2008 specifies" which would imply that those
> changes were made because of IDNA2008 or with IDNA2008 in mind.
> I don't believe either of those are true and believe that Mark would deny it even if they were.  It would be much better and more accurate to say that some Unicode changes have consequences for IDNA2008 (or for its algorithm)).
>
> I also don't believe some of the more substantive statements are accurate.  For example, the paragraph starting "Historically"
> implies that Unicode 6.0 raised substantive issues as serious as those identified in the 7.0 timeframe and later and that we
> decided to just accept them.   I don't recall there being any such issues.  From that standpoint, the key statement in RFC 6452 appears under Security Considerations: "The three code
> points are unlikely to occur in internationalized domain
> names,..."   That is a very specific rationale for accepting the change for those particular code points.  It is quite different from "The primary reason for that choice is that staying with the Unicode Standard has been viewed as important because of the diversity of implementations already existing in the wild.", a statement for which I don't think there is much evidence of
> informed IETF consensus.
>
> There are also several places where the draft omits material that would considerably strengthen it.   For example, under 3.2, there are "interpretations" of UTS#46 that go well beyond "a mix between IDNA2003 and IDNA2008".   As one important group of
> examples,
> https://www.unicode.org/Public/idna/11.0.0/IdnaMappingTable.txt shows many or most symbols, including all of the emoji, as
> valid.  Neither IDNA2008 or any plausible interpretation of
> IDNA2003 allow them, they raise issues that go beyond "registrar beware", and they do not appear to be on the "troublesome" list.
> PAs an aside, perhaps it is too paranoid, but every time I see a piece describing emoji as a new language, I wonder whether some future version of Unicode will reclassify them from So and Sk to Lo and Mn.  If we follow the doctrine or just accepting whatever Unicode throws over the wall, that would take us straight to vomiting cowboys and worse.  However, nothing to be fixed here that should take much time; just a strong argument that we fix the text to avoid promising we will accept whatever Unicode
> throws at us.
>
> I see these as editorial issues -- topics on which the I-D can be changed to say a bit less and be a good deal less
> controversial about the claims it makes and how those claims might constrain us in the future without making any significant substantive changes.  I think many of those changes should still be made if we conclude that IDNA2008 is frozen and it is all up to Unicode in the future, no matter what that brings.   I think they can be done fairly quickly if people pitch in; somewhat less quickly if some very small number of us need to deal with both these editorial issues and smoothing things over if we want to preserve the option of making technical adjustments to deal with defects in to IDNA2003 or to better explain some of the issues.
>
> best,
>    john
>
> -- 
> I18ndir mailing list
> I18ndir@ietf.org
> https://www.ietf.org/mailman/listinfo/i18ndir