Re: [I18nrp] draft-faltstrom-unicode11-04.txt

John C Klensin <john-ietf@jck.com> Wed, 05 December 2018 22:54 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5C2B1130F2C; Wed, 5 Dec 2018 14:54:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id isuD09JRxq_N; Wed, 5 Dec 2018 14:54:39 -0800 (PST)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BC073130F55; Wed, 5 Dec 2018 14:54:38 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1gUg3v-0004NX-LD; Wed, 05 Dec 2018 17:54:35 -0500
Date: Wed, 05 Dec 2018 17:54:28 -0500
From: John C Klensin <john-ietf@jck.com>
To: Nico Williams <nico@cryptonector.com>, =?UTF-8?Q?Patrik_F=C3=A4ltstr=C3=B6m?= <paf=40frobbit.se@dmarc.ietf.org>
cc: i18nrp@ietf.org, iab@iab.org, idna-update@ietf.org
Message-ID: <62C0C9D453720E19EF649173@PSB>
In-Reply-To: <20181011211736.GA2486@localhost>
References: <A23EC543-5DEB-4044-96B7-C983A1BF1E23@frobbit.se> <20181011211736.GA2486@localhost>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/OlZbdqpSd4EM9SUibCx-6QKXIMw>
Subject: Re: [I18nrp] draft-faltstrom-unicode11-04.txt
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Dec 2018 22:54:49 -0000


--On Thursday, October 11, 2018 16:17 -0500 Nico Williams
<nico@cryptonector.com> wrote:

> On Sun, Oct 07, 2018 at 07:53:51AM +0200, Patrik Fältström
> wrote:
>...
>> In short the draft proposes (just like RFC 6452) IETF
>> continue to follow Unicode Standard without adding exceptions
>> to the IDNA2008 algorithm, EVEN IF changes are made that are
>> incompatible to earlier versions of Unicode.
> 
> +1
> 
> We're talking specifically about future incompatible changes in
> either case mappings or normalization.  Correct?

First of all, if you go back and check the IDNA2008 documents
(not just 5892), you will find there is no such limitation.
Let me invent an example while hoping we can avoid a lengthy
argument about how likely it is.  The Unicode Stability rules
require that, one a code point is assigned and properties
assigned to it, some of those properties (not limited to, e.g.,
normalization) are just not ever going to be changed.   Should
they discover they have made a really serious mistake (unless
one believes that UTC is infallible, that is always possible,
however unlikely), the only option would be to assign a new code
point and properties to the abstract character and deprecate the
old code point.  Now, largely because we understood that such a
situation is unlikely, IDNA2008 contains no provision for
dealing with that sort of situation, so we would need to detect
it in the review process and, presumably, do so even though the
algorithm, which knows nothing about Unicode code point
deprecation, would not detect the situation.   And, assuming we
didn't want to allow both code points (the ultimate in
confusable characters), we would need to decide which one to
treat as an exception and update the exception list in 5892
accordingly.

So, no, not just those two sets of cases.

> And to confirm the IDNA2008 / UTS#46 context:

>  - IDNA2008 is hear-no-evil-see-no-evil, says to do no case
> mappings,    relies on users entering domainnames in the
> correct case whatever    that is

I think that characterization is incorrect or at least
misleading.  For example, IDNA2008 doesn't only say "do no case
mappings", it explicitly DISALLOWS upper case characters, etc.  

>  - UTS#46 says to lower-case first, then apply the rest of
> IDNA2008
> 
> Correct?

No, UTS#46 also does a number of other things.  As two examples
that have stirred up controversy, it disallows a few characters
that IDNA2008 treats as PVALID (I might advise a registry to
permit those characters only with great care, if at all, but
that is different from disallowing them -- see comments about
registry restricts and behavior elsewhere in these threads).
And it allows a number of code points that IDNA2008 DISALLOWS,
primarily some of those in the "So" General Category, including
specifically emjoi.

>...
> The choices for IETF when things like this happens are:
>> 
>> 1. Keep IDNA2008 with no exceptions
>> 
>> 2. Keep IDNA2008 with exceptions
>> 
>> 3. Stop referring (directly) to Unicode as it is not stable
>> enough
> 
> (3) has got to be out as it amounts to hard-forking Unicode.
> A fork of Unicode would be momentous and extraordinary, and
> not to be undertaken lightly!  It might be useful to think
> about how we'd go about forking Unicode process-wise, but I
> think any analysis will convince us that we should not
> actually do it.

I am not quite sure what Patrik had in mind with (3) but I think
there are options that would make "hard-forking Unicode"
hyperbolic at best.  In particular, while I don't believe IETF
has the resources to competently do so (an issue tied up with
the "directorate" and "inability to process documents" topics),
one could conceive of an IETF-specific normalization technique
and associated tables that could be references in lieu of the
Unicode ones.  That has been discussed several times.  It may
well be a bad idea even independent of the resource issue, but
it would stop a lot sort of a hard fork in Unicode, if only
because we would not be assigning new code points.

> We should, however, use our liason to communicate to the UC our
> displeasure as to backwards-incompatible changes, and that
> we'd like the UC to at least consider adding a stability
> attribute to every codepoint. If there were a stability
> attribute, then we could exclude codepoints whose stability
> the UC will not guarantee.

Yes.  Sure.  I hope I can ask whether you would like your
supplemental pony delivered by a flock of flying pigs without
getting myself in trouble.

> Future codepoint assignments could all begin life as unstable
> and gain stability with experience.

Curiously (and one reason for the above snarky comment), this
would take us back to a very early proposal for what became
IDNA2008.  In that proposal, rather than dividing code points
into "PVALID" and "DISALLOWED" (and a few special-case
categories), the main division would include "PROBABLY-YES" and
"PROBABLY-NO" categories that would be assigned to anything that
might be expected to change or that was uncertain for other
reasons (such as being from a script for which there were no
living active users) and impose additional restriction on  their
use.  That idea was pulled out of the spec largely at the urging
(and claims it would be confusing and was completely
unnecessary) but people from the Unicode Consortium.
 
> We could, perhaps, implement codepoint stability attributes at
> the IETF alone if the UC balks.  I would be OK with that
> provided it was simple and simple to automate -- I don't think
> we want to be in the business of inspecting every new
> codepoint assignment and guessing a stability level for it, so
> nothing much more complicated than time-since-assignment
> should be used.

See above.  Been there.  Of course we could revisit it, but that
would, at this point, be a very significant change in
IDNA2008.... and it would add another layer of difficulty to the
issue of getting documents reviewed.  I don't know whether it
would be more or less problematic than an IETF-specific
normalization technique or normalization profile.

>> Probably more choices than these...
>> 
>> My proposal is [1], together with a more forceful push to
>> strict IDNA2008 adoption. No IDNA2003, no UTS#46, no homebrew
>> mixes.
> 
> I would much prefer that we adopt UTS#46.

I await your I-D with great interest, including how you expect
to handle emoji string-matching and making sure that the rules
about which combining and modifier sequences are allowed are
clear to registrants and zone administrators.   Or perhaps you
are proposing a profile of UTS#46, but I'd assume that would
also raise either forking problems or a review procedure as
least as complicated as that now called for in IDNA2008.

>...
>> This very fast turn into a process issue where IAB have asked
>> for progress, and IETF is to deliver.
>> 
>> And this is where I take a step back, and want IESG and IAB
>> to make up their mind. This group was created for I18N
>> issues, we have a charter, but then what? Is this where
>> drafts like these should be discussed?
> 
> I believe we need IETF consensus on these matters, but the IAB
> must be involved.  Any notion of forking Unicode should
> require buy-in from the IAB.

Out of curiosity, why?  There is nothing in the selection
process for IAB members that guarantees a supply of expertise in
these matters, even sufficient expertise to evaluate the
external consequences of such a decision.  Even if the current
IAB had that expertise, there is nothing in the Nomcom process
that guarantees for the future.  The IAB's shutting down of the
I18N Program came with an assertion (almost certainly true) that
the Program wasn't getting anything done and an implicit
assertion that the IAB didn't know how to fix that except by
creating new structures that would cover i18n issues (if any
progress has been made on that, it has not been discernable
outside the IAB).   So why do you think the IAB "must be
involved" or is even, anymore, relevant?

 john