Re: [Idna-update] [Ext] FWD: Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>

"Patrik Fältström " <paf@frobbit.se> Tue, 06 March 2018 10:31 UTC

Return-Path: <paf@frobbit.se>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A22A4126CF6 for <idna-update@ietfa.amsl.com>; Tue, 6 Mar 2018 02:31:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.611
X-Spam-Level:
X-Spam-Status: No, score=-2.611 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dEwNpH54KyPa for <idna-update@ietfa.amsl.com>; Tue, 6 Mar 2018 02:31:04 -0800 (PST)
Received: from mail.frobbit.se (mail.frobbit.se [85.30.129.185]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0A546126FDC for <idna-update@ietf.org>; Tue, 6 Mar 2018 02:31:03 -0800 (PST)
Received: from [172.20.10.2] (unknown [IPv6:2a02:aa1:1002:1a12:2165:4e06:865b:c89e]) by mail.frobbit.se (Postfix) with ESMTPSA id 1E90022971; Tue, 6 Mar 2018 11:31:00 +0100 (CET)
From: Patrik Fältström <paf@frobbit.se>
To: Suzanne Woolf <suzworldwide@gmail.com>
Cc: Andrew Sullivan <ajs@anvilwalrusden.com>, John C Klensin <john-ietf@jck.com>, idna-update@ietf.org, Kim Davies <kim.davies@icann.org>
Date: Tue, 06 Mar 2018 11:30:58 +0100
X-Mailer: MailMate (2.0BETAr6104)
Message-ID: <692038A9-6BDE-43FF-BFAE-03C9A320731D@frobbit.se>
In-Reply-To: <7BE50D38-969D-422A-AF0F-C58B442472FE@gmail.com>
References: <0AAE384126E73857E6EEC32C@PSB> <20180305191527.GA99731@KIDA-6861.local> <822FD6FA-4FA5-449D-9491-01315DB57A9E@frobbit.se> <161f7c23760.2772.55b9c0b96417b0a70c4dcaded0d2e1c6@anvilwalrusden.com> <9A04CF8C-DF86-4562-8AC0-21EF0FF539FF@frobbit.se> <7BE50D38-969D-422A-AF0F-C58B442472FE@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_8451FF48-B164-4707-89E0-B222F52DFD17_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/CaskuNvzCv1xBJq7jPAes-AMVvU>
Subject: Re: [Idna-update] [Ext] FWD: Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\) implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>, <mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>, <mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Mar 2018 10:31:08 -0000

Suzanne, others,

In general I agree with the comments from John.

Here is the mail I did send (scroll down to see my comments today):

Forwarded message:

> From: Patrik Fältström <paf@frobbit.se>
> To: Internet Architecture Board <iab@iab.org>, i18n@iab.org, John Klensin <klensin@jck.com>, Pete Resnick <presnick@qti.qualcomm.com>, Andrew Sullivan <ajs@crankycanuck.ca>
> Subject: Fwd: PRELIMINARY  APPEAL -- Re: [i18n] Exper Review - New Release of the Unicode Standard, Version 7.0.0
> Date: Wed, 9 Jul 2014 06:03:57 +0200
>
> As the appointed expert for this IANA registry I reacted by pulling the hand brake when John sent his email.
>
> I have also clarified with IANA what that implied, i.e. nothing more than that I did "revoke" my earlier decision and that they should wait for me to give them more clear instructions. See below.
>
> I have myself had a discussion with John on the matter and I do understand his point. I have recommended him to propose something concrete in the form of an I-D that explain the issue and the concrete actions (which should be to add a contextual rule, exceptions in the form of Disallowing certain code points, or both) that can be discussed. Consensus on such an I-D say what to do.
>
> Unfortunately we are in the quiet period of I-D due to the upcoming IETF meeting, so the I-D might take a few weeks (I am unclear on todays rules).
>
> If IAB do want some other action, do not hesitate contacting me to discuss the issue.
>
> To summarize:
>
> - Unicode has released a new version, 7.0.0, of the Unicode Code Points.
>
> - When applying the algorithm in the current RFCs of IDNA to the new table, myself and IANA did reach the same result.
>
> - When inspecting the set of valid and non valid code points, no problematic backward compatibility is found.
>
> - Because of this, the derived property values could be released as calculated.
>
> - BUT, what is found is that there is (at least) one new code point (U+08A1) added that also can be represented by two other code points (U+0628 and U+0621), but no normalization rule is added for them. Equivalence rules exists for other such triples, for example U+0623, U+0627 and U+0254 (below from Unicode 7.0.0):
>
> 0621;ARABIC LETTER HAMZA;Lo;0;AL;;;;;N;ARABIC LETTER HAMZAH;;;;
> 0623;ARABIC LETTER ALEF WITH HAMZA ABOVE;Lo;0;AL;0627 0654;;;;N;ARABIC LETTER HAMZAH ON ALEF;;;;
> 0627;ARABIC LETTER ALEF;Lo;0;AL;;;;;N;;;;;
> 0654;ARABIC HAMZA ABOVE;Mn;230;NSM;;;;;N;;;;;
> 08A1;ARABIC LETTER BEH WITH HAMZA ABOVE;Lo;0;AL;;;;;N;;;;;
>
> - Unicode consortium refer to explanation of Hamza in the "Combining Hamza Above" subsection, pp. 263-264 in <http://www.unicode.org/versions/Unicode6.2.0/ch08.pdf> (version 7.0.0 of this is not released yet) and in particular, Table 8-11. They say that U+08A1 will be added to that table explicitly in the Unicode 7.0 version of the core specification.
>
> - Question for IETF is whether this addition of U+08A1 should result in:
>
> 1. No action
>
> 2. Addition of U+08A1 to the list of exceptions with derived property value DISALLOWED
>
> 3. A contextual rule implementing the so far non-existing table that explains use of Hamza, which is in table 8-11 in Unicode 6.2.0.
>
> I am waiting for a consensus call on this, and do only see this being possible by having John Klensin proposing an I-D that is explaining the situation and suggesting one outcome (possibly 2). I have myself requested to be co-author of this I-D for conformity to the set of RFCs that do describe the algorithm for the derived property value.
>
> - In the mean time I have asked IANA to wait for me to give them further instructions. If IAB do believe some IANA action should happen already now, please let me know.
>
>    Patrik
>
> Begin forwarded message:
>
>> From: Patrik Fältström <paf@netnod.se>
>> Subject: Re: PRELIMINARY APPEAL -- Re: [i18n] Exper Review - New Release of the Unicode Standard, Version 7.0.0
>> Date: 8 juli 2014 22:54:33 CEST
>> To: Pearl Liang <pearl.liang@icann.org>
>> Cc: Michelle Cotton <michelle.cotton@icann.org>
>>
>> You only have to send it to me. What I do is to inform IAB and i18n list. In this case they did pull the hand brake. Not your problem. They pulled, and I informed you to stop the process (revoked my earlier decision). You should just wait for me (again).
>>
>> We might get an I-D with proposal on what to do, but we are due to the IETF in a silent period for new drafts. :-(
>>
>>   Patrik
>>
>> On 8 jul 2014, at 22:35, Pearl Liang <pearl.liang@icann.org> wrote:
>>
>>> Further, I wonder if this "additional" review will become part of the
>>> "expert review" from now on before we can post a new unicode version.  As
>>> now, I only need to send the request to Patrik for approval.
>>>
>>> Thanks,
>>> ~pl
>>>
>>>
>>> On 7/8/14 12:24 PM, "Michelle Cotton" <michelle.cotton@icann.org> wrote:
>>>
>>>> Patrik,
>>>> What do we need to do here?
>>>> We will wait to make any changes until this is sorted out.  Is that
>>>> correct? Let us know.
>>>> Thanks, Michelle
>>>>
>>>> Sent from my iPhone
>>>>
>>>>> On Jul 8, 2014, at 9:22 AM, "Pearl Liang" <pearl.liang@icann.org> wrote:
>>>>>
>>>>> Michelle,
>>>>>
>>>>> Cc: Patrik
>>>>>
>>>>> FYI, John escalated this.
>>>>>
>>>>> So, our main table is still with version 6.3.0.
>>>>>
>>>>> Thanks,
>>>>> ~pl
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On 7/7/14 3:22 PM, "John C Klensin" <klensin@jck.com> wrote:
>>>>>>
>>>>>> Attention AppsADs and IANA: Please treat this as a preliminary
>>>>>> appeal of Patrik's decision to allow the IANA tables to be
>>>>>> updated.  I will formalize that with a note to Jari if necessary
>>>>>> after we have the discussion suggested below.
>>>>>>
>>>>>> --On Monday, July 07, 2014 20:54 +0200 Patrik Fältström
>>>>>> <paf@netnod.se> wrote:
>>>>>>
>>>>>>> From Ken:
>>>>>>>> Patrik,
>>>>>>>>
>>>>>>>> O.k. The hamza above is an explicit edge case, where a
>>>>>>>> determination needs to be made in each instance.
>>>>>>>>
>>>>>>>> See the "Combining Hamza Above" subsection, pp. 263-264 in:
>>>>>>>>
>>>>>>>> http://www.unicode.org/versions/Unicode6.2.0/ch08.pdf
>>>>>>>>
>>>>>>>> and in particular, Table 8-11. U+08A1 will be added to that
>>>>>>>> table explicitly in the Unicode 7.0 version of the core
>>>>>>>> specification.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> --Ken
>>>>>>>
>>>>>>> I read Ken's response as if we already have such situations
>>>>>>> for Hamza.
>>>>>>>
>>>>>>> If an exception is to be added, that requires RFC action.
>>>>>>
>>>>>> Ok.  In the absence of other information, I think Ken is
>>>>>> referring to something called the Normalization Stability Rule.
>>>>>> To review earlier notes that none of this contradicts, part of
>>>>>> that rule, as I understand it, requires that, if a new
>>>>>> precomposed abstract character is added that previously could
>>>>>> have been assembled with a base character and a combining one,
>>>>>> NFC of the new character is required to decompose back the prior
>>>>>> sequence.  The other part of that rule is the one I think Mark
>>>>>> refers to: if the combining sequence normalized to itself in
>>>>>> version N, it cannot normalize to anything else (and, in
>>>>>> particular, the new precomposed character) in version N+1.
>>>>>>
>>>>>> Now, let's call that second part "normalization stability",
>>>>>> i.e., once a code point or sequence exists and has a
>>>>>> normalization the latter doesn't change.  That is elementary and
>>>>>> not at issue here.  And, of course, unassigned code points don't
>>>>>> have normalizations (or much of any other properties) -- that
>>>>>> was precisely the reason why IDNA2008 couldn't allow them to be
>>>>>> processed or looked up.
>>>>>>
>>>>>> The first part, however, is about "keeping normalization working
>>>>>> consistently".  The main purpose of normalization is to allow
>>>>>> different ways to represent "the same string" to compare equal
>>>>>> (and collate in similar ways, etc., but that is secondary).  If
>>>>>> a precomposed character and a combining sequence represent the
>>>>>> same abstract character and they don't both normalize to the
>>>>>> same thing (same thing under NFC, same thing under NFD, even
>>>>>> though those two sets of results may be different), then
>>>>>> normmalization does not work consistently for its intended
>>>>>> purpose, i.e., fails in that case.
>>>>>>
>>>>>> And there are a whole series of late-added precomposed
>>>>>> characters in earlier versions of Unicode where precisely that
>>>>>> "decompose the new precomposed character back to the sequence"
>>>>>> rule was included in NFC, so this case is neither new nor
>>>>>> theoretical.
>>>>>>
>>>>>> Now, I don't know what gets Ken to "edge case".  No special
>>>>>> treatment would be needed if the [new] precomposed character
>>>>>> always looked different from the combining sequence.  The choice
>>>>>> of name for the precomposed character would be stupid, but that
>>>>>> would be the _only_ problem.  But I've checked with a couple of
>>>>>> users of Arabic script and they assure me that, as far as they
>>>>>> know/ can tell, the two forms will normally yield exactly the
>>>>>> same printed results.  We'd have an edge case if, for example,
>>>>>> the considerable rendering differences between Arabic Script as
>>>>>> used in Arabic language and Arabic Script as used in
>>>>>> Perso-Arabic languages produced a rendering difference.  Again,
>>>>>> I don't even pretend to be an Arabic expert, but that would
>>>>>> constitute an edge case of the "sometimes" variety and, to me,
>>>>>> it would be lots safer to apply the "decompose the new
>>>>>> precomposed character" rule than not.  More on that below.
>>>>>>
>>>>>>
>>>>>> --On Monday, July 07, 2014 21:21 +0200 Patrik Fältström
>>>>>> <paf@netnod.se> wrote:
>>>>>>
>>>>>>>> Well, yes, but it also reads as though U+0654 should have
>>>>>>>> required a contextual rule or something.  At least, it does
>>>>>>>> to me.
>>>>>>
>>>>>> Again, I am not an Arabic script expert and won't even pretend,
>>>>>> but this seems to me to have nothing to do with contextual
>>>>>> rules.  We use those rules to restrict the use of characters to
>>>>>> some specific script context when they might plausibly be
>>>>>> applied elsewhere or look like something used elsewhere.  I
>>>>>> suppose we could use a contextual rule to prevent the use of a
>>>>>> combining Hamza with Latin or Han script, but we tend to assume
>>>>>> that things like that would look so stupid and out of place that
>>>>>> no such precautions are needed.  In either event, it wouldn't
>>>>>> help with this case which is, as explained above, entirely about
>>>>>> normalization stablility and predictability.
>>>>>>
>>>>>> There is a possible exception under which we could use a
>>>>>> combining rule, but I think it would be bad news and a horrible
>>>>>> kludge (see [1] below).
>>>>>>
>>>>>>> Yes, that was my point. What has happened in 7.0.0 is not a
>>>>>>> new situation. It might be that we missed the Hamza when
>>>>>>> discussing the contextual rule(s) and if that is the case, a
>>>>>>> contextual rule is what we should add (should have added). Not
>>>>>>> exception rule.
>>>>>>
>>>>>> It is not a new situation and it is not, as discussed above, a
>>>>>> contextual rule one either.  The only thing that is different is
>>>>>> that UTC has decided to not apply their existing/historical
>>>>>> "make normalization keep working right rule".  If we do nothing
>>>>>> and unless _every_ valid rendering of the new precomposed
>>>>>> character will look significantly different from _every_ valid
>>>>>> rendering of the combining sequence, their failure to apply that
>>>>>> rule means that we could have two perfectly valid and
>>>>>> identical-looking DNS labels, in the same script, etc., that do
>>>>>> not compare equal.  IDNA was designed to avoid that case and
>>>>>> categories F (Section 2.6 of RFC 5892) and G (Section 2.7) are
>>>>>> there precisely to deal with the "Unicode messed up and we need
>>>>>> to apply a different rule than their properties alone would give
>>>>>> us" cases.  In this case, the only rules that don't cause other
>>>>>> problems would be to either ban the use of the newly-added
>>>>>> precomposed character(s) (I can't remember from April whether
>>>>>> there is more than one) entirely or to
>>>>>> ban the combining sequences.  The disadvantage of the first is
>>>>>> that it will cause some inconvenience going forward; of the
>>>>>> second that it might make a now-valid label invalid, something
>>>>>> that we promised to not do again.
>>>>>>
>>>>>> There is a small procedural problem, which is that Category G,
>>>>>> where I think this really belongs, says "changes in property
>>>>>> values" rather than "changes in property values or normalization
>>>>>> screwups".  To make that a little more complicated, should we
>>>>>> want to pick the second options, I don't believe we have
>>>>>> anywhere that we ban a particular combining sequence without
>>>>>> completely banning the use of one or more of the characters that
>>>>>> make it up [1].  But nothing would prevent our writing a new
>>>>>> document that updates 5892 by adding this (these) new
>>>>>> precomposed characters to the Category F "Disallowed" list,
>>>>>> using the discussion above as a basis.
>>>>>>
>>>>>> If that is what we want to do, I'm willing to write the I-D.  If
>>>>>> not, I see no way to forestall this action other than to
>>>>>> generate a formal appeal, one that would necessarily call for
>>>>>> changes in the review process for these revisions and tables.
>>>>>>
>>>>>> regretfully, but I think the precedent here is really important.
>>>>>>
>>>>>> john
>>>>>>
>>>>>>
>>>>>>
>>

So, where are we and what to do?

I think:

1. A decision "JUST" must be made regarding 7.0.0 according to my email above. Of course updated to the situation today (as John wrote) and findings by others (as Asmus said) that though are just "different positions" and one of the reasons I see "it is hard to find consensus". And THAT is ultimately what IAB must make a very very very careful decision about.

2. ICANN Board finally made a decision to be strict regarding IDNA2008, but the question then ends up being "how to ensure it is implemented"? ICANN Board have now reached out to IAB (and a few others -- SSAC included) and I saw IAB asked for people a few days ago. I will present the issue with ccTLDs (and emojis) that John explained where ultimately who controls a ccTLD is the hidden question.

I think, if IAB *DO* believe (at least until we have something better), they:

A. MUST make a recommendation/decision regarding how to move forward with IDNA2008/Unicode

B. MUST take the flag and advocate with other players (W3C, ICANN etc) strict implementation of IDNA2008

And if IAB do not want to take this decision "just like that" because of risks John lay out between the lines (no new Kobe) then IAB I think should discuss carefully, now with the help of ICANN, "how to move forward". Maybe the group ICANN Board asked for might help? I do not know.

But the situation is pretty serious. For many reasons. And I must say (positively) that the last couple of messages in this thread lay out the issue in a relative good way.

   Patrik

On 5 Mar 2018, at 22:23, Suzanne Woolf wrote:

> Patrik,
>
> That was before my time on the IAB, sorry.
>
> But as it happens, I’m looking at an update to the IAB statement, so have been pondering this very question. Do you think that https://www.ietf.org/id/draft-klensin-idna-5892upd-unicode70-05.txt <https://www.ietf.org/id/draft-klensin-idna-5892upd-unicode70-05.txt> covers the options reasonably well?
>
>
> Thanks,
> Suzanne
> ("member of, not speaking for, the IAB")
>
>> On Mar 5, 2018, at 3:22 PM, Patrik Fältström <paf@frobbit.se> wrote:
>>
>> I did send the question, and gave choices, to IAB on July 9 2014.
>>
>> Message-Id: <93768164-79F7-443D-8A7F-0411661B6EF9@frobbit.se>
>>
>> Since then the question has been stalled waiting for consensus on how to move forward.
>>
>>   Patrik
>>
>> On 5 Mar 2018, at 21:02, Andrew Sullivan wrote:
>>
>>> Does the IAB know that, and have an opinion?
>>>
>>> A
>>>
>>> -- 
>>> Please excuse my clumbsy thums
>>>
>>>
>>>
>>> ----------
>>> On March 5, 2018 14:38:48 Patrik Fältström <paf@frobbit.se> wrote:
>>>
>>>> On 5 Mar 2018, at 20:15, Kim Davies wrote:
>>>>
>>>>> Quoting John C Klensin on Monday March 05, 2018:
>>>>>
>>>>>> Given that there has been no discussion of this draft in the
>>>>>> last few months, that there seems to be no interest in it in the
>>>>>> IESG (although we have not pressed hard on it specifically), and
>>>>>> that the possibly-complementary
>>>>>> draft-freytag-troublesome-characters draft has already expired,
>>>>>> I'm just going to let this expire unless someone gives me a good
>>>>>> reason why not in the next few minutes (if a revision is not
>>>>>> posted today, the draft will expire because, without special
>>>>>> permission, nothing can be posted between tomorrow and the start
>>>>>> of IETF).
>>>>>>
>>>>>> I hope no one interprets this as complete lack of interest in
>>>>>> pursuing IDN issues, or even clarification of the IDNA
>>>>>> standards, in the IETF.
>>>>>
>>>>>
>>>>> With the expiry of these efforts, I am concerned more generally that the standing IAB guidance in
>>>>> https://www.iab.org/documents/correspondence-reports-documents/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/
>>>>> has kept the pre-computed IDNA tables published at IANA downgraded to Unicode 6.3.0 for a number of years. It seems appropriate to revisit publishing tables against contemporary Unicode editions if there is no active work on publishing clarifying RFCs in this regard.
>>>>
>>>> Speaking as the by IAB appointed expert that do approve new tables, I simply can not unless there are some guidance and agreement in the IETF on how to move forward.
>>>>
>>>> The few people involved that spend time on these issues makes it hard to draw conclusions on "the right path forward". This given the situation we are in, as described by John.
>>>>
>>>>   Patrik
>>>>
>>>>
>>>>
>>>> ----------
>>>> _______________________________________________
>>>> IDNA-UPDATE mailing list
>>>> IDNA-UPDATE@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/idna-update
>>>>
>> _______________________________________________
>> IDNA-UPDATE mailing list
>> IDNA-UPDATE@ietf.org
>> https://www.ietf.org/mailman/listinfo/idna-update