Re: [Idna-update] Genart telechat review of draft-faltstrom-unicode11-08

"Patrik Fältström " <paf@netnod.se> Tue, 19 March 2019 06:47 UTC

From: Patrik Fältström <paf@netnod.se>
To: John C Klensin <john-ietf@jck.com>
Cc: Dan Romascanu <dromasca@gmail.com>, gen-art@ietf.org, draft-faltstrom-unicode11.all@ietf.org, Alissa Cooper <alissa@cooperw.in>, Barry Leiba <barryleiba@computer.org>, idna-update@ietf.org, ietf@ietf.org
Date: Tue, 19 Mar 2019 07:47:01 +0100
Message-ID: <EA2B2A09-152C-4AF3-B0C8-0D352CCA6647@netnod.se>
In-Reply-To: <458987D953A5B3227D3A791F@PSB>
References: <155289429627.26188.2047331005281292889@ietfa.amsl.com> <458987D953A5B3227D3A791F@PSB>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_1EE76ECD-14DD-47D4-AD5D-791ADBD6DB6E_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/ZPjHZZUTy2wyboHoGK90FNLl8Gk>
Subject: Re: [Idna-update] Genart telechat review of draft-faltstrom-unicode11-08
Precedence: list

On 18 Mar 2019, at 18:21, John C Klensin wrote:

> (2) In a more perfect world, the review called for by RFC 5892 and reflected in RFC 6452 and this I-D would not be necessary at all.  Unicode would announce a new version, assorted entities would calculate tables for their use with that version, and
> everyone would go merrily on their way.  The IDNA-update WG was painfully aware of the imperfection of the world and, for this particular case, was worried about two possibilities (both of which we hoped would be very, very rare).  One was that, for a new version of Unicode, the Unicode Consortium would make
> changes in the properties or category values they assigned to particular code points in a way that would change the derived properties calculated by the RFC 5892 algorithm for those code points.  This I-D primarily reflects a review designed to detect such changes.  As I read RFC 5892, if such cases are detected, the normal response is to introduce additional backward
> compatibility exceptions into the list in RFC 5892 to preserve stability (Patrik reads that a bit differently than I do).

Correct, and let me try to explain this in slightly different words. And yes, we do read 5892 slightly differently, but I claim the outcome is the same. Let me come back to that.

1. We MUST remember that it is the algorithm in 5892 that is normative. Not any table of derived property values. As John wrote which I agree with, having IANA having a table with non-normative data was probably a very bad idea. And maybe that registry should simply be removed. It just confuses people. If it confuses people on IESG, of course it confuses other people as well. Maybe the description of the table at IANA should be changed? Let me think and discuss with IANA.

2. The whole idea with the algorithm was that the property values that the algorithm uses would never ever be changed as Unicode Consortium said those would be stable. At least the property values IETF did use in the algorithm. So having the algorithm normative, and "just" let Unicode consortium release new versions of Unicode was EXACTLY one design criteria for IDNA2008 that is vastly different from the "pick code points" that IDNA2003 did.

3. RFC 5892 was based on Unicode 5.2.0, and since then Unicode have released versions 6.0.0, 6.1.0, 6.2.0, 6.3.0, 7.0.0, 8.0.0, 9.0.0, 10.0.0, 11.0.0, 12.0.0 and a pre-relase of 12.1.0. Of all of these versions, two include incompatible changes: 6.0.0 and 11.0.0.

4. As John wrote, the simple solution for the IDNA2008 standard to move forward would be to change the algorithm so that the IDNA2008 algorithm when applied to any Unicode version would produce the same result, and indeed did we think about that when doing RFC 5892. See section 2.7 <https://tools.ietf.org/html/rfc5892#section-2.7>. By adding code points and explicit derived property values to rule set [G], we would have backward compatibility.

5. When 6.0.0 was released, I as appointed expert did flag that for IETF (I do not remember exactly how, but I might be able to dig up those email messages) and after some discussions the conclusion was that rule set [G] should NOT be updated. I think I was convinced that at that time some software libraries where changed, and it was more important to get IDNA2008 out the door and we had the risk of two different incompatibilities:

5.1. If we added a rule to [G], the incompatibility would be between software packages that did use the original version of 5892 algorithm without the added rule and the ones which did have the updated rule.

5.2. If we did not add a rule to [G], the incompatibility would be between software packages that calculated the derived property value on Unicode 5.2.0 and the ones that did the calculation on 6.0.0.

Ultimately [5.2] above did win, and that was explained in RFC 6452. And I as the appointed expert and editor of 5892 I feel good having the discussion about this incompatible change that *would never happen* recorded in 6452.

6. The fact it was an issue was of course also brought up to the Unicode Consortium as I am (handy enough) also the liaison from IETF to Unicode Consortium. No response. They do push, as we know, TR#46 which is a different kind of animal than IDNA2008. Oh well. Things where fine, and we moved on. And IETF could focus on what IETF really should focus on, to ensure the algorithm itself was still "ok". I.e. without looking at individual code points, but more in general terms, was IDNA2008 still good enough? Lots of good feedback from people like Asmus that was fighting like mad (and still is) trying to a. convince registries they MUST come up with a subset of IDNA2008 permissible code points when they decide what can be used in their zones and b. come up with processes and rules for how that work should be done. Specifically work did go on for a long time regarding what code points can be used in the root zone, i.e. for TLDs. This is managed in the IAB document, and the Klensin draft about and more.

6. Now in version 11.0.0, the non-backward-compatible change happened again. Sigh. I was at that time on my way to just give up, and recommend IETF that what we do today, to reference something that we thought was normative, stable etc (i.e. the Unicode Standard) did not work. That IETF instead would pick up and use some relationship with ISO that should be formalized, and instead of referencing the Unicode Consortium IETF should reference ISO, and then ISO should have in its rules that backward compatibility would be a requirement. If then Unicode did ship something to ISO for "approval", then ISO would say "no, try again" or else ISO would make a process violation that members could appeal within ISO. But after sleeping on this for a bit, I felt the issue with 11.0.0 also happened with 7.0.0 and maybe one should first have a discussion within IETF again before the bolts are blown regarding relationship with Unicode Consortium. That just started when the I18N directorate was closed, and I ended up in a void. After long time, pushing and what not (lets discuss in the bar) this group finally was created were this VERY SERIOUS ISSUE could be discussed, and as John explained, there are many things to discuss, where we can boil it down to two:

A. Should the non-backward compatible change in 11.0.0 result in a change in rule [G] in RFC 5892, or should we accept a non-backward compatible change? To trigger the discussing, I proposed the same result as for 7.0.0, to NOT update [G], and simply accept it. This also because the next issue is the important one.

B. What should we do with IDNA2008? Obviously Unicode Standard is not stable enough. Or is it? What should we do with review? Should we have to start do what Martin just did with 12.0.0? Do we IETF have the expertise? Can we rely on individuals like Martin, Asmus, myself and John and very few more to be around? Can we rely on Unicode Consortium? Is this the time to instead move to ISO? Like keeping IDNA2008 algorithm BUT tie it to ISO approved charset, and then ask ISO to protect the backward compatibility? Or are we done, so that we simply freeze IDN to a specific Unicode version, and simply ignore all added code points after that version? I.e. at 5.2.0 we knew some interesting code points where still to be added, but after 11.0.0?

What I now know is that IESG have told me that we do have agreement in IETF on draft-faltstrom-unicode11-08.txt which implies moving to IDNA 11.0.0 without adding things to [G] is the path forward for now.

What to do with 12.0.0 and future versions is still up in the air.

See (some of) you in Prague.

See you in the bar...

   Patrik

Attachment: signature.asc

Re: [Idna-update] Genart telechat review of draft… John C Klensin
Re: [Idna-update] Genart telechat review of draft… Patrik Fältström
Re: [Idna-update] Genart telechat review of draft… Asmus Freytag
Re: [Idna-update] Genart telechat review of draft… Martin J. Dürst
Re: [Idna-update] Genart telechat review of draft… Asmus Freytag (c)
Re: [Idna-update] Genart telechat review of draft… Michel Suignard
Re: [Idna-update] Genart telechat review of draft… Martin J. Dürst
Re: [Idna-update] Genart telechat review of draft… Martin J. Dürst
Re: [Idna-update] Genart telechat review of draft… Martin J. Dürst
Re: [Idna-update] Genart telechat review of draft… Asmus Freytag

Re: [Idna-update] Genart telechat review of draft-faltstrom-unicode11-08

Attachment: signature.asc