Re: [Idna-update] Genart telechat review of draft-faltstrom-unicode11-08

Asmus Freytag <asmusf@ix.netcom.com> Tue, 19 March 2019 07:55 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5CAD6131222 for <idna-update@ietfa.amsl.com>; Tue, 19 Mar 2019 00:55:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Level:
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dW5vnc0QpwSX for <idna-update@ietfa.amsl.com>; Tue, 19 Mar 2019 00:55:53 -0700 (PDT)
Received: from elasmtp-mealy.atl.sa.earthlink.net (elasmtp-mealy.atl.sa.earthlink.net [209.86.89.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 64D001311F8 for <idna-update@ietf.org>; Tue, 19 Mar 2019 00:55:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1552982153; bh=mTr0NHz296NhuVLEwrtiQrVoNRG74B16ollr Pk/F4cY=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=cFlrSZ3bri7W7i1bsbjyNAw3LFb8zcl8o yv5QrMcbmHY/AjRsvYT2VWbdP1DmP37cBhLk6X4bAsuaQWGGIHv1VaPiBLHo6/67pwm 5wxrYBsrQlJSuHpQNJQFY7NC/bDmOl4F458Yvw+Iw9Jx8742CEXeZhf3BZEpZw3oeuO HxldwMEwTKfo0625E95jkJCR7f49YdqC4jl0hgZd/NRy7ND3ng4hsLCZk60x+lL+1/8 kZz//NmsVNB0BOr/gHDHNzLQtaGVaxJekgDfonXn1GR5BI1p7gcxljc2WOmefNDgP7X FVEaSie2WBlqJHhqGdWzBBM49ImWe/qI73rlPyLiw==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=nFTTm7cK7sWMzEX07MPA6Yf90CrwQscdE+QRUwmXIRPzrajHN82eYEYE3WYn0V50aDXOTXZRrGvSg9WRiTByfPDGz12s21rpnwot4uz0pSPgdikvbTX+GTSwJqQTuAbqEOlzGlsTELHxRbehOJnHVv4jtnnTBx0mMdWBEOgUtgu7oFwrXLxn8pnSoa0wZs7qNUiJwTzibNOrjan26LYO2hUF5NMMcMGl1O+qB7ykC8cEwUa0qdmm6HlpJ5GLtoovCGvqb2p3W/AYNzC0abNh0icglrELlNHnnpxvy05gSFatRVGzHSHz66/lUhzXLGIwvFU2K6iO84mmitui7e/dYw==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [97.113.245.20] (helo=[192.168.0.5]) by elasmtp-mealy.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1h69bC-0008VH-Vd for idna-update@ietf.org; Tue, 19 Mar 2019 03:55:51 -0400
To: idna-update@ietf.org
References: <155289429627.26188.2047331005281292889@ietfa.amsl.com> <458987D953A5B3227D3A791F@PSB> <EA2B2A09-152C-4AF3-B0C8-0D352CCA6647@netnod.se>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <6b149a8d-9102-ea1a-5048-b83842fc66c0@ix.netcom.com>
Date: Tue, 19 Mar 2019 00:55:52 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3
MIME-Version: 1.0
In-Reply-To: <EA2B2A09-152C-4AF3-B0C8-0D352CCA6647@netnod.se>
Content-Type: multipart/alternative; boundary="------------18282E60E7CBDB5E654C371B"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2817f643b8a0bc7e0ffdeec51713c504d68655cbb85248cb9350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 97.113.245.20
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/KhXPF8GJi50bB_RdXGGEDsHJg5I>
Subject: Re: [Idna-update] Genart telechat review of draft-faltstrom-unicode11-08
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\) implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>, <mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>, <mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Mar 2019 07:56:09 -0000

On 3/18/2019 11:47 PM, Patrik Fältström wrote:
> 6. The fact it was an issue was of course also brought up to the Unicode Consortium as I am (handy enough) also the liaison from IETF to Unicode Consortium. No response. They do push, as we know, TR#46 which is a different kind of animal than IDNA2008. Oh well. Things where fine, and we moved on. And IETF could focus on what IETF really should focus on, to ensure the algorithm itself was still "ok". I.e. without looking at individual code points, but more in general terms, was IDNA2008 still good enough? Lots of good feedback from people like Asmus that was fighting like mad (and still is) trying to a. convince registries they MUST come up with a subset of IDNA2008 permissible code points when they decide what can be used in their zones and b. come up with processes and rules for how that work should be done. Specifically work did go on for a long time regarding what code points can be used in the root zone, i.e. for TLDs. This is managed in the IAB document, and the Klensin draft about and more.
>
> 6. Now in version 11.0.0, the non-backward-compatible change happened again. Sigh.

I think its worth injecting some pragmatism here.

We know that only about 35K code points out of 130K total are actually 
used in someones orthography, where that someone is a speaker of a 
language used for daily activities and with good intergenerational 
transmission (= not likely to die out soon).

All the other 100K code points are either symbols or punctuation or, 
more often, code points that are recognized only by scholars.

Very little activity in Unicode affects that first set of 35K code 
points: future changes may be possible due to orthography reforms, as 
well as some population acquiring widespread literacy for the first 
time, but other than that, there's not much going on that needs new code 
points. There's the occasional case where the properties for some of the 
lesser used code points in this set may turn out to require corrections 
as result of software implementations for these becoming more mainstream 
and thus discovering issues. Generally, these affect languages where 
digital deployment itself is still nascent; and only a small subset 
potentially affects IDNA category.

The upshot is that the practical effect of such activity in the margins 
is effectively invisible.

The overwhelming part of the additions and other changes are within the 
remaining set of 100K specialized characters. These code points are 
either misused for malicious registrations, relying on users' 
unfamiliarity with these rare code points or of use only to specialized 
audiences.

The practical effect of any incompatibilities in that set is, again, 
effectively minuscule; even given the realistic expectation that the 
property data for these won't get rigorously tested in practical 
applications and therefore could be found incorrect at some future point.


>   I was at that time on my way to just give up, and recommend IETF that what we do today, to reference something that we thought was normative, stable etc (i.e. the Unicode Standard) did not work. That IETF instead would pick up and use some relationship with ISO that should be formalized, and instead of referencing the Unicode Consortium IETF should reference ISO, and then ISO should have in its rules that backward compatibility would be a requirement.


This is a complete nonstarter.

Nobody wins if we encourage the existence of two different encoding 
standards - and the current process of cooperation works precisely 
because it doesn't work the way it's sketched below.

> If then Unicode did ship something to ISO for "approval", then ISO would say "no, try again" or else ISO would make a process violation that members could appeal within ISO. But after sleeping on this for a bit, I felt the issue with 11.0.0 also happened with 7.0.0 and maybe one should first have a discussion within IETF again before the bolts are blown regarding relationship with Unicode Consortium. That just started when the I18N directorate was closed, and I ended up in a void. After long time, pushing and what not (lets discuss in the bar) this group finally was created were this VERY SERIOUS ISSUE could be discussed, and as John explained, there are many things to discuss, where we can boil it down to two:
>
> A. Should the non-backward compatible change in 11.0.0 result in a change in rule [G] in RFC 5892, or should we accept a non-backward compatible change? To trigger the discussing, I proposed the same result as for 7.0.0, to NOT update [G], and simply accept it. This also because the next issue is the important one.
>
> B. What should we do with IDNA2008? Obviously Unicode Standard is not stable enough. Or is it? What should we do with review? Should we have to start do what Martin just did with 12.0.0?

what Martin did with Unicode 12.0.0 is what I call a sanity check. I 
think it's a reasonable activity (if resources exist), but I would not 
expect to find any issues that are of enough practical importance to 
actually matter. (See above).


>   Do we IETF have the expertise? Can we rely on individuals like Martin, Asmus, myself and John and very few more to be around? Can we rely on Unicode Consortium? Is this the time to instead move to ISO? Like keeping IDNA2008 algorithm BUT tie it to ISO approved charset, and then ask ISO to protect the backward compatibility? Or are we done, so that we simply freeze IDN to a specific Unicode version, and simply ignore all added code points after that version? I.e. at 5.2.0 we knew some interesting code points where still to be added, but after 11.0.0?
>
> What I now know is that IESG have told me that we do have agreement in IETF on draft-faltstrom-unicode11-08.txt which implies moving to IDNA 11.0.0 without adding things to [G] is the path forward for now.
>
> What to do with 12.0.0 and future versions is still up in the air.


I believe the correct answer is to have a very serious discussion on how 
to help registries to make sure their support for the widely needed 35K 
is more robust and not to get sidetracked by the vast wasteland of 
obsolescent, obsolete, ancient or otherwise specialized use code points 
(whether for liturgical, phonetic, or poetic usage).

Continue to keep a watchful eye over the application of the algorithm 
for future versions, but don't expect that any cases will rise to the 
threshold where any exceptions need to be baked into the protocol - with 
all the attendant issues for updating software libraries.

But focus on solving issues that have at least practical impact.

A./