Re: [I18ndir] Getting restarted and triage

"Asmus Freytag (c)" <> Fri, 14 June 2019 21:13 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D5D9C1200E5 for <>; Fri, 14 Jun 2019 14:13:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -3.113
X-Spam-Status: No, score=-3.113 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.415, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key); domainkeys=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id oDUvcHAFDhSy for <>; Fri, 14 Jun 2019 14:13:55 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 6E4331200B1 for <>; Fri, 14 Jun 2019 14:13:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dk12062016; t=1560546835; bh=ReuuugODnnC4WXPDH1CnCGgTmAIdKcBPZMcw 35OWOvg=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=rI6yp5gcORVYWfeWkA6tl87ds3M1+oHjJ f3K29EZcKhzEA0yM/xLiZvRjQpWSFtKHmfm/zHN2NBBPd+RG+IDxaYj/aRfHBxpwIHw 1nxTFgEZ5KnAYRhaWPM6qbcb8Dc9XLrW6jK+Jl5MDuugCV704fHsGZP8R22ZUHVP3U8 aOdrUmF068tTMoaR9dAgDpzGm3VUulnMcbKu+S68CgkSCMNvXuQmQLphD78kARvW72L Q+zryOU2/Rqe7rK1aOBE5xgUU8PgBEE7E2/Wr/rBW/D2qUTGF0t4uwgBgQ/XHuZ+Mts GMMEiFX6yI2W3M4apGopjtNEET8j/+DZ/zpSjsz0Q==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016;; b=YpnNybR0A2VUAh9aPZcvXsuhHvROqh9ZwysEamP0qtJsPBYz46ljzsrWN8nQITAb17AFwTJVxrKlGD355HTrmbXuIrz7sULNhs2gETAws+r85Nk9rgZ89afQYQmlWhJWQh7OuO9qN9hRbqbLFXbch7v0xwnfoVokXhGh+6b1dCa2HNLGn4ZinVef738u8Z3OqHtOZ1m1XbOhApaqFHsg8jK9TeW6MoH+h8ITSODzinuBGHAuFJLmWTWtcoFJca/t6kdGK+SjeuGo7Vy0AptZ4OAak6CtHQ8op11UBXKhCkAFl71IAirnbRekT0KXQ8yHZFjGVS/FeHXigjSNhRU6XA==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [] (helo=[]) by with esmtpa (Exim 4) (envelope-from <>) id 1hbtWD-000ByM-Cg; Fri, 14 Jun 2019 17:13:53 -0400
To: John C Klensin <>,
References: <> <843EAB4535391A494DA216CC@PSB> <> <EC8189E3EA3488B8924DBBEB@PSB> <> <E596E8F5E430FAFAC84B17CF@PSB>
From: "Asmus Freytag (c)" <>
Message-ID: <>
Date: Fri, 14 Jun 2019 14:13:58 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0
MIME-Version: 1.0
In-Reply-To: <E596E8F5E430FAFAC84B17CF@PSB>
Content-Type: multipart/alternative; boundary="------------C3702BCCA6A0D9A8D1A04A3E"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b27dfed51d21846668b2bf0b7bb7bdf05abed7e43080299308350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Archived-At: <>
Subject: Re: [I18ndir] Getting restarted and triage
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 14 Jun 2019 21:13:59 -0000

On 6/14/2019 1:03 PM, John C Klensin wrote:
> Asmus,
> I think we are in substantial agreement, but a few comments
> below to be sure (and with everything else elided)...
> --On Friday, June 14, 2019 09:29 -0700 Asmus Freytag
> <>; wrote:
>> ...
>> In prinicple, I'd agree. However:
>> ...
>>> (2) The original IAB statement on what was described at the
>>> time as a Hamza problem but which we are now calling a
>>> non-decomposing character one
>> Which description still falls short of the problem as long as
>> you have several pairs of code points with identical (by
>> design) appearance in IDNA 2008 -- and not counting the
>> Latin/Cyrillic pairs.
> First of all, that issue, as described in the original IAB
> statement and in draft-klensin-idna-unicode-7-0-0 (even in
> version -00) is about conflicts within a script and hence does
> not interact with Latin/Cyrillic, Cyrillic/Greek, or Latin/Greek
> overlaps.  While we could have a long and probably amusing
> discussion about whether Greek, Latin, and Cyrillic should have
> been separated while the Arabic language variation of Arabic
> Script and the variations based on or derived from Persian were
> not, that discussion is of theoretical interest only (except
> possibly for those who feel some nostalgia for the original ISO
> DIS 10646 and its greater identifier focus rather than
> printing/display focus).
> Second, no one I know ever claimed it was a complete description
> of the problem or even that draft-klensin-idna-unicode-7-0-0 was
> at the most recent (public) -05.   I was just trying to map the
> landscape out in a very approximate way for this list.

I was just making the point that Unicode's own list of intentionally
identical characters contains several same-script pairs.

They also document a large number of sequences that would render
identically, except that one of each pair is considered "do not use"
by Unicode (and vendors obligingly tend to insert dotted circles into
the display - although that is not mandated).

These cases are edge cases not covered by normalization for various
good reasons, but we might need to work with them to make that data
machine readable.

Finally, there are some sequences that we discovered in the Root Zone
development process that are not normalizable (they are are distinct
spellings, even though they look alike - the correct choice depends on
the word in question). Because they are sequences,
they are not yet covered by Unicode data.

And then the larger issues is that nobody knows enough about ancient,
obsolete or archaic scripts (or rare code points in general) to be able
to map the problem space *within* that repertoire; which is why it's
best to not support any of them in "public" zones.

We should talk about the problem this way: "In the general case,
if you want a reasonably secure zone, you need to develop
a set of Label Generation Rules for your zone,
that is restricted to modern-use repertoire (to facilitate
recognition) and that uses context and other rules to restrict
labels further to stay within the structure of the script and prevent
duplicate or non-recognizable labels; or, that uses variants to
limit the number of labels that users may consider substitutes
for one another; or both. In addition, it may require the use of a
separate process to deal with certain more subtle forms of confusion
that fall short of full exchangeability."

The issue here is that not all scripts require all types of mitigation
to the same extent (or for the same reasons). Hence the, "in the
general case".

But I find that if we think of only a sub-problem, like non-normalizable
sequences with identical (or possibly identical) display, we fall short
of seeing the full issue and will try to pursue remedies that,
while well intentioned, will fall short. Or worse, will make the life
of those more difficult who are doing the right thing.

For example, you could define some kind of inverse "CONTEXT"
rule for IDNA that invalidates certain code points and feed that
process all the "do not use" sequences from Unicode. Turns out,
for the Indic scripts, where these are of an issue, a rather simple
set of positive context rules, generally based on character classes,
not individual code points, covers the 90+% of all cases including
100% of those listed by Unicode.

But you don't want to bake these into the protocol, because between
1 and 10% of the cases can be language dependent, or can depend
on whether you allow certain other code points (or not), that is
the details (and needed exceptions) can be repertoire dependent.

[We really should be having this discussion based on your updated

>>> Without interpreting the months it took to get it off the
>>> ground, the lag time between the discussions of the Unicode
>>> 11.0 and 12.0 tables and drafts and Pete's note and the month
>>> between Pete's note and my note as indicating anything
>>> (although it probably does), (1) - (6) above make an
>>> extremely strong case that getting critical mass together to
>>> initiate and sustain a WG, at least a conventional one that
>>> does not bend various rules, is implausible.
>> Seems like a reasonable reading of a the evidence to me.
> Sadly, we agree.

Which is why a different beast (or some tweaking of the rules) may
be needed to obviate any "bending".

>>> FWIW, if we look over the i18n drafts posted in the last few
>>> years (since the last PRECIS ones to go to RFC), I believe
>>> that none of the authors other than Andrew are regular IETF
>>> meeting attendees.  That doesn't bode well for a WG, at least
>>> a conventional one either.
>> That's an issue.
> Given the above, just one more minor complication.  It either
> means that we need to get agreement to do something mildly
> unconventional or that we need to figure out how to do any work
> mostly remotely _and_ in a way that keeps people engaged.  In
> the latter regard, while I've pulled time out to try to write
> some drafts and look again at others in the hope of either
> getting things moving or proving that we cannot do so -- getting
> this off my plate one way or the other, the other issue is that
> I'm really busy and have zero support for this (even support I
> can borrow from other things or justify on the basis of other
> commitments), Patrik is really busy with other commitments,
> Andrew now has a more than fill-time job, I assume I can make
> inferences about the degree to which Pete and Peter are sitting
> around contemplating what to do with their extensive free time
> from the rate at which they have been pushing the directorate
> forward, and I assume that no one has stepped forward to support
> you for as much time as you feel like putting into this and in
> the style to which you would like to become accustomed. That,
> IIR, is entire the recent author list.
> IMO that is a fairly serious problem and the reason I took
> exception to "requires a WG" without a plan that addresses it.
>> i18n is special in the way it intersects technologies. It
>> isn't a standalone technology, despite the fact that some
>> technologies are i18n-specific.
> Yes.  I hope we all know and agree about that.  Certainly I do.

May need it's own niche in terms of process.

>> In principle, the directorate model should cover the other
>> aspects well, except that IETF has too few people who can (or
>> want to) understand and review meaningfully those "generic"
>> technologies that nevertheless have i18n exposure. The W3C
>> make that model work, but only because their core participants
>> are funded directly for that work.
> Actually, there was a bit of a fuss when the directorate was
> created about confining its role to that of traditional
> directorates.  If those positions are accepted (some of which
> came from comments by people who where then ADs), our sole role
> is to advise that ART ADs on strategic issues.  Even reviewing
> out-of-Area documents is a little marginal and some Areas have,
> historically, had both directorates (for strategy and technical
> advice to the Area) and what are now called Review Teams (for
> out of Area document reviews) with different memberships.

Again, the way, i18n cuts across technologies seems poorly understood
by IETF in general  - excepting this group.

> >From the perspective of someone who is part of W3C's core i18n
> WG and who has been on most of the weekly calls for the last few
> years --probably a higher percentage of calls over that time
> than anyone other than the assigned staff member and the chair
> (and who, by the way, is not funded either directly or
> indirectly for that work), there are at least two other reasons
> why that effort works.

I've watched the process from a bit further remote, but I do monitor their
work and tend to put my oar in when issues intersect my particular 

>   One is that the core group, and the W3C
> generally, have been at liberty to say "not a web problem" or
> equivalent and walk away (and have done that, repeatedly).   I
> cannot imagine them spending much time on, e.g., non-ASCII
> identifiers in X.509 certificates or physical device
> identifiers.

But I'm sure IETF also has a scope. Wouldn't expect this directorate to get
involved in defining issues for HTML for example ?

But some of the discussions here make me think that, for example, IETF
isn't clear about keeping character encoding issues on the outside.

>   The second is that they are actually treated as a
> group of experts that is required, or even expected, to justify
> every internal decision to a pack of people with strong
> opinions, loud voices, and no expertise (in our case, whether
> within the IETF or to various ICANN and other industry groups).
> Even the public review process is different in that respect.

I can't parse that first sentence. Is there a "not" missing?
> So I agree with you, but the difference is much greater than
> what you say above.

Well, comparisons are always approximate.