Re: [I18ndir] Getting restarted and triage

"Asmus Freytag (c)" <asmusf@ix.netcom.com> Wed, 26 June 2019 18:59 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3734E120654 for <i18ndir@ietfa.amsl.com>; Wed, 26 Jun 2019 11:59:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aJfZdn5QDeFL for <i18ndir@ietfa.amsl.com>; Wed, 26 Jun 2019 11:59:29 -0700 (PDT)
Received: from elasmtp-kukur.atl.sa.earthlink.net (elasmtp-kukur.atl.sa.earthlink.net [209.86.89.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A5C40120652 for <i18ndir@ietf.org>; Wed, 26 Jun 2019 11:59:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1561575569; bh=lv+/mA8u+PpXZUbSnWMFdaEzYbvYl8mEyARH +N9PIhY=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=AdLD+YtUyzXbgsKFE6InD8mkJVtGFpFPR Py5pzUZRfs0kqbNNeaGw4r1rRJ4rY/n1t95I8WzK6RFrvEzzbyI/PT6MsKk7HNehFK+ GzcIC9MDzdUv3iUTISaOmg0NASNsr1Qo8eTd0/DHQ/8SbK9a8xFvKPSnpmby+m2Mj6R UAUNNLDyY42fAiyPPRRblwPUApUsmSLeUp8zw0yAmBFCJUdVw9VE2cr1uJ0J40+h8lt ZdRV8fbuwCovDr4iYrBKBB0HQJE8nVI/VXUSgk1Zd2X0a1hWCtV4JxX+jdBDg9G4iKB dRXiKWQ0+uB8bd7KMnCdBZeTeLgrtmrKpfpqAPfuQ==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=cjv7mG0PF697MPx81/NHvU/2skHr1BfSqXAC2btj+CMfhjuMyRw5ObMWsAX3hnqDnrhilLbXiYkGTO2ZNeyXAUr+GXtZDHSXtkL7pCj2Z8tt8s+TK5/QdUxCP1nFb3n1Mv0+UsMgk5rMk/70jnQtFs/HNUaXbOJx9vqrg0SW3OpXpo+opoxFTOVGwjQIG5hItQ7DFFAEnm+i0y1lqA8Cb06Qdkc0rJ+DjuV+AE4bb6MGjBzcpKGMkZhKffPkq0p/nZA6rJt+KyXvG3Vpoi9hINBaRK6XzKWxbFyeVkawFdm4t+3nXh8GRV3hNMitCB+rugzN1/u03v5ctQ1z2g9pIA==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [174.21.169.30] (helo=[192.168.0.5]) by elasmtp-kukur.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1hgD8h-0001RT-3i; Wed, 26 Jun 2019 14:59:28 -0400
To: John C Klensin <john-ietf@jck.com>, i18ndir@ietf.org
References: <F2B84580-7E5A-4B86-BF9C-0205D4E6121D@episteme.net> <843EAB4535391A494DA216CC@PSB> <13212579-9AEA-45F8-A205-18B4AD1B0BF1@viagenie.ca> <EC8189E3EA3488B8924DBBEB@PSB> <77e8acfd-811a-c5e9-6940-3b8ed2669a75@ix.netcom.com> <E596E8F5E430FAFAC84B17CF@PSB> <16a479d0-8a10-192f-a00c-11b1eae4abb1@ix.netcom.com> <B7A0B60B2E3DE13D5FAA651F@PSB>
From: "Asmus Freytag (c)" <asmusf@ix.netcom.com>
Message-ID: <b855da87-49db-fc34-8632-7ae204359421@ix.netcom.com>
Date: Wed, 26 Jun 2019 11:59:28 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2
MIME-Version: 1.0
In-Reply-To: <B7A0B60B2E3DE13D5FAA651F@PSB>
Content-Type: multipart/alternative; boundary="------------9688934391C6042BAF9BCCFE"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b27dfed51d2184666823b101230f32b8cdf314e8c2fbacaaa3350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 174.21.169.30
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/dfnNBRs9dsLl-zYHviIJKZtozJE>
Subject: Re: [I18ndir] Getting restarted and triage
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Jun 2019 18:59:35 -0000

On 6/26/2019 9:24 AM, John C Klensin wrote:
> Asmus (and everyone else),
>
> Let's try to restate your (IMO very helpful) explanation in
> possibly-actionable terms...

OK, let's.

[Note added after finishing my reply: it felt like a large part of your 
message was continuing our conversation, and others may not necessarily 
have something to add to the details. However, the "meta question" 
really addresses a different audience, and people should be sure to read 
to the end. Esp. Pete&Peter.]

>
> --On Friday, June 14, 2019 14:13 -0700 "Asmus Freytag (c)"
> <asmusf@ix.netcom.com>; wrote:
>
>> ...
>> I was just making the point that Unicode's own list of
>> intentionally identical characters contains several
>> same-script pairs.
>>   They also document a large number of sequences that would
>> render identically, except that one of each pair is considered
>> "do not use" by Unicode (and vendors obligingly tend to insert
>> dotted circles into the display - although that is not
>> mandated).
>> These cases are edge cases not covered by normalization for
>> various good reasons, but we might need to work with them to
>> make that data machine readable.
> First question cluster: Would you advocate modifying RFC 5892 to
> say something substantive about one or both members of that pair
> of characters?  If so, do we need to ask Unicode, not just for
> machine-readability but for a property or something similar that
> IDNA can use?  Do we need to make a plan about what to do if we
> ask and they say "no"?   Also see "Final, meta, questions" below.
Machine readable for Unicode means "property". Currently they document
this in tables in the text, but that's no place for such normative 
information.

(There is other information about these scripts that also, over time, 
has turned
into properties. By its nature, the other stuff is less "stable" as it 
is concerned
with rendering and those things tend to have a feedback loop with rendering
technology (and are exposed to potential orthography reforms). This "do not
use" info, however, appears to be more fundamental and more on the
encoding level.)

I would simply ask and take it from there.

>   
>> Finally, there are some sequences that we discovered in the
>> Root Zone development process that are not normalizable (they
>> are are distinct spellings, even though they look alike - the
>> correct choice depends on the word in question). Because they
>> are sequences, they are not yet covered by Unicode data.
> Second question cluster: Is there something that could usefully
> be done about these in IDNA and, if so, what would it be?

Not sure, unless you admit a discussion of variants (or the role of variants
in defining registration policies aka. LGRs). The omission of (blocked) 
variants
after IDNA has both repertoire and context rules is one that places IDNA2008
out of step with where domain names are going.

(Note, I think "allocatable" variants, despite sharing the name 
"variant" are
a different beast - while "blocked" variants directly improve security and
only that, "allocatable" variants are often motivated by the desire to allow
users of different communities to access the same resource, no matter their
local keyboard, or input method settings. They also cut down on spoofing,
but that is secondary. There are very few "allocatable" variants that are
truly unavoidable: Tamil "sri" / "shri" comes to mind.)


> How
> are such sequences differ from the "rn" and "m" pair that, with
> the right choice of type styles and sizes, can look nearly
> identical (or at least "alike")?

Tamil "sri" / "shri" are two ways of spelling the same syllable; rendering
systems will make them come out identical, even though only the second
one is "formally" the correct one. For historic reasons (Unicode lacked a
dedicated SHA character) the "wrong" sequence was for a long time the
only way to go. However, the "correct" spelling isn't universal and both
exist side-by-side. Well, one is technically more correct than the other,
so you could add an additional normalization step that always forces the
correct one, but no software exists to do that.

This is an example where blocking alone is insufficient; users will not know
why typing a perfectly good label doesn't work. Hence, this is one of the
very few allocatable variants the root zone has for Indic scripts.


> If there is a correct choice
> and it depends on spelling of actual words, how does that
> interact with the principle that DNS labels are not required to
> be words in any language (and that many historically and
> currently important ones are not)?

That question seems orthogonal to the issue I'm trying to address here.

However, let me elaborate this: for complex scripts, you can have
code point sequences that are invalid, not just "not words". Neither
users nor rendering engines are prepared for doing something sensible
with structurally unsound ("impossible") syllables. Users read in syllables,
and software likewise renders syllables.


> Is the Root Zone development
> process imposing a "word" criterion for non-ASCII labels even
> though the rest of the DNS does not have that requirement
> (noting, e.g., "COM", "ORG", "BIZ", "MIL", most of the ISO
> 3166-1-based ccTLDs, etc., there is enough precedent for
> non-words in ASCII that proposing such a change would probably
> be a lost cause)?

Not really. In most Indic scripts you can put together any random string
of consonants, or consonants mixed with independent vowels. Those
subsets of strings are as close as you get to "random" letter sequences
possible in those scripts - they do exist in the wild as acronyms, at least
for some of these scripts.

Generally, we tended to push back against restrictions that seemed like
"spelling rules", preferring those that are "structural", and preferably
"deep structural" (if I may use this term).


> Considering your work with and comments
> about complex scripts, is it possible to construct and impose an
> "orthographically plausible in a given script" rule that would
> stop short of a "word" requirement and should we be exploring
> that for IDNA?  Again see "Final, meta, questions" below.

That's the way we squared that particular circle for the Root Zone (to the
best of our abilities). However, some of the rule systems represent 
compromise.
If it was easier to model an approximation, we would do that, allowing
both under and over production of labels.

In some cases, languages using different ideas of structure share a given
script. In Thai, we were unable to accommodate one of the minority
languages, because so much of Thai computing infrastructure has the
rules for the Thai language baked in, that allowing something that violated
assumptions for that language would have resulted in unstable or
ambiguous rendering for the minority language labels. So, we went
conservative and had to limit things.

That's why the protocol would not be a good fit to shoehorn all these
rules as CONTEXTO. But the IDNA2008 should be updated to mandate that
"if you support a complex script, you must provide context rules."

(And perhaps we can find a way to list those scripts and/or a minimal
set of code points for which a context rule should be mandatory, no
exceptions: it's pretty clear that, for example, all VIRAMA code points
need to have a context; after all, like joiners, they are invisible in many
situations; unlike joiners, their context rules aren't as universal).

>   
>> And then the larger issues is that nobody knows enough about
>> ancient, obsolete or archaic scripts (or rare code points in
>> general) to be able to map the problem space *within* that
>> repertoire; which is why it's best to not support any of them
>> in "public" zones.
> While I think there is general agreement about that (some
> fussing about the meaning of "public zone" aside), if it is a
> public zone question alone, does it impact IDNA in any way?

Yes, in the sense that we should strongly imply that applying IDNA to
archaic scripts in public zones is unsafe. "Secure" is probably a bad choice
so maybe some different term, but we should define a designation for
these partially more secure registry policies that

(1) avoid unanalyzed scripts (ancient, obsolete etc)

(2) limit code points to modern/everyday use ones (better recognition)

(3) provide required contexts (complex scripts)

(4) provide blocked variants for "identical" code points/sequences
       and/or "allocatable" ones in limited circumstances

In other words, without changing base IDNA2008, we should provide
something that can be conformed to (not least so ICANN could
potentially require this of contracted TLDs)

>
>> We should talk about the problem this way: "In the general
>> case, if you want a reasonably secure zone, you need to
>> develop a set of Label Generation Rules for your zone, that is
>> restricted to modern-use repertoire (to facilitate
>> recognition) and that uses context and other rules to restrict
>> labels further to stay within the structure of the script and
>> prevent duplicate or non-recognizable labels; or, that uses
>> variants to limit the number of labels that users may consider
>> substitutes for one another; or both. In addition, it may
>> require the use of a separate process to deal with certain
>> more subtle forms of confusion that fall short of full
>> exchangeability."
> At one level, that seems entirely reasonable. I think --but an
> not sure and would learn from a correction and explanation --
> that it is actually less stringent as a requirement than the
> repeated variations on "as a registry, you should register only
> strings you understand in scripts that you understand
> thoroughly" in the existing IDNA specs and reinforced in
> draft-klensin-idna-rfc5891bis.

Compare this wishy-washy (sorry) description to my points (1) through (4)
above. Which one reads as more actionable?

Also, much of the discussion for IDNA2008 has focused on repertoire,
even RFC 6912 only talks about repertoire selection. A policy (aka LGR)
designed for a typical script needs ALL three legs: repertoire, context
and variants, because most scripts (not most languages, but most scripts)
require all three, and many require two out of the three.

We are up against the "unknown unknown" here: even well-meaning zone
administrators do not know what the don't understand. They will tend to
answer your question in the affirmative.

Part of it comes down to people not connecting the rendered string with
the encoded string. In Latin, the difference is of no practical concern.
In complex scripts, knowing the code points is meaningless, if you don't
know how they interact. But a string is just a sequence of code points.
Hence the disconnect.

>   Would it work any better as
> advice or in practice?   I think each of us --or at least many
> of us-- have hypotheses about why the existing rule has not been
> observed very much in zones that are operated for a profit or
> even by third party registries on a cost-recovery basis (not the
> same as "public domain", but with a good deal of overlap).
> Most of those hypotheses seem to amount to "not profitable",
> either because it would reject potentially-paying customers who
> want to buy particular names or because the costs of checking
> each string would be too high.

Mechanical verification with an LGR at registration time is inexpensive.
We have solved the issue of how to encode registration policies so that
machine verification against such policies is now trivial. That didn't use
to be the case.

I'm thinking that "advice" is nice, but you want to elevate it to something
that can be conformance tested. Only that way could some body like
ICANN write it into contracts, for example.

>
> Third question cluster:  Keeping in mind the above and that
> we've, so far, avoided (or at least tried to avoid) getting into
> issues of what humans can confuse, especially if others are
> deliberately trying to confuse them in IDNA, what would you
> suggest doing in this area?  Or is it just about reinforcing the
> mandate for registries being responsible about what they do and
> providing additional guidance and tutorial materials to help if
> they decide, for business reasons, to substitute "have a
> general, if vague, knowledge or and some helpful guidelines" for
> "understand thoroughly"?

If my morning coffee hasn't worn off and I can parse this correctly, I think
you are reaching for something similar to my points (1)-(4) above; those
could be hammered out to become a "profile" on IDNA2008 that would
add a layer of protection against cases where strings are not just similar
(like "rn" and "m") but effectively identical where even an observant user
cannot tell apart two labels that are not presented side-by-side in 22pt
fully serifed fonts.

(Many cases like U+0259 and U+01DD) are 100% identical, even in 72pt
fonts and in *every* font, but we like to slightly relax the definition of
identical to cover "effectively identical" - code points or sequences that
users will unhesitatingly "substitute" for each other. This would include
some distinctions that can be detected with a good font, at high resolution
and by people that know the subtle things to look for.)

Unicode has a list of "intentionally identical" code points; many are cross-
script, some cross the letter/digit divide, but others are the result of
disunifying the encoding of the same shape based on things like bidi
properties or casing pairs. For that list, you could easily mandate 
"blocked"
variants (or allow optionally "allocatable" variants). The profile would
then demand that your policy deal with these, but also strongly
recommend that the same mechanism be used for "effectively" identical
code points.

Does that sound like a start?

You can then further recommend that every zone have some scheme
of resolving issues around "confusables" of a wider range. Unicode has
some files on those, if you'd like being able to conformance test.


>   
>> The issue here is that not all scripts require all types of
>> mitigation to the same extent (or for the same reasons). Hence
>> the, "in the general case".
> Clearly.  And I think we understood that in the IDNA2008 design
> even though I now believe that we botched some of the details.

It reflects the fact that its authors were not familiar with all scripts. As
part of the RZ-LGR project, it's been my privilege to become familiar with
tons of details of all modern, every day use scripts; that perspective is
rare. Some rendering system designers may have that also, but they don't
look at it from same angle as you would do for identifiers.

>   
>> But I find that if we think of only a sub-problem, like
>> non-normalizable sequences with identical (or possibly
>> identical) display, we fall short of seeing the full issue and
>> will try to pursue remedies that, while well intentioned, will
>> fall short. Or worse, will make the life of those more
>> difficult who are doing the right thing.
>   
>> For example, you could define some kind of inverse "CONTEXT"
>> rule for IDNA that invalidates certain code points and feed
>> that process all the "do not use" sequences from Unicode.
>> Turns out, for the Indic scripts, where these are of an issue,
>> a rather simple set of positive context rules, generally based
>> on character classes, not individual code points, covers the
>> 90+% of all cases including 100% of those listed by Unicode.
>> But you don't want to bake these into the protocol, because
>> between 1 and 10% of the cases can be language dependent, or
>> can depend on whether you allow certain other code points (or
>> not), that is the details (and needed exceptions) can be
>> repertoire dependent.
> Ok.  Whether we agree or not on the details (and you and I, at
> least, mostly agree) I think our understanding of what we should
> not do is improving, with your efforts and explanations being a
> significant component of that improvement.  What I'm not sure
> about is whether we are getting any closer to an understanding
> and agreement on what we should do.


Do you like the suggestion of a "restricted profile" on IDNA208 (or whatever
name you'd like to give it) that focuses on the extra security issues for
"public" zones (however we want to define those)? That's something that
I believe we can make concrete and actionable (essentially something someone
could "conform" to).

>
>> [We really should be having this discussion based on your
>> updated draft.]
> Probably.  See "Final, meta, questions" below.
>
>>>> Without interpreting the months it took to get it off the
>>>>> ground, the lag time between the discussions of the Unicode
>>>>> 11.0 and 12.0 tables and drafts and Pete's note and the
>>>>> month between Pete's note and my note as indicating anything
>>>>> (although it probably does), (1) - (6) above make an
>>>>> extremely strong case that getting critical mass together to
>>>>> initiate and sustain a WG, at least a conventional one that
>>>>> does not bend various rules, is implausible.
>>>> Seems like a reasonable reading of a the evidence to me.
>>> Sadly, we agree.
>   
>> Which is why a different beast (or some tweaking of the rules)
>> may be needed to obviate any "bending".
> Ok.  But what do you suggest given the constraints to which the
> above points?  In a way, this is at the core of the "Final,
> meta, questions" discussion below, but it states the problem at
> least a bit differently.

(I would let Pete&Peter worry about how to make things happen in the IETF
process arena - I would like to focus on getting the technical aspects 
right).

>     
>> ...
>>> IMO that is a fairly serious problem and the reason I took
>>> exception to "requires a WG" without a plan that addresses it.
>>>
>>>> i18n is special in the way it intersects technologies. It
>>>> isn't a standalone technology, despite the fact that some
>>>> technologies are i18n-specific.
>>> Yes.  I hope we all know and agree about that.  Certainly I
>>> do.
>> May need it's own niche in terms of process.
> Yes.  However, when the directorate was under discussion, we got
> back what I thought were fairly clear signals from the IESG that
> our role was going to be scoped even more narrowly to advising
> the ART ADs than some directorates have been in the past.  That
> does not make me optimistic about trying to define a special
> niche and get it accepted, but maybe things have changed.  As we
> wrote to each other earlier...
(Again, good thoughts, but should be on a separate track with different 
people
making the call).
>
>>>> In principle, the directorate model should cover the other
>>>> aspects well, except that IETF has too few people who can (or
>>>> want to) understand and review meaningfully those "generic"
>>>> technologies that nevertheless have i18n exposure. The W3C
>>>> make that model work, but only because their core
>>>> participants are funded directly for that work.
>>> Actually, there was a bit of a fuss when the directorate was
>>> created about confining its role to that of traditional
>>> directorates.  If those positions are accepted (some of which
>>> came from comments by people who where then ADs), our sole
>>> role is to advise that ART ADs on strategic issues.  Even
>>> reviewing out-of-Area documents is a little marginal and some
>>> Areas have, historically, had both directorates (for strategy
>>> and technical advice to the Area) and what are now called
>>> Review Teams (for out of Area document reviews) with
>>> different memberships.
>> Again, the way, i18n cuts across technologies seems poorly
>> understood by IETF in general  - excepting this group.
> I agree completely.  But I'm not seeing the mechanism or
> suggestions that get us unstuck from however one describes the
> current situation.  See "Final, meta, questions" below.
>
>>>>  From the perspective of someone who is part of W3C's core
>>>> i18n
>>> WG and who has been on most of the weekly calls for the last
>>> few years --probably a higher percentage of calls over that
>>> time than anyone other than the assigned staff member and the
>>> chair (and who, by the way, is not funded either directly or
>>> indirectly for that work), there are at least two other
>>> reasons why that effort works.
>> I've watched the process from a bit further remote, but I do
>> monitor their work and tend to put my oar in when issues
>> intersect my particular expertise.
> Yes.  But the claim, IIR, was the W3C effort was succeeding and
> we weren't because the core activity there was fully funded and
> staffed.  Staffed, yes, but, other than Richard, not much better
> funded than the IETF effort.  Certainly I'm not funded for that
> work and I assume you are not either.  I also note that the
> level of activity of the other people on this list who are
> supposedly part of the core group (and who may not be funded for
> it either) has been rather close to zero and that has not
> prevented work getting done either.


(W3C participants may be funded by their own employers, who understand
their companies' dependence on these specifications. Not the case here.)

>   
>>>    One is that the core group, and the W3C
>>> generally, have been at liberty to say "not a web problem" or
>>> equivalent and walk away (and have done that, repeatedly).   I
>>> cannot imagine them spending much time on, e.g., non-ASCII
>>> identifiers in X.509 certificates or physical device
>>> identifiers.
>> But I'm sure IETF also has a scope. Wouldn't expect this
>> directorate to get involved in defining issues for HTML for
>> example ?
> Of course.  But that isn't the point.   The point is that
> analysis of, e.g., implications of of non-ASCII identifiers in
> X.509 certificates or storage identifiers is fairly far from the
> expertise of most of us on this list.  But we don't get to say
> "out of scope" and, instead, appear to be expected to prioritize
> them.
(W3C have the same problem: they are lucky to have some people with wide-
ranging abilities. I can usually give useful input if I can identify 
character
coding related issues, or if someone identifies for me where they are.)
>   
>> But some of the discussions here make me think that, for
>> example, IETF isn't clear about keeping character encoding
>> issues on the outside.
> More explanation about that would be helpful.  I think we are
> fairly clear about it and that, when issues have arisen, they
> have been about whether other groups are following their own
> rules (or what they told the IDNA2008 development effort their
> rules are) or about areas in which they have done work or made
> decisions that are not, historically, about character encoding.

Any time someone talks about "discarding Unicode" is a good example of that.
Or, defining some non-Unicode normalization, etc. Shows an unwillingness
to accept that these things are out of scope and need to be taken as they
exist (and then mitigated where they don't come tailored for identifiers
out of the box).

>
>>>    The second is that they are actually treated as a
>>> group of experts that is required, or even expected, to
>>> justify every internal decision to a pack of people with
>>> strong opinions, loud voices, and no expertise (in our case,
>>> whether within the IETF or to various ICANN and other
>>> industry groups). Even the public review process is different
>>> in that respect.
>> I can't parse that first sentence. Is there a "not" missing?
> Yes.  Should have been "is not required..."  Sorry.
>
>> ...
> Final, meta, questions(s):
>
> We are now six weeks past Pete's "get back to work" note and two
> weeks past my response to it.
> draft-klensin-idna-unicode-review, the direct result of
> discussions in this group, will have been posted for two weeks
> on Friday.
Hey, I at least read your draft. Was busy publishing RZ-LGR-3. (It's 
done, but
there's a two-week embargo, because of the ICANN meeting).
>
> Since then, we've had Marc's note suggesting the issues ought to
> have a WG and several comments that I believe add up a
> conclusions that the only way we are going to get a WG is to
> redefine this list as one ... and that probably won't work
> either.  We've had a few exchanges between Asmus and myself that
> I'm learning from but that I don't think are moving us forward.

(If you haven't caught on, I'm using () to mark off things that are meta or
out-of-scope comments or an occasional aside. The next one is meta).

(We need to get Pete&Peter to come down unambigously in support of
us finishing that series of "core" documents +/- whatever factoring of these
documents we end up with. I'd leave it to them to find suitable additions
to our process. I don't have the IETF expertise to be productive and it
frankly wastes my time, which I could put to use on technical contents.)


> We've had a review by John Levine that is encouraging because it
> means that someone other than Patrik and myself has looked at
> the document, but that doesn't show a path forward either (not
> his job).  And we've had a few notes from Pete about procedures
> and comment styles but, AFAICT, no direction that will move us
> forward.  My (I thought) fairly simple question as to whether
> I/we should expect the directorate to review that draft and
> advise the ADs on what to do with it or whether I should try to
> find an AD to sponsor it has not been addressed by Pete, Peter,
> or any of the ADs.

(I had a chat with Pete where he promised to be more supportive/proactive,
but he'll have to put that into action.)

>
> I hope my being unable to find this encouraging is no surprise.
> I am having a good deal of trouble prioritizing this effort,
> particularly working on the various documents, ahead of work
> that actually generates interest and, btw, income to pay the
> bills.  I can push back, as I have above, to try to find things
> that are within the IETF's scope and actionable, but if we can't
> figure out how to process even the two most obviously actionable
> documents we have on the table
> (draft-klensin-idna-unicode-review and
> draft-klensin-idna-rfc5891bis), then there doesn't appear to be
> a lot of point.

That latter one, can we move that towards proposing a profile like I am 
suggesting.
(And you can keep the advice that all zones should be limited to 
well-understood
stuff and add that they should also follow whatever items (1)-(4) they find
applicable for their zone, to make even that advice more "actionable").

Item (5), incidentally, ought to be that the relevant zone policies are 
published
in RFC 7940 format.

For the former, I like the idea of an "under review" classification; I 
am otherwise
inclined to follow Unicode even on property changes, as long as they don't
affect, directly or via related code points, the interpretation of 
"high-use"
modern code points. Obviously, moving towards DISALLOWED would be an
issue, but not everyone I spoke to agrees that keeping code points PVALID
after we discover (strong) reasons why it shouldn't have been PVALID to
begin with is such a good idea. (That's in the details).

> Unlike most of the rest of what should be in the
> directorate's queue, those two documents make no substantive
> changes to IDNA2008 that, e.g., change the derived properties of
> code points so they should not be very controversial.   I could
> make a fast pass through the "registry responsibility" document
> and encourage Asmus to join me to be sure that we haven't
> changed our minds about anything significant and then post a
> current (non-expired) version but, right now and given the
> underwhelming response to the draft posted a bit under two weeks
> ago, it is questionable whether even that is worth the trouble.

I'd like that (as far as it involves me :) ).

(We should get P&P explicitly involved in creating a roadmap of how this
stuff is supposed to move forward. I feel this process could use a bit of
management).

>
> So, is there a path forward or was this directorate idea a
> well-intentioned failure that leaves us back where we were a
> year ago?   Is there going to be a report on the directorate on
> the morning of 22 July and, if so, what do the five of you
> expect to say?

(Seems mostly addressed to the other members, so I'm not adding a 
response for now).

I found this rather constructive, esp. the parts that weren't meta 
(those are largely for a different audience than me in my role of 
technical expert).

> best,
>     john
>
>
>