RE: [Ltru] Last call comments on LTRU registry and initialization documents

--On Wednesday, 07 September, 2005 12:19 -0700 Addison Phillips
<addison.phillips@quest.com> wrote:

>> Comments on draft-ietf-ltru-registry and
>> draft-ietf-ltru-initial and, secondarily, on
>> draft-ietf-ltru-matching...
> 
> I've thought a lot about the excellent analysis and comments
> in John Klensin's message. My perception is that we have a
> divergent view of the structure and significance of the LTRU
> draft(s). 

First, my thanks for the obviously careful reading and thought.
We may indeed have divergent views, although, after reading your
notes I believe that, in practical terms, we are pretty close
together.

> Although superficially the drafts are very different than the
> RFC 3066 that they seek to replace, in fact the structure is
> very similar. The drafts are attempting to fill certain gaps
> unaddressed by RFC 3066 for implementers or for tag choice by
> and the requirements on "content authors" (people who choose
> tags or ranges).
> 
> Here are my basic thoughts in response to those comments:
> 
> 1. All tags valid under the drafts would have been valid or
> valid to register under RFC 3066. This is a key point. The tag
> grammar proposed is intended to be highly compatible not just
> with RFC 3066 but also with existing implementation. It is
> expressed as greater restriction on what may be registered.
> This provides more regularity in tags, although tags
> themselves are not greatly changed. A subtag registry is, in
> effect, a different way of expressing what is already in
> place. 

I understand this, and think I understood it before.  There is a
difference, however, and its expresses itself, I believe, in two
ways.  (i) The 3066 model requires some process for every tag
that is to be used.  That is very bad in some ways, as the
documents and your notes correctly point out.  On the other
hand, it tends to keep the number of tags that are in use down.
Given more general IETF experience --which may not be applicable
to this particular situation -- a smaller number of tags in uses
tends toward better and more widespread interoperability.  (ii)
The idea of using a registry of components (in this case
subtags) that can be mixed and matched at the implementer's
discretion, albeit according to specific rules, is somewhat
untested in the IETF and the Internet applications community.
The closest equivalents involve protocols with a small set of
options that are presumed to be orthogonal.  In those cases, we
typically apply rather strict rules to be sure that each of the
possible combinations are tested and shown to interoperate;
combinations that are not or cannot be demonstrated 
to interoperate are dropped from the standard at higher maturity
levels.  Clearly, that level of testing is not possible or
appropriate here, but that level of innovation does justify
requiring some operational demonstration of impact on
interoperabilty, rather than stamping "BCP" on the document and
hoping that everything will work out.

> 2. The fact that it *always* narrows the potential subtags
> that could be registered *in the future*, but has no effect on
> any tags or subtags already extant means that (from an RFC
> 3066 implementation perspective) the range of tags actually
> seen in the wild will be more limited than it might have been.
> Commentators on this thread have implied that it is an
> entirely new protocol, but I think that goes too far: it is
> the same protocol with greater rigor on what may go where. 

While I completely agree with your final sentence, it is
possible to reach a different conclusion (or, more accurately,
pose a different hypothesis) about the first.    One could
equally claim that the discretion accorded the tag reviewer,
working with the IANA and under IESG supervision, would keep the
number of tags registered, and hence in the wild, lower in
practice than the number permitted as the crossproduct of subtag
registrations.  I am aware of the flaws in that argument,
including a tradition "if you can't get it registered easily
enough, just use what you want" in other parts of the community.

As a trivial and silly counterexample to the "fewer actual tags
in use" hypothesis, I would expect any competent reviewer under
3066 to look askance at a request to register en-Hang or en-Hant
while, as I understand it, the fact that the three subtage "en",
"Hang", and "Hant" appear in the registry makes those
combinations valid under the LTRU model.  What would prevent
their appearance in the wild is that it would take someone who
was either stupid or perverse to want to use them (casual
readers, see the Aside below -- Addison clearly does not need
it).  But "stupid or perverse" is not a rarity around the
Internet.  Moreover, someone who was seriously security-paranoid
might wonder whether these perverse combinations could be a way
to code (not cypher, but code) secret/private messages.  It
seems to me that the risk there, while small, is greater than in
3066 and it is not called out in the security analysis (not that
I'm sure it should be: as Sam and Russ are painfully aware, I'm
very concerned about stopping rules in requirements for threat
analyses and presentations).

	Aside on the example above (LTRU participants can skip
	unless they want to check my logic): "en-Hang" and
	"en-Hant" would imply writing English in Korean Hangul
	or Traditional Chinese characters respectively.  In
	addition to those not exactly being common cases, it is
	not clear that they are feasible.  Since most Chinese
	characters cannot be used in an unambiguous phonetic
	way, one would presumably need a rather specific
	profile, presumably expressed as a variant or extension,
	to make things work (and even then, it would be
	strange).  Hangul is problematic in a different way.
	Unlike Chinese characters, it is definitely phonetic.
	But because it is rather carefully designed and
	structured around the needs of Korean, it is not clear
	to me, in my ignorance, that it could be used to
	represent the full range of English phonemes and
	syllables with reasonable accuracy.  Contrast these two
	examples with, e.g., en-Cryl (English written in
	Cyrillic characters) or en-Arab (English written in
	Arabic characters).  Those might be strange or even
	perverse, and they might be used to conceal the content
	of text from a casual reader/observer, but they would
	"work" perfectly well if read out loud, using
	conventions no more extreme (and probably less so) than
	some alternate spelling systems that use Roman-based
	characters but whose advocates claim are more consistent
	and easier than the normal spelling patterns.

> 3. The various rules and guidelines set down in the draft
> provide a more rigorous registration process based on the
> experience of operating ietf-languages for the seven or so
> years. This could be seen to make it the "best current
> practice" for registering language tags or their components.
> The switch to subtags was chosen to spare the community
> immense numbers of registrations of various subtag variations
> (examples from the current registry: two German orthographic
> subtags, eight registrations; two Chinese script subtags,
> *twelve* registrations). 

In case I haven't made it clear enough in previous notes, I
_like_ this system and the ideas behind it.  I think
"Suppress-script" is a particularly nice idea given the
weaknesses we agree are present in the 3066 model.   I just
think we need to move with caution into somewhat uncharted
territory, doing so in a way that permits and encourages us to
apply more specific guidance for particular applications than
the general guidance of Section 4 of "Registry" (not that I find
anything in that section to disagree with).

> 4. The creation of a registry simplifies the work incumbent on
> implementers or content authors, since they no longer have to
> refer to (under RFC 3066) four separate tag-or-subtag
> repositories and then synthesize the rules in RFC 3066 for
> choosing between certain overlapping subtags (for example ISO
> 639-1/-2). The fact that there is a registry doesn't change
> the fact that "somewhere" there is a list of subtags that may
> be validly combined into tags.

See above.

> 5. There is a perfectly good matching scheme loosely described
> in RFC 3066. This scheme is enshrined in numerous places,
> including RFC 3282 (which, you'll note, also "Obsoletes:
> 1766", an example with 3066 of two RFCs obsolescing the same
> BCP on separate days over a year apart). The additional forms
> of matching described by the matching draft are interesting
> and may be useful in a variety of applications (draft-matching
> gives some examples). But they are unnecessary to the specific
> task of updating RFC 3066. Applications of language tags in
> the future may wish to choose one or another of the other
> schemes from draft-matching to produce more interesting
> results. But such additional schemes are not necessary to the
> task of updating RFC 3066.
> 
> If the community feels that matching is so important that
> draft-registry must deal with it directly, my suggestion would
> be to take Section 2.5 verbatim from RFC 3066 and include it
> in draft-registry. This preserves the vital reference to
> language-ranges. It should be noted that RFC 3066 nowhere
> provides an explicit treatise on matching. Both it and
> draft-registry were written for compatibility with known
> matching schemes. Success or failure of the draft should
> necessarily be measured by its interoperability with existing
> matching protocols. My belief is that there is high
> interoperability, since the matching scheme is quite basic and
> the rules governing tag choice gave careful consideration to
> the problem of script subtags. 

My personal bias about how to do this, based on IETF experience
and some idiosyncrasies of the RFC series (notably the exceeding
bluntness of the "Updates" and "Obsoletes" instruments), is to
tell applications that they need to pick 

	(i) the registry, as defined by an RFC and an IANA
	entity.  I'd hope there would be only one, but Sam's
	suggestion may have some value,

	(ii) a matching rule, as defined by an appropriate RFC
	or text in the RFC defining/specifying the application,
	and optionally, 

	(iii) some application-specific additional rules or
	constraints, specified in the RFC defining/specifying
	the application.

Now, if we were having this discussion in, e.g., a JTC1 context,
I'd probably have a different bias.  But, in the IETF/RFC one,
I'm led to believe that "separate matching rule documents" makes
(ii) much cleaner than having it be "...by an appropriate RFC,
or Section XY of [ltru-registry], or text in the RFC ...".  I'm
especially drawn in that direction because I'm not enamored of
the 3066 matching rules and would prefer that it not become a
permanent default.  And RFC numbers are not a scarce resource.

But, all of that said, the ultimate semantics of "extra section
of LTRU-registry that specifies the 3066 matching rules" are the
same as those of "extra RFC that makes the 3066 matching rules
explicit".  The essence of my earlier comment was only that this
was a loose end that needed tying off if one wanted to claim
that 3066 was obsolete.  That requirement would be satisfied by
either of these semantically-equivalent approaches.

Which one to pick is, IMO, a strategic decision to be made in
the context of the broad needs and practices of the IETF, and
hence by the IESG with whatever mechanism it chooses to obtain
community input, rather than by a single WG.  

> 6. The tag forms used in the draft are, in fact, being
> registered and adopted. I note that Google this morning
> returns 41,600 hits for "zh-Hant" as a piece of content. Many
> of these appear to be valid usages as language tags---script
> subtags in the wild--rather than just mentions of the
> registration. Thus the draft merely recognizes the "reality on
> the ground" with regard to language tags. It does so by
> reorganizing how tags are registered to make the scheme more
> manageable.

This is great.  But all it means today is that, were this on the
standards track, it would be pretty easy to get it to Draft
under the general criteria that Sam and I have outlined.

> 7. The choice between STD and BCP tracks is really a toss-up.
> There are very good arguments on both sides. The creation and
> management of a registry does not lend itself to STD, but the
> creation and testing of implementations does not lend itself
> to BCP. My thought here is that one can view the draft
> entirely through the lens of existing RFC 3066 implementers:
> these documents represent a set of BCPs related to various
> aspects of registering, choosing, and implementing language
> tags. New implementations may be different as a result of the
> improvements made (certain kinds of assumptions can be made
> about a 3066bis tag that cannot be made about a 3066 tag). All
> such implementations will be recognizably implementations of
> RFC 3066, though, and to the benefit of all concerned (IMHO)
> they represent the best current thinking on the manner in
> which to identify languages on the Internet (given our legacy
> considerations).

We agree about the "toss-up" part.  It is also true that LTRU
was told to develop a BCP and responded by recommending a BCP.
No one can blame the WG for following those instructions.  I
come down on the standards track side because of two things.
First, as I have noted before, we rarely create registries as an
end in themselves.  The second issue is a more basic property of
the IETF: while we describe our decision process in terms of
"rough consensus and running code", I have come to realize that
there should be a background chant of "interoperability,
interoperability, interoperability" every time it is said.  An
interoperability lens causes me to consider 3066 to have been a
marginal case for BCP and the LTRU documents to be over some "I
know it when I see it" line.

As I indicated in my recent note to Doug, none of my suggestions
or comments require, or even suggest, any fundamental changes to
the recommendations of the WG or, for the most part, even to its
documents.  They mostly have to do with procedures, processing
models, and a bit of loose-end-tying-up, but those are issues at
a rather different level.

   regards,
     john

_______________________________________________
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf