[Ltru] Re: extlang

"Doug Ewell" <dewell@roadrunner.com> Sun, 24 June 2007 23:35 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I2bcX-0004YK-6m; Sun, 24 Jun 2007 19:35:53 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1I2bcV-0004YF-K0 for ltru-confirm+ok@megatron.ietf.org; Sun, 24 Jun 2007 19:35:51 -0400
Received: from [] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I2bcV-0004Y7-9x for ltru@ietf.org; Sun, 24 Jun 2007 19:35:51 -0400
Received: from mta3.adelphia.net ([]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I2bcU-0006RC-V2 for ltru@ietf.org; Sun, 24 Jun 2007 19:35:51 -0400
Received: from DGBP7M81 ([]) by mta9.adelphia.net (InterMail vM. 201-2131-123-102-20050715) with SMTP id <20070624232847.PWZD6326.mta9.adelphia.net@DGBP7M81> for <ltru@ietf.org>; Sun, 24 Jun 2007 19:28:47 -0400
Message-ID: <006d01c7b6b7$6a752140$6401a8c0@DGBP7M81>
From: Doug Ewell <dewell@roadrunner.com>
To: LTRU Working Group <ltru@ietf.org>
References: <E1I0iTd-0003he-SD@megatron.ietf.org>
Date: Sun, 24 Jun 2007 16:28:47 -0700
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
X-Spam-Score: 0.0 (/)
X-Scan-Signature: b5d20af10c334b36874c0264b10f59f1
Subject: [Ltru] Re: extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

I'm doing a lot of catching up, which means reading over a lot of material 
that others have already replied to.  So I apologize in advance if I rehash 
anything that's been decided.

Mark Davis <mark dot davis at icu dash project dot org> quoted John Cowan:

>> [...] We had an excellent idea of what 639-3 would both look and actually 
>> be like when 4646 was finalized.  We couldn't include 639-3 or extlangs 
>> because 639-3 itself was not yet final.
> ...
> For example, you can't mean that every member of the working group had 
> looked it over thoroughly, and explored all the implementation 
> ramifications. I'd like to see a show of hands for those who did -- maybe 
> everyone except for me had, but I'd be rather surprised at that.

That isn't fair.  I doubt it could be said of any standard, or 
specification, or protocol, that every conceivable implementation 
ramification has been fully explored.

I agree with John on this point.  We did have an excellent idea of what ISO 
639-3 would be like by this time.  There have been virtually no structural 
changes in 639-3 since Peter originally formulated the extlang concept and 
we discussed it.  Of course somebody will have missed a detail somewhere, 
but we had the general idea.

> After all, it is trivial to make a 4647bis that adds an optional step for 
> microlanguages, which is that when you get to a microlanguage, the next 
> step is to look at its macrolanguage before falling back to the default. 
> That has the same result (and same problems) as extlang, but is something 
> that is not baked into the standard -- is something that people can 
> implement if they want without impacting matching for everyone else.

I can see value in this approach, because I can see value in updating my 
implementation.  But we have shed so much digital blood over Suppress-Script 
on the basis that matching implementations won't be smart enough to match 
"nl-Latn-NL" with "nl-NL", and won't be updated to do so, that I continue 
not to understand why we should assume any will be updated to fall back to 

I have a real problem with the idea that tag producers should be encouraged 
to tag Cantonese as "yue", thereby drastically reducing the likelihood that 
tag consumers searching for "zh" or "zh-yue" will find it.  The outcome of 
such an scenario will be that tag producers will be reluctant to use the new 
4646bis subtags in general, and may ignore other changes from 4646 to 

>> I realize that this cause is probably not important to you, because you 
>> can (comparatively) easily change all "zh-yue" tags to "yue", but this is 
>> not the case for other users of BCP 47 on and off the Internet, who will 
>> never even hear about the change.
> These are irregular tags anyway, and can stay irregular tags afterwards. 
> We and everyone else already have to deal with equivalences with 
> grandfathered and irregular tags anyway; these are not a real problem.

I feel as though we are recanting what was said on ietf-languages at the 
time "zh-cmn" and friends were registered, that they would eventually become 
generative tags and wouldn't add to the grandfathered-redundant slag heap.

> If you can't make a compelling case that extlang will make BCP 47 better 
> instead of worse, and won't even look at the reasons not to do it, nor 
> even bother to set out a case for it, then why should we add it?

My case for extlang is that it allows simplistic RFR matching engines to 
retrieve "zh-yue" content when handed "zh", but still identifies the content 
as Cantonese for more sophisticated matching engines that can take advantage 
of extlang.  It puts the additional burden of identifying the content on the 
tag producer, where it belongs, not on the tag consumer.  The 
Suppress-Script argument shows that we don't trust tag consumers to be very 

>>> B. (optional) Add a field Macrolanguage: to the language subtag 
>>> registry.
>> I am not opposed to this, precisely because encompassed languages and the 
>> corresponding macrolanguage cannot be identified syntactically.
> Good.

But there is no sense in having a Macrolanguage field unless we define a 
specific use for this information in 4646bis.  Just saying "this language 
has a macrolanguage" (or alternatively, "this is a macrolanguage that has 
this list of encompassed languages") doesn't add anything to tagging.  Some 
months ago I suggested a Language-Type field that would reflect the ISO 
639-3 classifications, and several people pointed out that it didn't add 
anything to tagging.

Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14

Ltru mailing list