Re: [Ltru] Re: Macrolanguage and extlang

"Doug Ewell" <> Wed, 18 July 2007 05:41 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1IB2Hs-00083M-4n; Wed, 18 Jul 2007 01:41:24 -0400
Received: from ltru by with local (Exim 4.43) id 1IB2Hq-0007vl-78 for; Wed, 18 Jul 2007 01:41:22 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1IB2Hp-0007u6-Bc for; Wed, 18 Jul 2007 01:41:21 -0400
Received: from ([]) by with esmtp (Exim 4.43) id 1IB2Hn-0004Ie-Tk for; Wed, 18 Jul 2007 01:41:21 -0400
Received: from DGBP7M81 ([]) by (InterMail vM. 201-2131-123-102-20050715) with SMTP id <>; Wed, 18 Jul 2007 01:41:19 -0400
Message-ID: <009d01c7c8fe$43f0ee60$6a01a8c0@DGBP7M81>
From: Doug Ewell <>
To: LTRU Working Group <>
References: <> <013b01c7c6a8$55cb4a20$6401a8c0@DGBP7M81> <> <> <00d701c7c738$841e6930$6a01a8c0@DGBP7M81> <>
Subject: Re: [Ltru] Re: Macrolanguage and extlang
Date: Tue, 17 Jul 2007 22:41:18 -0700
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="utf-8"; reply-type="original"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 1a1bf7677bfe77d8af1ebe0e91045c5b
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

Mark Davis wrote:

> zh = Mandarin since that is what everyone means by "zh" currently.

Hold that thought.

> Scenario 1. The user's browser has the proposed "zh-yue-Hant-US". My 
> lookup falls back to zh, so I serve it up to the user. So even if the 
> target of the match (zh) is not Cantonese, you want a fallback to zh. 
> I'm guessing that you see this as better than if we defined the tag as 
> "yue-Hant-US", since it gets to some fallback that the user is likely 
> to understand. But I don't see this as much different than if we had 
> fr-br-BE (meaning Breton, but fall back to French), or ro-mo (meaning 
> Moldavian, but falling back to Romanian).

You're right: providing a fallback from encompassed languages back to 
their macrolanguages (as defined by Ethnologue) doesn't extend to all 
other imaginable fallback scenarios.  I'm pretty sure nobody ever 
claimed that it would.

> And note that in the fallback, the script and region are completely 
> lost.
> Scenario 2. The user's browser has zh-cmn-Hant-US. In matching, we 
> fall back to zh. Note than in the fallback, the script and region are 
> completely lost.

Both of these scenarios assume that script and/or region subtags are 
likely to be present.  You haven't addressed my scenario, where they are 

> We have essentially just introduced a synonym for zh which causes 
> fallback to lose information, for no good reason.

But according to your earlier statement, "cmn" is already effectively a 
synonym for "zh".

> The problem with extlang is that the fallback from encompassed 
> language to macrolanguage is fundamentally different in kind than a 
> fallback from region to script to base language. In the case of 
> script, like uz-Arab and uz-Latn, or en-US vs en-GB, we really have 
> variations on the same language, and fallback makes sense. We ordered 
> the subtags so that it works optimally overall.


> The encompassed languages, on the other hand, are not just dialects, 
> not just variants. They are languages in their own right.

Agreed to a certain point.  The whole idea of encompassed languages is 
that people sometimes consider them to be languages in their own right, 
and sometimes as "dialects" or "variants" of a macrolanguage.  I hate to 
keep harping on Chinese, but try searching on "Chinese dialects" and 
"Chinese languages" and see which search is more likely to tell you more 
about Mandarin vs. Cantonese vs. Wu vs. Hakka.  There are a *lot* of 
people who think of these as variants of a single language.  And the 
same is likely to be true for Standard Arabic vs. Algerian Arabic vs. 
Libyan Arabic vs. Uzbeki Arabic.

> Trying to insert them into the fallback process just screws things up, 
> because they need a "sideways" matching not just simple truncation 
> fallback. If you want to do any fallback with extlang, it would be to 
> fall back from zh-yue-<other stuff> to zh-<other stuff>. That means 
> that in order to do reasonable fallback, you can't just use truncation 
> fallback anyway.

Agreed.  Then again, I'm not the biggest fan of truncation fallback.

> So I see the situation this way:
> * The only reason for adding the complication of the extlang mechanism 
> is to make truncation fallback work better.
> * Truncation fallback with extlang doesn't work better.

I claim it does work better if there are no script or region or variant 
subtags, and is no worse even if there are.

> So there is no need to make encompassed languages be "secondary" 
> languages by making them be "secondary" subtags.

I do agree with the possible public perception that extended languages 
are second-class in some way, although of course that is not the intent.

> So instead of adding the extlang mechanism to RFC 4646, what we really 
> need to do is to point people to how to handle yue and other 
> encompassed languages along with mo/ro, tl/fil, and other edge cases 
> in a reasonable way, by augmenting matching.

Well, you'll never get me to disagree that matching should be more 
sophisticated than RFR truncation.  But I hope you're not suggesting 
that we add an RFC 4647bis to our plate at this late date.  Look how 
much time we burned on RFC 4647, and look at the end result: we're still 
designing everything under the assumption that matching engines will be 
limited to RFR.

Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14

Ltru mailing list