Re: [Ltru] Re: Macrolanguage and extlang
"Mark Davis" <mark.davis@icu-project.org> Tue, 17 July 2007 19:19 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IAsaM-00026J-I2; Tue, 17 Jul 2007 15:19:50 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IAsaL-000268-4B for ltru-confirm+ok@megatron.ietf.org; Tue, 17 Jul 2007 15:19:49 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IAsaK-00025y-Qh for ltru@ietf.org; Tue, 17 Jul 2007 15:19:48 -0400
Received: from nz-out-0506.google.com ([64.233.162.238]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IAsaJ-0004mW-E9 for ltru@ietf.org; Tue, 17 Jul 2007 15:19:48 -0400
Received: by nz-out-0506.google.com with SMTP id n1so1213869nzf for <ltru@ietf.org>; Tue, 17 Jul 2007 12:19:47 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=Fk+oV7Jyn7sXwbXhVWplUWn9MqEOJznWHZXqeLEQyc8CS+BMRdc0emlATSjQ2Fet6svvnJFuwXgCDSMeV46Jc016dqEi2VX4Elgt27gEXoDDUYOgGye53TVK6b84nkOEmfP5c6KVPiNsLqcqPH/U3DWnkOuIUsN8vEgtKhp6yUM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=PqSm7kKCfy7A50ScVxI1uchN9JuLB2AmyL+w7S55TUZrOCQ7/gAT+nNK6pGQfYap8aXkQb2JHEBxItjSo19N9blWFQ+OlQ7x5pcPrTVocntRJz4g4TbG76TOOrx/4DyZ8r/iracSgbAL5IVymkGF8HUQBSTjiHmDiHNY4lDQFgw=
Received: by 10.114.88.1 with SMTP id l1mr714415wab.1184699986526; Tue, 17 Jul 2007 12:19:46 -0700 (PDT)
Received: by 10.114.196.12 with HTTP; Tue, 17 Jul 2007 12:19:46 -0700 (PDT)
Message-ID: <30b660a20707171219q4c824654h7ad9063f23ba26ad@mail.gmail.com>
Date: Tue, 17 Jul 2007 12:19:46 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Doug Ewell <dewell@roadrunner.com>
Subject: Re: [Ltru] Re: Macrolanguage and extlang
In-Reply-To: <00d701c7c738$841e6930$6a01a8c0@DGBP7M81>
MIME-Version: 1.0
References: <E1I9ghp-0006L9-0h@megatron.ietf.org> <013b01c7c6a8$55cb4a20$6401a8c0@DGBP7M81> <20070715152301.GY9402@mercury.ccil.org> <30b660a20707151612k14b1e578q7cc7887c68ccc785@mail.gmail.com> <00d701c7c738$841e6930$6a01a8c0@DGBP7M81>
X-Google-Sender-Auth: e98a2f4e3601165b
X-Spam-Score: 0.0 (/)
X-Scan-Signature: d2b46e3b2dfbff2088e0b72a54104985
Cc: LTRU Working Group <ltru@ietf.org>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1771091650=="
Errors-To: ltru-bounces@ietf.org
I just find your position hard to understand, perhaps you can help me. Maybe some scenarios would help. Suppose that: - My site has support for zh, zh-Hans, and zh-Hant. zh = Mandarin since that is what everyone means by "zh" currently. As is customary in fallback, the content of zh is the predominent form (zh-Hans) [this is just for example; if a TW site has a different convention alternate examples can be given. - A user comes in with different requests, listed below. Scenario 1. The user's browser has the proposed "zh-yue-Hant-US". My lookup falls back to zh, so I serve it up to the user. So even if the target of the match (zh) is not Cantonese, you want a fallback to zh. I'm guessing that you see this as better than if we defined the tag as "yue-Hant-US", since it gets to some fallback that the user is likely to understand. But I don't see this as much different than if we had fr-br-BE (meaning Breton, but fall back to French), or ro-mo (meaning Moldavian, but falling back to Romanian). And note that in the fallback, the script and region are completely lost. Scenario 2. The user's browser has zh-cmn-Hant-US. In matching, we fall back to zh. Note than in the fallback, the script and region are completely lost. We have essentially just introduced a synonym for zh which causes fallback to lose information, for no good reason. The problem with extlang is that the fallback from encompassed language to macrolanguage is fundamentally different in kind than a fallback from region to script to base language. In the case of script, like uz-Arab and uz-Latn, or en-US vs en-GB, we really have variations on the same language, and fallback makes sense. We ordered the subtags so that it works optimally overall. The encompassed languages, on the other hand, are not just dialects, not just variants. They are languages in their own right. Trying to insert them into the fallback process just screws things up, because they need a "sideways" matching not just simple truncation fallback. If you want to do any fallback with extlang, it would be to fall back from zh-yue-<other stuff> to zh-<other stuff>. That means that in order to do reasonable fallback, you can't just use truncation fallback anyway. So I see the situation this way: 1. The only reason for adding the complication of the extlang mechanism is to make truncation fallback work better. 2. Truncation fallback with extlang doesn't work better. 3. So there is no need to make encompassed languages be "secondary" languages by making them be "secondary" subtags. The goals of extlang are good, to make matching work better, but in practice it just makes things worse. [Speaking to those familiar with C++, it feels a bit like the default assignment operator in C++. Nice in theory, but in practice it gums things up more than it fixes, since once you are beyond very simple (toy) classes, the default is almost always wrong -- but because it is supplied behind your back you don't realize it.] So instead of adding the extlang mechanism to RFC 4646, what we really need to do is to point people to how to handle yue and other encompassed languages along with mo/ro, tl/fil, and other edge cases in a reasonable way, by augmenting matching. Mark On 7/15/07, Doug Ewell <dewell@roadrunner.com> wrote: > > Mark Davis wrote: > > > The main argument I've hear for extlang is behind-the-scenes-inertia. > > While we made provision in 4646 for possibly accepting them in the > > future, it was by no means a done-deal. > > The main argument that has been offered is that it makes matching > easier. Whether that argument was heard is a different question. > > > The only reason I've heard advocated for them is that it makes > > matching easier. But in practice, we have simply not found that to be > > true. If it is indeed worthwhile to add this mechanism to 4646, a > > good case needs to be made for it; and inertia isn't a good case. > > Here is the argument restated. It is based not on behind-the-scenes > inertia, but on backward compatibility, the exact same issue that caused > us to adopt Suppress-Script. > > Existing Cantonese text has been tagged as "zh", the basic ISO > 639-1-based tag, or as "zh-yue", the tag that was registered for this > purpose back in 1999. > > The extlang mechanism would have established "yue" as an extlang under > "zh", so the proper tagging of Cantonese would continue to be "zh" (more > general) or "zh-yue" (more specific). Matching engines would continue > to operate as they do now. > > The proposed mechanism establishes "yue" as a primary language subtag, > so the proper tagging of Cantonese becomes "yue". Matching engines must > be upgraded to RFC 4646bis in order to have any chance at finding a > match. The much-beloved and much-catered-to RFC 3066 remove-from-right > "fallback" algorithm will NEVER find a match between "yue" and "zh", > regardless of whether script and/or region subtags are involved. > > Put another way: Extlangs may not make matching easier, but *not* having > extlangs will make matching harder. > > -- > Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 > http://users.adelphia.net/~dewell/ <http://users.adelphia.net/%7Edewell/> > http://www1.ietf.org/html.charters/ltru-charter.html > http://www.alvestrand.no/mailman/listinfo/ietf-languages > > -- Mark
_______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] Suggested text for future compatibility of… Mark Davis
- Re: [Ltru] Suggested text for future compatibilit… Randy Presuhn
- Re: [Ltru] Suggested text for future compatibilit… Mark Davis
- Re: [Ltru] Suggested text for future compatibilit… Randy Presuhn
- [Ltru] Re: Macrolanguage and extlang Doug Ewell
- Re: [Ltru] Suggested text for future compatibilit… Addison Phillips
- Re: [Ltru] Suggested text for future compatibilit… Mark Davis
- [Ltru] Re: Suggested text for future compatibilit… Doug Ewell
- Re: [Ltru] Re: Suggested text for future compatib… John Cowan
- Re: [Ltru] Re: Suggested text for future compatib… Doug Ewell
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan
- [Ltru] Re: Macrolanguage and extlang Doug Ewell
- Re: [Ltru] Re: Macrolanguage and extlang Mark Davis
- Re: [Ltru] Re: Macrolanguage and extlang Doug Ewell
- [Ltru] Re: Suggested text for future compatibilit… Stephane Bortzmeyer
- [Ltru] Re: Macrolanguage and extlang Stephane Bortzmeyer
- RE: [Ltru] Re: Macrolanguage and extlang Peter Constable
- Re: [Ltru] Re: Macrolanguage and extlang Mark Davis
- Re: [Ltru] Re: Macrolanguage and extlang Addison Phillips
- Re: [Ltru] Re: Macrolanguage and extlang Randy Presuhn
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan
- Re: [Ltru] Re: Macrolanguage and extlang Doug Ewell
- [Ltru] Re: Macrolanguage and extlang Doug Ewell
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan
- RE: [Ltru] Re: Macrolanguage and extlang Kent Karlsson
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan