[Ltru] extlang (was: Punjabi)

(I agree with Frank, we should change the subject line.)

Let me try another way to state it. We've been looking into how to implement
extlang at Google, and it has a number of complications when you start to
look at actual implementations. We want the default fallback to work well,
but using the extlang model we get the equivalent of the "suppress script"
problem.

Right now, 99% of the usage of zh means Mandarin, 99% of the usage of ar
means standard Arabic (fuṣḥā) If we introduce zh-cmn and ar-arb it
complicates lookup considerably. Compatibility forces us to treat "ar" and
"ar-arb" as equivalents, which means we already have non-trivial lookup,
since we have to have all child languages match, eg ar-arb-Arab-EG and
ar-Arab-EG, etc. Encoding arq as ar-arq to do a bit of magic in fallback
isn't worth the trouble introduced by that. It may well be that arb is the
best fallback for arq, but it may also be that nb is the best fallback for
da, which we don't try to incorporate in the language tags themselves. That
is, trying to guess what the best fallback is AND bake it into the encoding
of the tags itself for the case of macro languages is simply a complication
for little benefit.

At this point, we'd be better off to add information to the registry about
related languages than we would to use the extlang structure. That is, add
"cmn-CN", because we need to have it, and have an extra field to say that
this is related in a certain way to "zh", information that can be used in
matching.

Mark

On 3/17/07, Doug Ewell <dewell@adelphia.net> wrote:
>
> Mark Davis wrote:
>
> > Let's suppose that I have content tagged with the following:
> >
> > #1 zh-Hant-HK
> > #2 zh-Hans-HK
> > #3 yue-SG
> >
> > If I basic filter for zh, I'll get #1 and #2. With just a bit smarter
> > filter, using information from ISO 639 (maybe put into our registry),
> > I'll also match #3. If I search with zh-Hant, I'll get #1 in either
> > case. If I search for yue, I'll get just #3.
>
> Wait a minute.  You've been arguing that we need to put writing-system
> differences in the script subtag and not in the variant, because filters
> will be too stupid to do anything more than prefix matching.  Then why
> would we design for the possibility of a "smarter filter" than could
> associate "yue" with "zh"?
>
> --
> Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
> http://users.adelphia.net/~dewell/
> http://www1.ietf.org/html.charters/ltru-charter.html
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>

-- 
Mark