[Ltru] extlang (was: Punjabi)

"Mark Davis" <mark.davis@icu-project.org> Sat, 17 March 2007 18:49 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HSdxs-0002Ge-7P; Sat, 17 Mar 2007 14:49:16 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HSdxr-0002GZ-Pg for ltru@ietf.org; Sat, 17 Mar 2007 14:49:15 -0400
Received: from wr-out-0506.google.com ([64.233.184.231]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HSdxq-0005tS-Ck for ltru@ietf.org; Sat, 17 Mar 2007 14:49:15 -0400
Received: by wr-out-0506.google.com with SMTP id 37so928181wra for <ltru@ietf.org>; Sat, 17 Mar 2007 11:49:14 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:mime-version:content-type:x-google-sender-auth; b=iiUpfk0DDov1P+t+wdhMwm5LFPfiuyNtLE26zxrug6YA3s76mnHSYwWcFExGa2Uq1efROPZCQN5juXgXFW8wdfq/v6mcicimnAuUhL+xfvbzCNoLmF5N1oCH0R59uTxHe2oBdlGNn7NdCWWmYwzUcwwAqD949OqKgfW18yp8b1M=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:mime-version:content-type:x-google-sender-auth; b=SQ4ejUcjykn4MnE5rhJmrrRLWDE2/RtzZtexMCh6n/mvvMt/jFpOASGf1GvZAlZG1jclrHPH9uObUpsVCX6xZUabRWkoIEr/xx2PQfT7EpxMeC0a6CLqUP2wi17mNSjrI//as0DrP0wyclK+CBMHjNOk8CC1jphE7S56poEistg=
Received: by 10.90.101.19 with SMTP id y19mr2886700agb.1174157353935; Sat, 17 Mar 2007 11:49:13 -0700 (PDT)
Received: by 10.114.196.2 with HTTP; Sat, 17 Mar 2007 11:49:13 -0700 (PDT)
Message-ID: <30b660a20703171149i47d09580w126aeb3f9feb8fdf@mail.gmail.com>
Date: Sat, 17 Mar 2007 11:49:13 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Doug Ewell <dewell@adelphia.net>
Subject: [Ltru] extlang (was: Punjabi)
MIME-Version: 1.0
X-Google-Sender-Auth: b2bcacef71925858
X-Spam-Score: 0.5 (/)
X-Scan-Signature: 36c793b20164cfe75332aa66ddb21196
Cc: LTRU Working Group <ltru@ietf.org>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1757708163=="
Errors-To: ltru-bounces@ietf.org

(I agree with Frank, we should change the subject line.)

Let me try another way to state it. We've been looking into how to implement
extlang at Google, and it has a number of complications when you start to
look at actual implementations. We want the default fallback to work well,
but using the extlang model we get the equivalent of the "suppress script"
problem.

Right now, 99% of the usage of zh means Mandarin, 99% of the usage of ar
means standard Arabic (fuṣḥā) If we introduce zh-cmn and ar-arb it
complicates lookup considerably. Compatibility forces us to treat "ar" and
"ar-arb" as equivalents, which means we already have non-trivial lookup,
since we have to have all child languages match, eg ar-arb-Arab-EG and
ar-Arab-EG, etc. Encoding arq as ar-arq to do a bit of magic in fallback
isn't worth the trouble introduced by that. It may well be that arb is the
best fallback for arq, but it may also be that nb is the best fallback for
da, which we don't try to incorporate in the language tags themselves. That
is, trying to guess what the best fallback is AND bake it into the encoding
of the tags itself for the case of macro languages is simply a complication
for little benefit.

At this point, we'd be better off to add information to the registry about
related languages than we would to use the extlang structure. That is, add
"cmn-CN", because we need to have it, and have an extra field to say that
this is related in a certain way to "zh", information that can be used in
matching.

Mark

On 3/17/07, Doug Ewell <dewell@adelphia.net> wrote:
>
> Mark Davis wrote:
>
> > Let's suppose that I have content tagged with the following:
> >
> > #1 zh-Hant-HK
> > #2 zh-Hans-HK
> > #3 yue-SG
> >
> > If I basic filter for zh, I'll get #1 and #2. With just a bit smarter
> > filter, using information from ISO 639 (maybe put into our registry),
> > I'll also match #3. If I search with zh-Hant, I'll get #1 in either
> > case. If I search for yue, I'll get just #3.
>
> Wait a minute.  You've been arguing that we need to put writing-system
> differences in the script subtag and not in the variant, because filters
> will be too stupid to do anything more than prefix matching.  Then why
> would we design for the possibility of a "smarter filter" than could
> associate "yue" with "zh"?
>
> --
> Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
> http://users.adelphia.net/~dewell/
> http://www1.ietf.org/html.charters/ltru-charter.html
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>


-- 
Mark
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru