[Ltru] extlang (was: Punjabi)
"Mark Davis" <mark.davis@icu-project.org> Sat, 17 March 2007 18:49 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HSdxs-0002Ge-7P; Sat, 17 Mar 2007 14:49:16 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HSdxr-0002GZ-Pg for ltru@ietf.org; Sat, 17 Mar 2007 14:49:15 -0400
Received: from wr-out-0506.google.com ([64.233.184.231]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HSdxq-0005tS-Ck for ltru@ietf.org; Sat, 17 Mar 2007 14:49:15 -0400
Received: by wr-out-0506.google.com with SMTP id 37so928181wra for <ltru@ietf.org>; Sat, 17 Mar 2007 11:49:14 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:mime-version:content-type:x-google-sender-auth; b=iiUpfk0DDov1P+t+wdhMwm5LFPfiuyNtLE26zxrug6YA3s76mnHSYwWcFExGa2Uq1efROPZCQN5juXgXFW8wdfq/v6mcicimnAuUhL+xfvbzCNoLmF5N1oCH0R59uTxHe2oBdlGNn7NdCWWmYwzUcwwAqD949OqKgfW18yp8b1M=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:mime-version:content-type:x-google-sender-auth; b=SQ4ejUcjykn4MnE5rhJmrrRLWDE2/RtzZtexMCh6n/mvvMt/jFpOASGf1GvZAlZG1jclrHPH9uObUpsVCX6xZUabRWkoIEr/xx2PQfT7EpxMeC0a6CLqUP2wi17mNSjrI//as0DrP0wyclK+CBMHjNOk8CC1jphE7S56poEistg=
Received: by 10.90.101.19 with SMTP id y19mr2886700agb.1174157353935; Sat, 17 Mar 2007 11:49:13 -0700 (PDT)
Received: by 10.114.196.2 with HTTP; Sat, 17 Mar 2007 11:49:13 -0700 (PDT)
Message-ID: <30b660a20703171149i47d09580w126aeb3f9feb8fdf@mail.gmail.com>
Date: Sat, 17 Mar 2007 11:49:13 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Doug Ewell <dewell@adelphia.net>
Subject: [Ltru] extlang (was: Punjabi)
MIME-Version: 1.0
X-Google-Sender-Auth: b2bcacef71925858
X-Spam-Score: 0.5 (/)
X-Scan-Signature: 36c793b20164cfe75332aa66ddb21196
Cc: LTRU Working Group <ltru@ietf.org>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1757708163=="
Errors-To: ltru-bounces@ietf.org
(I agree with Frank, we should change the subject line.) Let me try another way to state it. We've been looking into how to implement extlang at Google, and it has a number of complications when you start to look at actual implementations. We want the default fallback to work well, but using the extlang model we get the equivalent of the "suppress script" problem. Right now, 99% of the usage of zh means Mandarin, 99% of the usage of ar means standard Arabic (fuṣḥā) If we introduce zh-cmn and ar-arb it complicates lookup considerably. Compatibility forces us to treat "ar" and "ar-arb" as equivalents, which means we already have non-trivial lookup, since we have to have all child languages match, eg ar-arb-Arab-EG and ar-Arab-EG, etc. Encoding arq as ar-arq to do a bit of magic in fallback isn't worth the trouble introduced by that. It may well be that arb is the best fallback for arq, but it may also be that nb is the best fallback for da, which we don't try to incorporate in the language tags themselves. That is, trying to guess what the best fallback is AND bake it into the encoding of the tags itself for the case of macro languages is simply a complication for little benefit. At this point, we'd be better off to add information to the registry about related languages than we would to use the extlang structure. That is, add "cmn-CN", because we need to have it, and have an extra field to say that this is related in a certain way to "zh", information that can be used in matching. Mark On 3/17/07, Doug Ewell <dewell@adelphia.net> wrote: > > Mark Davis wrote: > > > Let's suppose that I have content tagged with the following: > > > > #1 zh-Hant-HK > > #2 zh-Hans-HK > > #3 yue-SG > > > > If I basic filter for zh, I'll get #1 and #2. With just a bit smarter > > filter, using information from ISO 639 (maybe put into our registry), > > I'll also match #3. If I search with zh-Hant, I'll get #1 in either > > case. If I search for yue, I'll get just #3. > > Wait a minute. You've been arguing that we need to put writing-system > differences in the script subtag and not in the variant, because filters > will be too stupid to do anything more than prefix matching. Then why > would we design for the possibility of a "smarter filter" than could > associate "yue" with "zh"? > > -- > Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 > http://users.adelphia.net/~dewell/ > http://www1.ietf.org/html.charters/ltru-charter.html > http://www.alvestrand.no/mailman/listinfo/ietf-languages > > -- Mark
_______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] extlang (was: Punjabi) Mark Davis
- Re: [Ltru] extlang (was: Punjabi) Doug Ewell
- Re: [Ltru] extlang (was: Punjabi) Gerard Meijssen
- Re: [Ltru] extlang (was: Punjabi) Doug Ewell
- Re: [Ltru] extlang Gerard Meijssen
- Re: [Ltru] extlang (was: Punjabi) Martin Duerst