Re: [Ltru] extlang (was: Punjabi)

Gerard Meijssen <gerard.meijssen@gmail.com> Sun, 18 March 2007 12:16 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HSuJf-0000CT-UG; Sun, 18 Mar 2007 08:16:51 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HSuJf-0000CO-DM for ltru@ietf.org; Sun, 18 Mar 2007 08:16:51 -0400
Received: from ug-out-1314.google.com ([66.249.92.173]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HSuJb-0001Ni-QQ for ltru@ietf.org; Sun, 18 Mar 2007 08:16:51 -0400
Received: by ug-out-1314.google.com with SMTP id 72so983496ugd for <ltru@ietf.org>; Sun, 18 Mar 2007 05:16:47 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; b=GAM5C41AQu2AB/sgqdo01LoU3x3UMuJray7wLolMjW4IrlK67s6mOBuyXclwTCs/yNiKKSldA2cyp3rK0CSzkJPjQOJEJuTjMNavjVPd8Tqa1Ad5IRbjpb0j4zCXZggrCi8BJGeUsCEwW4oAOn3YIbnxo229yB947AxDCrHF+eQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:user-agent:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; b=FeqpgmxeoD0kxAUQcg9kecRm1F6HK6UpaxaVKyeQybYHV12i0OzurJlo+GlN7l6pIzeVM3a3Axl8AKC4ayZ14a71bLN8h3WJqzKNUoUyB08XM51SgFgO7fFebB7bi1yJ0tZiILMm6z0NSP0qNmxCuiHRKqKYWMmAc8TnnoRBhpY=
Received: by 10.66.232.9 with SMTP id e9mr8011139ugh.1174220206975; Sun, 18 Mar 2007 05:16:46 -0700 (PDT)
Received: from ?192.168.0.2? ( [62.195.155.247]) by mx.google.com with ESMTP id y7sm5990349ugc.2007.03.18.05.16.45; Sun, 18 Mar 2007 05:16:45 -0700 (PDT)
Message-ID: <45FD2D84.5050007@gmail.com>
Date: Sun, 18 Mar 2007 13:16:04 +0100
From: Gerard Meijssen <gerard.meijssen@gmail.com>
User-Agent: Thunderbird 1.5.0.10 (Windows/20070221)
MIME-Version: 1.0
To: Mark Davis <mark.davis@icu-project.org>
Subject: Re: [Ltru] extlang (was: Punjabi)
References: <30b660a20703171149i47d09580w126aeb3f9feb8fdf@mail.gmail.com>
In-Reply-To: <30b660a20703171149i47d09580w126aeb3f9feb8fdf@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 10ba05e7e8a9aa6adb025f426bef3a30
Cc: Doug Ewell <dewell@adelphia.net>, LTRU Working Group <ltru@ietf.org>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

Hoi,
Modern computers are quite capable of this "non-trivial" lookup that you 
refer to. The reason why we should allow for both zh-cmn and cmn is 
because it is much easier to explain to *people*. I utterly disagree 
that having both options is of little benefit. It helps people; it helps 
people understand the code. We give meaning to these codes and it is 
extremely hard to explain how these codes work. For people it makes a 
hell of a difference. You can explain that zh is old and Mandarin can be 
used as zh-cmn and cmn as a consequence. People will opt for cmn if they 
have the option and to it right, they will still opt for cmn and you 
would call them wrong.

We get a much better adoption by being understood. It is more important 
than having computers to know that two codes are equivalent. Computers 
are good at that !!

Thanks,
    Gerard


Mark Davis schreef:
> (I agree with Frank, we should change the subject line.)
>
> Let me try another way to state it. We've been looking into how to 
> implement extlang at Google, and it has a number of complications when 
> you start to look at actual implementations. We want the default 
> fallback to work well, but using the extlang model we get the 
> equivalent of the "suppress script" problem.
>
> Right now, 99% of the usage of zh means Mandarin, 99% of the usage of 
> ar means standard Arabic (fuṣḥā) If we introduce zh-cmn and ar-arb it 
> complicates lookup considerably. Compatibility forces us to treat "ar" 
> and "ar-arb" as equivalents, which means we already have non-trivial 
> lookup, since we have to have all child languages match, eg 
> ar-arb-Arab-EG and ar-Arab-EG, etc. Encoding arq as ar-arq to do a bit 
> of magic in fallback isn't worth the trouble introduced by that. It 
> may well be that arb is the best fallback for arq, but it may also be 
> that nb is the best fallback for da, which we don't try to incorporate 
> in the language tags themselves. That is, trying to guess what the 
> best fallback is AND bake it into the encoding of the tags itself for 
> the case of macro languages is simply a complication for little benefit.
>
> At this point, we'd be better off to add information to the registry 
> about related languages than we would to use the extlang structure. 
> That is, add "cmn-CN", because we need to have it, and have an extra 
> field to say that this is related in a certain way to "zh", 
> information that can be used in matching.
>
> Mark
>
> On 3/17/07, *Doug Ewell* <dewell@adelphia.net 
> <mailto:dewell@adelphia.net>> wrote:
>
>     Mark Davis wrote:
>
>     > Let's suppose that I have content tagged with the following:
>     >
>     > #1 zh-Hant-HK
>     > #2 zh-Hans-HK
>     > #3 yue-SG
>     >
>     > If I basic filter for zh, I'll get #1 and #2. With just a bit
>     smarter
>     > filter, using information from ISO 639 (maybe put into our
>     registry),
>     > I'll also match #3. If I search with zh-Hant, I'll get #1 in either
>     > case. If I search for yue, I'll get just #3.
>
>     Wait a minute.  You've been arguing that we need to put writing-system
>     differences in the script subtag and not in the variant, because
>     filters
>     will be too stupid to do anything more than prefix matching.  Then
>     why
>     would we design for the possibility of a "smarter filter" than could
>     associate "yue" with "zh"?
>
>     --
>     Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
>     http://users.adelphia.net/~dewell/
>     <http://users.adelphia.net/%7Edewell/>
>     http://www1.ietf.org/html.charters/ltru-charter.html
>     http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
>
>
> -- 
> Mark


_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru