Re: [Ltru] Re: extlang

Addison Phillips <addison@yahoo-inc.com> Thu, 30 August 2007 17:13 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IQnZx-0001Q6-IS; Thu, 30 Aug 2007 13:13:13 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IQnZx-0001Q1-1x for ltru-confirm+ok@megatron.ietf.org; Thu, 30 Aug 2007 13:13:13 -0400
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IQnZw-0001Pt-NH for ltru@ietf.org; Thu, 30 Aug 2007 13:13:12 -0400
Received: from rsmtp1.corp.yahoo.com ([207.126.228.149]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1IQnZw-0007hN-1E for ltru@ietf.org; Thu, 30 Aug 2007 13:13:12 -0400
Received: from [172.21.37.80] (duringperson-lx.corp.yahoo.com [172.21.37.80]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l7UHD7vV079751 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 30 Aug 2007 10:13:07 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=nTYQVdFl+OIp+ZCQxHEQGIb/0VhfzKEV5IUC7uUGFEIOALvSPhlK32bTz05EAp7/
Message-ID: <46D6FAA3.2030805@yahoo-inc.com>
Date: Thu, 30 Aug 2007 10:13:07 -0700
From: Addison Phillips <addison@yahoo-inc.com>
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: Peter Constable <petercon@microsoft.com>
Subject: Re: [Ltru] Re: extlang
References: <30b660a20708281459r6000d746qe007f2882fae6d73@mail.gmail.com> <20070828223536.GB31670@mercury.ccil.org> <30b660a20708281812s3401e193u7c90d3ab22ac3eda@mail.gmail.com> <DDB6DE6E9D27DD478AE6D1BBBB83579561ABDC7644@NA-EXMSG-C117.redmond.corp.microsoft.com>
In-Reply-To: <DDB6DE6E9D27DD478AE6D1BBBB83579561ABDC7644@NA-EXMSG-C117.redmond.corp.microsoft.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Score: -13.8 (-------------)
X-Scan-Signature: c3a18ef96977fc9bcc21a621cbf1174b
Cc: LTRU Working Group <ltru@ietf.org>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

Peter Constable wrote:
>  These are the cases that relate to the
> way language-range works.
> 

Not exactly. Language-ranges are the expression of the preference. While 
HTTP 1.1 and others didn't put a lot of structure around language 
negotiation, we spent some time doing so in RFC 4647.

HTTP 1.1 and other, similar, items defined what we now call "basic 
filtering". That's what Peter means by the above. There is also 
"lookup", which is widely implemented (not well that I do not say "more 
widely").

In fact, we have different mechanisms for working with language tagged 
materials and these mechanisms each have their own uses and practical 
usage scenarios.

As for extlangs... I'm not sure we can solve this by arguing either 
case. Both the pro- and anti-extlang parties have valid points and in 
some cases the extlang tags *are* more useful and in some cases they 
*are* more harmful.

For a long time I supported extlangs because they were the direction we 
had laid down, in particular for Chinese languages. However, some things 
bother me about this approach:

1. If the languages in question really are distinct languages, why does 
subordinating the language make sense? Yes, you couldn't tag this or 
that language yesterday via the less distinct macrolanguage tag. Is that 
appropriate?

Thus, if I were to support extlang, it would be based solely on John 
Cowan's argument that we need extlang to prevent a "retagging crisis" 
for languages formerly enclosed by a macrolanguage.

2. Randy suggested (a long while ago now) that cherry-picking from the 
macrolanguage list would be a bad idea. Yet tag stability provisions 
prevent us from taking the list wholesale. And I have some concern that 
the macrolanguage "collections" (if you'll pardon this inaccurate term) 
have yet to be thoroughly tested. They may not be stable or suitable in 
the short-to-medium term. This is not a critique of ISO 639-3's work 
here. It is merely a note of caution.

If I were to support doing extlangs, it would consider each 
macrolanguage separately, as a one-time-event, and, again, solely as a 
compatibility item.

Note that we also face our own exception--I assume future sign languages 
would be treated as extlangs for compatibility reasons. Note that sign 
languages are NOT macrolanguages in the first place--they are already 
exceptional. I would favor eliminating this use of extlang too.

3. Mark's arguments about "better matching choices" really don't fit 
with matching as described in RFC 4647. Applications really do need to 
do more than the trivial matching in BCP 47 in many cases, but these 
depend on application specific needs. "Pure" BCP 47 matching has its 
place and cannot do many of the things that Mark suggests (as with 
Breton -> French fallback).

Mark's arguments about losing subsidiary subtags when matching extlangs 
*do* concern me. I had to modify my implementation of filtering to make 
it work in an extlang world. The modifications were not difficult and 
were reasonably successful. I *could* support a very limited application 
of extlang...

... but I feel that we're doing a disservice to the various languages 
involved by making them extlangs. Shouldn't Cantonese, Wu, or Hakka be 
treated fully as languages? Yes, it saves some retagging in the short 
term, but it is cleaner and clearer to tag languages directly. Users can 
still use the macrolanguage tag instead (I doubt we'll see much 'cmn-*' 
when 'zh-*' is available). It eliminates a complexity in tagging and 
language negotiation/selection who's purpose is served by compatibility 
alone. And content will need to be retagged in either case to take 
advantage of the distinction. Content that is not retagged won't be 
distinct.

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG

Internationalization is an architecture.
It is not a feature.


_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru