Re: [Ltru] Consensus call: extlang

Peter Constable <petercon@microsoft.com> Thu, 29 May 2008 16:53 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9B4E13A6BAE; Thu, 29 May 2008 09:53:32 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6BDDB3A6BAE for <ltru@core3.amsl.com>; Thu, 29 May 2008 09:53:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id acf2o1by7DBM for <ltru@core3.amsl.com>; Thu, 29 May 2008 09:53:30 -0700 (PDT)
Received: from smtp.microsoft.com (maila.microsoft.com [131.107.115.212]) by core3.amsl.com (Postfix) with ESMTP id 8E8AA3A6B23 for <ltru@ietf.org>; Thu, 29 May 2008 09:53:30 -0700 (PDT)
Received: from tk1-exhub-c101.redmond.corp.microsoft.com (157.54.46.185) by TK5-EXGWY-E801.partners.extranet.microsoft.com (10.251.56.50) with Microsoft SMTP Server (TLS) id 8.1.240.5; Thu, 29 May 2008 09:53:30 -0700
Received: from NA-EXMSG-C117.redmond.corp.microsoft.com ([157.54.62.46]) by tk1-exhub-c101.redmond.corp.microsoft.com ([157.54.46.185]) with mapi; Thu, 29 May 2008 09:53:30 -0700
From: Peter Constable <petercon@microsoft.com>
To: "ltru@ietf.org" <ltru@ietf.org>
Date: Thu, 29 May 2008 09:53:28 -0700
Thread-Topic: [Ltru] Consensus call: extlang
Thread-Index: AcjBWfak4CkFWqJQTT66twKsMP1x3wATjrRg
Message-ID: <DDB6DE6E9D27DD478AE6D1BBBB835795633304E966@NA-EXMSG-C117.redmond.corp.microsoft.com>
References: <422633.90603.qm@web31813.mail.mud.yahoo.com> <E19FDBD7A3A7F04788F00E90915BD36C13C2528ABB@USSDIXMSG20.spe.sony.com> <30b660a20805282114v642c07dawa905112dbd6a35f5@mail.gmail.com> <483E54DC.4050605@malform.no>
In-Reply-To: <483E54DC.4050605@malform.no>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
MIME-Version: 1.0
Subject: Re: [Ltru] Consensus call: extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

> From: ltru-bounces@ietf.org [mailto:ltru-bounces@ietf.org] On Behalf Of
> Leif Halvard Silli


> If the focus is identification, then the fallback and filtering
> doesn't matter (or must be defined subsequently).

Sorry, but this is not a sensible approach to coding.

By "identification", we're talking about devising a coding scheme that provides code elements used to represent a semantic for the purpose of declaring language identity on information objects, and *also* for use in various kinds of processing on those information objects. If declaring identity was the only purpose, and processing is irrelevant, then all bets are off as to what the best way is to represent identities -- for instance, a URI pointing to some encyclopedic description strikes me as a far better identifier.

Our language tags exist *because* of processing scenarios. Nobody would bother wasting time adding a tag to metadata on information objects if it was just going to _be_, ignored and taking up space.

The coded representation of a semantic category can take any form, and as long as all are equally documented the identificational value of each of those potential forms is exactly equivalent. For instance, "cmn" and "zh-cmn" are exactly equal in terms of identificational value -- meaning "Mandarin". Mark described these as different, but I think the difference he pointed out was one of processing, not one of identity.

If I know the relationship between Mandarin and "Chinese", then I can make exactly the same processing decisions with "cmn" as I can with "zh-cmn". But the two differ wrt what kinds of processing mechanisms are available to me: "cmn" requires that a process have a separate business knowledge relating Mandarin and "Chinese", but "zh-cmn" embeds that knowledge into the tag itself. Since both are the same in terms of their identificational value, the question to be asked is which of these is more useful for processing.

And not just for *this* pair, but for all macrolanguage cases -- unless we're considering using extlang for just certain cases.



> Extlang makes identifying/finding *all* Chinese languages simple,

Note that you have already introduced processing.


> for those that need to do find them all. It would only require
> that you ask for 'zh'. Thus with extlang one could start tagging
> resources spesificly *immediatly*, because it should be highly
> compatible with existing application behaviour.

For applications that use right-truncation matching, but not lookup.


> (Btw, this approach strikes me as higly compatible with another
> popular slogan: Don't break the Web.)

Well, not all servers use right-truncation matching logic, and for those that don't your plan would be breaking the Web. (I'm not saying that the no-extlang path doesn't have some of the same issues; either way, there are things that would need to be done to keep everything working smoothly if we want to transition from a "zh"-only past into a future that includes "cmn"/"yue"/etc.)



Peter
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru