Re: [Ltru] Macrolanguage usage

"Mark Davis" <mark.davis@icu-project.org> Wed, 21 May 2008 04:01 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4BABF28C48A; Tue, 20 May 2008 21:01:44 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A2DEB28C477 for <ltru@core3.amsl.com>; Tue, 20 May 2008 21:01:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.377
X-Spam-Level:
X-Spam-Status: No, score=-1.377 tagged_above=-999 required=5 tests=[AWL=0.599, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UWSIv8xT-ALG for <ltru@core3.amsl.com>; Tue, 20 May 2008 21:01:40 -0700 (PDT)
Received: from yw-out-2324.google.com (yw-out-2324.google.com [74.125.46.29]) by core3.amsl.com (Postfix) with ESMTP id DD79628E51B for <ltru@ietf.org>; Tue, 20 May 2008 13:44:25 -0700 (PDT)
Received: by yw-out-2324.google.com with SMTP id 3so1485526ywj.49 for <ltru@ietf.org>; Tue, 20 May 2008 13:44:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=u4rFOFnA+7mlXUWcB6ddHeiaK0iQI5wuBdqhsQtnrHg=; b=IMwJcGGuoC8HHXiyljk7CA7bfQvufDiFP7grSyrm4F659Z1SIGPzFu0y73FVM6bKbbBp7wSdeYchOhqTuKuxc+zqwXMWwnMAXygy/6cI/XLBE0zb1HOgFZO9KZykH1tGnQGV41TZMNx20p4uROXiwuK6rl2MdozSGMmyJujP5jE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=b2zTUwR8npIpjqIwQ5yuv8RNjlEOBOXBOnln7JjC9x6B8itjn4ZXueu8zqt2gHMH12WWMghUoTXpi+Rp13/Wy1/atNDC95mlifsRNPcGMlHeaI38myPLnnRYeIqi1Nz/hmkFHRTRIskSGKzXhTsHo+H13oBpBJwZcPdx2lNU88c=
Received: by 10.150.92.12 with SMTP id p12mr957809ybb.236.1211316258752; Tue, 20 May 2008 13:44:18 -0700 (PDT)
Received: by 10.150.206.3 with HTTP; Tue, 20 May 2008 13:44:18 -0700 (PDT)
Message-ID: <30b660a20805201344m22f0f40cmdfba059b0123e477@mail.gmail.com>
Date: Tue, 20 May 2008 13:44:18 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Leif Halvard Silli <lhs@malform.no>
In-Reply-To: <4832C21A.4050800@malform.no>
MIME-Version: 1.0
References: <mailman.494.1210865385.5128.ltru@ietf.org> <00a901c8b6f5$c04529a0$e6f5e547@DGBP7M81> <30b660a20805161108w578b6cf9g11933ca34996a596@mail.gmail.com> <005901c8b787$930f98c0$6801a8c0@oemcomputer> <30b660a20805161309u67158b6arcb3b2df1c46db6a7@mail.gmail.com> <C9BF0238EED3634BA1866AEF14C7A9E561554BEB09@NA-EXMSG-C116.redmond.corp.microsoft.com> <30b660a20805161415kb1172f0xa6c4dea251344bb6@mail.gmail.com> <4832C21A.4050800@malform.no>
X-Google-Sender-Auth: cac9f54b29432574
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] Macrolanguage usage
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0228381241=="
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

On Tue, May 20, 2008 at 5:20 AM, Leif Halvard Silli <lhs@malform.no> wrote:

> Mark Davis 2008-05-16 23.15:
>
>> Yes, clearly. Although even that takes time and effort to do; there is no
>> magic wand. (As I've said, the addition of non-predominant encompassed
>> language subtags like "yue" is important and useful for us. The addition
>> of
>> the predominant encompassed language subtags just means a lot of work for
>> no
>> additional benefit to us or our users.
>>
>
> Sad to hear a Google representative speak like that. Because it does add
> benefit to your users. Macrolanguage information represents reality - it is
> not just teory.


Actually no. According to everything I've heard, the macrolanguage was
devised as a construction which is an attempt to rationalize inconsistent
approach to languages used by previous versions of ISO 639. It does not
represent any particularly reality beyond that. As said before, whether two
languages are closely related or not is pretty much orthogonal to whether or
not they share a macrolanguage: ro and mo are examples.

You appear to be generalizing from the case of no, nn, nb to all
macrolanguages.


>
> And, btw, for Google, each language tag is also an potential advertising
> channel, for instance. In order to be able to use Nynorsk in adverticing, I
> must be reasonably certain that those who are not turned off by that
> receives it. However, as long as Google associates 'no' with 'nb' then you
> are sucking up all the unknowing nynorskusers who just take what the
> browser/OS offers them.
>
> And for searching, being able to discern between Nynorsk, Bokmål or both
> taken together, would be useful.


I'm not going to speak to the issue of particular language support by Google
or any others except to say that it is always a tradeoff. Each new language
takes effort. Of the hundreds of languages that could be supported, one has
to consider the priority among those: is it better to do Bengali next, or
Nynorsk, and so on.


>
>   Following the de/gsw paradigm would
>> have been *much* simpler. But that is water under the bridge; we just have
>> to deal with the situation as it is, and try to give people guidance as to
>> the best way to handle them.)
>>
>>
>
> Often it is much simpler to ignore than to take notice of, it seems.


We cannot ignore the situation; the question is what are the best strategies
for handling them going forward.


>
>  > If its "just" some internal database of languages and you have to map
>> > between the actual request/content names anyway, then I don't know why
>> this
>> > working group would care if Google called their internal data "zh" or
>> "abc"
>> > or whatever.
>> >
>>
>> The working group might not, but we have a goal of using BCP 47 not only
>> externally, but in communication *within* Google among the many different
>> products and programs. I assume that's not a bad thing ;-)
>>
>>
>
> It is a bad thing if you force your possibly wrong understanding upon the
> rest of the world.


Nobody's forcing anything.


>
>
>  It the intent is interchange, then continuing to use "zh" when what is
>> > really meant is a large subset of zh (Mandarin) seems to perpetuate the
>> > existing ambiguity.
>> >
>>
>> The issue is on output. If a program switches to "cmn" from "zh" on
>> output,
>> then an external party who doesn't recognize "cmn" breaks. So that program
>> needs to output "zh" until it is certain that the recipients would all
>> recognize "cmn".
>>
>>
>
> Firstly, you are in fact making an argument her for the use of extlang.


No. And notice again that extlang is not relevant to your issue: the extlang
issue does not pertain to no, nb, and nn; those are already coded. *None of
the extlang proponents except for you are proposing to have "no-nb" or
"no-nn".*

>
>
> And where is this an issue for Google? You control your own applications -
> so it can't be that. The only place I see is in lang negoation for web
> browsers. However, there you are masters in using cookies and IP information
> anyhow.


Modern companies do not live in a vacuum; there are many partners and other
organizations that they need to have effective communication and
interoperability with. Thus the reason for standards.


>
>  We could remain silent on this issue in the spec, but that would just be
>> withholding useful advice for people in terms of "tagging wisely", advice
>> that would allow people to interoperate more effectively.
>>
>>
>
> As long as you are not silent on the issue that 'zh' equally well can be
> used for Cantonese, then pleas also tell that it can be used for Mandarin.
> Both things are true and must be stated - not hidden.


My opinion, and I've hardly been quiet on the issue, is that for a large
class of implementations it is best for backwards compatibility to continue
to have resource lookup map 'ar' to Standard Arabic, 'zh' map to Mandarin,
and so on, and use the new codes for other languages, eg 'yue' for
Cantonese. And the text of draft #14 allows that possibility, which is after
all perfectly legal. If you choose to map 'zh' to Gan in your resource
lookup, be my guest.


>
>  > Microsoft is also concerned with the backwards compatibility issues,
>> > however we recognize that existing "zh" tagged data is not necessarily
>> > Mandarin (even though its likely).  We don't have language detection
>> tools
>> > to try to guess what the application's resources or a web page actually
>> has
>> > for a language.  For back-compat, you'll recall that I thought zh-cmn
>> > helped, which would seem to solve both problems.  (Google'd have the zh
>> it
>> > wants and I'd have the specificity that I'm looking for.)
>> >
>>
>> No, it doesn't solve the problem at all. There are many, many
>> circumstances
>> where you want precise communication, not lookup&fallback. If recipient
>> expecting "zh" (and who doesn't know about the new codes) will break when
>> they get "zh-cmn" just like they will break when they get "cmn": extlang
>> or
>> not makes no difference. (Extlang only really has an effect with lookup,
>> you
>> think positive, I think negative, but let's not discuss that here -- we've
>> all agreed to look at that issue *after* this round.)
>>
>>
>
> But if a recpipient expecting 'no' to mean all forms of Norwegian breaks
> when he sees that you uses it only for Bokmål, then that doesn't matter for
> you.
>
> Yes, negatively.


Again, 'no' is not relevant to the extlang discussion.


>
>
>  > Presumably (unless its only for internal use), Microsoft and Google
>> can't
>> > have different definitions of "zh" for interchange.  If the definitions
>> > differ, then indexing of IIS served pages or requests from IE browsers
>> would
>> > not necessarily provide the expected results.
>> >
>>
>> Once both parties can handle both, it is not a problem. And where there is
>> a
>> handshaking protocol it is not a problem. But there will be a *long*
>> transition period where only "zh" can be depended on to work.
>>
>>
>
> The most important thing is probably the goal.


This is completely unrealistic. If you don't have a clear migration path for
people to take, you will never reach that goal.


>
>
>  > What happens for other language tags that change?  Like serbian (serbian
>> > remains the same, but data tagged at the region level will change.)
>>  Needing
>> > to support zh + cmn isn't that different than other common scenarios.
>> >
>>
>> Right, I've been saying all along that this is the same issue with any
>> predominant encompassed language.
>>
>>
>
> As it is the same issue as with any predominant region variant of English,
> for instance.


I have no idea what you mean by this.


>
> --
> leif halvard silli
>



-- 
Mark
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru