Re: [Ltru] Consensus call: extlang

"Mark Davis" <mark.davis@icu-project.org> Tue, 27 May 2008 19:04 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5226B3A6C7B; Tue, 27 May 2008 12:04:15 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C4B7E3A6C66 for <ltru@core3.amsl.com>; Tue, 27 May 2008 12:04:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[AWL=0.067, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mVT04SF4RQ1E for <ltru@core3.amsl.com>; Tue, 27 May 2008 12:03:59 -0700 (PDT)
Received: from yw-out-2324.google.com (yw-out-2324.google.com [74.125.46.30]) by core3.amsl.com (Postfix) with ESMTP id C87203A6C5D for <ltru@ietf.org>; Tue, 27 May 2008 12:02:49 -0700 (PDT)
Received: by yw-out-2324.google.com with SMTP id 3so1393700ywj.49 for <ltru@ietf.org>; Tue, 27 May 2008 12:02:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=bEht94BnAG7aHphSK0413+ceK4DRFbN4sHq6zVaVSjw=; b=PjU1r0A+ZMECONT/J/z6+Nmm09+EaTl6V0v5dlrAihEOXkIeB6kLrGAO844NALphjkC3/n40FBRZ3TNudzp/S78RXAJ+G/DQLMszY+l4wNy6fW2ON1DCb51mK9iN5x2hNdv84/VoLcglqoBoBDIB/lhZA6shj1dVhjVAJkRUz7s=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=fAIROTQhe3BsIgm2JXNe54nEx8wT6qqkepSSOLAuekwhoHv/4N8hF2gPy+UsXuHkx5w0W9sZNL109WuYDHrqNTatThyZK3WwBUT8qpQQhxsXNqq/06PPV1mW2eQS97g5BaXDYmwaBPv2T/zow2nR8z1aBhCV+0TUdhFQoq+51TE=
Received: by 10.150.212.17 with SMTP id k17mr469189ybg.68.1211913511003; Tue, 27 May 2008 11:38:31 -0700 (PDT)
Received: by 10.150.206.3 with HTTP; Tue, 27 May 2008 11:38:30 -0700 (PDT)
Message-ID: <30b660a20805271138v67b081dat5809395233575c90@mail.gmail.com>
Date: Tue, 27 May 2008 11:38:30 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: John Cowan <cowan@ccil.org>
In-Reply-To: <20080527032120.GA18303@mercury.ccil.org>
MIME-Version: 1.0
References: <01c301c8bbe5$8c2810c0$6801a8c0@oemcomputer> <008a01c8bedc$72b97b20$6801a8c0@oemcomputer> <30b660a20805252132g28ff50b0kd5b04d6f47ca35d2@mail.gmail.com> <002001c8bef3$e0497520$6801a8c0@oemcomputer> <30b660a20805262003j21fff6c4tf20d59be11f28633@mail.gmail.com> <20080527032120.GA18303@mercury.ccil.org>
X-Google-Sender-Auth: fbb2c83c86e2e9e0
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] Consensus call: extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============2069884195=="
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

That's a good point, John. My corrected text should would read:

I want to get Mandarin and French. For backwards compatibility, I have to
use "zh" for Mandarin, now and the the foreseeable future (I'd pointed this
out before).

What I'd like to say is

   - under RFC 4646: "zh, fr"

As John points out,  RFC4646 is already complicated by codes with zh-
prefixes, and what I actually have to say is:

   - under RFC 4646: "zh-cmn, zh, fr, zh-gan;q=0, zh-hakka;q=0,
   zh-min-nan;q=0, zh-wuu;q=0; zh-xiang;q=0, zh-yue;q=0"

And under the proposed extlang additions, I have to say:

   - under this proposal: "zh-cmn, zh, fr, zh-cjy;q=0, zh-cpx;q=0,
   zh-czh;q=0, zh-czo;q=0, zh-gan;q=0, zh-hak;q=0, zh-hsn;q=0, zh-mnp;q=0,
   zh-nan;q=0, zh-wuu;q=0, zh-yue;q=0; zh-hakka;q=0, zh-min-nan;q=0,
   zh-yue;q=0"

Thus RFC4646 is already complicated by these codes, and the proposed change
will just make it worse.

To choose a clearer example, if I want Standard Arabic and French:

What I'd like to say, and what I use now under RFC4646 is:

   - "ar, fr"

And under the proposed extlang additions, I have to say:

   - "ar, fr, ar-aao;q=0, ar-abh;q=0, ar-abv;q=0, ar-acm;q=0, ar-acq;q=0,
   ar-acw;q=0, ar-acx;q=0, ar-acy;q=0, ar-adf;q=0, ar-aeb;q=0, ar-aec;q=0,
   ar-afb;q=0, ar-ajp;q=0, ar-apc;q=0, ar-apd;q=0, ar-arq;q=0, ar-ars;q=0,
   ar-ary;q=0, ar-arz;q=0, ar-auz;q=0, ar-avl;q=0, ar-ayh;q=0, ar-ayl;q=0,
   ar-ayn;q=0, ar-ayp;q=0, ar-bbz;q=0, ar-pga;q=0, ar-shu;q=0, ar-ssh;q=0"

Shawn,
Yes, you're right: "zh, fr" means "Chinese and French". The ISO standard
that RFC4646 is based on is woefully underspecified, so you don't know
whether that "Chinese" means Standard Chinese or also includes other
languages that people call Chinese. (Magically, "German" means only
"Standard German", and "French" means "Standard French", and so on, but
"Arabic" doesn't mean "Standard Arabic",...) The text in draft 12 makes it
clear what the extent of zh is. Yet for now and into the future, if I want
Mandarin, I have to include "zh". ("zh-cmn" has been around for almost 3
years (2005-07-15), but we are seeing **no** uptake in any web documents or
Accept-Language values.)

As far as a compromise goes, I could even live with allowing either extlang
or not, since they are not semantically or syntactically the same. Then the
user has the choice of indicating s/he wants

   - Cantonese alone ("yue"), or Cantonese-with-fallback ("zh-yue"), or
   - Babalia Creole Arabic ("bbz") alone, or Babalia Creole
   Arabic-with-fallback ("ar-bbz").


Peter,
I agree with what you were saying, I didn't intend for it to be misleading.
I was really thinking about the syntactic matching, since extlang is a
syntactic device, and Accept-Language works in terms of that syntax. If we
are dealing with semantic matching, then the situation is a bit different.

Martin,
>> "consider each of the applications of language tags: identification,
lookup, filtering, and Accept-Language, and be able to have a reasoned
judgment on the technical merits."
> I assume everybody has done this.

Based on the comments, I really doubt that. Do you really think that
everyone has considered the technical ramifications of using this with
Accept-Language, even now?

>If you, or anybody else closely involved with this work, wants to
claim that they had absolutely no idea that the extlang production
in RFC 4646 was intended, at least among else if not primarily or
only, for encompassed languages, or more concretely, for cases such
as zh-yue and friends, then I'd be extremely surprised.

I dem fall, chani di üeberasche ;-)

I can at least speak for myself -- and I was "closely involved" -- that the
extlang mechanism was to allow us the syntactic wiggle room we needed should
we decide to use it for 639-3 in any way. We knew about macrolanguages, but
we had *not* weighted all of the implications at the time, and had not
committed to the use for macrolanguages, because we didn't know what shape
639-3 might take. We even allowed for multiple extlangs in a tag.

Mark


On Mon, May 26, 2008 at 8:21 PM, John Cowan <cowan@ccil.org> wrote:

> Mark Davis scripsit:
>
> > I have been a strong proponent of RFC 4646. But I can't see any way to
> sell
> > software developers on the ways in which extlang would require a radical
> > change, eg that the Accept-Language value meaning 'Mandarin then French'
> > would be
> >
> >    - under RFC 4646: "zh, fr"
> >    - under this proposal: "zh-cmn, zh, fr, zh-cjy;q=0, zh-cpx;q=0,
> >    zh-czh;q=0, zh-czo;q=0, zh-gan;q=0, zh-hak;q=0, zh-hsn;q=0,
> zh-mnp;q=0,
> >    zh-nan;q=0, zh-wuu;q=0, zh-yue;q=0".
>
> Actually, to get *exactly* the same effect under RFC 4646 you'd need to
> say "zh-cmn, zh, fr, zh-gan;q=0, zh-hakka;q=0, zh-min-nan;q=0, zh-wuu;q=0;
> zh-xiang;q=0, zh-yue;q=0", and that's not counting the deprecated forms.
>
> --
> In politics, obedience and support      John Cowan <cowan@ccil.org>
> are the same thing.  --Hannah Arendt    http://www.ccil.org/~cowan<http://www.ccil.org/%7Ecowan>
>



-- 
Mark
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru