Re: [Ltru] my technical position on extlang

"Mark Davis" <mark.davis@icu-project.org> Sat, 24 May 2008 16:43 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 86BAC3A6A26; Sat, 24 May 2008 09:43:34 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 908DC3A6A44 for <ltru@core3.amsl.com>; Sat, 24 May 2008 09:43:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.376
X-Spam-Level:
X-Spam-Status: No, score=-1.376 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_23=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xhqb3UooOdcx for <ltru@core3.amsl.com>; Sat, 24 May 2008 09:43:32 -0700 (PDT)
Received: from hu-out-0506.google.com (hu-out-0506.google.com [72.14.214.232]) by core3.amsl.com (Postfix) with ESMTP id 5267C3A6A26 for <ltru@ietf.org>; Sat, 24 May 2008 09:43:31 -0700 (PDT)
Received: by hu-out-0506.google.com with SMTP id 24so627880hud.14 for <ltru@ietf.org>; Sat, 24 May 2008 09:43:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=5cPuoC/0PAlMDgZiCQYrw/FiNEncx1SF0lOsMDIE3pQ=; b=ShXJmTBbzOGkIDEnXkOZI7BQHR1hCcAMiH1k3Dh3lKTeDXhixBqxxgE4ULnoKa0ZWZHjHQ+yL1+23NgQvq1Z7BQWBxM2d3WXGNM0chVL6TjxI33GiTKL8FeANT2BVer524wcVFOTYd5YGbvWsBLj1I6b1A0gCthd7pYPUJ6q4lA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=RI1EKNr2JJNzWT03k2UpK5Jjk0e0MMlNEmI2UVbwJQ/0xN9ANk7FI3BnnAq/PebpkrQ/Q8VK5lQ0OvQt35z4it5Uvz1O1KGPf/EdEe0BQYxzkNB1AtBjrskF+tux563JveSwRe2lxoDR6ncK7y0uiV70KXfCvtuqyQT1WrjQSII=
Received: by 10.150.91.17 with SMTP id o17mr1243365ybb.223.1211647405509; Sat, 24 May 2008 09:43:25 -0700 (PDT)
Received: by 10.150.206.3 with HTTP; Sat, 24 May 2008 09:43:25 -0700 (PDT)
Message-ID: <30b660a20805240943o44a5719r50eb8f0eaf721dca@mail.gmail.com>
Date: Sat, 24 May 2008 09:43:25 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: John Cowan <cowan@ccil.org>
In-Reply-To: <20080524001151.GD13152@mercury.ccil.org>
MIME-Version: 1.0
References: <30b660a20805181149u2e1e3fb9y1a3b5b751c3e6998@mail.gmail.com> <20080523044305.GB7960@mercury.ccil.org> <30b660a20805230851r519f5d14wd93a92494d1db1c9@mail.gmail.com> <20080523160905.GD21554@mercury.ccil.org> <30b660a20805231405q56b156c4vbb3b6abda4af3893@mail.gmail.com> <20080523225400.GB13152@mercury.ccil.org> <30b660a20805231639w1de0fda8w116662738f8c5d6a@mail.gmail.com> <20080523234427.GC13152@mercury.ccil.org> <30b660a20805231655r34486205m9362e8fe65193ae6@mail.gmail.com> <20080524001151.GD13152@mercury.ccil.org>
X-Google-Sender-Auth: 59b871d0b23434be
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] my technical position on extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1616756556=="
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

Just to make sure that we're talking about content negotiation, eg using
Accept-Language to get the language for a website, where "zh" means "zh" or
anything starting with it. The key difference between the models is
identification. With the extlang model, I can't *restrict* the match to what
I want without really jumping through hoops.
Here's an example. Let's suppose that content tagged with "fr", "gsw", and
"yue" (or "zh-yue") are available, and nothing else. Let's suppose also
that I understand German, French, and Mandarin, but not any of the
following.

   - gct Colonia Tovar German
   - geh Hutterite German
   - gmh Middle High German
   - gml Middle Low German
   - goh Old High German
   - gsw Swiss German
   - nds Low German
   -
   - cjy Jinyu Chinese
   - cpx Pu-Xian Chinese
   - czh Huizhou Chinese
   - czo Min Zhong Chinese
   - gan Gan Chinese
   - hak Hakka Chinese
   - hsn Xiang Chinese
   - mnp Min Bei Chinese
   - nan Min Nan Chinese
   - wuu Wu Chinese
   - yue Yue Chinese

Both models

   - If I specify "de fr", I'll get "fr" under either model. Fine, I can
   understand "fr".
   - If I wanted to also request Swiss German as well as German, I would use
   "gsw de fr", and get "gsw".
   - I only get "gsw" if I request it. That's good, because I don't
   understand "gsw".

No Extlang Model

   - If I specify "zh cmn fr", I'll get "fr" under either model. Fine, I can
   understand "fr".
   - If I wanted to also request Cantonese as well as Mardarin, I would use
   "yue cmn zh fr" and get "yue"
   - I only get "yue" if I request it. That's good, because I don't
   understand "yue".
   - I need to include "zh", and will for the indefinite future because the
   vast majority of Mandarin content is tagged that way.
   - I might get content tagged "zh" but which is actually "yue" but the
   chances of that are extremely remote.

Extlang Model

   - If I specify "zh zh-cmn fr", I would get "zh-yue". Broken, I don't
   understand Cantonese!
   - In order to specify that I want "zh" or "cmn" but no other languages, I
   have to use "zh-cjy;q=0, zh-cpx;q=0, zh-czh;q=0, zh-czo;q=0, zh-gan;q=0,
   zh-hak;q=0, zh-hsn;q=0, zh-mnp;q=0, zh-nan;q=0, zh-wuu;q=0, zh-yue;q=0 zh
   zh-cmn fr".
   - Moreover, as you said, if another encompassed language shows up for zh,
   I could get that inadvertently unless I add this to my list.
   - Again, I need to include "zh", and will for the indefinite future
   because the vast majority of Mandarin content is tagged that way.

Mark

On Fri, May 23, 2008 at 5:11 PM, John Cowan <cowan@ccil.org> wrote:

> Mark Davis scripsit:
>
> > If you don't have any data to back up your assertion that "The best is
> the
> > enemy of the good."....
>
> So your argument is that languages are *less* likely to be mutually
> intelligible (to some degree) if they are co-members of a macrolanguage
> than if they are not?
>
> All I need is to show that the tendency is the other way: that (a)
> getting Cantonese when you want Mandarin is no worse than getting
> Portuguese when you want French (i.e. both are equally bad); and (b)
> that getting Egyptian Arabic when you want Sa'idi Arabic is better
> than getting English when you want Greek.
>
> --
> One Word to write them all,             John Cowan <cowan@ccil.org>
>  One Access to find them,              http://www.ccil.org/~cowan
> One Excel to count them all,
>  And thus to Windows bind them.                --Mike Champion
>



-- 
Mark
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru