Re: extlang (was Re: Suggested language for "mis" (Re: [Ltru] RE: ISO 639-2 decision: "mis"))
"Mark Davis" <mark.davis@icu-project.org> Tue, 19 June 2007 18:30 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I0iTc-0003ge-Lp; Tue, 19 Jun 2007 14:30:52 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1I0iTb-0003gZ-NU for ltru-confirm+ok@megatron.ietf.org; Tue, 19 Jun 2007 14:30:51 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I0iTb-0003gR-Df for ltru@ietf.org; Tue, 19 Jun 2007 14:30:51 -0400
Received: from wr-out-0506.google.com ([64.233.184.227]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I0iTa-0001Tf-Ko for ltru@ietf.org; Tue, 19 Jun 2007 14:30:51 -0400
Received: by wr-out-0506.google.com with SMTP id 70so1293443wra for <ltru@ietf.org>; Tue, 19 Jun 2007 11:30:50 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=Kb5gXSvH7TK38K2hPoUcaFn6IqgRpr1lOaSkpRTJ34bSNT6aNVsCRTzyr2csz+lSY5biiMWdydazUE0hfbNVZjAuRnECF8q4pH27M3J/6W+43qU/b60ka/8BrTchqBUNMxG5MaMEBmLMGu/92FXXc2sFgAYaQTw9iDKWu9EzgJA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=m9T9cegY8N92ghm8crWDSSYLMWmSxetrgrFYhMokjWDPHnn1dLfX+OTBA2ZCnMAaRRtdYB/0iTOdTIODXMMYNvsAuQGM2OhGstPyPVP8TOo5RRXDinK3NvTJNK55Cqdu06I97gGX+JQs/2XHMu8nCWYrmgpF3fkwJAVpbzw6xqY=
Received: by 10.143.11.13 with SMTP id o13mr475268wfi.1182277849680; Tue, 19 Jun 2007 11:30:49 -0700 (PDT)
Received: by 10.114.192.10 with HTTP; Tue, 19 Jun 2007 11:30:49 -0700 (PDT)
Message-ID: <30b660a20706191130x2a83134ned38aed061d551b1@mail.gmail.com>
Date: Tue, 19 Jun 2007 11:30:49 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: John Cowan <cowan@ccil.org>
Subject: Re: extlang (was Re: Suggested language for "mis" (Re: [Ltru] RE: ISO 639-2 decision: "mis"))
In-Reply-To: <20070619013433.GA15048@mercury.ccil.org>
MIME-Version: 1.0
References: <30b660a20706171252l3c61d451p464b96e864d1a515@mail.gmail.com> <007f01c7b166$8ef7bf10$6401a8c0@DGBP7M81> <30b660a20706181006x3efbf772t9a0751feb070a6cb@mail.gmail.com> <20070619013433.GA15048@mercury.ccil.org>
X-Google-Sender-Auth: 91afc0721ac8b391
X-Spam-Score: 0.5 (/)
X-Scan-Signature: 72dbfff5c6b8ad2b1b727c13be042129
Cc: Doug Ewell <dewell@roadrunner.com>, LTRU Working Group <ltru@ietf.org>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============2103493850=="
Errors-To: ltru-bounces@ietf.org
On 6/18/07, John Cowan <cowan@ccil.org> wrote: > > Mark Davis scripsit: > > > We added extlang to allow ourselves the freedom to make choices > > when 639-3 came along. We *very clearly did not define its meaning*, > > because we didn't know what 639-3 was finally going to look like, > > Not so much. We had an excellent idea of what 639-3 would both look and > actually be like when 4646 was finalized. We couldn't include 639-3 or > extlangs because 639-3 itself was not yet final. I think perhaps both of our "we"s are overstated. Since we disagree on this point, neither my "we" nor your "we" is inclusive. For example, you can't mean that every member of the working group had looked it over thoroughly, and explored all the implementation ramifications. I'd like to see a show of hands for those who did -- maybe everyone except for me had, but I'd be rather surprised at that. > nor did we have agreement on what we should actually do. > > We had at least the consensus of silence; at least, I don't remember any > complaints at the time. Remember that the development of 4646 started at > least a year before LTRU was formally created. I don't think everyone had reviewed it in detail, but perhaps the show of hands would prove me the only one. > We *already* had macrolanguages with ISO 639-2 in RFC 4646 and we *did > > not* use extlang for them: examples are "sr", "hr", "nb", etc. > > (Rather, these are examples of languages *encompassed* by macrolanguages, > henceforth "encompassed languages". I realize that's just a slip.) True, thanks. > We are not going to (and cannot) be forcing users to encode nb as no-nb, > > nor sr as sh-sr. > > Nobody has ever proposed that. Language subtags coming from 639-1 > or 639-2 will not change, even if 639-3 says they encode encompassed > languages. My point was that anyone who wants to deal with macro languages, has to already deal with sr, hr, etc. as primary subtags, not as secondary. Thus whatever mechanisms people have to work with sh, etc. can be extended to other cases without the extlang mechanism. > When we ("Google") tried implementing matching with "zh-yue" and others, > > we found it made things *more* difficult, not less. > > Respectfully I suggest that because you ("Google") assign tags to incoming > rather than outgoing content, your use of BCP 47 is essentially private > rather than in interchange. That makes it potentially important, but > definitely not prototypical. First of all, that isn't true (but thanks for the "respectfully"!) We actually use language tags in a huge number of products, many of which have APIs. We are not fully BCP 47 compliant, but are working towards that. But the main point is, I want those people who have tried to implement -- professionally, not just in toy programs -- extlang to speak up about their experiences. So if you want to speak to your experience implementing this professionally, I'm all ears. Addison mentions that fallback matching is not a problem. Mechanically it is a no-brainer. But the *results* of that fallback are what we are having problems with. And *that's* why I raise this issue, and call it "baking in an assumption". Let me try to set this out, yet again. I'll call the "languages encompassed by a macrolanguage" by the term "microlanguage", just as a term. I see two options. I may have captured the extlang reasoning incorrectly, so please bear with me. And if there are other reasons for having extlang, that'd be good to hear. Premises 1. The reason for making microlanguages be extlang instead of primary sublanguages is so that truncation-style matching will have better results. 2. Fallback works when there is mutual comprehensibility (not necessarily 100%, but to a high degree); if you fallback to something that is not comprehensible, then fallback has failed. Option A. 1. Thus for extlang to work for microlanguages, the speakers of any microlanguages sharing a macrolanguage need to be able to understand the speakers of any other microlanguages sharing that macrolanguage. 2. Peter and the ISO JAC can verify that A1 is true; that every speaker of Hakka can understand Jinyu; every speaker of Shihhi Arabic can understand Cypriot Arabic; and so on). 3. Everything is hunky-dory. Option B. 1. The macrolanguage alone is always assumed to be the "standard", and that can be identified. That is, "zh" is always assumed to be Mandarin, "ar" is always assumed to be "Standard Arabic", etc. (That is, I think, the correct approach, but is *not* currently in the spec.) 2. Thus for extlang to work for microlanguages, the speakers of any microlanguages sharing a macrolanguage need to be able to understand the speakers of the standard used for that macro language. 3. Peter and the ISO JAC can verify that B21 is true; that every speaker of Hakka can understand Mandarin; every speaker of Shihhi Arabic can understand Cypriot Arabic, and so on). 4. Everything is hunky-dory. If both A and B are not plausible, and we can't find something other compelling reason to have extlangs, it is *far* better to add the Macrolanguage field to the registry, and let people implement their own matching making use of that AND other factors. After all, it is trivial to make a 4647bis that adds an optional step for microlanguages, which is that when you get to a microlanguage, the next step is to look at its macrolanguage before falling back to the default. That has the same result (and same problems) as extlang, but is something that is not baked into the standard -- is something that people can implement if they want without impacting matching for everyone else. > Matching "zh" and "yue" is not something you want to do > > automatically. Moreover, because of #2 we had to have a mechanism for > > dealing with macrolanguages in RFC 4646 *anyway*. > > Very plausible in your circumstances. But note that Yue (Cantonese) > content is *already* properly tagged "zh-yue" on precisely the theory > that's being applied to 639-3 encompassed languages. > > I realize that this cause is probably not important to you, because > you can (comparatively) easily change all "zh-yue" tags to "yue", but > this is not the case for other users of BCP 47 on and off the Internet, > who will never even hear about the change. These are irregular tags anyway, and can stay irregular tags afterwards. We and everyone else already have to deal with equivalences with grandfathered and irregular tags anyway; these are not a real problem. > Thus to make a proposed change from 4646 to use the extlang mechanism > > for languages that have macrolanguages, we need a very compelling > > case that the additional complication solves more problems than it > > creates. We haven't seen that yet, and certainly have no consensus > > that it is the case. > > On the contrary, the burden of persuasion is with you. Tags like > zh-yue are already present in 4646, and it's up to you to provide a > convincing argument to deprecate them. Furthermore, LTRU and its ad hoc > predecessor has been assuming the extlang structure since at least 2004. > Derailing that is what will take a "very compelling case". I disagree strongly. If you can't make a compelling case that extlang will make BCP 47 better instead of worse, and won't even look at the reasons not to do it, nor even bother to set out a case for it, then why should we add it? > A. [Don't use extlangs.] > > I continue in opposition to this. > > > B. (optional) Add a field Macrolanguage: to the language subtag > > registry. > > I am not opposed to this, precisely because encompassed languages and > the corresponding macrolanguage cannot be identified syntactically. Good. -- > May the hair on your toes never fall out! John Cowan > --Thorin Oakenshield (to Bilbo) cowan@ccil.org > I hope not.... -- Mark
_______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- extlang (was Re: Suggested language for "mis" (Re… Mark Davis
- Re: extlang (was Re: Suggested language for "mis"… Stephane Bortzmeyer
- Re: extlang (was Re: Suggested language for "mis"… Doug Ewell
- Re: extlang (was Re: Suggested language for "mis"… Mark Davis
- Re: extlang (was Re: Suggested language for "mis"… John Cowan
- RE: extlang (was Re: Suggested language for "mis"… Peter Constable
- Re: extlang (was Re: Suggested language for "mis"… Doug Ewell
- Re: extlang (was Re: Suggested language for "mis"… John Cowan
- Re: extlang (was Re: Suggested language for "mis"… Addison Phillips
- Re: extlang (was Re: Suggested language for "mis"… Stephane Bortzmeyer
- [Ltru] * in extended filtering Peter Constable
- Re: [Ltru] * in extended filtering Addison Phillips
- Re: extlang (was Re: Suggested language for "mis"… Mark Davis
- Re: extlang (was Re: Suggested language for "mis"… John Cowan
- Re: extlang (was Re: Suggested language for "mis"… Addison Phillips
- Re: extlang (was Re: Suggested language for "mis"… John Cowan
- [Ltru] Re: extlang Randy Presuhn
- [Ltru] Re: extlang Randy Presuhn
- Re: [Ltru] Re: extlang Addison Phillips
- Re: [Ltru] Re: extlang John Cowan
- Re: extlang (was Re: Suggested language for "mis"… John Cowan
- Re: [Ltru] Re: extlang Randy Presuhn
- RE: extlang (was Re: Suggested language for "mis"… Peter Constable
- RE: [Ltru] Re: extlang Peter Constable
- RE: [Ltru] Re: extlang Peter Constable
- Re: [Ltru] Re: extlang John Cowan
- Re: extlang (was Re: Suggested language for "mis"… Martin Duerst
- Re: extlang (was Re: Suggested language for "mis"… John Cowan
- [Ltru] Re: extlang Doug Ewell
- Re: [Ltru] Re: extlang Mark Davis
- Flagging macrolanguages in the LSR (Was: extlang … Stephane Bortzmeyer
- Re: Flagging macrolanguages in the LSR (Was: extl… Addison Phillips
- Re: Flagging macrolanguages in the LSR (Was: extl… Stephane Bortzmeyer
- [Ltru] Langtag.net Karen_Broome
- Re: Flagging macrolanguages in the LSR (Was: extl… Addison Phillips
- Re: [Ltru] Langtag.net Addison Phillips
- [Ltru] Re: Langtag.net Stephane Bortzmeyer
- [Ltru] Extlang stability (was: extlang) Frank Ellermann
- Re: [Ltru] Extlang stability (was: extlang) John Cowan
- Re: [Ltru] Extlang stability (was: extlang) Mark Davis
- [Ltru] Re: Extlang stability Frank Ellermann
- Re: [Ltru] Extlang stability (was: extlang) John Cowan
- Re: [Ltru] Re: Extlang stability John Cowan
- Re: [Ltru] Extlang stability (was: extlang) Mark Davis
- Re: [Ltru] Extlang stability (was: extlang) John Cowan