Re: extlang (was Re: Suggested language for "mis" (Re: [Ltru] RE: ISO 639-2 decision: "mis"))

"Mark Davis" <mark.davis@icu-project.org> Mon, 18 June 2007 17:06 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I0Kgp-0003Qy-G8; Mon, 18 Jun 2007 13:06:55 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1I0Kgo-0003QX-JL for ltru-confirm+ok@megatron.ietf.org; Mon, 18 Jun 2007 13:06:54 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I0Kgo-0003QJ-8I for ltru@ietf.org; Mon, 18 Jun 2007 13:06:54 -0400
Received: from wa-out-1112.google.com ([209.85.146.183]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I0Kgm-0001GC-Bh for ltru@ietf.org; Mon, 18 Jun 2007 13:06:54 -0400
Received: by wa-out-1112.google.com with SMTP id j5so2624998wah for <ltru@ietf.org>; Mon, 18 Jun 2007 10:06:51 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=VPritMgobUbIPFjIjDnB90NUdf/o02V73B49sTw145jwt62/gkREIueybtIppV5GR0lmI17Xvbv12Y99vxhmJabexSdX9flmeQTrVoDIuoTTFabjUPbiUxLmfuVOC/IewCzeMLL6gBh+h4xONV7aXyqzRMklRFSvCDg35zgM0dI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=iiI+SiHTOveIixygFDr9AlasNCq16FVgak0mpVdWOAILEExmsqaoUDqoAO2sMQIry4q89yALGp9WMHh9za2bfRH/z7Ymc3dIu4fstPPq8cr6hnf8LfrX7F6jl+tgnHpKnnNdKHKF9aIrM6LzU/NeX6Xmbqu+MERckUSHQOdMezM=
Received: by 10.114.13.1 with SMTP id 1mr6280792wam.1182186410787; Mon, 18 Jun 2007 10:06:50 -0700 (PDT)
Received: by 10.114.192.10 with HTTP; Mon, 18 Jun 2007 10:06:50 -0700 (PDT)
Message-ID: <30b660a20706181006x3efbf772t9a0751feb070a6cb@mail.gmail.com>
Date: Mon, 18 Jun 2007 10:06:50 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Doug Ewell <dewell@roadrunner.com>
Subject: Re: extlang (was Re: Suggested language for "mis" (Re: [Ltru] RE: ISO 639-2 decision: "mis"))
In-Reply-To: <007f01c7b166$8ef7bf10$6401a8c0@DGBP7M81>
MIME-Version: 1.0
References: <30b660a20706171252l3c61d451p464b96e864d1a515@mail.gmail.com> <007f01c7b166$8ef7bf10$6401a8c0@DGBP7M81>
X-Google-Sender-Auth: 7a95f2151ec6d33c
X-Spam-Score: 0.3 (/)
X-Scan-Signature: ed68cc91cc637fea89623888898579ba
Cc: LTRU Working Group <ltru@ietf.org>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1297085007=="
Errors-To: ltru-bounces@ietf.org

We added extlang to allow ourselves the freedom to make choices when 639-3
came along. We *very clearly did not define its meaning*, because we didn't
know what 639-3 was finally going to look like, nor did we have agreement on
what we should actually do. We had no commitment to using extlang.

We *already* had macrolanguages with ISO 639-2 in RFC 4646 and we *did not*
use extlang for them: examples are "sr", "hr", "nb", etc. We are not going
to (and cannot) be forcing users to encode nb as no-nb, nor sr as sh-sr.
These include

ak 4646 Akan fat 4646 Fanti  ak 4646 Akan tw 4646 Twi  no 4646 Norwegian nb
4646 Norwegian BokmÃl  no 4646 Norwegian nn 4646 Norwegian Nynorsk  sh 4646
Serbo-Croatian bs 4646 Bosnian  sh 4646 Serbo-Croatian hr 4646 Croatian  sh
4646 Serbo-Croatian sr 4646 Serbian
When we ("Google") tried implementing matching with "zh-yue" and others, we
found it made things *more* difficult, not less. Matching "zh" and "yue" is
not something you want to do automatically. Moreover, because of #2 we had
to have a mechanism for dealing with macrolanguages in RFC 4646 *anyway*.

Thus to make a proposed change from 4646 to use the extlang mechanism for
languages that have macrolanguages, we need a very compelling case that the
additional complication solves more problems than it creates. We haven't
seen that yet, and certainly have no consensus that it is the case.

So my proposal is essentially:

A. Keep the same structure and same philosophy as in RFC 4646. ISO 639-3
codes become primary language subtags, whether they have a macro language or
not. These are the edits I passed out previously, plus the removal of
2.2.2from the draft, and a few other changes (basically reverting to
RFC 4646 for
extlang).

B. (optional) Add a field Macrolanguage: to the language subtag registry.
The suggested text changes are:

* in 3.1.2.  Record Definitions, add in the optional fields

"Macrolanguage

   - For fields of type 'language', if there is a macro language in ISO
   639-3, then this field indicates that."

* After 3.1.9 add a new section.

Macrolanguage

The macro language field is added whenever a language has a corresponding
macro language in ISO 639-3. For example, 'sr' (Serbian) will have the
Macrolanguage value 'sh' (Serbo-Croation). This field is provided for use in
matching algorithms. For more information about the meaning and use of
macrolanguages, see [ISO 639-3].

* Then search for instances of "Suppress-Script" (just as a place to find
where field descriptions are) and make an addition of "Macrolanguage" if
appropriate, eg in the "LANGUAGE SUBTAG REGISTRATION FORM"

Mark


On 6/17/07, Doug Ewell <dewell@roadrunner.com> wrote:
>
> Mark Davis wrote:
>
> > There are at least three open issues listed in
> > http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-06.html.
> >
> > Extlang is one of them. I'll step back a bit, since it appears that we
> > don't have consensus about making a change in extlang from RFC 4646.
> >
> > So we need to revive the discussion from back in March.
>
> I agree.  For those who wish to follow the extlang discussion, the
> question
> is: for languages listed in ISO 639-3 that are not already present in the
> Registry, should we:
>
> 1.  add all of them as primary language subtags, or
> 2.  take the 5% that are encompassed by an ISO 639-3 macrolanguage, add
> them
> as extended language (extlang) subtags, and add the other 95% as primary
> language subtags?
>
> The discussion in March started with this message from Mark:
> http://www1.ietf.org/mail-archive/web/ltru/current/msg07288.html
>
> Chase the "Follow-Ups" links to follow the thread.
>
> My view is that we should keep the primary/extlang relationship that we
> have
> been planning since 2004 to reflect the ISO 639-3 macrolanguage
> relationship.  It allows identification of the individual languages while
> staying compatible with the way people have tagged them in the past, and
> in
> some cases with the way people will always perceive them (i.e. "Arabic"
> rather than 30 different flavors).  Expecting matching engines to have "a
> bit more smarts" so they can match "yue" with "zh" doesn't work for me
> when
> we can't even expect them to be smart enough to match "en-US" with
> "en-Latn-US".
>
> But that's just my view, and the important thing is for the list to
> discuss
> the matter and reach a consensus so we can move forward.
>
> --
> Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
> http://users.adelphia.net/~dewell/
> http://www1.ietf.org/html.charters/ltru-charter.html
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>


-- 
Mark
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru