[Ltru] Re: extlang

John Cowan <cowan@ccil.org> Tue, 28 August 2007 22:35 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IQ9f0-0003X0-Ac; Tue, 28 Aug 2007 18:35:46 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IQ9ez-0003SA-5v for ltru-confirm+ok@megatron.ietf.org; Tue, 28 Aug 2007 18:35:45 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IQ9ey-0003PZ-Pk for ltru@ietf.org; Tue, 28 Aug 2007 18:35:44 -0400
Received: from earth.ccil.org ([192.190.237.11]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IQ9ex-0003kr-Fm for ltru@ietf.org; Tue, 28 Aug 2007 18:35:44 -0400
Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from <cowan@ccil.org>) id 1IQ9eq-0002Lv-L3; Tue, 28 Aug 2007 18:35:36 -0400
Date: Tue, 28 Aug 2007 18:35:36 -0400
To: Mark Davis <mark.davis@icu-project.org>
Message-ID: <20070828223536.GB31670@mercury.ccil.org>
References: <30b660a20708281459r6000d746qe007f2882fae6d73@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <30b660a20708281459r6000d746qe007f2882fae6d73@mail.gmail.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
From: John Cowan <cowan@ccil.org>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: a2c12dacc0736f14d6b540e805505a86
Cc: LTRU Working Group <ltru@ietf.org>
Subject: [Ltru] Re: extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

Mark Davis scripsit:

> Not single-handedly: Addison has also commented on it. 

AFAIK he has acquiesced in, but not defended, your position.
(I may be wrong.)

> And on the other side, I've heard 3 strong proponents. So not exactly
> overwhelming either way.

FWIW, I have heard privately from several normally active listmembers
supporting my position.  I assume that you know I wouldn't lie about this.

> I've said before, and will say again: it was never a foregone conclusion. 

I have never said so.  I do say that it is the status quo, and it's
a change to the status quo that requires defending.

> Do I really have to repeat this over and over????

Alas, we do seem to be looping.  I wish I saw the way ahead.

>    1. The macrolanguage is always a better fallback for every encompassed
>    language than other alternatives. Out of the many encompassed languages, you
>    implying that a speaker of every encompassed language will be able to
>    understand the macrolanguage, or at least better than the alteratives.

RFC 1766 already said that fallback doesn't always work, and sometimes
produces something unintelligible to the requester.

But I believe you are misusing the term "macrolanguage".  A macrolanguage
is not to be equated with a "main" or standardized variety.  Some
macrolanguages like Chinese and Arabic have such varieties, others like
Quechua and Nahuatl do not.

Rather, a macrolanguage is a group of languages that *in some domain*
is recognized as a single language.  By definition, if you speak
Sudan Arabic, you are speaking something that is part of the Arabic
macrolanguage, even if you cannot speak modern standard Arabic at all.
Thus it may be empirically sound to assume that anything (or at least
any text) tagged "ar" is in MSA, it is not certain.

>    1. Truncation fallback from zh-cmn-Hant-SG to "zh" loses the Hant and
>       the SG; falling back from ar-arb-SA to 'ar' loses the "SA".

So it does.

>       2. It introduces ambiguous language names. Right now, in the
>       overwhelming majority of practice, standard Arabic is "ar"; after the
>       change, standard Arabic is "ar-arb".

You propose in the alternative that "ar" and "arb" both be understood
as modern standard Arabic.  I submit that that is worse.

>    1. Anyone who has to deal with language issues on all but a trivial
>       level must already have a mechanism to deal with sh, sr, hr;
>       with no, nb, nn. Those are macrolanguages and encompassed
>       languages. They exist right now WITHOUT an extlang mechanism,
>       and people deal with them. The proposed mechanism won't handle
>       these, and anyone who can handle these doesn't need extlang.

True, but not everyone does.  As I have said before, the majority of
language-sensitive applications, I believe, deal only with recognizing a
short list of languages that they know what to do with, and all others
have the semantics of "unknown language".  It's perfectly compliant
just to look at the first subtag, decide if it is 'en', 'fr', or 'ar',
and throw away all other information; furthermore, I believe this to
be typical.  Most applications just don't involve processing almost
every document known to man.  I want this sort of application to continue
to work without having to be modified to add "arb" as a fourth alternative.

>       2. With the Macrolanguage field, there is sufficient information
>       for *anyone* who wants to to implement extlang-like fallback
>       (including for sh or no), *without* encumbering the IDs with
>       superfluous information.

I support the Macrolanguage: field for the uses which you mention.
Furthermore, I support extlangs *only* in the context of using newly
introduced 639-3 identifiers that are encompassed by a macrolanguage
registered in 639-2.  Thus, I do *not* support changes to language
tags based on shifting macrolanguage information as SIL learns more,
just the 350+ specific code elements of 639-3, and no more.

> This is untrue. As soon as we implemented extlang in prototype, we ran into
> the problems listed above. It *didn't* work out of the box.

I can well understand that in certain applications having an unusual
breadth of scope, both in the matter of documents and in the matter
of languages, that it might well not.

-- 
Dream projects long deferred             John Cowan <cowan@ccil.org>
usually bite the wax tadpole.            http://www.ccil.org/~cowan
        --James Lileks


_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru