Re: extlang (was Re: Suggested language for "mis" (Re: [Ltru] RE: ISO 639-2 decision: "mis"))

John Cowan <cowan@ccil.org> Wed, 20 June 2007 17:15 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I13lv-0004E9-OT; Wed, 20 Jun 2007 13:15:11 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1I13lu-0004Ds-Qr for ltru-confirm+ok@megatron.ietf.org; Wed, 20 Jun 2007 13:15:10 -0400
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I13lu-0004DW-Ds for ltru@ietf.org; Wed, 20 Jun 2007 13:15:10 -0400
Received: from earth.ccil.org ([192.190.237.11]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1I13lr-0008Ua-RG for ltru@ietf.org; Wed, 20 Jun 2007 13:15:10 -0400
Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from <cowan@ccil.org>) id 1I13lm-000540-Tb; Wed, 20 Jun 2007 13:15:03 -0400
Date: Wed, 20 Jun 2007 13:15:02 -0400
To: Mark Davis <mark.davis@icu-project.org>
Subject: Re: extlang (was Re: Suggested language for "mis" (Re: [Ltru] RE: ISO 639-2 decision: "mis"))
Message-ID: <20070620171502.GL12168@mercury.ccil.org>
References: <30b660a20706171252l3c61d451p464b96e864d1a515@mail.gmail.com> <007f01c7b166$8ef7bf10$6401a8c0@DGBP7M81> <30b660a20706181006x3efbf772t9a0751feb070a6cb@mail.gmail.com> <20070619013433.GA15048@mercury.ccil.org> <30b660a20706191130x2a83134ned38aed061d551b1@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <30b660a20706191130x2a83134ned38aed061d551b1@mail.gmail.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
From: John Cowan <cowan@ccil.org>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 7fa173a723009a6ca8ce575a65a5d813
Cc: LTRU Working Group <ltru@ietf.org>, Doug Ewell <dewell@roadrunner.com>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

Mark Davis scripsit:

> My point was that anyone who wants to deal with macro languages, has to
> already deal with sr, hr, etc. as primary subtags, not as secondary. 

It's not a MUST, it's a MAY.  Matchers are allowed to take into account
any information outside the purely syntactic match algorithms that they
happen to have -- and find useful  For example, a fallback from Scots
to English probably makes sense everywhere -- if you know Scots, you
know English.  On the other hand, a fallback from Swedish to Finnish
would be a Really Bad Idea, except within Finland, where it might make
all the sense in the world.  (Falling back from English to Scots or
Finnish to Swedish essentially never makes sense.)

So while there is no reason to prevent such nonsyntactic matching,
and 4647 explicitly licenses it, there is no reason to require that
every matcher use it either.  The point of the extlang tags, like their
currently-grandfathered predecessors, is to make additional matches easy.

>   1. The reason for making microlanguages be extlang instead of primary
>   sublanguages is so that truncation-style matching will have better 
>   results.

Agreed.

>   2. Fallback works when there is mutual comprehensibility (not
>   necessarily 100%, but to a high degree); if you fallback to something
>   that is not comprehensible, then fallback has failed.

Two caveats: (a) fallback is one-way, so *mutual* comprehensibility
is not required; (b) Mohawk and English aren't comprehensible at all,
mutually or asymmetrically, but it so happens that the few hundred
Mohawk-speakers know English too.

Also, fallback is a matter of best effort; it does not have to work in
every case to be useful in many cases.  In particular, a failed fallback
leaves you no worse off than before.

> Option A.
> 
>   1. Thus for extlang to work for microlanguages, the speakers of any
>   microlanguages sharing a macrolanguage need to be able to understand
>   the speakers of any other microlanguages sharing that macrolanguage.

Not necessarily all, though the more the better, of course.

>   2. Peter and the ISO JAC can verify that A1 is true; that every
>   speaker of Hakka can understand Jinyu; every speaker of Shihhi Arabic
>   can understand Cypriot Arabic; and so on).

Of course that's not the case, as you must know.  Even if true in any
particular case, "every speaker" is an inappropriate standard.  If there
are one or two ancient Mohawks who don't have sufficient command of
English, it can't be helped.

>   3. Everything is hunky-dory.

"Hunky-dory" is the wrong standard.

> Option B.

This doesn't differ much from Option A, except that it changes the
semantics of macrolanguages in a way inconsistent with 639-3:

>   1. The macrolanguage alone is always assumed to be the "standard", 

Otherwise my comments above apply.

> After all, it is trivial to make a 4647bis that adds an optional step
> for microlanguages, which is that when you get to a microlanguage,
> the next step is to look at its macrolanguage before falling back to
> the default. That has the same result (and same problems) as extlang,
> but is something that is not baked into the standard -- is something
> that people can implement if they want without impacting matching for
> everyone else.

True enough, but it disregards the behavior of the large number of
naive matchers already out there that lack nonsyntactic information.
It also disregards the well-chosen grandfathered tags (which will become
for the most part redundant in 4646bis along the current lines), which
were picked precisely because they worked tolerably, if not perfectly,
with naive matchers.  This is the primary argument for using extlangs:
they are a conservative extension of what has already been done and what
tags have already been assigned to documents.

Your argument also proves too much: we find it useful to fall back from
az-Cyrl, az-Latn, and az-Arab to plain az (or vice versa for filtering)
without requiring that these be mutually intercomprehensible (your
Option A) or that one of them is the standard (your Option B).  The
results may be less than ideal, but we live with it.

> We and everyone else already have to deal with equivalences with
> grandfathered and irregular tags anyway; these are not a real problem.

Again, every one else does not *have to* deal with them; they MAY
treat them just like all other tags, at the expense of missing some
reasonable matches.  4647 makes such matches entirely optional.

> I disagree strongly. If you can't make a compelling case that extlang
> will make BCP 47 better instead of worse, and won't even look at the
> reasons not to do it, nor even bother to set out a case for it, then
> why should we add it?

It's true, as Dr. Johnson said: "Of an opinion which is no longer
doubted, the evidence ceases to be examined."  I've done my best above.
But I don't understand yet how extlangs make BCP 47 worse, and I think
they actually help given legacy documents and legacy matchers.

-- 
John Cowan      cowan@ccil.org        http://www.ccil.org/~cowan
        Is it not written, "That which is written, is written"?


_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru