Re: [Ltru] my technical position on extlang

John Cowan <cowan@ccil.org> Fri, 23 May 2008 16:09 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C8AA73A6CEC; Fri, 23 May 2008 09:09:24 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 93F8828C143 for <ltru@core3.amsl.com>; Fri, 23 May 2008 09:09:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.199
X-Spam-Level:
X-Spam-Status: No, score=-2.199 tagged_above=-999 required=5 tests=[AWL=0.400, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p0do67d5fpp1 for <ltru@core3.amsl.com>; Fri, 23 May 2008 09:09:15 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by core3.amsl.com (Postfix) with ESMTP id 2FC1F3A6CE6 for <ltru@ietf.org>; Fri, 23 May 2008 09:09:15 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from <cowan@ccil.org>) id 1JzZpJ-00040L-8W; Fri, 23 May 2008 12:09:05 -0400
Date: Fri, 23 May 2008 12:09:05 -0400
To: Mark Davis <mark.davis@icu-project.org>
Message-ID: <20080523160905.GD21554@mercury.ccil.org>
References: <30b660a20805181149u2e1e3fb9y1a3b5b751c3e6998@mail.gmail.com> <20080523044305.GB7960@mercury.ccil.org> <30b660a20805230851r519f5d14wd93a92494d1db1c9@mail.gmail.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <30b660a20805230851r519f5d14wd93a92494d1db1c9@mail.gmail.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
From: John Cowan <cowan@ccil.org>
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] my technical position on extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

Mark Davis scripsit:

[points 1, 2, 3a snipped]

These all amount to "Some implementations are buggy and don't follow
RFC 2616."  Such bugs that could be fixed at any time, in which
case the servers would come into compliance.  Historical practice in
tagging documents is important, buggy protocol implementations are not.
(A variant of the ISO C motto about "code matters, implementations
don't".)

> [3b]  - Interpreting 'ar' as Standard Arabic is permissible now
> and after,

Certainly.

> With the above list, the practical impact will be that it will stop at
> 'ar' anyway and return Standard Arabic.

As I say, that's a clear violation of the RFC.

> The odds of an implementation supporting "pga" but not ar/arb are
> pretty darn'd low.

Unless it just happens to be a site with lots of audio from around
the arabophone world.  *Across all sites* all non-standard Arabics are
rare, but that doesn't apply to *specific* sites that really need to do
language negotiation.

> Even if the server doesn't support extlang matching, you can get exactly
> what you said you want -- reliably -- with an explicit list:
> 
> arz, arb, ar, arq, aao, bbz, abv, acy, adf, avl, afb, ayh, acw, ayl, acm,
> ary, ars, apc, ayp, acx, aec, ayn, ssh, ajp, apd, pga, acq, abh, aeb, auz

Which is just what I said.  However, that isn't very stable with time
as new Arabic varieties get encoded, not to mention extremely obnoxious
to the user without special support in the client.

> And this will work even if filtering isn't supported according to the spec;
> and note that we have no guarantee that the server will filter according to
> the spec -- based on the experience with Accept-Language, it's actually
> unlikely.

Most authors don't tag according to the spec either, so why have language
tags at all?  Search engines can and should ignore them; that doesn't
mean they are worthless for other purposes.  We design around the central
point that people tag wisely and programs behave properly, which is not
necessarily the best *implementation* strategy for special cases like
search engines.

> And really, what are the odds that someone wants that exact list
> above? And MUST have it supported with a short enumeration? And does
> not care if additional encompassed languages are added?  According
> to information from Peter, the microlanguages are all independent
> languages, meaning that they are not mutually comprehensible.

The Arabic languages don't have full mutual intelligibility, but that's not
the same as saying intelligibility is zero.  I specifically picked Shuwa
(Chadian/Nigerian) to exclude because it's very remote from most others.

> - Languages that are *very* closely related do not have a macrolanguage
> ("ro" and "mo").

I know.  That doesn't mean not to take advantage of the ISO 639-3
information we do have.

> I don't see extlang as any really practical benefit in filtering. It isn't
> useful in a great many other cases of related languages; if I want to use
> with nn, nb, no it doesn't help, nor with ro/mo, various varieties of
> German, and so on. Moreover, if a customer searches for 'ar' and get a bunch
> of documents in shu, they are as or more likely to be unhappy as if they
> search for 'de' documents and get back a bunch of 'gsw'.

As I keep saying, *search* isn't the issue here.  It's *negotiation
by filtering*.

> Putting some ISO language tags into the extlang position just because they
> have a macrolanguage is an unnecessary complication for implementations and
> represents a substantive, controversial change to RFC 4646.

*Now* it does, yes.  All that shows is that I should have thought of
this argument before giving in earlier.

-- 
Values of beeta will give rise to dom!          John Cowan
(5th/6th edition 'mv' said this if you tried    http://www.ccil.org/~cowan
to rename '.' or '..' entries; see              cowan@ccil.org
http://cm.bell-labs.com/cm/cs/who/dmr/odd.html)
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru