Re: [Ltru] my technical position on extlang

John Cowan <cowan@ccil.org> Fri, 23 May 2008 22:55 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0CA473A6D16; Fri, 23 May 2008 15:55:12 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4931728C2EC for <ltru@core3.amsl.com>; Fri, 23 May 2008 15:55:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.279
X-Spam-Level:
X-Spam-Status: No, score=-2.279 tagged_above=-999 required=5 tests=[AWL=0.320, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G6iHXlbaTG3d for <ltru@core3.amsl.com>; Fri, 23 May 2008 15:55:10 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by core3.amsl.com (Postfix) with ESMTP id 1FBA13A6990 for <ltru@ietf.org>; Fri, 23 May 2008 15:55:10 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from <cowan@ccil.org>) id 1Jzg9A-0007Og-OB; Fri, 23 May 2008 18:54:00 -0400
Date: Fri, 23 May 2008 18:54:00 -0400
To: Mark Davis <mark.davis@icu-project.org>
Message-ID: <20080523225400.GB13152@mercury.ccil.org>
References: <30b660a20805181149u2e1e3fb9y1a3b5b751c3e6998@mail.gmail.com> <20080523044305.GB7960@mercury.ccil.org> <30b660a20805230851r519f5d14wd93a92494d1db1c9@mail.gmail.com> <20080523160905.GD21554@mercury.ccil.org> <30b660a20805231405q56b156c4vbb3b6abda4af3893@mail.gmail.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <30b660a20805231405q56b156c4vbb3b6abda4af3893@mail.gmail.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
From: John Cowan <cowan@ccil.org>
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] my technical position on extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

Mark Davis scripsit:

>    1. "get me languages that are mutually intelligible with X" (maybe
>    to degree Y), and
>    2. "get me the languages that have the same macrolanguage as X"
> 
> Number 1 is very interesting, and would be very useful; but it is
> not at all the same as #2. If macrolanguage were defined as #1, I
> would probably be all in favor of baking it into extlangs. But is
> not at all the same, as many, many examples illustrate. Moreover,
> "mutual intelligibility" differs whether the content is written or
> spoken - forcing it to be baked into the syntax does not allow for
> that difference.

This is as much as to say, Because we can't have everything, let us
have nothing.  To which I reply, The best is the enemy of the good.

> Also, as I read your response, I think at least part of our apparent
> differences is the use of different terminology.

Indeed.

> I was using "content negotiation"  in the lookup sense, which is what is
> typically done with Accept-Language (not always, but typically).

That just turns out not to be the case.

> That is, the client is supplying a list of languages, perhaps with q
> values, and the expectation is that s/he will get one thing back. For
> example, a web page in one of the requested languages, but could be
> any sort of resource. And in such interactions, you do want lookup on
> the individual items; "ar" is not treated like "ar(-.*)"

If you look at RFC 2616, you'll see that what it describes in section
14.4 is filtering:

   A language-range matches a language-tag if it exactly equals the tag,
   or if it exactly equals a prefix of the tag such that the first tag
   character following the prefix is "-".

That's the standard.

Now, if we look at the documented behavior of the dominant web server,
Apache, at http://httpd.apache.org/docs/2.2/content-negotiation.html ,
you'll find that it implements that standard.

It's true that if filtering fails to find anything, Apache provides
additional lookup-style fallback, thus:

	When a client requests a page on your server, but the server
	cannot find a single page that matches the Accept-language sent
	by the browser, the server will return either a "No Acceptable
	Variant" or "Multiple Choices" response to the client.

	[snipped explanation of configuring ultimate fallback]

	The server will also attempt to match language-subsets when
	no other match can be found. For example, if a client requests
	documents with the language en-GB for British English, the server
	is not normally allowed by the HTTP/1.1 standard to match that
	against a document that is marked as simply en. (Note that it is
	almost surely a configuration error to include en-GB and not en
	in the Accept-Language header, since it is very unlikely that
	a reader understands British English, but doesn't understand
	English in general. Unfortunately, many current clients have
	default configurations that resemble this.)

	However, if no other language match is possible and the server
	is about to return a "No Acceptable Variants" error or fallback
	to the LanguagePriority, the server will ignore the subset
	specification and match en-GB  against en documents. Implicitly,
	Apache will add the parent language to the client's acceptable
	language list with a very low quality value. But note that if
	the client requests "en-GB; q=0.9, fr; q=0.8", and the server
	has documents designated "en" and "fr", then the "fr" document
	will be returned. This is necessary to maintain compliance with
	the HTTP/1.1 specification and to work effectively with properly
	configured clients.

This is very clear that the use of lookup is an extension to provide
ultimate fallback for misconfigured browsers, *not* the standard behavior
in handling HTTP Accept-Language: headers.

> Again, if we are talking about content negotiation, where we return a 'best
> fit', it is going to be a rare case where 'pga' is supported but 'ar' is
> not. 

Consider a collection of oral histories by people in arabophone countries.
Standard Arabic would be rare among them, as most people can't speak it.

> Better to just explicitly list the ones that are wanted.

I may even agree with you here, but I don't think it's our business to
tell people how to configure their browsers.

> That isn't my point. My point is that the explicit list is
> 
>    - currently works.
>    - handles everything that extlang could, with finer control
>    - more powerful, since you can express things that extlang simply can't
>    (eg "ro, mo, no, nn, nb"), and provide exactly the list you want.

It's also painful to set up, less stable than the variety using
macrolanguages, and so on.

> > > Putting some ISO language tags into the extlang position just
> > > because they have a macrolanguage is an unnecessary complication
> > > for implementations and represents a substantive, controversial
> > > change to RFC 4646.
> >
> > *Now* it does, yes.  All that shows is that I should have thought of
> > this argument before giving in earlier.
> 
> I didn't quite understand this.

In other words, extlangs were the status quo before the teleconferences,
and if I had held out, they might well still be.  Because I gave in then
(prematurely, as I think now), no-extlangs is the status quo.

-- 
                Si hoc legere scis, nimium eruditionis habes.
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru