[Ltru] John Cowan throws in the towel on extlangs

John Cowan <cowan@ccil.org> Thu, 29 November 2007 14:39 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IxkXy-0001na-LO; Thu, 29 Nov 2007 09:39:22 -0500
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IxkXy-0001nF-7C for ltru-confirm+ok@megatron.ietf.org; Thu, 29 Nov 2007 09:39:22 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IxkXx-0001n5-Tz for ltru@ietf.org; Thu, 29 Nov 2007 09:39:21 -0500
Received: from earth.ccil.org ([192.190.237.11]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IxkXx-0004Mg-4u for ltru@ietf.org; Thu, 29 Nov 2007 09:39:21 -0500
Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from <cowan@ccil.org>) id 1IxkXw-0003lD-TW for ltru@ietf.org; Thu, 29 Nov 2007 09:39:20 -0500
Date: Thu, 29 Nov 2007 09:39:20 -0500
To: ltru@ietf.org
Message-ID: <20071129143920.GC32134@mercury.ccil.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.13 (2006-08-11)
From: John Cowan <cowan@ccil.org>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 4d87d2aa806f79fed918a62e834505ca
Subject: [Ltru] John Cowan throws in the towel on extlangs
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

After yesterday's telcon, it became clear to me that we had no movement
toward reconciling the extlang-yes and the extlang-no factions, and
that if we were ever to make progress, someone would have to concede.
That turns out to be me.  No one else has been actually defending
the use of extlangs, and several people are either mildly or strongly
against them, so all I am doing now is blocking consensus to no purpose.
Therefore, I have reluctantly ceased to block it.

The strongest argument against extlang tags is that our simple
remove-from-right lookup algorithm ends up losing too much information.
If you ask for "zh-yue-Hant", and there are no Cantonese resources, then
by the time you have truncated the tag to just "zh" you have forgotten
that Traditional Han script was required.  In order to prevent it,
we would have to change 4647 to specify a sequence like zh-yue-Hant >
zh-yue > zh-Hant > zh, where the extlang tag is treated magically;
in essence, stripping it restores all other tags.  That would be fine,
except that no one actually does it or is likely to for some time to come.

Everything else is more or less symmetrical.  For example, almost
all content tagged 'zh' is in fact Mandarin, as an uncontroversial
consequence of the fact that almost all tagged content is written,
and almost all written Chinese is in fact Mandarin.  There will be a
problem in getting good matches if people use the new 'cmn' subtag,
as naive matchers will not know its relationship to 'zh', but no worse
than the problems shown above for Cantonese.

In practical terms, this means that (unless someone else wants to take
up the cause on the LTRU mailing list and see Bonnie Prince Charlie
come into his own again) that we will allow all 639-3 code elements
to be language subtags, and deprecate all the grandfathered forms like
"zh-yue" (Cantonese) in favor of just 'yue'.  Similarly, the currently
redundant sign languages like "sgn-US" will be deprecated in favor of
'ase' and friends.  The existing language tags representing macrolanguages
of course remain intact.

IMO we should strongly consider adding a new informative (and mutable)
"Fallback" header in the registry which will inform people about
problematic cases like "cmn" and "arb" (Standard Arabic), instructing
them which language subtag to fall back to in cases of match failure.
These will have to be cherry-picked, like Suppress-Script, but Peter
Constable has estimated that there are no more than 15 such cases
actually in wide use.  For some macrolanguages, there is no dominant
variety, and no special consideration is as yet required; if new dominant
varieties come to exist in future, new Fallback headers can be added.
RFC 4647bis can then be revised to explain how this header MAY be used
to enhance matching.

-- 
Mark Twain on Cecil Rhodes:                    John Cowan
I admire him, I freely admit it,               http://www.ccil.org/~cowan
and when his time comes I shall                cowan@ccil.org
buy a piece of the rope for a keepsake.


_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru