[Ltru] Great Script Debate "the Next Generation"... (long)

Addison Phillips <addison@yahoo-inc.com> Thu, 05 October 2006 18:58 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1GVYQ8-0008GU-6f; Thu, 05 Oct 2006 14:58:12 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GVYQ7-0008GI-28 for ltru@ietf.org; Thu, 05 Oct 2006 14:58:11 -0400
Received: from rsmtp2.corp.yahoo.com ([207.126.228.150]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GVYQ5-00006m-F2 for ltru@ietf.org; Thu, 05 Oct 2006 14:58:11 -0400
Received: from [172.21.37.80] (duringperson-lx.corp.yahoo.com [172.21.37.80]) (authenticated bits=0) by rsmtp2.corp.yahoo.com (8.13.6/8.13.6/y.rout) with ESMTP id k95Iw29O080151 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <ltru@ietf.org>; Thu, 5 Oct 2006 11:58:02 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:subject: content-type:content-transfer-encoding; b=q8+Ot1AyXMwGAsqmFxpojVfsxPkvpKDnWd0A0haSgTHEZ2aPpUWy/pP5UjE5VRVV
Message-ID: <452555BA.2040601@yahoo-inc.com>
Date: Thu, 05 Oct 2006 11:58:02 -0700
From: Addison Phillips <addison@yahoo-inc.com>
User-Agent: Thunderbird 1.5.0.7 (Windows/20060909)
MIME-Version: 1.0
To: 'LTRU Working Group' <ltru@ietf.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Score: -15.0 (---------------)
X-Scan-Signature: 944ecb6e61f753561f559a497458fb4f
Subject: [Ltru] Great Script Debate "the Next Generation"... (long)
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

All,

Taking up the gauntlet flung down by John Cowan :-), herewith my 
proposal for fixing Suppress-Scripts

----
First, some recap of the problem. Suppress-Scripts are meant to identify
languages that are written predominantly in a single script. This is to
warn users and implementers not to form language tags using a script
subtag that is usually redundant, for compatibility with tagging
practices prior to RFC 4646.

S-S poses a number of interesting problems.

I've cited previously the registration problems. Mainly the problem here
is that most languages fit the pattern of "wanting" some form of S-S
field. Even languages that might not have a clear relationship to a
specific script subtag (cf. Doug's research on Korean) probably should
not use a script subtag.

In particular, creating accurate values for even the ISO 639-1 and ISO
639-2 set of languages would require significant knowledge about the
current and recent historical writing traditions of each given language,
plus, possibly, some knowledge of public policy and/or potential
suppression or abuse of minority tradition in regard to that language.

S-S indicates that a given language is written predominantly in a
specific script, so the burden of proof for a less common language might
be very difficult to achieve, since the presence of many texts in a
specific script does not "prove the negative", that is, that a
significant body of texts or a specific writing tradition does not exist
that uses a separate script.

The main alternative we've dealt with in the past would be an
"Accept-Script" or "Recommend-Script" approach. That design, which was
not adopted in RFC 4646, involves documenting the known cases in which a
script subtag *should* be used, and, in effect, recommending that
languages that do not have an "A-S" field not use a script subtag except
when indicating a specific difference important within a given group of
information items.

A-S avoids the "proving the negative" problem of S-S. Since it applies
to a much smaller set of languages, it probably requires less
registration overhead. The burden of proof may be just as difficult to
achieve and is encumbered with essentially the same problems that attend
S-S, since assertions about multiple script usage are just as
potentially disruptive as assertions about the "single scriptness" of a
language.

Removing script information from the registry altogether is appealing as
an alternative. As Mark points out, script subtags are entirely
voluntary and entirely valid. The informational nature of the S-S field
is merely to help guide implementers and users to try and do the right
things. If maintenance is a nightmare, why persist in maintaining
somewhat fictional information?

I do think that guidance for users/implementers is a valid goal here. My
experience as an "eminence grise" for language tags over the past couple
of years is that the level of ill-informedness and mythology surrounding
language tags is pretty deep. Anything we can do to help speed proper
implementation of language tags will help.

The problem here is that I think we're miscasting the role of the script
advisory field or fields in the registry. If we only document the "do
not use" case, users and implementers will remain ignorant of what to do
for languages without the S-S field. Having explained the Chinese issue
several times, it's clear to me that many implementers will not stumble
over the right subtags by accident... and certainly not for languages 
such as Serbian, Uzbek, or Azerbaijani.

If we only document the "do use" cases, though, users and implementers
may not notice the warnings against use of scripts elsewhere. Leading to
the problem we initially sought to prevent.

Thus, my proposal for solving the problem:

1. Include the strongest possible warnings about not using script
subtags in 4646bis. This is probably embodied in Karen Broome's
suggested texts.

2. Replace "Suppress-Script" with "Script". If no script field is
supplied, language tags/ranges should still not use a script subtag
unless one is warranted by the information item or request. If the
script field is present and contains a single item, the language is
known to use that script predominantly. If two or more items are
present, the language is commonly written in more than one script. Users 
are advised *not* to use a script subtag unless the language has more 
than one item in the Script field.

Potential issues:

1. zero or one script subtag have the same behavior. Acquiring a second 
script, however, requires extra scrutiny by ietf-languages because it 
changes the potential default behavior for tag formation for that language.


Reactions?



-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.


_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru