[Ltru] Re: Non-Latin-1 Description fields in RFC 4645bis

"Doug Ewell" <dewell@roadrunner.com> Fri, 07 December 2007 06:26 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1J0WfP-0003aM-VN; Fri, 07 Dec 2007 01:26:33 -0500
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1J0WfO-0003Yj-Dh for ltru-confirm+ok@megatron.ietf.org; Fri, 07 Dec 2007 01:26:30 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1J0WfO-0003YP-2T for ltru@ietf.org; Fri, 07 Dec 2007 01:26:30 -0500
Received: from mta11.adelphia.net ([68.168.78.205]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1J0WfN-0003UT-FL for ltru@ietf.org; Fri, 07 Dec 2007 01:26:30 -0500
Received: from DGBP7M81 ([76.167.184.182]) by mta11.adelphia.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with SMTP id <20071207062629.EMSA19654.mta11.adelphia.net@DGBP7M81> for <ltru@ietf.org>; Fri, 7 Dec 2007 01:26:29 -0500
Message-ID: <00de01c8389a$1a388590$6601a8c0@DGBP7M81>
From: Doug Ewell <dewell@roadrunner.com>
To: LTRU Working Group <ltru@ietf.org>
Date: Thu, 06 Dec 2007 22:26:28 -0800
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="utf-8"; reply-type="response"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
X-Spam-Score: 2.2 (++)
X-Scan-Signature: 02ec665d00de228c50c93ed6b5e4fc1a
Subject: [Ltru] Re: Non-Latin-1 Description fields in RFC 4645bis
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

When thinking about how to define "the Latin script" for purposes of 
meeting the requirement in RFC 4646bis, section 3.1.4, it might be 
useful to consider why we have such a requirement in the first place.

We have generally copied Description fields for language, script, and 
region subtags from the respective ISO standard.  The various ISO MA's 
and RA's generally do not go overboard in character repertoire -- in 
particular, ISO 639-3/RA uses ordinary ASCII punctuation in place of 
true click letters -- but in this case they have seen fit to use 
"Māhārāṣṭri Prākrit" as a language name.  (I don't know what, if 
anything, the plain-ASCII equivalent "Maharastri Prakrit" means; I hope 
it's something I can say in public.)  We have committed ourselves to 
using the names from the ISO standards.  This issue mainly concerns the 
variant subtags.

Many of the fields in the Registry are intended for some type of 
automated processing, but the Description is really meant for human 
consumption.  Software certainly isn't going to do much with it, except 
display it.  Therefore, the Description field ought to be designed with 
human usability in mind.

While the names of languages, writing systems, countries, etc. could 
obviously be written in a huge variety of scripts, most humans 
(professional translators and linguists aside) are unlikely to be able 
to read more than two or three scripts, so it makes sense to impose a 
limit to help ensure scrutability.  Because of the nature of the rest of 
the Registry, some familiarity with the Latin script is more or less 
assumed.  Consequently, we imposed a requirement that every subtag have 
at least one Latin-script description.

There is no written language on earth that uses all of the letters of 
the Latin script, for any reasonable definition of "Latin script" (i.e. 
more than just ASCII).  Nevertheless, a reader can generally recognize 
Latin-script letters that are not part of that reader's alphabet.  For 
example, a Francophone should be able to recognize n-tilde (ñ) as a 
letter belonging to the broader "Latin script" even though it is not in 
the narrower "French alphabet."

Technical limits should not be seen as the primary concern.  The problem 
of limited font coverage, as explained months ago to CE Whitehead, is a 
temporary one that will not last forever.  Programs and operating 
systems are being localized to more and more languages, requiring more 
comprehensive font coverage, and rendering engines are becoming smart 
enough to substitute glyphs from alternative fonts instead of displaying 
square boxes.  On the other hand, an LTRU policy that restricts "the 
Latin script" to an artificial subset is something that will last until 
some future LTRU comes around to change it.

I suggest we adopt a liberal definition of "the Latin script," 
preferably one based on Unicode Standard Annex #24, "Script Names," 
rather than a narrow definition based on legacy character sets or 
subsets, keyboard layouts, or font coverage.

These are my opinions; yours may differ.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://home.roadrunner.com/~dewell
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ



_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru