Re: [Ltru] Fw: I-D Action: draft-falk-transliteration-tags-01.txt

"Martin J. Dürst" <> Tue, 21 June 2011 06:39 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 29E0E11E813C for <>; Mon, 20 Jun 2011 23:39:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -99.878
X-Spam-Status: No, score=-99.878 tagged_above=-999 required=5 tests=[AWL=-0.087, BAYES_00=-2.599, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id vQ92C3-Wgii6 for <>; Mon, 20 Jun 2011 23:39:33 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 0F9B111E80CF for <>; Mon, 20 Jun 2011 23:39:32 -0700 (PDT)
Received: from ([]) by (secret/secret) with SMTP id p5L6dMvB025812 for <>; Tue, 21 Jun 2011 15:39:22 +0900
Received: from (unknown []) by with smtp id 446d_7a84_32ab303e_9bd1_11e0_9ce4_001d0969ab06; Tue, 21 Jun 2011 15:39:22 +0900
Received: from [IPv6:::1] ([]:38817) by with [XMail 1.22 ESMTP Server] id <S15205CE> for <> from <>; Tue, 21 Jun 2011 15:39:16 +0900
Message-ID: <>
Date: Tue, 21 Jun 2011 15:39:03 +0900
From: =?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= <>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: =?UTF-8?B?TWFyayBEYXZpcyDimJU=?= <>
References: <002701cc29f8$7c3e7d00$6801a8c0@oemcomputer> <> <> <000e01cc2b80$563628e0$6801a8c0@oemcomputer> <> <003d01cc2b84$af13f560$6801a8c0@oemcomputer> <>
In-Reply-To: <>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: LTRU Working Group <>,
Subject: Re: [Ltru] Fw: I-D Action: draft-falk-transliteration-tags-01.txt
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 21 Jun 2011 06:39:34 -0000

Hello Mark, others,

Overall comment:
The idea to reuse language tags to indicate 
transliteration/transcription source, and to add some additional tags to 
distinguish methods seems to be reasonable and sound.

The description of the structure of the allowed subtags and of the 
responsibility split between IETF (this draft) and UTC (UTS 35) looks 
quite messy to me, and should be cleaned up. I'd personally prefer that 
UTS 35 (or whatever else on the Unicode side) only define the 
<mechanism> part (after the m0 subtag).

Detailled comments:

"In addition, it may also be important to
    specify a particular specification for the transformation.": Too 
much 'spec' in one sentence.

"For example, if one is transcribing the names of Italian or Russian
    cities on a map for Japanese users, each name will need to be
    transliterated into katakana using rules appropriate for the source
    language and target languages.": "source languages and target language"?

BCP47 required information: The first three paragraphs should move to 
the introduction.

"followed by a sequence of subtags that would form a language tag": Here 
and in general: Don't use 'would'.

    The structure of 't' subtags is determined by the Unicode CLDR
    Technical Committee, in accordance with the policies and procedures
    in, and subject
    to the Unicode Consortium Policies on

The following paragraph is also difficult to understand. I wouldn't know 
exactly what falls on what side. I think one major reason is that we are 
treading new ground here, it's the first time we have a singleton 
definition that allows reuse of language tags (with a few restrictions) 
as well as intends to define its own extensions.

    Changes that can be made by successive versions of LDML [UTS35] by
    the Unicode Consortium without requiring a new RFC include the
    allocation of new subtags for use after the 't' extension.  A new RFC
    would be required for material changes to an existing 't' subtag, or
    an incompatible change to the overall syntactic structure of the 't'
    extension; however, such a change would be contrary to the policies
    of the Unicode Consortium, and thus is not anticipated.

2.1 Summary: There seems to be quite some overlap between the part of 
section 2 before the 2.1 heading.

One question I would have as a linguistic researcher is: How much effort 
and time is involved in getting a 'mechanism' approved? If such 
'mechanisms' are e.g. rejected with arguments like "if we accept it, 
then everybody has to implement it" or so, then I would see that as a 

So much for the moment.

Regards,   Martin.

On 2011/06/18 6:07, Mark Davis ☕ wrote:
> Yoshito, Addison, and I had had an action for a while now from the CLDR
> committee to submit a draft for a an extension. Rather than go through all
> the problems in the falk draft, we put together an alternative approach,
> leveraging the work we already did for the -u- extension.
> It just got posted at
> Courtney, I think this provides a superset of the functionality that you are
> interested in. Perhaps you can read it over, and we can add you as an author
> of the next version of this draft instead of having the two competing
> proposals.
> Mark
> *— Il meglio è l’inimico del bene —*
> On Wed, Jun 15, 2011 at 10:50, Randy Presuhn
> <>wrote;wrote:
>> Hi -
>> I started out with an off-list response, but I figure this is
>> something worth sending to the list.
>> Off-list, a contributor asked:
>> ...
>>> I'd love to see your input. I'd like to make sure I understand
>>> all the concerns. Is there any way you could forward this to the list?
>> My response:
>> Sorry, already deleted.  As I recall, the main concerns were
>>   (1) there already *is* support for identifying orthographies
>>       (remember German?)
>>   (2) the I-D seems to assume that transliterations always result
>>       in "Latin" (previous discussion on LTRU included transliterations
>>       to Cyrillic and Hangul, among others)
>>   (3) the "original orthography" is irrelevant for the transliteration
>>       systems I've been able to think of.  (At the same time, some
>>       transliteration systems are quite "lossy" and some don't do
>>       "round trip" very well.)  Consider also the transliteration of
>> material
>>       which was originally in audio form...
>>   (4) The draft doesn't clearly distinguish "orthography" from
>> "transliteration".
>>       This may be because the boundary between the two can be fuzzy, but
>> even
>>       that is an issue that should be addressed.
>>   (5) How this fits in with *transcription* systems (e.g. IPA) should be
>>       addressed.  The boundary gets fuzzy with orthographies that are
>> equivalent
>>       to phonemic representations of the language.  (e.g., Pinyin for
>> Mandarin)
>>   (6) The proposed singleton usage appears broken and unnecessary.
>> Or something like that.  I may have forgotten something here, or, in the
>> process of reconstruction, thought of something I missed the first time.
>> Randy
>> _______________________________________________
>> Ltru mailing list
> _______________________________________________
> Ltru mailing list