Re: [Ltru] Fw: I-D Action: draft-falk-transliteration-tags-01.txt

"Martin J. Dürst" <> Thu, 21 July 2011 10:11 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8BF6321F87BC for <>; Thu, 21 Jul 2011 03:11:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -100.76
X-Spam-Status: No, score=-100.76 tagged_above=-999 required=5 tests=[AWL=1.030, BAYES_00=-2.599, GB_I_LETTER=-2, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id zgSyy3PuyUDS for <>; Thu, 21 Jul 2011 03:11:54 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 743DB21F88B7 for <>; Thu, 21 Jul 2011 03:11:54 -0700 (PDT)
Received: from ([]) by (secret/secret) with SMTP id p6LABrC3026175 for <>; Thu, 21 Jul 2011 19:11:53 +0900
Received: from (unknown []) by with smtp id 1262_fc26_db7b0f0a_b381_11e0_8ad4_001d096c5b62; Thu, 21 Jul 2011 19:11:53 +0900
Received: from [IPv6:::1] ([]:44546) by with [XMail 1.22 ESMTP Server] id <S1531992> for <> from <>; Thu, 21 Jul 2011 19:11:56 +0900
Message-ID: <>
Date: Thu, 21 Jul 2011 19:11:04 +0900
From: =?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= <>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: =?UTF-8?B?TWFyayBEYXZpcyDimJU=?= <>
References: <002701cc29f8$7c3e7d00$6801a8c0@oemcomputer> <> <> <000e01cc2b80$563628e0$6801a8c0@oemcomputer> <> <003d01cc2b84$af13f560$6801a8c0@oemcomputer> <> <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: LTRU Working Group <>,
Subject: Re: [Ltru] Fw: I-D Action: draft-falk-transliteration-tags-01.txt
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 21 Jul 2011 10:11:58 -0000

Sorry this is a late reply.

On 2011/06/22 1:00, Mark Davis ☕ wrote:
> Those are good issues; thanks for raising them and starting the discussion.
> Comments below.
> ------------------------------
> Mark
> *— Il meglio è l’inimico del bene —*
> On Mon, Jun 20, 2011 at 23:39, "Martin J. Dürst"<>wrotejp>wrote:
>> Hello Mark, others,
>> Overall comment:
>> The idea to reuse language tags to indicate transliteration/transcription
>> source, and to add some additional tags to distinguish methods seems to be
>> reasonable and sound.
>> The description of the structure of the allowed subtags and of the
>> responsibility split between IETF (this draft) and UTC (UTS 35) looks quite
>> messy to me, and should be cleaned up. I'd personally prefer that UTS 35 (or
>> whatever else on the Unicode side) only define the<mechanism>  part (after
>> the m0 subtag).
> That would be my preference as well (can't speak for my coauthors).
> We patterned it this way following what ended up being accepted for  the -u-
> extension. That is, the spec is in UTS35, but there is a summary here.

I didn't like that then, I have to admit.

> But
> of course, there are many ways to do it. And maybe this summary is too
> detailed, at least for the mechanism part, and we could just have it in
> UTS35.

The most important thing is to make clear what is 'summary' (i.e. 
non-normative) and what's normative. The second most important thing is 
that the RFC actually define something, not just say "look over there". 
The Unicode side is supposed to be a registry, not a spec (I think Doug 
already pointed that out.).

> We considered a number of alternatives:
>     - We could define everything after -t- to be the source language, and
>     everything after -m- to be the mechanism. But that burns 2 extension
>     letters, just one.
>     - We also considered having everything in the -u extension, for which we
>     already have the structure set up. However, that would force us to have
>     artificial source subtags like 'en0' instead of 'en', because the -u-
>     extension wouldn't allow the 2-letter subtags (it already defines a use for
>     them).
>     - We could also have -t- be just the source, and define the mechanism in
>     -u-, also easy. But we felt it would be better to have everything under one
>     extension.

This is the technical aspect, where I think you got it right. What I'm 
talking about above are questions of what spec says what.

>> Detailled comments:

>> BCP47 required information: The first three paragraphs should move to the
>> introduction.
> Other authors, what do you think?

In version -03, we have two sections titled "Introduction". Not a good 
sign for a spec.

>> "followed by a sequence of subtags that would form a language tag": Here
>> and in general: Don't use 'would'.
> Grammatically, it is that the sequence of subtags *would* form a language
> subtag if they *were* separated out. They are not actually a language tag,
> because they occur in the middle of another language subtag. How would you
> like that to be phrased?

I think "sequence of subtags that form a language tag" is fine.

>>    The structure of 't' subtags is determined by the Unicode CLDR
>>    Technical Committee, in accordance with the policies and procedures
>>    in**consortium/tc-procedures.html<>ml>,
>> and subject
>>    to the Unicode Consortium Policies on
>> .
>> The following paragraph is also difficult to understand. I wouldn't know
>> exactly what falls on what side. I think one major reason is that we are
>> treading new ground here, it's the first time we have a singleton definition
>> that allows reuse of language tags (with a few restrictions) as well as
>> intends to define its own extensions.
> These were both patterned after what was used for the -u- extension. We can
> take a look at them to try to clarify.

Please do.

Regards,    Martin.

>>    Changes that can be made by successive versions of LDML [UTS35] by
>>    the Unicode Consortium without requiring a new RFC include the
>>    allocation of new subtags for use after the 't' extension.  A new RFC
>>    would be required for material changes to an existing 't' subtag, or
>>    an incompatible change to the overall syntactic structure of the 't'
>>    extension; however, such a change would be contrary to the policies
>>    of the Unicode Consortium, and thus is not anticipated.
>> 2.1 Summary: There seems to be quite some overlap between the part of
>> section 2 before the 2.1 heading.
>> One question I would have as a linguistic researcher is: How much effort
>> and time is involved in getting a 'mechanism' approved? If such 'mechanisms'
>> are e.g. rejected with arguments like "if we accept it, then everybody has
>> to implement it" or so, then I would see that as a problem.
> Good point. I'll propose some text.
>> So much for the moment.
>> Regards,   Martin.
>> On 2011/06/18 6:07, Mark Davis ☕ wrote:
>>> Yoshito, Addison, and I had had an action for a while now from the CLDR
>>> committee to submit a draft for a an extension. Rather than go through all
>>> the problems in the falk draft, we put together an alternative approach,
>>> leveraging the work we already did for the -u- extension.
>>> It just got posted at
>>> Courtney, I think this provides a superset of the functionality that you
>>> are
>>> interested in. Perhaps you can read it over, and we can add you as an
>>> author
>>> of the next version of this draft instead of having the two competing
>>> proposals.
>>> Mark
>>> *— Il meglio è l’inimico del bene —*
>>> On Wed, Jun 15, 2011 at 10:50, Randy Presuhn
>>> <>**wrote*wrote:
>>>   Hi -
>>>> I started out with an off-list response, but I figure this is
>>>> something worth sending to the list.
>>>> Off-list, a contributor asked:
>>>> ...
>>>>> I'd love to see your input. I'd like to make sure I understand
>>>>> all the concerns. Is there any way you could forward this to the list?
>>>> My response:
>>>> Sorry, already deleted.  As I recall, the main concerns were
>>>>   (1) there already *is* support for identifying orthographies
>>>>       (remember German?)
>>>>   (2) the I-D seems to assume that transliterations always result
>>>>       in "Latin" (previous discussion on LTRU included transliterations
>>>>       to Cyrillic and Hangul, among others)
>>>>   (3) the "original orthography" is irrelevant for the transliteration
>>>>       systems I've been able to think of.  (At the same time, some
>>>>       transliteration systems are quite "lossy" and some don't do
>>>>       "round trip" very well.)  Consider also the transliteration of
>>>> material
>>>>       which was originally in audio form...
>>>>   (4) The draft doesn't clearly distinguish "orthography" from
>>>> "transliteration".
>>>>       This may be because the boundary between the two can be fuzzy, but
>>>> even
>>>>       that is an issue that should be addressed.
>>>>   (5) How this fits in with *transcription* systems (e.g. IPA) should be
>>>>       addressed.  The boundary gets fuzzy with orthographies that are
>>>> equivalent
>>>>       to phonemic representations of the language.  (e.g., Pinyin for
>>>> Mandarin)
>>>>   (6) The proposed singleton usage appears broken and unnecessary.
>>>> Or something like that.  I may have forgotten something here, or, in the
>>>> process of reconstruction, thought of something I missed the first time.
>>>> Randy
>>>> ______________________________**_________________
>>>> Ltru mailing list
>>> ______________________________**_________________
>>> Ltru mailing list