Re: [Ltru] Fw: I-D Action: draft-falk-transliteration-tags-01.txt

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Thu, 21 July 2011 10:11 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: ltru@ietfa.amsl.com
Delivered-To: ltru@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8BF6321F87BC for <ltru@ietfa.amsl.com>; Thu, 21 Jul 2011 03:11:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.76
X-Spam-Level:
X-Spam-Status: No, score=-100.76 tagged_above=-999 required=5 tests=[AWL=1.030, BAYES_00=-2.599, GB_I_LETTER=-2, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zgSyy3PuyUDS for <ltru@ietfa.amsl.com>; Thu, 21 Jul 2011 03:11:54 -0700 (PDT)
Received: from acintmta01.acbb.aoyama.ac.jp (acintmta01.acbb.aoyama.ac.jp [133.2.20.33]) by ietfa.amsl.com (Postfix) with ESMTP id 743DB21F88B7 for <ltru@ietf.org>; Thu, 21 Jul 2011 03:11:54 -0700 (PDT)
Received: from acmse01.acbb.aoyama.ac.jp ([133.2.20.226]) by acintmta01.acbb.aoyama.ac.jp (secret/secret) with SMTP id p6LABrC3026175 for <ltru@ietf.org>; Thu, 21 Jul 2011 19:11:53 +0900
Received: from (unknown [133.2.206.133]) by acmse01.acbb.aoyama.ac.jp with smtp id 1262_fc26_db7b0f0a_b381_11e0_8ad4_001d096c5b62; Thu, 21 Jul 2011 19:11:53 +0900
Received: from [IPv6:::1] ([133.2.210.5]:44546) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S1531992> for <ltru@ietf.org> from <duerst@it.aoyama.ac.jp>; Thu, 21 Jul 2011 19:11:56 +0900
Message-ID: <4E27FB38.7050108@it.aoyama.ac.jp>
Date: Thu, 21 Jul 2011 19:11:04 +0900
From: =?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: =?UTF-8?B?TWFyayBEYXZpcyDimJU=?= <mark@macchiato.com>
References: <002701cc29f8$7c3e7d00$6801a8c0@oemcomputer> <BANLkTinv4kB7X_hx6N7=B2-1QO8x3EoosA@mail.gmail.com> <4DF80CD1.1090907@it.aoyama.ac.jp> <000e01cc2b80$563628e0$6801a8c0@oemcomputer> <2CB55BFC7405E94F830537BD924318D5EBF06DE3BF@USSDIXMSG11.am.sony.com> <003d01cc2b84$af13f560$6801a8c0@oemcomputer> <BANLkTi=88jBLkn3eQM6OqALJt87B1APaWg@mail.gmail.com> <4E003C87.5090009@it.aoyama.ac.jp> <BANLkTikyx-YPZQNN1PoHGwGz=WHOxbDRtg@mail.gmail.com>
In-Reply-To: <BANLkTikyx-YPZQNN1PoHGwGz=WHOxbDRtg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: LTRU Working Group <ltru@ietf.org>, court@infiauto.com
Subject: Re: [Ltru] Fw: I-D Action: draft-falk-transliteration-tags-01.txt
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jul 2011 10:11:58 -0000

Sorry this is a late reply.

On 2011/06/22 1:00, Mark Davis ☕ wrote:
> Those are good issues; thanks for raising them and starting the discussion.
> Comments below.
>
> ------------------------------
> Mark
> *— Il meglio è l’inimico del bene —*
>
>
> On Mon, Jun 20, 2011 at 23:39, "Martin J. Dürst"<duerst@it.aoyama.ac.jp>wrotejp>wrote:
>
>> Hello Mark, others,
>>
>> Overall comment:
>> The idea to reuse language tags to indicate transliteration/transcription
>> source, and to add some additional tags to distinguish methods seems to be
>> reasonable and sound.
>>
>> The description of the structure of the allowed subtags and of the
>> responsibility split between IETF (this draft) and UTC (UTS 35) looks quite
>> messy to me, and should be cleaned up. I'd personally prefer that UTS 35 (or
>> whatever else on the Unicode side) only define the<mechanism>  part (after
>> the m0 subtag).
>>
>
> That would be my preference as well (can't speak for my coauthors).
>
> We patterned it this way following what ended up being accepted for  the -u-
> extension. That is, the spec is in UTS35, but there is a summary here.

I didn't like that then, I have to admit.

> But
> of course, there are many ways to do it. And maybe this summary is too
> detailed, at least for the mechanism part, and we could just have it in
> UTS35.

The most important thing is to make clear what is 'summary' (i.e. 
non-normative) and what's normative. The second most important thing is 
that the RFC actually define something, not just say "look over there". 
The Unicode side is supposed to be a registry, not a spec (I think Doug 
already pointed that out.).

> We considered a number of alternatives:
>
>     - We could define everything after -t- to be the source language, and
>     everything after -m- to be the mechanism. But that burns 2 extension
>     letters, just one.
>     - We also considered having everything in the -u extension, for which we
>     already have the structure set up. However, that would force us to have
>     artificial source subtags like 'en0' instead of 'en', because the -u-
>     extension wouldn't allow the 2-letter subtags (it already defines a use for
>     them).
>     - We could also have -t- be just the source, and define the mechanism in
>     -u-, also easy. But we felt it would be better to have everything under one
>     extension.

This is the technical aspect, where I think you got it right. What I'm 
talking about above are questions of what spec says what.

>> Detailled comments:

>> BCP47 required information: The first three paragraphs should move to the
>> introduction.
>>
>
> Other authors, what do you think?

In version -03, we have two sections titled "Introduction". Not a good 
sign for a spec.

>> "followed by a sequence of subtags that would form a language tag": Here
>> and in general: Don't use 'would'.
>>
>
> Grammatically, it is that the sequence of subtags *would* form a language
> subtag if they *were* separated out. They are not actually a language tag,
> because they occur in the middle of another language subtag. How would you
> like that to be phrased?

I think "sequence of subtags that form a language tag" is fine.


>>>>>>
>>    The structure of 't' subtags is determined by the Unicode CLDR
>>    Technical Committee, in accordance with the policies and procedures
>>    in http://www.unicode.org/**consortium/tc-procedures.html<http://www.unicode.org/consortium/tc-procedures.html>ml>,
>> and subject
>>    to the Unicode Consortium Policies on
>>    http://www.unicode.org/**policies/policies.html<http://www.unicode.org/policies/policies.html>
>> .
>>>>>>
>>
>>
>> The following paragraph is also difficult to understand. I wouldn't know
>> exactly what falls on what side. I think one major reason is that we are
>> treading new ground here, it's the first time we have a singleton definition
>> that allows reuse of language tags (with a few restrictions) as well as
>> intends to define its own extensions.
>>
>
> These were both patterned after what was used for the -u- extension. We can
> take a look at them to try to clarify.

Please do.

Regards,    Martin.


>>>>>>
>>    Changes that can be made by successive versions of LDML [UTS35] by
>>    the Unicode Consortium without requiring a new RFC include the
>>    allocation of new subtags for use after the 't' extension.  A new RFC
>>    would be required for material changes to an existing 't' subtag, or
>>    an incompatible change to the overall syntactic structure of the 't'
>>    extension; however, such a change would be contrary to the policies
>>    of the Unicode Consortium, and thus is not anticipated.
>>>>>>
>>
>> 2.1 Summary: There seems to be quite some overlap between the part of
>> section 2 before the 2.1 heading.
>>
>>
>> One question I would have as a linguistic researcher is: How much effort
>> and time is involved in getting a 'mechanism' approved? If such 'mechanisms'
>> are e.g. rejected with arguments like "if we accept it, then everybody has
>> to implement it" or so, then I would see that as a problem.
>>
>
> Good point. I'll propose some text.
>
>
>>
>> So much for the moment.
>>
>>
>> Regards,   Martin.
>>
>>
>>
>> On 2011/06/18 6:07, Mark Davis ☕ wrote:
>>
>>> Yoshito, Addison, and I had had an action for a while now from the CLDR
>>> committee to submit a draft for a an extension. Rather than go through all
>>> the problems in the falk draft, we put together an alternative approach,
>>> leveraging the work we already did for the -u- extension.
>>>
>>> It just got posted at
>>> http://tools.ietf.org/html/**draft-davis-t-langtag-ext-00<http://tools.ietf.org/html/draft-davis-t-langtag-ext-00>
>>>
>>> Courtney, I think this provides a superset of the functionality that you
>>> are
>>> interested in. Perhaps you can read it over, and we can add you as an
>>> author
>>> of the next version of this draft instead of having the two competing
>>> proposals.
>>>
>>> Mark
>>>
>>> *— Il meglio è l’inimico del bene —*
>>>
>>>
>>> On Wed, Jun 15, 2011 at 10:50, Randy Presuhn
>>> <randy_presuhn@mindspring.com>**wrote*wrote:
>>>
>>>   Hi -
>>>>
>>>> I started out with an off-list response, but I figure this is
>>>> something worth sending to the list.
>>>>
>>>> Off-list, a contributor asked:
>>>>
>>>> ...
>>>>
>>>>> I'd love to see your input. I'd like to make sure I understand
>>>>> all the concerns. Is there any way you could forward this to the list?
>>>>>
>>>>
>>>> My response:
>>>>
>>>> Sorry, already deleted.  As I recall, the main concerns were
>>>>
>>>>   (1) there already *is* support for identifying orthographies
>>>>       (remember German?)
>>>>   (2) the I-D seems to assume that transliterations always result
>>>>       in "Latin" (previous discussion on LTRU included transliterations
>>>>       to Cyrillic and Hangul, among others)
>>>>   (3) the "original orthography" is irrelevant for the transliteration
>>>>       systems I've been able to think of.  (At the same time, some
>>>>       transliteration systems are quite "lossy" and some don't do
>>>>       "round trip" very well.)  Consider also the transliteration of
>>>> material
>>>>       which was originally in audio form...
>>>>   (4) The draft doesn't clearly distinguish "orthography" from
>>>> "transliteration".
>>>>       This may be because the boundary between the two can be fuzzy, but
>>>> even
>>>>       that is an issue that should be addressed.
>>>>   (5) How this fits in with *transcription* systems (e.g. IPA) should be
>>>>       addressed.  The boundary gets fuzzy with orthographies that are
>>>> equivalent
>>>>       to phonemic representations of the language.  (e.g., Pinyin for
>>>> Mandarin)
>>>>   (6) The proposed singleton usage appears broken and unnecessary.
>>>>
>>>> Or something like that.  I may have forgotten something here, or, in the
>>>> process of reconstruction, thought of something I missed the first time.
>>>>
>>>> Randy
>>>>
>>>> ______________________________**_________________
>>>> Ltru mailing list
>>>> Ltru@ietf.org
>>>> https://www.ietf.org/mailman/**listinfo/ltru<https://www.ietf.org/mailman/listinfo/ltru>
>>>>
>>>>
>>>
>>>
>>> ______________________________**_________________
>>> Ltru mailing list
>>> Ltru@ietf.org
>>> https://www.ietf.org/mailman/**listinfo/ltru<https://www.ietf.org/mailman/listinfo/ltru>
>>>
>>
>