[Ltru] draft-davis-t-langtag-ext

Mark Davis ☕ <mark@macchiato.com> Wed, 22 June 2011 22:00 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ltru@ietfa.amsl.com
Delivered-To: ltru@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 864C911E80AB for <ltru@ietfa.amsl.com>; Wed, 22 Jun 2011 15:00:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.992
X-Spam-Level:
X-Spam-Status: No, score=-1.992 tagged_above=-999 required=5 tests=[AWL=0.134, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2, HTML_FONT_FACE_BAD=0.884, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-1, SARE_HTML_USL_OBFU=1.666]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Xn6+9xz27gsr for <ltru@ietfa.amsl.com>; Wed, 22 Jun 2011 15:00:48 -0700 (PDT)
Received: from mail-yi0-f44.google.com (mail-yi0-f44.google.com [209.85.218.44]) by ietfa.amsl.com (Postfix) with ESMTP id A147511E8085 for <ltru@ietf.org>; Wed, 22 Jun 2011 15:00:48 -0700 (PDT)
Received: by yie30 with SMTP id 30so757946yie.31 for <ltru@ietf.org>; Wed, 22 Jun 2011 15:00:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:date:x-google-sender-auth :message-id:subject:from:to:cc:content-type; bh=GfbXvwZI4smjajjfy3HSVEkKaQjimrdJwhgndZi0QbY=; b=altrt0C3c4Go5pML/TgqERHyi12HXOHvBtQNhiupOybjPaPb5kp8oRWoc2Uujvr2Ni QN90cfkclI5MfqjgAnGG8SCVZspuGCwXYIZQ+x69zW9PKvl06AKQzDXWR9pVLeCsP1Lw LgRgBIy9ZGMP7Hvk9ycQNw0lG1369deK8a5Mo=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; b=GNBBZum4AdZ1cRnDZLsvE3O3XjHDWoa0r+34bZ2XzFkb/az/sDlPkWshDZv1SMzWXr Ob3Fuj8U2oGOmpE3UWBgyuChiMjErzIEHAX5WyEKDMVN+8qsGlI+3y5l2nmMJYrL/ddS VWsXayBMeCpzIbfYqG8zJwEerTgs0F+AdQ3ZQ=
MIME-Version: 1.0
Received: by 10.151.111.10 with SMTP id o10mr1449472ybm.80.1308780047777; Wed, 22 Jun 2011 15:00:47 -0700 (PDT)
Sender: mark.edward.davis@gmail.com
Received: by 10.151.146.9 with HTTP; Wed, 22 Jun 2011 15:00:47 -0700 (PDT)
Date: Wed, 22 Jun 2011 15:00:47 -0700
X-Google-Sender-Auth: ptthyoqznVn7Lbw5AVnFUlEHV5Y
Message-ID: <BANLkTin1hwmPAq4p7rUfKD0TtkvLXPHUrA@mail.gmail.com>
From: =?UTF-8?B?TWFyayBEYXZpcyDimJU=?= <mark@macchiato.com>
To: =?UTF-8?Q?Martin_J=2E_D=C3=BCrst?= <duerst@it.aoyama.ac.jp>
Content-Type: multipart/alternative; boundary=001517574434d63f0704a65419d4
Cc: LTRU Working Group <ltru@ietf.org>, court@infiauto.com
Subject: [Ltru] draft-davis-t-langtag-ext
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Jun 2011 22:00:50 -0000

A new draft posted at
http://tools.ietf.org/html/draft-davis-t-langtag-ext-01

Martin, we tried to address your concerns; please take a look and let us
know what you think.

Mark
*— Il meglio è l’inimico del bene —*


On Tue, Jun 21, 2011 at 09:00, Mark Davis ☕ <mark@macchiato.com> wrote:

> Those are good issues; thanks for raising them and starting the discussion.
> Comments below.
>
> ------------------------------
> Mark
> *— Il meglio è l’inimico del bene —*
>
>
>
> On Mon, Jun 20, 2011 at 23:39, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>wrote;wrote:
>
>> Hello Mark, others,
>>
>> Overall comment:
>> The idea to reuse language tags to indicate transliteration/transcription
>> source, and to add some additional tags to distinguish methods seems to be
>> reasonable and sound.
>>
>> The description of the structure of the allowed subtags and of the
>> responsibility split between IETF (this draft) and UTC (UTS 35) looks quite
>> messy to me, and should be cleaned up. I'd personally prefer that UTS 35 (or
>> whatever else on the Unicode side) only define the <mechanism> part (after
>> the m0 subtag).
>>
>
> That would be my preference as well (can't speak for my coauthors).
>
> We patterned it this way following what ended up being accepted for  the
> -u- extension. That is, the spec is in UTS35, but there is a summary here.
> But of course, there are many ways to do it. And maybe this summary is too
> detailed, at least for the mechanism part, and we could just have it in
> UTS35.
>
> We considered a number of alternatives:
>
>    - We could define everything after -t- to be the source language, and
>    everything after -m- to be the mechanism. But that burns 2 extension
>    letters, just one.
>    - We also considered having everything in the -u extension, for which
>    we already have the structure set up. However, that would force us to have
>    artificial source subtags like 'en0' instead of 'en', because the -u-
>    extension wouldn't allow the 2-letter subtags (it already defines a use for
>    them).
>    - We could also have -t- be just the source, and define the mechanism
>    in -u-, also easy. But we felt it would be better to have everything under
>    one extension.
>
>
>
>>
>>
>> Detailled comments:
>>
>> "In addition, it may also be important to
>>   specify a particular specification for the transformation.": Too much
>> 'spec' in one sentence.
>>
>
> ok
>
>
>>
>> "For example, if one is transcribing the names of Italian or Russian
>>   cities on a map for Japanese users, each name will need to be
>>   transliterated into katakana using rules appropriate for the source
>>   language and target languages.": "source languages and target language"?
>>
>
> yes
>
>
>>
>> BCP47 required information: The first three paragraphs should move to the
>> introduction.
>>
>
> Other authors, what do you think?
>
>
>>
>> "followed by a sequence of subtags that would form a language tag": Here
>> and in general: Don't use 'would'.
>>
>
> Grammatically, it is that the sequence of subtags *would* form a language
> subtag if they *were* separated out. They are not actually a language tag,
> because they occur in the middle of another language subtag. How would you
> like that to be phrased?
>
>
>
>
>> >>>>
>>   The structure of 't' subtags is determined by the Unicode CLDR
>>   Technical Committee, in accordance with the policies and procedures
>>   in http://www.unicode.org/**consortium/tc-procedures.html<http://www.unicode.org/consortium/tc-procedures.html>ml>,
>> and subject
>>   to the Unicode Consortium Policies on
>>   http://www.unicode.org/**policies/policies.html<http://www.unicode.org/policies/policies.html>
>> .
>> >>>>
>>
>>
>> The following paragraph is also difficult to understand. I wouldn't know
>> exactly what falls on what side. I think one major reason is that we are
>> treading new ground here, it's the first time we have a singleton definition
>> that allows reuse of language tags (with a few restrictions) as well as
>> intends to define its own extensions.
>>
>
> These were both patterned after what was used for the -u- extension. We can
> take a look at them to try to clarify.
>
>
>
>>
>> >>>>
>>   Changes that can be made by successive versions of LDML [UTS35] by
>>   the Unicode Consortium without requiring a new RFC include the
>>   allocation of new subtags for use after the 't' extension.  A new RFC
>>   would be required for material changes to an existing 't' subtag, or
>>   an incompatible change to the overall syntactic structure of the 't'
>>   extension; however, such a change would be contrary to the policies
>>   of the Unicode Consortium, and thus is not anticipated.
>> >>>>
>>
>> 2.1 Summary: There seems to be quite some overlap between the part of
>> section 2 before the 2.1 heading.
>>
>>
>> One question I would have as a linguistic researcher is: How much effort
>> and time is involved in getting a 'mechanism' approved? If such 'mechanisms'
>> are e.g. rejected with arguments like "if we accept it, then everybody has
>> to implement it" or so, then I would see that as a problem.
>>
>
> Good point. I'll propose some text.
>
>
>>
>> So much for the moment.
>>
>>
>> Regards,   Martin.
>>
>>
>>
>> On 2011/06/18 6:07, Mark Davis ☕ wrote:
>>
>>> Yoshito, Addison, and I had had an action for a while now from the CLDR
>>> committee to submit a draft for a an extension. Rather than go through
>>> all
>>> the problems in the falk draft, we put together an alternative approach,
>>> leveraging the work we already did for the -u- extension.
>>>
>>> It just got posted at
>>> http://tools.ietf.org/html/**draft-davis-t-langtag-ext-00<http://tools.ietf.org/html/draft-davis-t-langtag-ext-00>
>>>
>>> Courtney, I think this provides a superset of the functionality that you
>>> are
>>> interested in. Perhaps you can read it over, and we can add you as an
>>> author
>>> of the next version of this draft instead of having the two competing
>>> proposals.
>>>
>>> Mark
>>>
>>> *— Il meglio è l’inimico del bene —*
>>>
>>>
>>> On Wed, Jun 15, 2011 at 10:50, Randy Presuhn
>>> <randy_presuhn@mindspring.com>**wrote*wrote:
>>>
>>>  Hi -
>>>>
>>>> I started out with an off-list response, but I figure this is
>>>> something worth sending to the list.
>>>>
>>>> Off-list, a contributor asked:
>>>>
>>>> ...
>>>>
>>>>> I'd love to see your input. I'd like to make sure I understand
>>>>> all the concerns. Is there any way you could forward this to the list?
>>>>>
>>>>
>>>> My response:
>>>>
>>>> Sorry, already deleted.  As I recall, the main concerns were
>>>>
>>>>  (1) there already *is* support for identifying orthographies
>>>>      (remember German?)
>>>>  (2) the I-D seems to assume that transliterations always result
>>>>      in "Latin" (previous discussion on LTRU included transliterations
>>>>      to Cyrillic and Hangul, among others)
>>>>  (3) the "original orthography" is irrelevant for the transliteration
>>>>      systems I've been able to think of.  (At the same time, some
>>>>      transliteration systems are quite "lossy" and some don't do
>>>>      "round trip" very well.)  Consider also the transliteration of
>>>> material
>>>>      which was originally in audio form...
>>>>  (4) The draft doesn't clearly distinguish "orthography" from
>>>> "transliteration".
>>>>      This may be because the boundary between the two can be fuzzy, but
>>>> even
>>>>      that is an issue that should be addressed.
>>>>  (5) How this fits in with *transcription* systems (e.g. IPA) should be
>>>>      addressed.  The boundary gets fuzzy with orthographies that are
>>>> equivalent
>>>>      to phonemic representations of the language.  (e.g., Pinyin for
>>>> Mandarin)
>>>>  (6) The proposed singleton usage appears broken and unnecessary.
>>>>
>>>> Or something like that.  I may have forgotten something here, or, in the
>>>> process of reconstruction, thought of something I missed the first time.
>>>>
>>>> Randy
>>>>
>>>> ______________________________**_________________
>>>> Ltru mailing list
>>>> Ltru@ietf.org
>>>> https://www.ietf.org/mailman/**listinfo/ltru<https://www.ietf.org/mailman/listinfo/ltru>
>>>>
>>>>
>>>
>>>
>>> ______________________________**_________________
>>> Ltru mailing list
>>> Ltru@ietf.org
>>> https://www.ietf.org/mailman/**listinfo/ltru<https://www.ietf.org/mailman/listinfo/ltru>
>>>
>>
>