Re: [Ltru] Fw: I-D Action: draft-falk-transliteration-tags-01.txt

Mark Davis ☕ <> Tue, 21 June 2011 16:00 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id BF83F11E829F for <>; Tue, 21 Jun 2011 09:00:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.959
X-Spam-Status: No, score=-1.959 tagged_above=-999 required=5 tests=[AWL=0.167, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2, HTML_FONT_FACE_BAD=0.884, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-1, SARE_HTML_USL_OBFU=1.666]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 4Pnq4rrsiD19 for <>; Tue, 21 Jun 2011 09:00:32 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id EB24711E8299 for <>; Tue, 21 Jun 2011 09:00:31 -0700 (PDT)
Received: by gya6 with SMTP id 6so1768991gya.31 for <>; Tue, 21 Jun 2011 09:00:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=YzyQS2C9CokNEiTypKdHZl9Pi8VKQcw6uB9UDrsGefU=; b=fBiLIksuTGocIn3yF2+G/x/PIb2/xqi9WeomgxYO8SbwGozCqFaLKry1rYjsV4JfIC skI2qEC0qf6OLiGNhZ+bUigC+CHOCVRmqwDSNJIi9kVsDPq0LdI8DHOoDISkvsmlmMye utYw9VCTJlj8v/nLpKPxt/k1MMECcVRBe5ZRo=
DomainKey-Signature: a=rsa-sha1; c=nofws;; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=HbuAZ1/UJD4uge8QW/TY1XfNz2gpX2A3O3O/M3bf2WdRRPBdiOC5xfc+jKzI7tVVqd mJ+5AmdyrfpjVINiw+BCjZngRU5WgLDnhIkUUe6Aum8A05tcoZlfuNgafZlG+dIoYgDH m7ZkNsZ+LTNNtn9Y5Fu1ygBB/6mMsPOrAE54A=
MIME-Version: 1.0
Received: by with SMTP id v10mr7593484ybm.23.1308672029364; Tue, 21 Jun 2011 09:00:29 -0700 (PDT)
Received: by with HTTP; Tue, 21 Jun 2011 09:00:29 -0700 (PDT)
In-Reply-To: <>
References: <002701cc29f8$7c3e7d00$6801a8c0@oemcomputer> <> <> <000e01cc2b80$563628e0$6801a8c0@oemcomputer> <> <003d01cc2b84$af13f560$6801a8c0@oemcomputer> <> <>
Date: Tue, 21 Jun 2011 09:00:29 -0700
X-Google-Sender-Auth: 2T6oqDRuiTCmyGQgtGSglecvz2g
Message-ID: <>
From: =?UTF-8?B?TWFyayBEYXZpcyDimJU=?= <>
To: =?UTF-8?Q?Martin_J=2E_D=C3=BCrst?= <>
Content-Type: multipart/alternative; boundary=001e680f184070124604a63af3a3
Cc: LTRU Working Group <>,
Subject: Re: [Ltru] Fw: I-D Action: draft-falk-transliteration-tags-01.txt
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 21 Jun 2011 16:00:35 -0000

Those are good issues; thanks for raising them and starting the discussion.
Comments below.

*— Il meglio è l’inimico del bene —*

On Mon, Jun 20, 2011 at 23:39, "Martin J. Dürst" <>wrote;wrote:

> Hello Mark, others,
> Overall comment:
> The idea to reuse language tags to indicate transliteration/transcription
> source, and to add some additional tags to distinguish methods seems to be
> reasonable and sound.
> The description of the structure of the allowed subtags and of the
> responsibility split between IETF (this draft) and UTC (UTS 35) looks quite
> messy to me, and should be cleaned up. I'd personally prefer that UTS 35 (or
> whatever else on the Unicode side) only define the <mechanism> part (after
> the m0 subtag).

That would be my preference as well (can't speak for my coauthors).

We patterned it this way following what ended up being accepted for  the -u-
extension. That is, the spec is in UTS35, but there is a summary here. But
of course, there are many ways to do it. And maybe this summary is too
detailed, at least for the mechanism part, and we could just have it in

We considered a number of alternatives:

   - We could define everything after -t- to be the source language, and
   everything after -m- to be the mechanism. But that burns 2 extension
   letters, just one.
   - We also considered having everything in the -u extension, for which we
   already have the structure set up. However, that would force us to have
   artificial source subtags like 'en0' instead of 'en', because the -u-
   extension wouldn't allow the 2-letter subtags (it already defines a use for
   - We could also have -t- be just the source, and define the mechanism in
   -u-, also easy. But we felt it would be better to have everything under one

> Detailled comments:
> "In addition, it may also be important to
>   specify a particular specification for the transformation.": Too much
> 'spec' in one sentence.


> "For example, if one is transcribing the names of Italian or Russian
>   cities on a map for Japanese users, each name will need to be
>   transliterated into katakana using rules appropriate for the source
>   language and target languages.": "source languages and target language"?


> BCP47 required information: The first three paragraphs should move to the
> introduction.

Other authors, what do you think?

> "followed by a sequence of subtags that would form a language tag": Here
> and in general: Don't use 'would'.

Grammatically, it is that the sequence of subtags *would* form a language
subtag if they *were* separated out. They are not actually a language tag,
because they occur in the middle of another language subtag. How would you
like that to be phrased?

> >>>>
>   The structure of 't' subtags is determined by the Unicode CLDR
>   Technical Committee, in accordance with the policies and procedures
>   in**consortium/tc-procedures.html<>ml>,
> and subject
>   to the Unicode Consortium Policies on
> .
> >>>>
> The following paragraph is also difficult to understand. I wouldn't know
> exactly what falls on what side. I think one major reason is that we are
> treading new ground here, it's the first time we have a singleton definition
> that allows reuse of language tags (with a few restrictions) as well as
> intends to define its own extensions.

These were both patterned after what was used for the -u- extension. We can
take a look at them to try to clarify.

> >>>>
>   Changes that can be made by successive versions of LDML [UTS35] by
>   the Unicode Consortium without requiring a new RFC include the
>   allocation of new subtags for use after the 't' extension.  A new RFC
>   would be required for material changes to an existing 't' subtag, or
>   an incompatible change to the overall syntactic structure of the 't'
>   extension; however, such a change would be contrary to the policies
>   of the Unicode Consortium, and thus is not anticipated.
> >>>>
> 2.1 Summary: There seems to be quite some overlap between the part of
> section 2 before the 2.1 heading.
> One question I would have as a linguistic researcher is: How much effort
> and time is involved in getting a 'mechanism' approved? If such 'mechanisms'
> are e.g. rejected with arguments like "if we accept it, then everybody has
> to implement it" or so, then I would see that as a problem.

Good point. I'll propose some text.

> So much for the moment.
> Regards,   Martin.
> On 2011/06/18 6:07, Mark Davis ☕ wrote:
>> Yoshito, Addison, and I had had an action for a while now from the CLDR
>> committee to submit a draft for a an extension. Rather than go through all
>> the problems in the falk draft, we put together an alternative approach,
>> leveraging the work we already did for the -u- extension.
>> It just got posted at
>> Courtney, I think this provides a superset of the functionality that you
>> are
>> interested in. Perhaps you can read it over, and we can add you as an
>> author
>> of the next version of this draft instead of having the two competing
>> proposals.
>> Mark
>> *— Il meglio è l’inimico del bene —*
>> On Wed, Jun 15, 2011 at 10:50, Randy Presuhn
>> <>**wrote*wrote:
>>  Hi -
>>> I started out with an off-list response, but I figure this is
>>> something worth sending to the list.
>>> Off-list, a contributor asked:
>>> ...
>>>> I'd love to see your input. I'd like to make sure I understand
>>>> all the concerns. Is there any way you could forward this to the list?
>>> My response:
>>> Sorry, already deleted.  As I recall, the main concerns were
>>>  (1) there already *is* support for identifying orthographies
>>>      (remember German?)
>>>  (2) the I-D seems to assume that transliterations always result
>>>      in "Latin" (previous discussion on LTRU included transliterations
>>>      to Cyrillic and Hangul, among others)
>>>  (3) the "original orthography" is irrelevant for the transliteration
>>>      systems I've been able to think of.  (At the same time, some
>>>      transliteration systems are quite "lossy" and some don't do
>>>      "round trip" very well.)  Consider also the transliteration of
>>> material
>>>      which was originally in audio form...
>>>  (4) The draft doesn't clearly distinguish "orthography" from
>>> "transliteration".
>>>      This may be because the boundary between the two can be fuzzy, but
>>> even
>>>      that is an issue that should be addressed.
>>>  (5) How this fits in with *transcription* systems (e.g. IPA) should be
>>>      addressed.  The boundary gets fuzzy with orthographies that are
>>> equivalent
>>>      to phonemic representations of the language.  (e.g., Pinyin for
>>> Mandarin)
>>>  (6) The proposed singleton usage appears broken and unnecessary.
>>> Or something like that.  I may have forgotten something here, or, in the
>>> process of reconstruction, thought of something I missed the first time.
>>> Randy
>>> ______________________________**_________________
>>> Ltru mailing list
>> ______________________________**_________________
>> Ltru mailing list