Re: [Ltru] Re: script tag for IPA

Addison Phillips <addison@yahoo-inc.com> Sat, 16 September 2006 17:56 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1GOeOv-0005ao-Ms; Sat, 16 Sep 2006 13:56:25 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GOeOu-0005ZX-8N for ltru@ietf.org; Sat, 16 Sep 2006 13:56:24 -0400
Received: from rsmtp1.corp.yahoo.com ([207.126.228.149]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GOeOt-0006wD-Lk for ltru@ietf.org; Sat, 16 Sep 2006 13:56:24 -0400
Received: from [10.72.72.7] (snvvpn1-10-72-72-c7.corp.yahoo.com [10.72.72.7]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.6/8.13.6/y.rout) with ESMTP id k8GHtnEQ048435 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 16 Sep 2006 10:55:53 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=E88ivhGEvfenGlvlco3taQkAwJSGngY0QXh67S5Y4bgzoOBMLs3A+WG1fdmHvV4r
Message-ID: <450C3AA4.20209@yahoo-inc.com>
Date: Sat, 16 Sep 2006 10:55:48 -0700
From: Addison Phillips <addison@yahoo-inc.com>
User-Agent: Thunderbird 1.5.0.5 (Windows/20060719)
MIME-Version: 1.0
To: Martin Hosken <martin_hosken@sil.org>
Subject: Re: [Ltru] Re: script tag for IPA
References: <E1GNiLf-0004yQ-BF@megatron.ietf.org> <005701c6d88a$21f245d0$6401a8c0@DGBP7M81> <20060915055401.GC6907@ccil.org> <450A44B6.3060905@sil.org> <6.0.0.20.2.20060915154910.06e040b0@localhost> <450AC979.3050203@yahoo-inc.com> <450B60D4.1040401@sil.org>
In-Reply-To: <450B60D4.1040401@sil.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Score: -15.0 (---------------)
X-Scan-Signature: 3fbd9b434023f8abfcb1532abaec7a21
Cc: Doug Ewell <dewell@adelphia.net>, ltru@ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

Martin Hosken wrote:

>> The proper "repair" to this issue is to fix ISO 15924. Multiple script
>> subtags would be very difficult for users to understand and use
>> consistently. And we'd have to deal with canonical ordering rules,
>> prefix checking, and all sorts of other nastiness---all to figure out
>> which Latin transcription was used? Bah.
> 
> The registrar of ISO 15924 has indicated that he has no intention of
> ever giving IPA a script code and that it is a variant of Latn. Perhaps
> you can get him to change his mind, but I doubt it. So where does that
> leave me? How do I tag text in the IPA script that can be in any
> language? You are asking me to live between a rock and a hard place.

Very few ISO standards are entirely under the thumb of one person's 
opinion. Just because Michael indicated his opposition on an 
unaffiliated email list doesn't mean that a carefully prepared request 
to ISO 15924 would fail.

I'm not asking you to live between a Scylla and Charybdis. I'm asking 
you to carefully consider what you're asking for. The idea you presented 
would break a number of principles and commitments in the formation of 
language tags. Furthermore, I note that ietf-languages history as a 
registry is somewhat uneven---something that was reined in under RFC 
4646 on purpose. I also note that adding more complex and arcane 
structures to language tags strikes me as questionable. The "finger of 
doubt" points at 15924 as the source of scripts. I would exhaust that 
angle first.

> 
> As Mark has stated, we need something to indicate that a script variant
> is more significant than a region. 

For starters, I don't believe in "script variants". I think there is a 
disconnect between the granularity of "script" as currently embodied in 
15924 and "script" as needed in language tags. Mark and I were of the 
(possibly mistaken) impression that 15924 would be a bit more expansive 
in their interpretation of "script".

 > For example, please prioritise the
> aspects of "UK Glaswegian English written in IPA" in terms of the
> components that have the most significance on the text and you will find
> that UK comes last and Glaswegian second to last. But if IPA is marked
> by an extension, it will come last.

So? Language tags cannot do everything. At some point one must look at 
the fact that two bodies of text have different tags and concluded that 
they are different, possibly mutually unintelligible, entities. RFC 4646 
even says this in Section 4.2. While the relative importance of various 
subtags is a key feature, there may be times that the subtags cannot 
entirely reflect the real relationships of the variations.

And I think the correct resolution to the problem would be to get 15924 
to register additional codes or to get a source for codes that mirrors 
15924 while adding additional codes to suit the needs of (say) language 
and locale identification. "en-Lipa-GB-glasgow" would be the best 
solution to your conundrum.

> 
> In discussions with the ISO 15924 registrar on this, he seemed open to
> the idea of extending the private use script code space. In addition, I
> agree that since a script variant (in my 4 character scheme) would
> always occur after a real script, there is no need to worry about
> codespace overlap.

Script "variants" are just scripts in another container. In a language 
tag, I see very little benefit to having a tag like "el-grek-mono" 
instead of an "el-mono" or an "en-Latn-Lipa" over an "en-Lipa".

>> Not to mention: if script variations aren't registered in 15924, where
>> will they come from? What rules will be applied to their registration?
>> Why does anyone think ietf-languages will be a good arbiter of said
>> variations?
> 
> ISO 15924 hasn't scored too highly for us so far. Addressing what a
> script variant really is will need some discussion, of course.

Yes, and from hard experience I don't believe that ietf-languages is a 
better solution. A few email exchanges with ISO 15924 folks does not 
indicate utter failure and other ideas might be useful before creating 
new language subtag fauna.

> 
> Remember that ISO 15924 isn't our standard to control. It's coming from
> TC46.

"Controlling a standard" implies some measure of responsibility. In this 
case, RFC 4646 strives to *avoid* making up its own rules and "working 
around" the underlying ISO standards. In fact, registration of language 
tags has a long and dubious history in this regard: many existing 
registrations have later been repented when (for example) ISO 639 took 
action at a later date.

The most likely possibility is that ISO 15924, by its definition, does 
not define all of the orthographic variations that are needed in 
language tags. While you (or I) might not like the 15924 JAC's 
decisions, I must admit that they have a certain logic.

However, I also note that modifying language tags is probably not the 
best method of overcoming this deficiency. There is precedent for 
creating a project such as an ISO 15924 Part 2 "Codes for the 
representation of scripts and script variations". Such a project, if it 
shared a codespace with 15924-1, would be a more methodical and 
consistent way to register the values you seek.

> 
> In the meantime, please send me the form to request 7000 language
> variants or extensions (since both are registered by language).

Extensions are NOT AT ALL associated with particular languages (unless 
the RFC that defines them says they are).

Variants are NOT required to be associated with specific languages, 
although there is stern resistance by some to "generic" variants. If you 
were to propose (say) 'phonipa' as a variant with no Prefix field, I 
would probably support it.

And the form is in RFC 4646.

> 
> I would encourage folks to think about how the language tag can be made
> productive. If every possible language tag in effect has to be
> registered, it will push many tags underground and the x- extension
> space will become far more popular than we might want it to be. 

Not really. Hardly anyone uses the registered stuff. And the x- private 
use space was made productive with subtags to facilitate those with 
specialized needs. I *hope* that more people can derive use from tags 
such as en-x-ipa-glasgow (oh, hey! there's your tag :-) ).

> I, for
> example, have to deal with emerging writing systems and storing data in
> them for archival purposes long before such writing systems are well
> established (in some cases), if every such instance needs to be
> registered RFC4646 will be seen as a bottle kneck to be worked around
> rather than in collaboration with.
> 

Your use case is explicitly what private-use subtags are for: 
idiosyncratic or specialized requirements.

Yes, every subtag you want to use has to be registered if you want it to 
be non-private-use. But previously you needed to registered entire tags 
(including every variation you wanted), so 4646 greatly improves your 
situation.

But I would start with really trying harder with 15924, followed by 
looking at creating a registry for "scripts and script variations", 
preferably within TC 46. Nothing that does not confront us today will 
prevent us from adding "script variants" in the future if they are truly 
necessary. But I don't see that as strictly necessary yet.

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru