Re: [Ltru] Macrolanguage usage

Leif Halvard Silli <lhs@malform.no> Sun, 25 May 2008 12:05 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1B6E93A68CF; Sun, 25 May 2008 05:05:04 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 881493A68CF for <ltru@core3.amsl.com>; Sun, 25 May 2008 05:05:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.577
X-Spam-Level:
X-Spam-Status: No, score=-3.577 tagged_above=-999 required=5 tests=[AWL=1.022, BAYES_00=-2.599, GB_I_LETTER=-2]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iRbLG78NWLbF for <ltru@core3.amsl.com>; Sun, 25 May 2008 05:05:00 -0700 (PDT)
Received: from lakepoint.domeneshop.no (lakepoint.domeneshop.no [194.63.248.54]) by core3.amsl.com (Postfix) with ESMTP id 835023A6808 for <ltru@ietf.org>; Sun, 25 May 2008 05:05:00 -0700 (PDT)
Received: from 10013.local (cm-84.208.108.246.getinternet.no [84.208.108.246]) (authenticated bits=0) by lakepoint.domeneshop.no (8.13.8/8.13.8) with ESMTP id m4PC4tfh013657; Sun, 25 May 2008 14:04:55 +0200
Message-ID: <483955E8.1070603@malform.no>
Date: Sun, 25 May 2008 14:04:56 +0200
From: Leif Halvard Silli <lhs@malform.no>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1b1) Gecko/20060724 Thunderbird/2.0a1 Mnenhy/0.7.4.666
MIME-Version: 1.0
To: Doug Ewell <doug@ewellic.org>
References: <mailman.2658.1211631529.13675.ltru@ietf.org> <001b01c8bdc8$e2d66770$e6f5e547@DGBP7M81>
In-Reply-To: <001b01c8bdc8$e2d66770$e6f5e547@DGBP7M81>
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] Macrolanguage usage
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

Doug Ewell 2008-05-24 20.06:
> Leif Halvard Silli <lhs at malform dot no> wrote:
>   
> > As you know, many tag Norwegian texts as 'no-no'.
>
> Of course, because that means "Norwegian as used in Norway."  (Which is 
> kind of redundant really, [...]

It is redundant. But it is still done. And not only on the Web. 
Yesterday I was looking at the hyphenation dictionaries of OpenOffice. 
All of them are identified with a language tag made of two subtags. 
Apparently the double subtags helps users/developers understand better 
what is meant. It gives more context.

So for Nynorsk, they used 'nn_NO'. For Bokmål 'nb_NO'. Which is the same 
level of redundancy as 'nn-NO'. I assume that adding the _NO helps user 
to understand what 'nn' and 'nb' means. Which is kind of backwards. One 
must read it from the right in order to fully understand what it means.

>  since there doesn't seem to be much evidence 
> of variation in the Norwegian language associated with regions other 
> than Norway; but some creators of language tags and locale identifiers 
> feel it is important to apply region subtags consistently.)
>   

To follow the same pattern throughout, you mean. Could be. But I guess 
this pattern arises in the first place because one knows about the need 
to discern between e.g. en-US and en-GB. To use the region tag *only* 
when needed would be too much hazzle, I guess ...

(The irony, in the OpenOffice case, is that the nn_NO and nb_NO hyph 
dictionaries are *identical*. As is, btw, the en_US, the en_GB and the 
en_CA hyph dictionaries.)

> > So it is obviously that when 'no-no' falls back to 'no', then of 
> > course 'no-nn' or 'no-nno' would fall back just as well. Why should I 
> > not believe so?
>
> I ask the co-chairs to settle this matter with a third consensus-call 
> question:
>
> Q3: If we did go back to using "extlang," we could combine this subtag
>     with the region subtag, and require that at most one of the two be
>     used in a single tag.  Possible responses:  (pick ONE)
>         A - I would like this.
>         B - I could live with this.
>         C - I would object to this.
>
> Remember that we did create such a "Leif rule" for the purpose of 
> allowing two-letter extlangs, as in "no-nn", then:
>
> 1. The region/extlang subtag would have to come AFTER any script 
> subtags, thus: "no-Latn-nn" rather than "no-nn-Latn", and "zh-Hant-cmn" 
> rather than "zh-cmn-Hant" -- unless we wanted to change that existing 
> BCP 47 syntactical rule as well.
>   

But you would still be able to omit the script tag. However, this does 
seem logical, to me.

> 2. It would be impossible to tell whether a non-initial two-letter 
> subtag such as 'tw' referred to a region, as in "zh-TW" (Taiwan), or an 
> extlang, as in "ak-tw" (Twi).  Case is not significant in language 
> tags -- unless we wanted to change that existing BCP 47 syntactical rule 
> as well.
>   

But since the most important tag is suppose to be the first one, this 
does not seem to be much of an issue. Unless we must concider the 
possibility of a mass exchange of language/population between Ghana/Côte 
d'Ivoire and Taiwan.

> 3. It would be impossible to write a tag for, say, "Cantonese as used in 
> Singapore" that also expressed the macrolanguage relationship --  
> whatever that may be -- between 'zh' and 'yue'.
>   

Yes, one would have to choose between 'yue-sg' and 'zh-yue'. The same 
would go for Mandarin in China. Either 'cmn-cn' or 'zh-cmn'.

> > In the draft you sent out you start by saying that "The arguments for 
> > extlang are that they give superior results,". However, this is an 
> > exaggeration of the standpoint that I for intance have. First, I 
> > assume you meant "technical superior". Well, no, I can understand that 
> > using short tags is easier to deal with, technically. And therefore 
> > superior to extlang. (In my testing with Apache, 'nn' and 'nb' was 
> > easier to deal with than 'no-nyn' and 'no-bok'.)
> >
> > But then a problem is that the users "in the wild" still are tagging 
> > Norwegian as if we had an extlang system.
>
> "no-nyn" and "no-bok" are grandfathered tags, registered under RFC 1766 
> in 1995, long before anyone ever used the word "macrolanguage" or 
> "extlang".  Their similarity to extlangs is to be considered 
> coincidental.
>   

Agree - coincidental. But only because 'bok' and 'nyn', by coincident, 
was not registered as the official codes for Nynorsk and Bokmål in 
ISO-639-2.

> > And this is a special kind of language negotiation. For a small 
> > macrolanguage like NOrwegian, we suddenly get 3 options. If instead we 
> > had extlang for Norwegian, we would in reality only have two options.
>
> This isn't new or sudden.

Pardon me for using colorful language (the word "suddenly").

>   You've had 3 options since 2000, when ISO 639 
> registered 'nb' and 'nn', and actually since 1995, when ietf-languages 
> registered the whole tags "no-bok" and "no-nyn" which were not to be 
> considered parsable.
>   

For that matter, 'no-nn', 'no-nb' and 'no' would also be 3 options. That 
is not what I meant.

> > I have allready been told that it is very important to read things out 
> > of the tags without needing to look into the registry. And I agree. 
> > That is a basic, and very good thing.
>
> See my note 2 above.  Regions and encompassed languages are not at all 
> the same, and the "Leif rule" would require tag producers and consumers 
> to look in the Registry to see which is intended.  (If Mark can say "a 
> la Ewell" then I can say "the Leif rule.")
>   

I read this with a sense of humor, so that is ok. :-)
-- 
leif halvard silli



_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru