Re: [Ltru] Macrolanguage usage

Doug Ewell 2008-05-24 20.06:
> Leif Halvard Silli <lhs at malform dot no> wrote:
>   
> > As you know, many tag Norwegian texts as 'no-no'.
>
> Of course, because that means "Norwegian as used in Norway."  (Which is 
> kind of redundant really, [...]

It is redundant. But it is still done. And not only on the Web. 
Yesterday I was looking at the hyphenation dictionaries of OpenOffice. 
All of them are identified with a language tag made of two subtags. 
Apparently the double subtags helps users/developers understand better 
what is meant. It gives more context.

So for Nynorsk, they used 'nn_NO'. For Bokmål 'nb_NO'. Which is the same 
level of redundancy as 'nn-NO'. I assume that adding the _NO helps user 
to understand what 'nn' and 'nb' means. Which is kind of backwards. One 
must read it from the right in order to fully understand what it means.

>  since there doesn't seem to be much evidence 
> of variation in the Norwegian language associated with regions other 
> than Norway; but some creators of language tags and locale identifiers 
> feel it is important to apply region subtags consistently.)
>   

To follow the same pattern throughout, you mean. Could be. But I guess 
this pattern arises in the first place because one knows about the need 
to discern between e.g. en-US and en-GB. To use the region tag *only* 
when needed would be too much hazzle, I guess ...

(The irony, in the OpenOffice case, is that the nn_NO and nb_NO hyph 
dictionaries are *identical*. As is, btw, the en_US, the en_GB and the 
en_CA hyph dictionaries.)

> > So it is obviously that when 'no-no' falls back to 'no', then of 
> > course 'no-nn' or 'no-nno' would fall back just as well. Why should I 
> > not believe so?
>
> I ask the co-chairs to settle this matter with a third consensus-call 
> question:
>
> Q3: If we did go back to using "extlang," we could combine this subtag
>     with the region subtag, and require that at most one of the two be
>     used in a single tag.  Possible responses:  (pick ONE)
>         A - I would like this.
>         B - I could live with this.
>         C - I would object to this.
>
> Remember that we did create such a "Leif rule" for the purpose of 
> allowing two-letter extlangs, as in "no-nn", then:
>
> 1. The region/extlang subtag would have to come AFTER any script 
> subtags, thus: "no-Latn-nn" rather than "no-nn-Latn", and "zh-Hant-cmn" 
> rather than "zh-cmn-Hant" -- unless we wanted to change that existing 
> BCP 47 syntactical rule as well.
>   

But you would still be able to omit the script tag. However, this does 
seem logical, to me.

> 2. It would be impossible to tell whether a non-initial two-letter 
> subtag such as 'tw' referred to a region, as in "zh-TW" (Taiwan), or an 
> extlang, as in "ak-tw" (Twi).  Case is not significant in language 
> tags -- unless we wanted to change that existing BCP 47 syntactical rule 
> as well.
>   

But since the most important tag is suppose to be the first one, this 
does not seem to be much of an issue. Unless we must concider the 
possibility of a mass exchange of language/population between Ghana/Côte 
d'Ivoire and Taiwan.

> 3. It would be impossible to write a tag for, say, "Cantonese as used in 
> Singapore" that also expressed the macrolanguage relationship --  
> whatever that may be -- between 'zh' and 'yue'.
>   

Yes, one would have to choose between 'yue-sg' and 'zh-yue'. The same 
would go for Mandarin in China. Either 'cmn-cn' or 'zh-cmn'.

> > In the draft you sent out you start by saying that "The arguments for 
> > extlang are that they give superior results,". However, this is an 
> > exaggeration of the standpoint that I for intance have. First, I 
> > assume you meant "technical superior". Well, no, I can understand that 
> > using short tags is easier to deal with, technically. And therefore 
> > superior to extlang. (In my testing with Apache, 'nn' and 'nb' was 
> > easier to deal with than 'no-nyn' and 'no-bok'.)
> >
> > But then a problem is that the users "in the wild" still are tagging 
> > Norwegian as if we had an extlang system.
>
> "no-nyn" and "no-bok" are grandfathered tags, registered under RFC 1766 
> in 1995, long before anyone ever used the word "macrolanguage" or 
> "extlang".  Their similarity to extlangs is to be considered 
> coincidental.
>   

Agree - coincidental. But only because 'bok' and 'nyn', by coincident, 
was not registered as the official codes for Nynorsk and Bokmål in 
ISO-639-2.

> > And this is a special kind of language negotiation. For a small 
> > macrolanguage like NOrwegian, we suddenly get 3 options. If instead we 
> > had extlang for Norwegian, we would in reality only have two options.
>
> This isn't new or sudden.

Pardon me for using colorful language (the word "suddenly").

>   You've had 3 options since 2000, when ISO 639 
> registered 'nb' and 'nn', and actually since 1995, when ietf-languages 
> registered the whole tags "no-bok" and "no-nyn" which were not to be 
> considered parsable.
>   

For that matter, 'no-nn', 'no-nb' and 'no' would also be 3 options. That 
is not what I meant.

> > I have allready been told that it is very important to read things out 
> > of the tags without needing to look into the registry. And I agree. 
> > That is a basic, and very good thing.
>
> See my note 2 above.  Regions and encompassed languages are not at all 
> the same, and the "Leif rule" would require tag producers and consumers 
> to look in the Registry to see which is intended.  (If Mark can say "a 
> la Ewell" then I can say "the Leif rule.")
>   

I read this with a sense of humor, so that is ok. :-)
-- 
leif halvard silli

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru