[Ltru] Re: Macrolanguage and extlang

Here are some comments on Mark and Addison's proposed Macrolanguage 
section.  All of this is presuming that the Macrolanguage field is in 
and the extlang mechanism is out; none of these comments is meant as an 
argument against this course of action.

> The Macrolanguage field contains a primary language subtag that 
> *encompasses* this subtag. That is, this language is a dialect or 
> sub-language of the Macrolanguage, and is called an *encompassed* 
> subtag.

I think we need to adhere very closely to the ISO 639-3 wording rather 
than coming up with our own definition of macrolanguage.  And the ISO 
639-3 wording is fairly clear that encompassed languages are not 
dialects:

"The linguistic varieties denoted by each of the identifiers in this 
part of ISO 639 are assumed to be distinct languages and not dialects of 
other languages, even though for some purposes some users may consider a 
variety listed in this part of ISO 639 to be a "dialect" rather than a 
"language"....  The dialects of a language are included within the 
denotation represented by the identifier for that language. Thus, each 
language identifier represents the complete range of all the spoken or 
written varieties of that language, including any standardized form."

The ISO 639-3 explanation of macrolanguage is at 
http://www.sil.org/iso639-3/scope.asp#M.  Basically the determining 
factor is that the encompassed languages are (a) considered to be 
different languages in some contexts, or by some people, and (b) 
considered to be a single language in (or by) others.  That is the 
concept we need to emphasize, not the standard vs. regional 
relationship.  This is important to help guide users toward the correct 
understanding and choice of macrolanguages and encompassed languages.

> Only values assigned by ISO 639-3 will be considered for inclusion.

I note again that this precludes assigning a macrolanguage of "sgn" to 
any of the 124 sign languages that have ISO 639-3 identifiers, and even 
though it's often a good idea to avoid special exceptions, this 
particular case worries me.

> For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn' 
> (Norwegian Nynorsk) has a Macrolanguage entry of 'no' (Norwegian).

"subtags... have"

> Nor does it define how the encompassed languages are related to 
> one-another.

Remove hyphen.

> In some cases, the Macrolanguage has a standard form as well as a 
> variety of less-common dialects.

"varieties"

> In other cases there is no particular standard form and the 
> encompassed subtags describe specific variations within the parent 
> language.

But again, there must always be the condition that the variations are 
sometimes considered to be a single language.  That's what makes it a 
macrolangauge relationship, and not a collection or something we should 
be using variants for.

> Care in selecting which subtags are used is crucial to 
> interoperability. In general, use the most specific tag. However, 
> where the standard written form of an encompassed language is captured 
> by the Macrolanguage, the Macrolanguage should still be used for 
> written material.

I feel this is too concrete; it almost feels like "ALWAYS use the 
specific tag, except when you MUST NOT."  John pointed out that taggers 
who use the more specific "yue" instead of the more general "zh" might 
be putting themselves at a disadvantage, considering that we expect few 
matching engines to understand how to use the Macrolanguage field (at 
least initially).  This is true even for spoken material, not just 
written.

> In particular, chinese language(s) and dialects call for special 
> consideration.

Uppercase "Chinese," and get rid of "dialects."  ISO 639-3 considers 
them languages, and we need to adhere to their model.

> ... languages such as 'yue' (Cantonese) have usually used tags 
> beginning with the subtag 'zh'. This past practice of tagging...

It is still the current practice.  I assume this wording is meant for an 
envisioned future when everyone knows which subtag to use: "Remember way 
back in 2007 when we HAD to tag Cantonese as 'zh'?"  But I don't see 
this tagging practice going away any time soon, and so I suggest the 
word "past" be removed.  This will drive the point home even further 
that filtering and lookup engines need to understand the Macrolanguage 
field and do the right thing with it.

> For example, the information that 'yue' has a macrolangauge of 'zh' 
> could be used in the Lookup algorithm to fallback from a request for 
> "yue-Hans-CN" to "zh-Hans-CN"

"Fall back" is two words when used as a verb.

> For example, in a given application the best fallback for "be" 
> (Breton), may be "fr" (French) -- rather than the more closely related 
> "cy" (Welsh) -- because...

For a smoother read IMHO, remove the intrusive second comma and then 
convert the dashes to commas:

"For example, in a given application the best fallback for "be" (Breton) 
may be 'fr' (French), rather than the more closely related 'cy' (Welsh), 
because..."

> ... Breton readers are far more likely to be able to read French than 
> Welsh.

"Far" sounds hyperbolic and presumptive, even if it is provably true. 
Remove it; the sentence still carries its full impact without it.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru