[Ltru] Re: Macrolanguage and extlang
"Doug Ewell" <dewell@roadrunner.com> Fri, 20 July 2007 04:47 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IBkP8-0001Kx-Dx; Fri, 20 Jul 2007 00:47:50 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IBkP7-0001Kq-Gy for ltru-confirm+ok@megatron.ietf.org; Fri, 20 Jul 2007 00:47:49 -0400
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IBkP7-0001KD-5l for ltru@ietf.org; Fri, 20 Jul 2007 00:47:49 -0400
Received: from mta15.adelphia.net ([68.168.78.77]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1IBkP6-0007WE-El for ltru@ietf.org; Fri, 20 Jul 2007 00:47:49 -0400
Received: from DGBP7M81 ([76.167.184.182]) by mta15.adelphia.net (InterMail vM.6.01.05.04 201-2131-123-105-20051025) with SMTP id <20070720044748.EXFH16178.mta15.adelphia.net@DGBP7M81>; Fri, 20 Jul 2007 00:47:48 -0400
Message-ID: <002601c7ca89$1e337e90$6a01a8c0@DGBP7M81>
From: Doug Ewell <dewell@roadrunner.com>
To: LTRU Working Group <ltru@ietf.org>
References: <E1I9ghp-0006L9-0h@megatron.ietf.org>
Date: Thu, 19 Jul 2007 21:47:45 -0700
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="utf-8"; reply-type="original"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 2086112c730e13d5955355df27e3074b
Cc:
Subject: [Ltru] Re: Macrolanguage and extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org
Here are some comments on Mark and Addison's proposed Macrolanguage section. All of this is presuming that the Macrolanguage field is in and the extlang mechanism is out; none of these comments is meant as an argument against this course of action. > The Macrolanguage field contains a primary language subtag that > *encompasses* this subtag. That is, this language is a dialect or > sub-language of the Macrolanguage, and is called an *encompassed* > subtag. I think we need to adhere very closely to the ISO 639-3 wording rather than coming up with our own definition of macrolanguage. And the ISO 639-3 wording is fairly clear that encompassed languages are not dialects: "The linguistic varieties denoted by each of the identifiers in this part of ISO 639 are assumed to be distinct languages and not dialects of other languages, even though for some purposes some users may consider a variety listed in this part of ISO 639 to be a "dialect" rather than a "language".... The dialects of a language are included within the denotation represented by the identifier for that language. Thus, each language identifier represents the complete range of all the spoken or written varieties of that language, including any standardized form." The ISO 639-3 explanation of macrolanguage is at http://www.sil.org/iso639-3/scope.asp#M. Basically the determining factor is that the encompassed languages are (a) considered to be different languages in some contexts, or by some people, and (b) considered to be a single language in (or by) others. That is the concept we need to emphasize, not the standard vs. regional relationship. This is important to help guide users toward the correct understanding and choice of macrolanguages and encompassed languages. > Only values assigned by ISO 639-3 will be considered for inclusion. I note again that this precludes assigning a macrolanguage of "sgn" to any of the 124 sign languages that have ISO 639-3 identifiers, and even though it's often a good idea to avoid special exceptions, this particular case worries me. > For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn' > (Norwegian Nynorsk) has a Macrolanguage entry of 'no' (Norwegian). "subtags... have" > Nor does it define how the encompassed languages are related to > one-another. Remove hyphen. > In some cases, the Macrolanguage has a standard form as well as a > variety of less-common dialects. "varieties" > In other cases there is no particular standard form and the > encompassed subtags describe specific variations within the parent > language. But again, there must always be the condition that the variations are sometimes considered to be a single language. That's what makes it a macrolangauge relationship, and not a collection or something we should be using variants for. > Care in selecting which subtags are used is crucial to > interoperability. In general, use the most specific tag. However, > where the standard written form of an encompassed language is captured > by the Macrolanguage, the Macrolanguage should still be used for > written material. I feel this is too concrete; it almost feels like "ALWAYS use the specific tag, except when you MUST NOT." John pointed out that taggers who use the more specific "yue" instead of the more general "zh" might be putting themselves at a disadvantage, considering that we expect few matching engines to understand how to use the Macrolanguage field (at least initially). This is true even for spoken material, not just written. > In particular, chinese language(s) and dialects call for special > consideration. Uppercase "Chinese," and get rid of "dialects." ISO 639-3 considers them languages, and we need to adhere to their model. > ... languages such as 'yue' (Cantonese) have usually used tags > beginning with the subtag 'zh'. This past practice of tagging... It is still the current practice. I assume this wording is meant for an envisioned future when everyone knows which subtag to use: "Remember way back in 2007 when we HAD to tag Cantonese as 'zh'?" But I don't see this tagging practice going away any time soon, and so I suggest the word "past" be removed. This will drive the point home even further that filtering and lookup engines need to understand the Macrolanguage field and do the right thing with it. > For example, the information that 'yue' has a macrolangauge of 'zh' > could be used in the Lookup algorithm to fallback from a request for > "yue-Hans-CN" to "zh-Hans-CN" "Fall back" is two words when used as a verb. > For example, in a given application the best fallback for "be" > (Breton), may be "fr" (French) -- rather than the more closely related > "cy" (Welsh) -- because... For a smoother read IMHO, remove the intrusive second comma and then convert the dashes to commas: "For example, in a given application the best fallback for "be" (Breton) may be 'fr' (French), rather than the more closely related 'cy' (Welsh), because..." > ... Breton readers are far more likely to be able to read French than > Welsh. "Far" sounds hyperbolic and presumptive, even if it is provably true. Remove it; the sentence still carries its full impact without it. -- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages _______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] Suggested text for future compatibility of… Mark Davis
- Re: [Ltru] Suggested text for future compatibilit… Randy Presuhn
- Re: [Ltru] Suggested text for future compatibilit… Mark Davis
- Re: [Ltru] Suggested text for future compatibilit… Randy Presuhn
- Re: [Ltru] Suggested text for future compatibilit… Addison Phillips
- Re: [Ltru] Suggested text for future compatibilit… Mark Davis
- [Ltru] Re: Suggested text for future compatibilit… Doug Ewell
- Re: [Ltru] Re: Suggested text for future compatib… John Cowan
- Re: [Ltru] Re: Suggested text for future compatib… Doug Ewell
- [Ltru] Re: Macrolanguage and extlang Doug Ewell
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan
- [Ltru] Re: Macrolanguage and extlang Doug Ewell
- Re: [Ltru] Re: Macrolanguage and extlang Mark Davis
- Re: [Ltru] Re: Macrolanguage and extlang Doug Ewell
- [Ltru] Re: Suggested text for future compatibilit… Stephane Bortzmeyer
- [Ltru] Re: Macrolanguage and extlang Stephane Bortzmeyer
- RE: [Ltru] Re: Macrolanguage and extlang Peter Constable
- Re: [Ltru] Re: Macrolanguage and extlang Mark Davis
- Re: [Ltru] Re: Macrolanguage and extlang Addison Phillips
- Re: [Ltru] Re: Macrolanguage and extlang Randy Presuhn
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan
- Re: [Ltru] Re: Macrolanguage and extlang Doug Ewell
- [Ltru] Re: Macrolanguage and extlang Doug Ewell
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan
- RE: [Ltru] Re: Macrolanguage and extlang Kent Karlsson
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan