Re: [Ltru] extlang
Martin Duerst <duerst@it.aoyama.ac.jp> Wed, 29 August 2007 03:59 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IQEii-0004dd-Qp; Tue, 28 Aug 2007 23:59:56 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IQEih-0004bQ-Mq for ltru-confirm+ok@megatron.ietf.org; Tue, 28 Aug 2007 23:59:55 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IQEih-0004bA-Ch for ltru@ietf.org; Tue, 28 Aug 2007 23:59:55 -0400
Received: from scmailgw1.scop.aoyama.ac.jp ([133.2.251.194]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IQEif-00036U-QC for ltru@ietf.org; Tue, 28 Aug 2007 23:59:55 -0400
Received: from scmse2.scbb.aoyama.ac.jp (scmse2 [133.2.253.17]) by scmailgw1.scop.aoyama.ac.jp (secret/secret) with SMTP id l7T3xq6c020210 for <ltru@ietf.org>; Wed, 29 Aug 2007 12:59:52 +0900 (JST)
Received: from (133.2.206.133) by scmse2.scbb.aoyama.ac.jp via smtp id 5df7_4bd67492_55e4_11dc_8922_0014221f2a2d; Wed, 29 Aug 2007 12:59:52 +0900
X-AuthUser: duerst@it.aoyama.ac.jp
Received: from Tanzawa.it.aoyama.ac.jp ([133.2.210.1]:41268) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S12635A> for <ltru@ietf.org> from <duerst@it.aoyama.ac.jp>; Wed, 29 Aug 2007 12:56:58 +0900
Message-Id: <6.0.0.20.2.20070829120052.05bc6e60@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Version 6J
Date: Wed, 29 Aug 2007 12:11:43 +0900
To: Randy Presuhn <randy_presuhn@mindspring.com>, LTRU Working Group <ltru@ietf.org>
From: Martin Duerst <duerst@it.aoyama.ac.jp>
Subject: Re: [Ltru] extlang
In-Reply-To: <001501c7e9c3$71f00b80$6801a8c0@oemcomputer>
References: <30b660a20708281459r6000d746qe007f2882fae6d73@mail.gmail.com> <001501c7e9c3$71f00b80$6801a8c0@oemcomputer>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 8f374d0786b25a451ef87d82c076f593
Cc:
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org
As a technical contributor, I quite agree with Randy. See below for more details. At 07:33 07/08/29, Randy Presuhn wrote: >Hi - > >As a technical contributor... > >> From: "Mark Davis" <mark.davis@icu-project.org> >> To: "John Cowan" <cowan@ccil.org> >> Cc: "LTRU Working Group" <ltru@ietf.org> >> Sent: Tuesday, August 28, 2007 2:59 PM >> Subject: [Ltru] extlang >... >> You are not revealing some important hidden assumptions in your statements. >> >> 1. The macrolanguage is always a better fallback for every encompassed >> language than other alternatives. Out of the many encompassed languages, you >> implying that a speaker of every encompassed language will be able to >> understand the macrolanguage, or at least better than the alteratives. > >I don't see how the use of extlang would require this as an assumption. >Making such an assumption seems a bit like assuming that all languages whose >tags begin in "a" are somehow related. > >> 2. People don't lose anything by having the fallback. I dispute this >> as well. As previously noted: >> 1. Truncation fallback from zh-cmn-Hant-SG to "zh" loses the Hant and >> the SG; falling back from ar-arb-SA to 'ar' loses the "SA". > >It is the nature of truncation fallback to lose information. No matter >what order the subtags are trimmed off, someone will be able to argue >that for some particular case, a different order might have been better. >This isn't an argument against extlang; it's an argument against unrealistically >high expectations for truncation fallback. > >> 2. It introduces ambiguous language names. Right now, in the >> overwhelming majority of practice, standard Arabic is "ar"; after the >> change, standard Arabic is "ar-arb". > >This would not be desirable. However, I wonder whether the semantic associated >by most taggers and users of tags with "ar" is "standard Arabic" or merely >"Arabic". Well, probably both ways. Most written Arabic is standard Arabic, I guess, so even if you assume "ar" just means "whatever Arabic it means", most stuff tagged "ar" will be Standard Arabic. The question is whether we can make that assumption stronger, or whether we can live with such an uncertainity. I think to some extent, we have always done this. As an example, "ja" means Japanese, but it also, by suppress-script, means Japanese written in Kanji-Kana-mixture, and it also, by "tag wisely", means Japanese as used in Japan. In general, "tag wisely" seems to include "tag special cases very precisely, general cases can be tagged more shortly". So in practice using ar rather than ar-arb for Standard Arabic seems to be extremely feasible. The question may be how we can say that in our draft without contradicting ourselves, and/or how we can help practice moving that way even if we don't say so in the spec. >> 3. People can't get along without this. >> 1. Anyone who has to deal with language issues on all but a trivial >> level must already have a mechanism to deal with sh, sr, hr; with no, nb, >> nn. Those are macrolanguages and encompassed languages. They >> exist right now >> WITHOUT an extlang mechanism, and people deal with them. The proposed >> mechanism won't handle these, and anyone who can handle these >> doesn't need >> extlang. > >I think this argument is flawed in that it neglects the cost of supporting >such constellations of languages. There are already some messes that we're >stuck with, and that have to be handled in an ad hoc manner. This doesn't >justify requiring ad hoc handling for every other such constellation of >languages. Thinking about what problems Mark actually has, it may be that what he is saying is: I know I have to deal with some things as special cases, but I'd like to limit these to single-tag cases and not to have to combine fallback code and special-case code. Mark, if this is what you are after, please confirm, and maybe give some details. Others, if that doesn't look like it would work, please give some counterexamples. Regards, Martin. >> 2. With the Macrolanguage field, there is sufficient information >> for *anyone* who wants to to implement extlang-like fallback >> (including for >> sh or no), *without* encumbering the IDs with superfluous information. > >This would be a compelling argument, if fallback were the sole reason for >extlang. >However, fallback is not the sole motivation for extlangs; they are also >of use to taggers with incomplete knowledge of the languages used in the >materials they are tagging. The library staff in my home town would be >doing well if they correctly recognized material as "zh" or "ar". It would >be quite unrealistic to expect them to be any more precise. > >Of course we'd all like everyone who has to tag material to have perfect >knowledge of the languages involved so that tags with sufficient precision >and accuracy would be used. But we also know that in reality people work >with incomplete knowledge. Consequently, I think we should allow people to >who by necessity tag with low precision to nonetheless do so accurately. > >> > > 2. it is sufficiently better to warrant making the language tags more >> > > complicated by the addition of this mechanism. >> > >> > Language tags become more complicated *if* it is desired to make them >> > so. Those who find "zh" sufficient may continue to use it while still >> > interoperating with "zh-cmn", "zh-yue", and so on. Existing deployed >> > matchers will continue to work, as will existing deployed software >> > that understands specific tags; they will not need to become more >> > complicated to understand the out-of-band relationship between "zh", >> > "cmn", "yue", etc. >> >> >> This is untrue. As soon as we implemented extlang in prototype, we ran into >> the problems listed above. It *didn't* work out of the box. > >I'm missing something. Precisely what scenario was it that was expected to >work that did not? > >Randy > > > >_______________________________________________ >Ltru mailing list >Ltru@ietf.org >https://www1.ietf.org/mailman/listinfo/ltru #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp _______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] Re: IANA, UTF-8 and the Language Subtag Re… Frank Ellermann
- [Ltru] Re: extlang Doug Ewell
- [Ltru] Re: IANA, UTF-8 and the Language Subtag Re… Doug Ewell
- [Ltru] Re: extlang Doug Ewell
- Re: [Ltru] Re: extlang Gerard Meijssen
- [Ltru] Re: extlang Doug Ewell
- RE: [Ltru] Re: extlang Peter Constable
- Re: [Ltru] Re: extlang John Cowan
- Re: [Ltru] Re: IANA, UTF-8 and the Language Subta… David Conrad
- Re: [Ltru] Re: extlang Doug Ewell
- Re: [Ltru] Re: extlang Karen_Broome
- Re: [Ltru] Re: extlang John Cowan
- Re: [Ltru] Re: IANA, UTF-8 and the Language Subta… Doug Ewell
- Re: [Ltru] Re: extlang Doug Ewell
- [Ltru] Re: extlang Doug Ewell
- Re: [Ltru] Re: extlang John Cowan
- [Ltru] Numerical region subtags (RE: extlang) Don Osborn
- Re: [Ltru] Numerical region subtags (RE: extlang) Addison Phillips
- [Ltru] Re: Numerical region subtags (RE: extlang) Doug Ewell
- [Ltru] RE: Numerical region subtags (RE: extlang) Don Osborn
- [Ltru] Re: extlang Doug Ewell
- [Ltru] Re: extlang Doug Ewell
- [Ltru] Re: extlang Doug Ewell
- [Ltru] Re: extlang Doug Ewell
- [Ltru] Re: extlang Doug Ewell
- [Ltru] extlang Mark Davis
- Re: [Ltru] extlang Randy Presuhn
- [Ltru] Re: extlang John Cowan
- [Ltru] Re: extlang Mark Davis
- Re: [Ltru] Re: extlang Martin Duerst
- Re: [Ltru] extlang Martin Duerst
- Re: [Ltru] Re: extlang Martin Duerst
- Re: [Ltru] Re: extlang Karen_Broome
- Re: [Ltru] Re: extlang Randy Presuhn
- Re: [Ltru] Re: extlang John Cowan
- RE: [Ltru] Re: extlang Peter Constable
- Re: [Ltru] Re: extlang Mark Davis
- Re: [Ltru] Re: extlang Marion Gunn
- Re: [Ltru] Re: extlang Marion Gunn
- Re: [Ltru] Re: extlang Addison Phillips
- Re: [Ltru] Re: extlang Marion Gunn
- [Ltru] Use of the string "microlangauges" Randy Presuhn
- Re: [Ltru] Re: extlang Karen_Broome
- Re: [Ltru] Re: extlang John Cowan
- RE: [Ltru] Re: extlang Peter Constable
- Re: [Ltru] Re: extlang Mark Davis
- Re: [Ltru] Re: extlang Doug Ewell
- [Ltru] Re: extlang Doug Ewell
- Re: [Ltru] Re: extlang Marion Gunn
- Re: [Ltru] Re: extlang John Cowan
- Re: [Ltru] Re: extlang GerardM
- [Ltru] Re: extlang Doug Ewell
- Re: [Ltru] Re: extlang John Cowan
- Re: [Ltru] Re: extlang Doug Ewell
- Re: [Ltru] Re: extlang John Cowan
- RE: [Ltru] Re: extlang Peter Constable
- Re: [Ltru] Re: extlang Addison Phillips
- Re: [Ltru] Re: extlang John Cowan
- Re: [Ltru] Re: extlang Doug Ewell
- Re: [Ltru] Re: extlang Marion Gunn
- [Ltru] Availability of ISO documents (Was: extlang Stephane Bortzmeyer
- Re: [Ltru] Availability of ISO documents (Was: ex… John Cowan
- [Ltru] Re: Availability of ISO documents (Was: ex… Stephane Bortzmeyer
- [Ltru] Re: Availability of ISO documents (Was: ex… Marion Gunn