Re: [Ltru] extlang

Martin Duerst <duerst@it.aoyama.ac.jp> Wed, 29 August 2007 03:59 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IQEii-0004dd-Qp; Tue, 28 Aug 2007 23:59:56 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IQEih-0004bQ-Mq for ltru-confirm+ok@megatron.ietf.org; Tue, 28 Aug 2007 23:59:55 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IQEih-0004bA-Ch for ltru@ietf.org; Tue, 28 Aug 2007 23:59:55 -0400
Received: from scmailgw1.scop.aoyama.ac.jp ([133.2.251.194]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IQEif-00036U-QC for ltru@ietf.org; Tue, 28 Aug 2007 23:59:55 -0400
Received: from scmse2.scbb.aoyama.ac.jp (scmse2 [133.2.253.17]) by scmailgw1.scop.aoyama.ac.jp (secret/secret) with SMTP id l7T3xq6c020210 for <ltru@ietf.org>; Wed, 29 Aug 2007 12:59:52 +0900 (JST)
Received: from (133.2.206.133) by scmse2.scbb.aoyama.ac.jp via smtp id 5df7_4bd67492_55e4_11dc_8922_0014221f2a2d; Wed, 29 Aug 2007 12:59:52 +0900
X-AuthUser: duerst@it.aoyama.ac.jp
Received: from Tanzawa.it.aoyama.ac.jp ([133.2.210.1]:41268) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S12635A> for <ltru@ietf.org> from <duerst@it.aoyama.ac.jp>; Wed, 29 Aug 2007 12:56:58 +0900
Message-Id: <6.0.0.20.2.20070829120052.05bc6e60@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Version 6J
Date: Wed, 29 Aug 2007 12:11:43 +0900
To: Randy Presuhn <randy_presuhn@mindspring.com>, LTRU Working Group <ltru@ietf.org>
From: Martin Duerst <duerst@it.aoyama.ac.jp>
Subject: Re: [Ltru] extlang
In-Reply-To: <001501c7e9c3$71f00b80$6801a8c0@oemcomputer>
References: <30b660a20708281459r6000d746qe007f2882fae6d73@mail.gmail.com> <001501c7e9c3$71f00b80$6801a8c0@oemcomputer>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 8f374d0786b25a451ef87d82c076f593
Cc:
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

As a technical contributor, I quite agree with Randy.
See below for more details.

At 07:33 07/08/29, Randy Presuhn wrote:
>Hi -
>
>As a technical contributor...
>
>> From: "Mark Davis" <mark.davis@icu-project.org>
>> To: "John Cowan" <cowan@ccil.org>
>> Cc: "LTRU Working Group" <ltru@ietf.org>
>> Sent: Tuesday, August 28, 2007 2:59 PM
>> Subject: [Ltru] extlang
>...
>> You are not revealing some important hidden assumptions in your statements.
>> 
>>    1. The macrolanguage is always a better fallback for every encompassed
>>    language than other alternatives. Out of the many encompassed languages, you
>>    implying that a speaker of every encompassed language will be able to
>>    understand the macrolanguage, or at least better than the alteratives.
>
>I don't see how the use of extlang would require this as an assumption.
>Making such an assumption seems a bit like assuming that all languages whose
>tags begin in "a" are somehow related.
>
>>    2. People don't lose anything by having the fallback. I dispute this
>>    as well. As previously noted:
>>    1. Truncation fallback from zh-cmn-Hant-SG to "zh" loses the Hant and
>>       the SG; falling back from ar-arb-SA to 'ar' loses the "SA".
>
>It is the nature of truncation fallback to lose information.  No matter
>what order the subtags are trimmed off, someone will be able to argue
>that for some particular case, a different order might have been better.
>This isn't an argument against extlang; it's an argument against unrealistically
>high expectations for truncation fallback.
>
>>       2. It introduces ambiguous language names. Right now, in the
>>       overwhelming majority of practice, standard Arabic is "ar"; after the
>>       change, standard Arabic is "ar-arb".
>
>This would not be desirable.  However, I wonder whether the semantic associated
>by most taggers and users of tags with "ar" is "standard Arabic" or merely 
>"Arabic".

Well, probably both ways. Most written Arabic is standard Arabic,
I guess, so even if you assume "ar" just means "whatever Arabic it
means", most stuff tagged "ar" will be Standard Arabic.

The question is whether we can make that assumption stronger,
or whether we can live with such an uncertainity.

I think to some extent, we have always done this.
As an example, "ja" means Japanese, but it also, by suppress-script,
means Japanese written in Kanji-Kana-mixture, and it also, by
"tag wisely", means Japanese as used in Japan.

In general, "tag wisely" seems to include "tag special cases
very precisely, general cases can be tagged more shortly".

So in practice using ar rather than ar-arb for Standard Arabic
seems to be extremely feasible. The question may be how we
can say that in our draft without contradicting ourselves,
and/or how we can help practice moving that way even if we
don't say so in the spec.

>>    3. People can't get along without this.
>>    1. Anyone who has to deal with language issues on all but a trivial
>>       level must already have a mechanism to deal with sh, sr, hr; with no, nb,
>>       nn. Those are macrolanguages and encompassed languages. They
>> exist right now
>>       WITHOUT an extlang mechanism, and people deal with them. The proposed
>>       mechanism won't handle these, and anyone who can handle these
>> doesn't need
>>       extlang.
>
>I think this argument is flawed in that it neglects the cost of supporting
>such constellations of languages.  There are already some messes that we're
>stuck with, and that have to be handled in an ad hoc manner.  This doesn't
>justify requiring ad hoc handling for every other such constellation of
>languages.

Thinking about what problems Mark actually has, it may be that
what he is saying is: I know I have to deal with some things as
special cases, but I'd like to limit these to single-tag cases
and not to have to combine fallback code and special-case code.

Mark, if this is what you are after, please confirm, and maybe
give some details. Others, if that doesn't look like it would
work, please give some counterexamples.

Regards,    Martin.

>>       2. With the Macrolanguage field, there is sufficient information
>>       for *anyone* who wants to to implement extlang-like fallback
>> (including for
>>       sh or no), *without* encumbering the IDs with superfluous information.
>
>This would be a compelling argument, if fallback were the sole reason for 
>extlang.
>However, fallback is not the sole motivation for extlangs; they are also
>of use to taggers with incomplete knowledge of the languages used in the
>materials they are tagging.  The library staff in my home town would be
>doing well if they correctly recognized material as "zh" or "ar".  It would
>be quite unrealistic to expect them to be any more precise.
>
>Of course we'd all like everyone who has to tag material to have perfect
>knowledge of the languages involved so that tags with sufficient precision
>and accuracy would be used.  But we also know that in reality people work
>with incomplete knowledge.  Consequently, I think we should allow people to
>who by necessity tag with low precision to nonetheless do so accurately.
>
>> > >    2. it is sufficiently better to warrant making the language tags more
>> > >    complicated by the addition of this mechanism.
>> >
>> > Language tags become more complicated *if* it is desired to make them
>> > so.  Those who find "zh" sufficient may continue to use it while still
>> > interoperating with "zh-cmn", "zh-yue", and so on.  Existing deployed
>> > matchers will continue to work, as will existing deployed software
>> > that understands specific tags; they will not need to become more
>> > complicated to understand the out-of-band relationship between "zh",
>> > "cmn", "yue", etc.
>> 
>> 
>> This is untrue. As soon as we implemented extlang in prototype, we ran into
>> the problems listed above. It *didn't* work out of the box.
>
>I'm missing something.  Precisely what scenario was it that was expected to
>work that did not?
>
>Randy
>
>
>
>_______________________________________________
>Ltru mailing list
>Ltru@ietf.org
>https://www1.ietf.org/mailman/listinfo/ltru


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     



_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru