Re: [Ltru] my technical position on extlang
"Mark Davis" <mark.davis@icu-project.org> Fri, 23 May 2008 21:06 UTC
Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1212A3A6D08; Fri, 23 May 2008 14:06:02 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 284DF28C127 for <ltru@core3.amsl.com>; Fri, 23 May 2008 14:05:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.776
X-Spam-Level:
X-Spam-Status: No, score=-1.776 tagged_above=-999 required=5 tests=[AWL=0.200, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9zYDB6ohKqan for <ltru@core3.amsl.com>; Fri, 23 May 2008 14:05:52 -0700 (PDT)
Received: from yw-out-2324.google.com (yw-out-2324.google.com [74.125.46.28]) by core3.amsl.com (Postfix) with ESMTP id 320FF3A6D04 for <ltru@ietf.org>; Fri, 23 May 2008 14:05:51 -0700 (PDT)
Received: by yw-out-2324.google.com with SMTP id 3so462697ywj.49 for <ltru@ietf.org>; Fri, 23 May 2008 14:05:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=wsNs3AH+yTxUZXQKQcximcCfI/gP1ggIDN0QyLXykx8=; b=JRkYkNnDmMUY2ahGgWKdHPr7P8+J75QC3xFqs2m05DFOmuIA4X622msuihNPctB72QPoMY+tHyxyAEv3idiExCfTKOe8tlS4fNLs+X5q9kPyjC2B55cf4TTWBJImpV9FJAPq3Bk5FXjaqXbm+EIQ0hiJPY+ywhGmnwETR24DMO4=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=LMt3aBWPg3UvZkEb9zi313r+G6daCodFqG4F5HrYdjIaOkHme1l6dcVtLHRS+k83I/eWI9e5mKVdZSkD36zPjxGAgSTPnuz0WYpbPDJR0fwA2uKisfb+NKn6s3yfhh58ahOIBdlhYvCvXKzwe7PZA3OaIcw83jvA8X2XpRSYwVE=
Received: by 10.150.11.6 with SMTP id 6mr458551ybk.9.1211576741249; Fri, 23 May 2008 14:05:41 -0700 (PDT)
Received: by 10.150.206.3 with HTTP; Fri, 23 May 2008 14:05:41 -0700 (PDT)
Message-ID: <30b660a20805231405q56b156c4vbb3b6abda4af3893@mail.gmail.com>
Date: Fri, 23 May 2008 14:05:41 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: John Cowan <cowan@ccil.org>
In-Reply-To: <20080523160905.GD21554@mercury.ccil.org>
MIME-Version: 1.0
References: <30b660a20805181149u2e1e3fb9y1a3b5b751c3e6998@mail.gmail.com> <20080523044305.GB7960@mercury.ccil.org> <30b660a20805230851r519f5d14wd93a92494d1db1c9@mail.gmail.com> <20080523160905.GD21554@mercury.ccil.org>
X-Google-Sender-Auth: 1315fd60074bc2b7
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] my technical position on extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0597968745=="
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org
A general comment, repeated below. People keep confusing two orthogonal concepts, and I think this is at the root of the extlang issue. These are: 1. "get me languages that are mutually intelligible with X" (maybe to degree Y), and 2. "get me the languages that have the same macrolanguage as X" Number 1 is very interesting, and would be very useful; but it is not at all the same as #2. If macrolanguage were defined as #1, I would probably be all in favor of baking it into extlangs. But is not at all the same, as many, many examples illustrate. Moreover, "mutual intelligibility" differs whether the content is written or spoken - forcing it to be baked into the syntax does not allow for that difference. Also, as I read your response, I think at least part of our apparent differences is the use of different terminology. I was using "content negotiation" in the lookup sense, which is what is typically done with Accept-Language (not always, but typically). That is, the client is supplying a list of languages, perhaps with q values, and the expectation is that s/he will get one thing back. For example, a web page in one of the requested languages, but could be any sort of resource. And in such interactions, you do want lookup on the individual items; "ar" is not treated like "ar(-.*)" Filtering is much different. An example would be: give me a list of all the documents in my index that are "fr, ar, sku" that have X in field 5. In that case, I do want to treate "ar" as if it were "ar(-.*)". Accept-Language could also be used for this, but with a different meaning. Bearing that in mind, let me see if I can make some useful responses. On Fri, May 23, 2008 at 9:09 AM, John Cowan <cowan@ccil.org> wrote: > Mark Davis scripsit: > > [points 1, 2, 3a snipped] > > These all amount to "Some implementations are buggy and don't follow > RFC 2616." Such bugs that could be fixed at any time, in which > case the servers would come into compliance. Historical practice in > tagging documents is important, buggy protocol implementations are not. > (A variant of the ISO C motto about "code matters, implementations > don't".) I was really referring to the ' lookup content negotiation' side of things, which may or may not be relevant to what you are discussing. And I think reality is not completely irrelevant to our discussion. ;-) > > > [3b] - Interpreting 'ar' as Standard Arabic is permissible now > > and after, > > Certainly. > > > With the above list, the practical impact will be that it will stop at > > 'ar' anyway and return Standard Arabic. > > As I say, that's a clear violation of the RFC. No, not if we are talking about content negotiation (lookup) rather than filtering. (this may be due to our differences in terminology) > > > > The odds of an implementation supporting "pga" but not ar/arb are > > pretty darn'd low. > > Unless it just happens to be a site with lots of audio from around > the arabophone world. *Across all sites* all non-standard Arabics are > rare, but that doesn't apply to *specific* sites that really need to do > language negotiation. Again, if we are talking about content negotiation, where we return a 'best fit', it is going to be a rare case where 'pga' is supported but 'ar' is not. So if one stops at 'ar', "pga" would never be seen, whether it were expressed as "ar-pga" or not. If we are talking about real filtering (all documents that match the list), see below. > > > Even if the server doesn't support extlang matching, you can get exactly > > what you said you want -- reliably -- with an explicit list: > > > > arz, arb, ar, arq, aao, bbz, abv, acy, adf, avl, afb, ayh, acw, ayl, acm, > > ary, ars, apc, ayp, acx, aec, ayn, ssh, ajp, apd, pga, acq, abh, aeb, auz > > Which is just what I said. However, that isn't very stable with time > as new Arabic varieties get encoded, not to mention extremely obnoxious > to the user without special support in the client. But again, this is an extremely unusual query, since these languages are not mutually comprehensible. And the very fact that you excluded "shu" in your example means that you don't want all the things that could be encompassed by "ar". If a new one shows up in the future, maybe you can understand it, maybe you can't and would want it excluded like "shu". And what happens with other macrolanguages; will a speaker of gan be able to understand hakka? Better to just explicitly list the ones that are wanted. > > > And this will work even if filtering isn't supported according to the > spec; > > and note that we have no guarantee that the server will filter according > to > > the spec -- based on the experience with Accept-Language, it's actually > > unlikely. > > Most authors don't tag according to the spec either, so why have language > tags at all? Search engines can and should ignore them; that doesn't > mean they are worthless for other purposes. We design around the central > point that people tag wisely and programs behave properly, which is not > necessarily the best *implementation* strategy for special cases like > search engines. That isn't my point. My point is that the explicit list is - currently works. - handles everything that extlang could, with finer control - more powerful, since you can express things that extlang simply can't (eg "ro, mo, no, nn, nb"), and provide exactly the list you want. > > > And really, what are the odds that someone wants that exact list > > above? And MUST have it supported with a short enumeration? And does > > not care if additional encompassed languages are added? According > > to information from Peter, the microlanguages are all independent > > languages, meaning that they are not mutually comprehensible. > > The Arabic languages don't have full mutual intelligibility, but that's not > the same as saying intelligibility is zero. I specifically picked Shuwa > (Chadian/Nigerian) to exclude because it's very remote from most others. People keep confusing two orthogonal concepts, and I think this is at the root of the extlang issue. 1. "get me languages that are mutually intelligible with X" (maybe to degree Y), and 2. "get me the languages that have the same macrolanguage as X" Number 1 is very interesting, and would be very useful; but it is not at all the same as #2. If macrolanguage were defined as #1, I would probably be all in favor of baking it into extlangs. But is not at all the same. > - Languages that are *very* closely related do not have a macrolanguage > > ("ro" and "mo"). > > I know. That doesn't mean not to take advantage of the ISO 639-3 > information we do have. See comment on mutual intelligibility. > > > I don't see extlang as any really practical benefit in filtering. It > isn't > > useful in a great many other cases of related languages; if I want to use > > with nn, nb, no it doesn't help, nor with ro/mo, various varieties of > > German, and so on. Moreover, if a customer searches for 'ar' and get a > bunch > > of documents in shu, they are as or more likely to be unhappy as if they > > search for 'de' documents and get back a bunch of 'gsw'. > > As I keep saying, *search* isn't the issue here. It's *negotiation > by filtering*. See my discussion above. > > > > Putting some ISO language tags into the extlang position just because > they > > have a macrolanguage is an unnecessary complication for implementations > and > > represents a substantive, controversial change to RFC 4646. > > *Now* it does, yes. All that shows is that I should have thought of > this argument before giving in earlier. I didn't quite understand this. > > > -- > Values of beeta will give rise to dom! John Cowan > (5th/6th edition 'mv' said this if you tried http://www.ccil.org/~cowan<http://www.ccil.org/%7Ecowan> > to rename '.' or '..' entries; see cowan@ccil.org > http://cm.bell-labs.com/cm/cs/who/dmr/odd.html) > -- Mark
_______________________________________________ Ltru mailing list Ltru@ietf.org https://www.ietf.org/mailman/listinfo/ltru
- [Ltru] my technical position on extlang Martin Duerst
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang Gerard Meijssen
- Re: [Ltru] my technical position on extlang Debbie Garside
- [Ltru] my technical position on extlang Mark Davis
- Re: [Ltru] my technical position on extlang Doug Ewell
- Re: [Ltru] my technical position on extlang Peter Constable
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- Re: [Ltru] my technical position on extlang Doug Ewell
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- Re: [Ltru] my technical position on extlang Doug Ewell
- Re: [Ltru] my technical position on extlang Peter Constable
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang Martin Duerst
- Re: [Ltru] my technical position on extlang Mark Davis
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang Mark Davis
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang Mark Davis
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang Mark Davis
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang Randy Presuhn
- Re: [Ltru] my technical position on extlang Peter Constable
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- Re: [Ltru] my technical position on extlang Gerard Meijssen
- Re: [Ltru] my technical position on extlang Mark Davis
- Re: [Ltru] my technical position on extlang Mark Davis
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang Peter Constable
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- Re: [Ltru] my technical position on extlang Gerard Meijssen
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- Re: [Ltru] my technical position on extlang John Cowan
- [Ltru] What people want (Was: my technical positi… Stephane Bortzmeyer
- Re: [Ltru] my technical position on extlang Stephane Bortzmeyer
- Re: [Ltru] What people want (Was: my technical po… Mark Davis
- Re: [Ltru] my technical position on extlang John Cowan
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- Re: [Ltru] my technical position on extlang Peter Constable
- Re: [Ltru] my technical position on extlang Peter Constable
- Re: [Ltru] my technical position on extlang Peter Constable
- Re: [Ltru] What people want (Was: my technical po… Peter Constable
- Re: [Ltru] my technical position on extlang Nicolas Krebs
- Re: [Ltru] my technical position on extlang Kent Karlsson
- Re: [Ltru] my technical position on extlang Shawn Steele
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- Re: [Ltru] my technical position on extlang Leif Halvard Silli
- [Ltru] [OT] Logic (was: Re: my technical position… Martin Duerst
- Re: [Ltru] [OT] Logic (was: Re: my technical posi… Peter Constable