Re: [Ltru] Macrolanguage usage

"Mark Davis" <mark.davis@icu-project.org> Fri, 16 May 2008 18:08 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 16DC73A6B3C; Fri, 16 May 2008 11:08:42 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id CD6303A6B3C for <ltru@core3.amsl.com>; Fri, 16 May 2008 11:08:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.976
X-Spam-Level:
X-Spam-Status: No, score=-1.976 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZQRM6wH8FxGx for <ltru@core3.amsl.com>; Fri, 16 May 2008 11:08:38 -0700 (PDT)
Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.172]) by core3.amsl.com (Postfix) with ESMTP id CA63B3A6ACF for <ltru@ietf.org>; Fri, 16 May 2008 11:08:37 -0700 (PDT)
Received: by ug-out-1314.google.com with SMTP id u2so1857103uge.46 for <ltru@ietf.org>; Fri, 16 May 2008 11:08:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=4R92JvwnBBPs5BRStAJttimUyBheOvrtTlOGutyAfps=; b=msIyfiKsodhCi6YA3ZObZIn0ruVYvzbLrWs0u4Y0bvM/CeHtoBU+Deh0rDR57oeu6PKf5lvcur7XqwIyUMV9JrFn8KppOX1hFxSvC0wDAhOHQu0zdt4XWLco6nf5DZr8XVLsoP4OyfNrvgie/pAvuHoAtR+NYwxd67oQ9C503ns=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=tvrKjDxUu9LOr1A1y+CSsfss4LFTWQuXunV72SRpiOPfRqZ8TZb2rvtsqG3B/zqipic3PovSZ1o1BtTq2VXClLTAKQC8kWHXprLcVfdPNeTHazkKO+jZxEPveDoyfyKvrBffRamgnsTD2fX3aguBtIjlUp6JAQAxzE6iO+qVbL8=
Received: by 10.150.83.29 with SMTP id g29mr3882281ybb.148.1210961307850; Fri, 16 May 2008 11:08:27 -0700 (PDT)
Received: by 10.150.206.3 with HTTP; Fri, 16 May 2008 11:08:27 -0700 (PDT)
Message-ID: <30b660a20805161108w578b6cf9g11933ca34996a596@mail.gmail.com>
Date: Fri, 16 May 2008 11:08:27 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Doug Ewell <doug@ewellic.org>
In-Reply-To: <00a901c8b6f5$c04529a0$e6f5e547@DGBP7M81>
MIME-Version: 1.0
References: <mailman.494.1210865385.5128.ltru@ietf.org> <00a901c8b6f5$c04529a0$e6f5e547@DGBP7M81>
X-Google-Sender-Auth: fc043201fc1915c4
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] Macrolanguage usage
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0004073183=="
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

The text does NOT say that. And I think there is a confusion regarding what
is *permissible* and what is "a good idea". Let me illustrate the difference
between these.

   - It is *permissible* (morally and legally) for me to walk through back
   alleys in the San Francisco Tenderloin district at 3 in the morning,
   blindfolded, with hundred dollar bills pinned all over my body.
   - It is, however, not a *good idea*.


So look again at language tags.

*Non-Predominant Encompassed Languages
*

   - It is *permissible* to tag Cantonese text with "und" or "zh" or "yue".
      - It is a *good idea* to tag/lookup with "yue".
   - It is *permissible* to tag Tajiki Arabic text with "und" or "ar" or
   "abh".
      - It is a *good idea* to tag/lookup with "abh".

This point should not be at issue, since we say in the spec *and have said
for some time* that people should tag as specifically as needed. Just
tagging with "zh" or "ar" would not be sufficient to distinguish it from the
vast amount of material in the predominant encompassed language.


*Predominant Encompassed Languages*

The only thing that the text is saying is that when there is a predominant
language, for backwards compatibility -- AND with some careful caveats --
the above is broadened. This is *not* with respect to permissibility, but
with respect to recommended practice.

   - It is *permissible* to tag Mandarin text with "und" or "zh" or "cmn".
      - It is a *good idea* to tag/lookup with "cmn"*, except that for
      backwards compatibility, "zh" may be better for many implementations.*
   - It is *permissible* to tag Standard Arabic text with "und" or "ar" or
   "arb".
      - It is a *good idea* to tag/lookup with "arb"*, except that for
      backwards compatibility, "ar" may be better for many implementations.*

That lets people like Google, which has a very serious and important issue
with backwards compatibility, support the vast amount of Mandarin/Arabic/...
that is tagged with "zh", and continue to tag Mandarin content with "zh". It
also lets people like Shawn's group at Microsoft, having no backwards
compatibility issues with "cmn" and "zh", to shift over immediately.

It is also orthogonal to the recommendation for non-predominant encompassed
languages, where we would tag with the more specific tags: "yue",
"yue-Hant-HK", "yue-Hant-US", "abh-AF", and so on, once they are available.

Mark

On Thu, May 15, 2008 at 6:39 PM, Doug Ewell <doug@ewellic.org> wrote:

> Mark Davis <mark dot davis at icu dash project dot org> wrote:
>
> > - where content written in an encompassed language is also
> > understandable in the predominant language (that being a distinct
> > language encompassed by the same macrolanguage), the content could
> > also be tagged with the macrolanguage identifier. Thus if a Cantonese
> > passage is understandable if read as Mandarin, it could also be tagged
> > with "zh", or where a Tajiki Arabic passage is also understandable in
> > Standard Arabic it could be tagged with "ar".
>
> Cantonese "is a" Chinese, it falls under the Chinese macrolanguage
> umbrella, and therefore it could legitimately be tagged with "zh"
> whether or not it is understandable if read as Mandarin.
>
> Unfortunately, this wording takes us back to the mindset that Chinese =
> Mandarin = "zh", and other Chinese languages are not "zh".  I thought we
> had just agreed not to do that.
>
> --
> Doug Ewell  *  Arvada, Colorado, USA  *  RFC 4645  *  UTN #14
> http://www.ewellic.org
> http://www1.ietf.org/html.charters/ltru-charter.html
> http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
>
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www.ietf.org/mailman/listinfo/ltru
>



-- 
Mark
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru