Re: [Ltru] Macrolanguage usage

"Mark Davis" <mark.davis@icu-project.org> Thu, 15 May 2008 21:31 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 75C263A6A7F; Thu, 15 May 2008 14:31:50 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B5A003A69BC for <ltru@core3.amsl.com>; Thu, 15 May 2008 14:31:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.982
X-Spam-Level:
X-Spam-Status: No, score=-1.982 tagged_above=-999 required=5 tests=[AWL=-0.006, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Nxii+iWHNYoM for <ltru@core3.amsl.com>; Thu, 15 May 2008 14:31:43 -0700 (PDT)
Received: from yw-out-2324.google.com (yw-out-2324.google.com [74.125.46.30]) by core3.amsl.com (Postfix) with ESMTP id 24D473A693D for <ltru@ietf.org>; Thu, 15 May 2008 14:31:43 -0700 (PDT)
Received: by yw-out-2324.google.com with SMTP id 3so339062ywj.49 for <ltru@ietf.org>; Thu, 15 May 2008 14:31:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=0/OdiEJ3kNTfZwjkxZuXiTW30WJWWBS2fhND3Hjydiw=; b=fjy6oxNaDOGdnCdZGEWcxoi2YWv564MnZHBFNC9kVyQI6TAlozF8FXmU8dgYMENTndEQ/Pz2J6wCuEOyGph+UBUPyfw9QtGxeFgxH1nzUClGmgCf1Sn7ezBp9TSxxB8fxs9tkAbP05mczzEOQRE0CQ4xkKmKtGd8GjOf5+W8Ntg=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=GE3eMl03zq05zWDvLIizXWrAiYCPgpD4+2G2roX0OA0iWjctkI6ygbS1k7ILs3CH5PXjOGttoUTeqOykwIxQyExWb7to9CZQUPAupstKSNDYPwvU48mh9FSnY52e9XhY3uruNy9Z5zuyn2XqQnpnbyTwe8LzoH0acmJlN/e7NLI=
Received: by 10.150.79.32 with SMTP id c32mr2856517ybb.145.1210887088685; Thu, 15 May 2008 14:31:28 -0700 (PDT)
Received: by 10.150.206.3 with HTTP; Thu, 15 May 2008 14:31:28 -0700 (PDT)
Message-ID: <30b660a20805151431w4a2f47dem32d566b26ee4a4c1@mail.gmail.com>
Date: Thu, 15 May 2008 14:31:28 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Shawn Steele <Shawn.Steele@microsoft.com>
In-Reply-To: <C9BF0238EED3634BA1866AEF14C7A9E56155D47BB5@NA-EXMSG-C116.redmond.corp.microsoft.com>
MIME-Version: 1.0
References: <30b660a20805150829hda2c1e4p114504a973843543@mail.gmail.com> <C9BF0238EED3634BA1866AEF14C7A9E56155D47AF4@NA-EXMSG-C116.redmond.corp.microsoft.com> <30b660a20805151400g7f84bc7em81304f19c6b969cc@mail.gmail.com> <C9BF0238EED3634BA1866AEF14C7A9E56155D47BB5@NA-EXMSG-C116.redmond.corp.microsoft.com>
X-Google-Sender-Auth: 9c4eec6f4ae6d4c6
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] Macrolanguage usage
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0415817876=="
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

Ah, now I understand. We'd decided some time ago to try to put the
discussion of the effects of macro/micro languages into 4646 rather than
4647, just to keep from having to open it up.

Mark

On Thu, May 15, 2008 at 2:26 PM, Shawn Steele <Shawn.Steele@microsoft.com>
wrote:

>  Its not clear to me where 4647 says that zh-SG and cmn-SG may be treated
> as equal (or may return different content).  On the contrary, it seems that
> applications needn't have any additional information.
>
>
>
> "Applications, protocols, and specifications are not required to validate
> or understand any of the semantics of the language tags or ranges or of the
> subtags in them"
>
>
>
> The canonicalizing text also discusses various forms, including zh-CN vs
> zh-Hans, but doesn't mention this cmn/zh variant.  It also mentions looking
> in the registry to recognize & canonicalize grandfathered tags, but neither
> zh nor cmn are grandfathered.
>
>
>
> It could be done with lists, but the idea is that a list isn't necessary
> for correct filtering/matching/lookup.  It also mentioned the
> remove-from-right behavior, but not its implications WRT zh/cmn.
>
>
>
> I may have overlooked where this is covered in 4647, but it seems to me
> like zh/cmn type matching isn't already covered by 4647.  Obviously it
> should be an optional thing, not a must thing, but I don't seem language
> permitting this sort of matching.
>
>
>
> - Shawn
>
>
>
>
>
>
>
> *From:* mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] *On
> Behalf Of *Mark Davis
> *Sent:* Thursday, May 15, 2008 2:01 PM
> *To:* Shawn Steele
> *Cc:* LTRU Working Group
> *Subject:* Re: [Ltru] Macrolanguage usage
>
>
>
> Perhaps it does, but I don't see why it would need to. Could you explain a
> bit more?
>
> Mark
>
> On Thu, May 15, 2008 at 1:09 PM, Shawn Steele <Shawn.Steele@microsoft.com>
> wrote:
>
> Sounds reasonable to me, but it seems like RFC4647 would need updated to
> handle the 2nd bullet.
>
>
>
> -          Shawn
>
>
>
> *From:* ltru-bounces@ietf.org [mailto:ltru-bounces@ietf.org] *On Behalf Of
> *Mark Davis
> *Sent:* Thursday, May 15, 2008 8:29 AM
> *To:* LTRU Working Group
> *Subject:* [Ltru] Macrolanguage usage
>
>
>
> Peter Constable and I were at the Unicode Technical Committee meeting, and
> had a chance to talk about macrolanguages. One of the key points is to
> provide implementers with enough guidance as to what they should do, while
> not precluding reasonable alternatives. Here's what we were thinking about
> (not in formal language, but the points we wanted to make).
>
> ===
>
> Formally, a macrolanguage identifier could be used to tag or look up
> content in any encompassed language; alternately, specific,
> individual-language identifiers could be used to tag or look up content in
> those languages. In consideration of these alternatives, the following
> provides guidance for how to maintain backwards compatibility while giving
> the ability to clearly tag and lookup content.
>
> 1. Implementations generally should tag and/or lookup content with the
> specific language where possible (this is the general recommendation with
> language tags). In the case of macrolanguages, this means that Cantonese
> should be tagged and looked up with "yue", Hakka with "hak", Tajiki Arabic
> with "abh", Plains Cree with "crk", and so on.
>
> 2. An exception to this general recommendation may apply in the case of
> macrolanguages with predominant forms, listed in Table 8. For backwards
> compatibility in those cases:
>
>    - an implementation could tag and/or lookup content in the predominant
>    language either with the macrolanguage or the encompassed language. (eg
>    either "ar" or "arb" for Standard Arabic).
>    - an implementation could make a distinction between these in lookup,
>    or could return the same content. That is, lookup for "zh-SG" and "cmn-SG"
>    may return the same content, or may return different content if the
>    implementation needs to make a distinction for some purpose.
>
>
>    - where content written in an encompassed language is also
>    understandable in the predominant language (that being a distinct language
>    encompassed by the same macrolanguage), the content could also be tagged
>    with the macrolanguage identifier. Thus if a Cantonese passage is
>    understandable if read as Mandarin, it could also be tagged with "zh", or
>    where a Tajiki Arabic passage is also understandable in Standard Arabic it
>    could be tagged with "ar".
>
> 3. Another exception to this general recommendation applies in the case of
> applications that have limitations that exclude the identifiers for
> encompassed, individual languages of a macrolanguage. For example, some
> content cataloguing systems limit language identifiers to those in ISO
> 639-2; as a result, they may support a macrolanguage identifier but not the
> identifiers for the encompassed languages of that macrolanguage.
>
> Mark and Peter
>
> --
> Mark
>
>
>
>
> --
> Mark
>



-- 
Mark
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru