[Ltru] Review of 4646bis-10, macrolanguages in section 4.1

John Cowan <cowan@ccil.org> Fri, 07 December 2007 21:56 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1J0lAu-0007MN-O5; Fri, 07 Dec 2007 16:56:00 -0500
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1J0lAt-00078z-Pt for ltru-confirm+ok@megatron.ietf.org; Fri, 07 Dec 2007 16:55:59 -0500
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1J0lAt-0006zb-5c for ltru@ietf.org; Fri, 07 Dec 2007 16:55:59 -0500
Received: from earth.ccil.org ([192.190.237.11]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1J0lAs-0006i4-JC for ltru@ietf.org; Fri, 07 Dec 2007 16:55:59 -0500
Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from <cowan@ccil.org>) id 1J0lAr-0002K2-Qi for ltru@ietf.org; Fri, 07 Dec 2007 16:55:57 -0500
Date: Fri, 07 Dec 2007 16:55:57 -0500
To: ltru@ietf.org
Message-ID: <20071207215557.GD3346@mercury.ccil.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.5.13 (2006-08-11)
From: John Cowan <cowan@ccil.org>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 52f7a77164458f8c7b36b66787c853da
Subject: [Ltru] Review of 4646bis-10, macrolanguages in section 4.1
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

Because of the thicket of rewordings in this part, I'm just presenting
my suggested revised text here.  It's very important to make sure
that we don't talk about "dialects" or "sub-languages" here.  Also,
I've used "Macrolanguage" for the header only, but "macrolanguage"
for the languages.

The affected text begins "Languages with a Macrolanguage field" and ends
"did not specify zh-Hans-CN in their request.)".

        Some of the languages in the registry are labeled
        "macrolanguages" by ISO 639-3, which defines the term as
        "clusters of closely-related language varieties that [...] can
        be considered distinct individual languages, yet in certain
        usage contexts a single language identity for all is needed".
        These correspond to codes registered in ISO 639-2 as single
        languages that were found to correspond to more than one language
        in ISO 639-3.  The languages encompassed by a macrolanguage
        contain a Macrolanguage header in the registry; the macrolanguages
        themselves are not specially marked.

        It is always permitted, and sometimes useful, to tag an
        encompassed language using the subtag for its macrolanguage.
        However, the Macrolanguage field doesn't define what the
        relationship is between the encompassed language and its
        macrolanguage, nor does it define how languages encompassed by the
        same macrolanguage are related to each other.  In some cases, In
        some cases, one of the encompassed languages serves as a standard
        form for the entire macrolanguage and is frequently identified
        with it; in other cases there is no dominant language, and the
        macrolanguage simply serves as a cover term for the entire group.

        Applications MAY use macrolanguage information to improve matching
        or language negotiation.  For example, the information that 'sr'
        (Serbian) and 'hr' (Croatian) share a macrolanguage expresses
        a closer relation between those languages than between, say,
        'sr' (Serbian) and 'ma' (Macedonian).  It is valid to use the
        subtag of the encompassed language or of the macrolanguage to
        form language tags.  However, many matching applications will
        not be aware of the relationship between the languages.  Care in
        selecting which subtags are used is crucial to interoperability.

        In general, use the most specific tag.  However, where the
        macrolanguage tag has been historically used to denote a dominant
        encompassed language, it SHOULD be used in place of the subtag
        specific to that encompassed language unless it is necessary
        to clearly distinguish the macrolanguage as a whole from the
        dominant language variety.

	In particular, the Chinese family of languages call for special
	consideration.	Because the written form is very similar for most
	languages having 'zh' as a macrolanguage (and because historically
	subtags for the various encompassed languages were not available),
	languages such as 'yue' (Cantonese) have historically used
	either 'zh' or a tag (now grandfathered) beginning with 'zh'.
	This means that macrolanguage information can be usefully
	applied when searching for content or when providing fallbacks
	in language negotiation.  For example, the information that 'yue'
	has a macrolangauge of 'zh' could be used in the Lookup algorithm
	to fallback from a request for "yue-Hans-CN" to "zh-Hans-CN"
	without losing the script and region information (even though
	the user did not specify "zh-Hans-CN" in their request).

--
John Cowan              cowan@ccil.org          http://www.ccil.org/~cowan
Any day you get all five woodpeckers is a good day.  --Elliotte Rusty Harold


_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru