Re: [Ietf-languages] Khmer orthographic reform

"Doug Ewell" <doug@ewellic.org> Thu, 31 October 2019 22:18 UTC

Return-Path: <doug@ewellic.org>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0584C120825 for <ietf-languages@ietfa.amsl.com>; Thu, 31 Oct 2019 15:18:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1_VNX_0aCuD0 for <ietf-languages@ietfa.amsl.com>; Thu, 31 Oct 2019 15:18:21 -0700 (PDT)
Received: from mork.alvestrand.no (mork.alvestrand.no [IPv6:2001:700:1:2::117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E1002120090 for <ietf-languages@ietf.org>; Thu, 31 Oct 2019 15:18:20 -0700 (PDT)
Received: by mork.alvestrand.no (Postfix) id EF9CF7C3A47; Thu, 31 Oct 2019 23:18:19 +0100 (CET)
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by mork.alvestrand.no (Postfix) with ESMTP id D49F47C3A3F for <ietf-languages@alvestrand.no>; Thu, 31 Oct 2019 23:18:18 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at alvestrand.no
Received: from mork.alvestrand.no ([127.0.0.1]) by localhost (mork.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id g2kURntYRACo for <ietf-languages@alvestrand.no>; Thu, 31 Oct 2019 23:18:16 +0100 (CET)
X-Greylist: from auto-whitelisted by SQLgrey-1.8.0
X-Greylist: from auto-whitelisted by SQLgrey-1.8.0
X-Comment: SPF skipped for whitelisted relay - client-ip=192.0.33.71; helo=pechora1.lax.icann.org; envelope-from=doug@ewellic.org; receiver=ietf-languages@alvestrand.no
Received: from pechora1.lax.icann.org (pechora1.icann.org [192.0.33.71]) by mork.alvestrand.no (Postfix) with ESMTPS id 3BBB27C39D5 for <ietf-languages@alvestrand.no>; Thu, 31 Oct 2019 23:18:16 +0100 (CET)
Received: from p3plwbeout03-02.prod.phx3.secureserver.net (p3plsmtp03-02-2.prod.phx3.secureserver.net [72.167.218.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pechora1.lax.icann.org (Postfix) with ESMTPS id E5AD21E04C8 for <ietf-languages@iana.org>; Thu, 31 Oct 2019 22:18:11 +0000 (UTC)
Received: from p3plgemwbe03-01.prod.phx3.secureserver.net ([72.167.218.129]) by :WBEOUT: with SMTP id QIkriuRE0nKJcQIkriAn5A; Thu, 31 Oct 2019 15:17:21 -0700
X-SID: QIkriuRE0nKJc
Received: (qmail 136923 invoked by uid 99); 31 Oct 2019 22:17:21 -0000
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"
X-Originating-IP: 208.51.143.189
User-Agent: Workspace Webmail 6.10.4
Message-Id: <20191031151719.665a7a7059d7ee80bb4d670165c8327d.38c22bee93.wbe@email03.godaddy.com>
From: "Doug Ewell" <doug@ewellic.org>
To: "=?UTF-8?Q?=C3=89lie=5FRoux?=" <elie.roux@telecom-bretagne.eu>, "IETF Languages Discussion" <ietf-languages@iana.org>
Cc: "Chris Tomlinson" <chris.j.tomlinson@gmail.com>
Date: Thu, 31 Oct 2019 15:17:19 -0700
Mime-Version: 1.0
X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.6.2 (pechora1.lax.icann.org [192.0.33.71]); Thu, 31 Oct 2019 22:18:11 +0000 (UTC)
X-CMAE-Envelope: MS4wfA3LKD+nQSCN4QekMYUAYGqCrUGkP/LBezfrv74ofB8J9673iB5juvuJybm5KkGNkIZu7rWtX3s/zUpMXVH0x9CcnpqBmcF/yh48RwIbpYGi7JglwcXP P3VFWel/ZWlBO2/OAQE/N+YihgO5cMrJ3RdE5NmbZkg+w14EQa2rRpi8LV+twizQab1ikh5qoZKPj/yx+zrTPHWCRXWliwVRthQ=
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/bHx_J6k2BivO6ruYyMXTg_OOPkY>
Subject: Re: [Ietf-languages] Khmer orthographic reform
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Oct 2019 22:18:23 -0000

Élie Roux wrote:

> I have some data in Khmer that I need to use two tags for:
> - one for Khmer as written before the orthographic reforms of the XXth
> c.
> - one for Khmer written according to said reforms

This is a good starting point. At least you have established that the
works need to be tagged distinctly (as opposed to "we need to
distinguish these varieties," which is not always a reason for tagging).

It may not be clear that two subtags are needed, if the current
orthography is overwhelmingly dominant and can be assumed, but that
discussion can come later.

> The reform is not exactly clear cut, unfortunately. Let me see if I
> can find any sources.

You'll certainly want to find a reference that explains and describes
the reform. The ones Richard supplied may help. It's not mandatory that
all contemporary writers switched to the new orthography simultaneously
or uniformly; that certainly wasn't the case for, say, '1606nict'.

> What is IANA policy with regards to this kind of situation? Is it
> reasonable for me to propose a "-pre1966" or "-pre20c" subtag for the
> strings that use the old orthography?

It's not "IANA policy," but rather ietf-languages practice. IANA merely
serves as the repository for the Registry.

It's certainly reasonable to request one or two variant subtags (see
above). Check the available references.

The subtag values themselves probably won't be any sort of abbreviation
for "pre-20th century" or the like. Such a subtag could theoretically
apply to dozens or hundreds of languages, with a different meaning for
each; and although the Prefix field is supposed to suggest the languages
for which the subtag is considered suitable, concerns are usually voiced
that this is not sufficient to discourage inappropriate use.

--
Doug Ewell | Thornton, CO, US | ewellic.org