[Ietf-languages] Khmer orthographic reform

Élie Roux <elie.roux@telecom-bretagne.eu> Wed, 30 October 2019 11:05 UTC

Return-Path: <roux.elie@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id E83BF12010C for <ietf-languages@ietfa.amsl.com>; Wed, 30 Oct 2019 04:05:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.265
X-Spam-Level: *
X-Spam-Status: No, score=1.265 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, SPOOFED_FREEMAIL=1.999] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id URkfPmGefcem for <ietf-languages@ietfa.amsl.com>; Wed, 30 Oct 2019 04:05:00 -0700 (PDT)
Received: from mork.alvestrand.no (mork.alvestrand.no [IPv6:2001:700:1:2::117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BE7DD120104 for <ietf-languages@ietf.org>; Wed, 30 Oct 2019 04:04:59 -0700 (PDT)
Received: by mork.alvestrand.no (Postfix) id 5A91B7C356F; Wed, 30 Oct 2019 12:04:57 +0100 (CET)
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost []) by mork.alvestrand.no (Postfix) with ESMTP id 3F91A7C350A for <ietf-languages@alvestrand.no>; Wed, 30 Oct 2019 12:04:57 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at alvestrand.no
Received: from mork.alvestrand.no ([]) by localhost (mork.alvestrand.no []) (amavisd-new, port 10024) with ESMTP id Cf9jonJh2dKz for <ietf-languages@alvestrand.no>; Wed, 30 Oct 2019 12:04:49 +0100 (CET)
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
X-Comment: SPF skipped for whitelisted relay - client-ip=; helo=pechora1.lax.icann.org; envelope-from=roux.elie@gmail.com; receiver=ietf-languages@alvestrand.no
Received: from pechora1.lax.icann.org (pechora1.icann.org []) by mork.alvestrand.no (Postfix) with ESMTPS id 3EF5E7C3CFD for <ietf-languages@alvestrand.no>; Wed, 30 Oct 2019 12:00:54 +0100 (CET)
Received: from mail-io1-f47.google.com (mail-io1-f47.google.com []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pechora1.lax.icann.org (Postfix) with ESMTPS id 741011E022A for <ietf-languages@iana.org>; Wed, 30 Oct 2019 11:00:51 +0000 (UTC)
Received: by mail-io1-f47.google.com with SMTP id h9so2027379ioh.2 for <ietf-languages@iana.org>; Wed, 30 Oct 2019 04:00:51 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc :content-transfer-encoding; bh=26UpuoChapK0S6oqkFCJbHWH8cwh8BAqQsK5QQu7NuI=; b=Wkrwf5h/x9e+FlAwNmTkKRmPPMfyOeJ0lrPNqZ0g1uJI4D3KDN82yqSpmfsP3hpVLg PnkSZjd0rC/vVMDQLQ7bJ2nhQYLPXVo4GSM2LtD63EI2Q6ojG/Ig5xJGPbEOyDZGQAPp baFbTbVP3PRwUNkHxcy1qd7yfcFQhDZo53d07DVuxwhNDRkv8b/V8qW0kZWx1bTSi+yM wJDKVQreaqqbCMnNm7GoqtJ/fVcJZlb0BWC5BpR9mcXPYXa2cR+mvng32BnqSam3hFPG oBhXzzEk17/nNMdE1JBSx4BbbKPb1b3uYHmDLPO35SmRcdbotO4hSEY9iTOntEP5V4WH AWVQ==
X-Gm-Message-State: APjAAAWuUt7/Uva+x73yw2Mmmx3o6Y1Q+OHkh3BWrzhnDuzB7dxBCeLT xTNzkA2o69icKIFStwpDC468iZ3W2Bgwoz/N4jMQbeEG5D8=
X-Google-Smtp-Source: APXvYqw+3g05m6OZsKtaNtsMPzZCquvzNSUL5cMnT+1GQUrzTTKyYUd6v3TleZZyFFOxjm62Bj5M0as8iRI2+w88oEE=
X-Received: by 2002:a6b:fa03:: with SMTP id p3mr5438657ioh.44.1572433229257; Wed, 30 Oct 2019 04:00:29 -0700 (PDT)
MIME-Version: 1.0
From: =?UTF-8?Q?=C3=89lie_Roux?= <elie.roux@telecom-bretagne.eu>
Date: Wed, 30 Oct 2019 12:00:18 +0100
Message-ID: <CANfi1JjzqHuwZvVRE9-aiRWc2-ZkAacs0R-_asME9wXu4BDJMw@mail.gmail.com>
To: IETF Languages Discussion <ietf-languages@iana.org>
Cc: Chris Tomlinson <chris.j.tomlinson@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/z7qlXfUe17PqbtKoVLDOvnUVLo4>
X-Mailman-Approved-At: Wed, 30 Oct 2019 05:47:52 -0700
Subject: [Ietf-languages] Khmer orthographic reform
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Oct 2019 11:05:02 -0000

Dear all,

I have some data in Khmer that I need to use two tags for:
- one for Khmer as written before the orthographic reforms of the XXth c.
- one for Khmer written according to said reforms

The situation has been explained to me as follows:

The reform is not exactly clear cut, unfortunately. Let me see if I
can find any sources. We don't know much about pre-20th-century
reforms; this is an area that needs more research. But a reform
process had begun by the early 20th century, and it culminated in the
compilation of the first comprehensive Khmer-Khmer dictionary, under
the editorship of Juon Ṇāt (Chuon Nath) and others. The first edition
was published by the Institut Bouddhique in 1938. Although there were
some additional spelling reforms in the 1960s and 1980s, still most
users of the language today still refer to the fifth edition of the
Institut Bouddhique dictionary, printed in 1966.

Manuscripts produced after the 1920s or so may show reformed
orthographies, but old spelling systems remained in place in some
manuscript traditions for several more decades, particularly when
copying directly from older texts.

Books printed after the 1920s almost never have pre-reform orthography
(19th century books and dictionaries do, however). Many Cambodians
today have never encountered a pre-reform orthography text, though it
is not hard for them to learn to read it.

What is IANA policy with regards to this kind of situation? Is it
reasonable for me to propose a "-pre1966" or "-pre20c" subtag for the
strings that use the old orthography?