Re: [Ietf-languages] Forms for subtag kmpre20c

Richard Wordingham <> Sun, 01 December 2019 18:50 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 43C2A1200F8 for <>; Sun, 1 Dec 2019 10:50:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id ZLuuwCvFu-Hh for <>; Sun, 1 Dec 2019 10:50:47 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 1E09C120059 for <>; Sun, 1 Dec 2019 10:50:46 -0800 (PST)
Received: from JRWUBU2 ([]) by cmsmtp with ESMTP id bUIuinLXIrx5AbUIui7FSG; Sun, 01 Dec 2019 18:50:44 +0000
X-Originating-IP: []
X-Spam: 0
X-Authority: v=2.3 cv=Te64SyYh c=1 sm=1 tr=0 a=yrOAJgItaIMndimPI+pDLQ==:117 a=yrOAJgItaIMndimPI+pDLQ==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=IkcTkHD0fZMA:10 a=aGu4qCp4AAAA:8 a=zvG38HLoAAAA:8 a=RuN2dAaalgdpDPSq6roA:9 a=QEXdDO2ut3YA:10 a=OgNdeCq1iSRX4M99ZXWK:22 a=ML_c0i-jAa8Hay5_2-6v:22
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=meg.feb2017; t=1575226244; bh=mdRPoXw+X396VPooGzpfGch8AjtI9P0XliBAJ9eWIoY=; h=Date:From:To:Subject:In-Reply-To:References; b=HYZo3jjMPmugea+4YnzmBFVPpsejukfIP/qPMPzMZUGKeKYucYY+/KpYW6k1b9tKY a71MDwFdQ8TPf4zzD1bFB1G/udGEMK/sY0P2MYNRjoMEvbINFOfFnKd/3g058IyBmV PKnbzZMEsI6NdiIp/lTlIzyvNC0FgMZTrUo8zdMUhGDjpNFqqQJEwsh2CPppO2dbs7 wiTwwb+VLFN2bLvnp/DKbQ0EAsFFbhILN5iB/2QgpInJDC7bbGR+JLpb4n1wMsvavH +BeSpqub4D0nXpMc58Jjpp/hTb5gAktsNTzI6B7UrFEO0Kog4wQMR5iy0YX3Mzs2y1 QlGi8BP5HjMgw==
Date: Sun, 1 Dec 2019 18:50:39 +0000
From: Richard Wordingham <>
Message-ID: <20191201185039.0ec4bf53@JRWUBU2>
In-Reply-To: <>
References: <> <> <> <> <000501d5a31c$cb6f52e0$624df8a0$> <> <>
X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; i686-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-CMAE-Envelope: MS4wfBlKQ+RcLvYEQoroznqzupxiPuALUCPDrmmxkekymrDRyfnMGvKit7Z7ej33eF0+uJKW+Jy/r+TnRG+ZFyGshHsuUVvWCHf+MlOzf5Dbu/ZpZEXrBtFE toAze3Um694A2E3xmD3cCFinU+yPojgswDOuFbCtqjuQ3ZlcjM0boa6D
Archived-At: <>
Subject: Re: [Ietf-languages] Forms for subtag kmpre20c
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 01 Dec 2019 18:50:51 -0000

On Sun, 1 Dec 2019 14:14:01 +0000
Michael Everson <> wrote:

> > I am still not happy with it. How many centuries of orthography is
> > this tag supposed to support?  
> Whatever is in Khmer script that's written in an old spelling. How
> does it matter?

You are not the only consumer of subtagging.  For example, etymologies
on the English Wiktionary often tag words as 'Old Khmer'. 

> > What process will be able to do anything with it?  
> The database I'm needing subtags for is mainly bibliographical, it has
> the title for each text in two flavors: the original (old) spelling as
> appearing on the manuscript, and the equivalent in modern spelling
> (Chuon Nath style).

So why does that need a tag?  What of a spelling that used the glottal
stop latter ('QA') when Chuon Nath prescribes a vowel-specific vowel
letter?  Even though the original spelling is in a way more modern
('before its time'), would not that have two spellings recorded for it?

Is this database restricted to manuscripts?

OT: How do you handle glyph differences that are close to being
character differences?  One example, if I understand Antelme, is that
from a lumper's perspective U+17BE used to look like *<U+17C1 E,
U+17B7 I>, but this proscribed combination contrasts with U+17BE in
Thai (perhaps in a wider sense).

> > And again, what are the reforms? What are their dates?  

I hope this is for identification purposes.  Several systems seem to
have been concurrent.

> The reform process is long and complex. I have put the reference to
> this article describing it several times:
> I'm not sure what else I can do... Should I copy paste the article
> content into an email? Here's a short summary:

> 1915: establishment of the committee for editing a Khmer dictionary,
> start of the debates between phonetic vs. etymological spellings
> 1926: establishment of a second committee led by Chuon Nath, using
> mostly etymological spellings
> 1920s: printeries using mostly reformed orthography, as it was largely
> under the control of those who favored reform (including the French
> and reformist monks at the Institut Bouddhique), whereas manuscripts
> were generally produced by traditional scribes and scholars and used
> non-reformed orthography
> 1938: first edition of the Dictionnaire Cambodgien by Chuon Nath
> 1967: 5th and final edition of the Dictionnaire Cambodgien
> 1967-1974: Khmer becomes main language in education
> 1972: reform by Loch Phlaeng and the Khmerization movement (more based
> on phonetic, less letters, less diphtongues), used officially from
> 1985 to 2009
> 2009: official use reverts to Chuon Nath's Dictionnaire Cambodgien
> So I suppose you could arbitrarily pick 1967 and 1972 as dates for the
> two reforms, but it's not clearcut at all.

Your account hasn't made clear when the use of CVC orthographic
syllables was largely abandoned.  That's not a phonetic v. etymological
difference.  Are we looking at at least three distinguishable writing

> > Our tags generally point TO a reference, and don’t specify
> > themselves by relation to what they are NOT.  
> Well, I guess I'll keep this in a private subtag then. There's no
> homogeneity in the pre-reform Khmer spelling. There is no tag that
> could be defined to point TO it, because it doesn't exist as a
> homogeneous concept that can be agreed upon.

That doesn't stop manuscripts being classified as 'Middle English',
despite its extreme lack of homogeneity.

> For what
> it's worth, I also have a lot of Pāli written using old Khmer
> spelling.

OT: 'Pāli written using old Khmer spelling' is an interesting concept.
Would you care to educate me by elaborating?  I've just been surprised
by the form of some 150 year old printed Thai-script Pali.  It's
different to the two living orthographies supported on Wiktionary.