Re: [Ietf-languages] Forms for subtag kmpre20c

Richard Wordingham <richard.wordingham@ntlworld.com> Mon, 02 December 2019 14:12 UTC

Date: Mon, 02 Dec 2019 14:12:37 +0000
From: Richard Wordingham <richard.wordingham@ntlworld.com>
Cc: ietf-languages@ietf.org, Trent Walker <trent.thomas.walker@gmail.com>
Message-ID: <20191202141237.1724fc7c@JRWUBU2>
In-Reply-To: <CANfi1Jgz65Ohw8FXSBaLzVx=9Xi_GpMAeL+BnKUoRgG=ij7txQ@mail.gmail.com>
References: <20191121141336.665a7a7059d7ee80bb4d670165c8327d.9a3859061b.wbe@email03.godaddy.com> <CANfi1JjyouJV-CLXdKOwvRxcFPM0csTe8=+44hszSBhVTxd-qA@mail.gmail.com> <CANfi1JjeSo2-Ez52Nu3Lcb3jC9skPp2_YWza8Xnusu0Xi8vHuA@mail.gmail.com> <CANfi1JgVZ=rc1s=ELHoS=tv9HkwuzNCP0PUAZbjXWfWX0UtEXQ@mail.gmail.com> <000501d5a31c$cb6f52e0$624df8a0$@ewellic.org> <7AAF56F5-A51D-45B2-9400-86FB94625A06@gmail.com> <00C5B42F-0871-4A9D-913A-EABAF0344F68@evertype.com> <20191201185039.0ec4bf53@JRWUBU2> <CANfi1Jgz65Ohw8FXSBaLzVx=9Xi_GpMAeL+BnKUoRgG=ij7txQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/xwbfeQiLqkakUzKWhg4RGGnm76M>
Subject: Re: [Ietf-languages] Forms for subtag kmpre20c
Precedence: list

On Sun, 1 Dec 2019 20:57:58 +0100
Élie Roux <elie.roux@telecom-bretagne.eu> wrote:

> > > Whatever is in Khmer script that's written in an old spelling. How
> > > does it matter?  
> >
> > You are not the only consumer of subtagging.  For example,
> > etymologies on the English Wiktionary often tag words as 'Old
> > Khmer'.  
> 
> Hmm sure, but in my mind this is a different issue: if the -kmpre20c
> were to exist, I can't really see why it couldn't be applied to
> anything written in Khmer script using non-reformed spelling: Old,
> middle, modern Khmer, all sorts of dialects, Pali, Sanskrit, Chinese,
> etc. That's what I meant, I'm not sure how you interpreted my point
> though?

If that's truly the case, the proper tag is und-Khmr.  You then hit the
problem that language tagging doesn't handle exclusions.  At least,
Michael Everson said it doesn't and I have no reason to disbelieve
him.  It also makes sense to me as a policy.

> > So why does that need a tag?  
> 
> To change the Lucene analyzer and for the UI (for each string in our
> database, we have a tag, and there's a little popup saying what
> language is the string). Note that we're dealing mostly with
> low-resource languages, and sometimes non-standardized scripts and
> transliterations; so we had to come up with a very lengthy list of
> private lang tags:
> https://github.com/buda-base/owl-schema/blob/master/lang-tags.md and
> some conventions to refer to the scripts (and script flavors) that are
> not in Unicode:
> https://github.com/buda-base/owl-schema/blob/master/types/scripts.ttl
> (we didn't include the SEA scripts yet, but there are quite a few).

And this immediately undermines the previous generality, as it includes
things like pi-Khmr.

> Is this database restricted to manuscripts?  
> 
> Currently yes, in the future we could imagine cataloging modern
> publications.

What about printed Khmer-script missionary texts from 1893 printed in
Hong Kong?

> Several in the XXth century, and before that there doesn't seem to be
> any study; so I'm not even sure we can confidently speak of concurrent
> systems, it might have been just very wild.

> > That doesn't stop manuscripts being classified as 'Middle English',
> > despite its extreme lack of homogeneity.  
> 
> Well, I really have no clue about the evolution of the Khmer language,
> in my mind the tag was only about the evolution of spelling
> conventions: the database we have has titles in old spelling + new
> spelling, it's just a mapping of characters and not a change of
> language (if I understand correctly). So in my mind this is a
> different situation from the Middle English / English situation where
> the language itself is different, not just the spelling.

Most living languages are continually changing, though I can't comment
on how hard it is to apply a knowledge of Modern khmer to understand
Old Khmer and Middle Khmer.  Some sources talk of a change of script,
though what that entails I don't know.  Certainly vowel symbols have
been added (apparently based on Thai additions), and the distinction
between Indic b/v has been reinstated.  Wikipedia starts Modern Khmer
in the later 18th century.

The fact that Middle English as a whole certainly lacked a standard
orthography is no bar to its being classified as a language.  The lack
of standards is therefore not of itself a bar to adding variants for
Old Khmer (if you truly have such materials), Middle Khmer and I
suggest is not necessarily a bar to 'pre 20th century' Modern Khmer.
It might be necessary to provide evidence of a time depth to the
variations - in which case we should try to get the experts involved. A
lack of agreement on classifying substantial documents could be, though
one should expect borderline documents.  How far back the BCP 47
language 'km' should go is another matter, probably not within IETF
jurisdiction.

> > OT: 'Pāli written using old Khmer spelling' is an interesting
> > concept. Would you care to educate me by elaborating?  
> 
> Hmm, I would love to do so but I'm not sure I'm the best person... I
> can refer you to http://www.trentwalker.org/unfoldingbuddhism which
> has some most excellent scholarship and scans of Leporellos, some of
> which contain Pali using the old spelling.
> 
> >  I've just been surprised
> > by the form of some 150 year old printed Thai-script Pali.  It's
> > different to the two living orthographies supported on Wiktionary.  
> 
> Hmm, that could be an entry point to an interesting discussions
> (perhaps to be had outside the list), can you tell me what you're
> referring to?

I was chasing up the references in
https://en.wikipedia.org/wiki/Talk:Thai_script#Sanskrit_and_Pali_Orthographies ,
in particular to the book "Pali Phonetic Edition" "พระไตรปิฎกสัชฌายะ
ฉบับเสียงอ่านปาฬิ", available at
http://plcthinktank.com/e-Book/Resource/0000005319.pdf.  The Wikipedia
article currently only documents the academic, abugida writing of Pali
in the Thai script, with a very misleading reference to the
'alphabetical' writing found in most books of chants. The books shows a
third scheme in use in a 19th century publication.  The third scheme is
primarily also an abugida, but has (encoded) vowel killers (two!)
different to the one used as a vowel killer nowadays, and marks
short /a/ in closed syllables with mai hanakat in much the same way as
the corresponding symbol Khmer samyok sannya is used. Michell's
"Siamese-English dictionary" (1892) noted that Siamese gramarians
regarded this as a vowel symbol.  The third scheme seems to have a
name, 'karayut', so it seems almost ready for registration.  However, I
haven't found names for the modern abugidic and alphabetic systems of
writing Pali.  One practical application would be to distinguish the
variants in spelling dictionaries.  It's the modern, unnamed varieties
for which spelling dictionaries would have most use.

Richard.

[Ietf-languages] Forms for subtag kmpre20c Doug Ewell
Re: [Ietf-languages] Forms for subtag kmpre20c Michael Everson
Re: [Ietf-languages] Forms for subtag kmpre20c Michael Everson
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Michael Everson
Re: [Ietf-languages] Forms for subtag kmpre20c Doug Ewell
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Doug Ewell
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Michael Everson
Re: [Ietf-languages] Forms for subtag kmpre20c Michael Everson
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Michael Everson
Re: [Ietf-languages] Forms for subtag kmpre20c Richard Wordingham
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Richard Wordingham
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Richard Wordingham
Re: [Ietf-languages] Forms for subtag kmpre20c Doug Ewell
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c drude
Re: [Ietf-languages] Forms for subtag kmpre20c Élie Roux
Re: [Ietf-languages] Forms for subtag kmpre20c Richard Wordingham
Re: [Ietf-languages] Forms for subtag kmpre20c Doug Ewell