Re: [Ietf-languages] Northern Thai Variants

Richard Wordingham <richard.wordingham@ntlworld.com> Thu, 10 January 2019 22:26 UTC

Return-Path: <richard.wordingham@ntlworld.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1E42913128F for <ietf-languages@ietfa.amsl.com>; Thu, 10 Jan 2019 14:26:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.142
X-Spam-Level:
X-Spam-Status: No, score=-2.142 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.142, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ntlworld.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eys9uNJHwDWK for <ietf-languages@ietfa.amsl.com>; Thu, 10 Jan 2019 14:25:57 -0800 (PST)
Received: from know-smtprelay-omc-2.server.virginmedia.net (know-smtprelay-omc-2.server.virginmedia.net [80.0.253.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5575F131287 for <ietf-languages@ietf.org>; Thu, 10 Jan 2019 14:25:56 -0800 (PST)
Received: from JRWUBU2 ([82.4.11.47]) by cmsmtp with ESMTP id hilugmKNHVQj4hilugbfHb; Thu, 10 Jan 2019 22:25:54 +0000
X-Originating-IP: [82.4.11.47]
X-Authenticated-User:
X-Spam: 0
X-Authority: v=2.3 cv=fJUXI6Se c=1 sm=1 tr=0 a=yrOAJgItaIMndimPI+pDLQ==:117 a=yrOAJgItaIMndimPI+pDLQ==:17 a=IkcTkHD0fZMA:10 a=2tsvuTQuAAAA:8 a=48vgC7mUAAAA:8 a=n5msMdCoNDvPKHiKDUsA:9 a=QEXdDO2ut3YA:10 a=w1QI8THEI4iyJQ0oNEIE:22 a=w1C3t2QeGrPiZgrLijVG:22
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1547159154; bh=ediRttFBQyPfPy+hDVGniAoRr8xofDRgt+3jmwvFBlQ=; h=Date:From:To:Subject:In-Reply-To:References; b=zEr7Frt4ooMwQT6MRBkam6nPBqkREigVaT6QbQYgnOIuKp+dmR1TYo9Bc/qYjNmDZ 9t1hfZZ4WAm2PJJCuj5JArpMLFfz/8+V0V+3c7ep835A0IwaqgalG+zznfxD7u+vk1 BtREP/ze4L+04GUDEPL6nIDqJX1yy+XAAhX7WgsHrYJDhF/weU30laBWhDW82tde7t b7flWzxduZULOLeaEK7HKI2VJozwUkVR3qnjkwA8rWXW+YBtLdIrdsyzL+Jw37WP65 bFPHGXEB0f93A32E+xb8/vHau0MgaZTYLGuwdYbXwn78x5EKlRCyVZgopVeZH1pkmr cyw3jLNxASECA==
Date: Thu, 10 Jan 2019 22:25:54 +0000
From: Richard Wordingham <richard.wordingham@ntlworld.com>
To: ietf-languages@ietf.org
Message-ID: <20190110222554.684b90a1@JRWUBU2>
In-Reply-To: <CAD2gp_T46TO++bSwM+7p7VktV0kp8keYEgoKh_=KVzWn9eWPYg@mail.gmail.com>
References: <20190110090859.23aec72a@JRWUBU2> <CAD2gp_T46TO++bSwM+7p7VktV0kp8keYEgoKh_=KVzWn9eWPYg@mail.gmail.com>
X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; i686-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-CMAE-Envelope: MS4wfCNN+50L11cGTYie3WE2Az8eEZDwu9HgKXy3/5NSxdNoy1lk9Ol/e7Z2I+jHMEdMF5jviHyLwYd+AxXf6gL6MKGCsV28ywotzTxuKaxJsCpcKiWcS4bq yUuJO5aeuupjcSAf9WMYoqHp8Ae2u1uwEdbEaerv9QOS4mBsjW/qKE6m
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/LXPrZqkfOBMERZCfoItgIXJ5pvw>
Subject: Re: [Ietf-languages] Northern Thai Variants
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Jan 2019 22:26:01 -0000

On Thu, 10 Jan 2019 10:57:18 -0500
John Cowan <cowan@ccil.org> wrote:

> On Thu, Jan 10, 2019 at 4:09 AM Richard Wordingham
> <richard.wordingham= 40ntlworld.com@dmarc.ietf.org> wrote:  
> 
> 1) Writing in the Tai Tham script (subtag Lana)
> > 2) Writing in the Thai script (subtag Thai) with consonants having
> > the same phonetic values as the corresponding Tai Tham consonants.
> > 3) Writing in the Thai script with stop consonants having
> > the same phonetic values as in Standard Thai.
> >
> > My idea for approaching this is to use the labels:
> >
> > nod-Lana
> > nod-Thai-etymo
> > nod-Thai-phonetic
> >  
> 
> The first is of course fine and needs no registration.
> 
> I am firmly opposed to the second and third: the names are far too
> generic, and people would use them for all sorts of things in all
> sorts of random ways.  We would begin to see "en-US-phonetic" for
> foe-NET-ik ree-SPELL-ingz, something which would have absolutely
> nothing to do with Northern Thai phonetic use of Thai script.  This
> is a principle we have fairly consistently stuck with: variant tags
> are meant to make sense only in specific contexts, with a very few
> exceptions like 'fonipa', since the IPA is inherently generic.

Would a Thai term like _thapsap_ (ทับสัพท์ 'transliterate, transcribe')
be open to the same objection?  It's the nearest I've seen to a name,
but I fear it's totally ambiguous between the two.

> If there are no names for these conventions in Northern Thai or
> Central Thai, try to come up with some.  Perhaps the names of the
> people who established or promoted these orthographies?  That's what
> we've done with sl-bohoric and sl-dajnko as well as be-tarask.  Dates
> can also be helpful, as in ru-luna1918, the Russian orthography
> published by the committee headed by A. A. Lunacharsky in 1918.

Orthography?!?  I was deliberately vague when I said 'varieties of
writing system'. I shall ask around, but I'm not confident I will get
useful answers.

> > The second is the possibility that Scheme 2, but not Scheme 3,
> > should be treated as a transform of Scheme 1, and therefore scheme
> > 2 should be nod-Lana-t-Thai, which would remove the need for
> > variants at this level.  

> I think that should only be applied to a document whose original was
> in Lana script and has been actually transliterated (by hand or
> machine, it doesn't matter) to another script.  It would be suitable
> for a Hebrew-letter version of the Syriac Bible (something I have
> seen done), but not for documents originally written in nod-Thai.

Well, there is at least some material, albeit probably mostly
translated, in 'nod-Thai-etymo' in the Wikipedia Northern Thai
incubator.

Neither nod-Thai-t-nod-Lana nor nod-Thai-t-und-Lana gives any clue as
to the conversion (or correspondence) method.  Looking at the Russian
to Latin script example in rfc6469, I don't see how one would
distinguish the results of transforms for English and French.  I will
ditch the idea of naming the writing systems as transformations.

> A more broadly scoped question: given the difficulties in reading
> Northern Thai, even in Thai script, at all, is it really necessary to
> mark this distinction clearly?  Are there spelling checkers, for
> instance, or is the barrier between the two variants high enough to
> make life difficult for readers of one to handle the other?  Is there
> any significant standardization (even de facto, like standard
> English) around either variant?

Surprisingly enough, printed Northern Thai seems easier to read in the
Lanna script than in the Thai script.  The Thai script does not mark
syllable boundaries as well as the Lanna script does.

Spelling checkers face one major hurdle - guessing word boundaries.  If
the user is prepared to mark word boundaries, then it is fairly simple
to use one on Firefox, and it is possible to use one on LibreOffice -
it will probably be much easier when I get round to using a SIL
packaging tool.  I've been using one for the Lanna script based on a
single dictionary.  Unfortunately, I haven't been able to start talking
about licensing the word list, so the dictionary's copyright stops me
sharing the spell-checker, or more precisely the Hunspell database. I'm
hoping there's no copyright in the vocabulary of a translation of the
New Testament.

For the Lanna script, a major use of spell-checkers is to ensure that
characters are in the right order - otherwise searching text will be
very difficult.  If the appearance of the text was the primary
concern, I would need different spell-checkers for different spelling
schemes.

For the Thai script, spell-checkers are less important - unless they
can be used to fight Thai line-breaking, which is a significant
annoyance for users of minority languages.  Unfortunately, victory
could be marred by searchers not ignoring WJ.

For spell-checking, the fewer unused words a spell-checker accepts, the
better.  Therefore, it will help to segregate spellings between the two
systems.  At least 12% of the vocabulary is different because words
starting with the voiced unaspirated Indic letters in the Lanna script
are different between to the two Thai script systems.  One quickly sees
which system one is reading; I suspect more rapidly than one detects
the difference between 'nod-Thai-etymo' and the standard Thai language.

One difference between the two Thai script spellings is the degree of
Sanskritisation of the spellings.  The 'nod-Thai-phonetic' is more
heavily Sanskritised, with 'dharma' being ธรรม (the usual Thai form) as
opposed to the Pali-derived ธัมม์, which is no longer allowed for Thai
by the Royal Institute dictionary, though it retains the longer form
ธัมมะ).

One problem with having a user interface in a mixed nod-Thai is that
the user would have to keep switching reading convention, which would be
likely to be confusing.  

Thank you for your comments.

Richard.