[Ietf-languages] Between language and script in Burmese

Simon Cozens <simon@simon-cozens.org> Wed, 17 November 2021 09:20 UTC

Return-Path: <simon@simon-cozens.org>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3DE883A0B44 for <ietf-languages@ietfa.amsl.com>; Wed, 17 Nov 2021 01:20:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=simon-cozens.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X5u_g0Eop4ov for <ietf-languages@ietfa.amsl.com>; Wed, 17 Nov 2021 01:20:11 -0800 (PST)
Received: from dealer.simon-cozens.org (dealer.simon-cozens.org [212.71.248.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 029CD3A0B40 for <ietf-languages@ietf.org>; Wed, 17 Nov 2021 01:20:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=simon-cozens.org; s=x; h=Subject:Content-Transfer-Encoding:Content-Type: From:To:MIME-Version:Date:Message-ID:Sender:Reply-To:Cc:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=Bfd+eykqypQLD580vWr4SPt/7UVraxJiiZ3JZY1/x88=; b=l/S2N8Ni8N/Dcld4dcBeWJES5q PmRVKURa+R3Q1/DqLGsyFXnkWvPpppUafbCMtX7BCdpy+023kJGJTycein2iZubT8y/he9p3bkWsW Hkp7IPa/pqkrt3ScgX6HWPUXBxyTB76AuBEy/aQjKYWqNee/c0Bq0Nzv+ek8PxJ7Src0=;
Received: from [77.100.132.4] (helo=[192.168.1.103]) by dealer.simon-cozens.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <simon@simon-cozens.org>) id 1mnH6r-00035i-Vl for ietf-languages@ietf.org; Wed, 17 Nov 2021 09:20:08 +0000
Message-ID: <6690448e-380c-e7a7-9d0a-320066e20eae@simon-cozens.org>
Date: Wed, 17 Nov 2021 09:20:05 +0000
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.3.1
To: ietf-languages@ietf.org
From: Simon Cozens <simon@simon-cozens.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-SA-Exim-Connect-IP: 77.100.132.4
X-SA-Exim-Mail-From: simon@simon-cozens.org
X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000)
X-SA-Exim-Scanned: No (on dealer.simon-cozens.org); Unknown failure
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/iZ8jb3d1PpNbfWSIQSDcCZHYzgw>
Subject: [Ietf-languages] Between language and script in Burmese
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Nov 2021 09:21:12 -0000

Hello!

I've been working on a system font which covers a number of minority 
languages and scripts of Burma, some of which are not currently 
addressable because they lack IETF (and OpenType) script/language tags, 
or where the correct tag combination is not obvious.

The Burmese script has many language-specific and context-specific 
variant forms (see UTN11 - 
https://www.unicode.org/notes/tn11/UTN11_4.pdf - for examples), and the 
boundary between script and language is not always obvious. Some of 
these differences in letterforms are encoded separately in Unicode and 
some of them as allographs. It's all a bit of mess.

The easy problem we have is the Thai Mon language. This is a variant of 
the Mon language used by Mon people in Thailand. It has its own distinct 
script tradition. 
(https://www.unicode.org/L2/L2020/20163-arakanese-mon.pdf) There's no 
distinct language subtag but I believe mnw-TH is enough to distinguish 
this language - although we may have to pull some OpenType strings to 
enable that distinction to select Thai Mon specific orthographic forms.

The hard problem we have is that some of these language-specific variant 
orthographies are used to write text in *other* languages. In that 
sense, they are essentially functioning as *different scripts* to 
standard Myanmar.

For example: a document written in the Shan language using the Shan 
variant orthography of Burmese is clearly shn-Mymr, and setting the Shan 
language in a document should be enough to activate the Shan variant 
forms. No problems here. And a document written in Pali using the 
standard Burmese orthography is obviously pl-Mymr, and because it's 
standard Burmese, a Burmese font doesn't need to do any magic to get the 
right glyphs.

But what is a document written in the Pali language using the Shan (or 
Khamti, or Mon) orthography? Do we need variant tags to distinguish the 
flavour of Burmese script used in these cases? Shouldn't Shan, Khamti 
and Mon actually be separate scripts? And if not, how on earth are we 
going to get browsers to choose the Shan forms for this document, 
without pretending that it's actually written in the Shan language?

Any advice would be helpful!

Thanks,
Simon