Re: [Ietf-languages] Between language and script in Burmese

Peter Constable <pgcon6@msn.com> Wed, 17 November 2021 20:03 UTC

Return-Path: <pgcon6@msn.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2CB9F3A040B for <ietf-languages@ietfa.amsl.com>; Wed, 17 Nov 2021 12:03:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.85
X-Spam-Level:
X-Spam-Status: No, score=-1.85 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=msn.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hCth4G-ig9R1 for <ietf-languages@ietfa.amsl.com>; Wed, 17 Nov 2021 12:03:47 -0800 (PST)
Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12olkn2093.outbound.protection.outlook.com [40.92.21.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C42803A0404 for <ietf-languages@ietf.org>; Wed, 17 Nov 2021 12:03:47 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JW45psXBUh/xQ0ktiBGfG+XbKrXEtFGT6GjA580AmecYaxKLdGrwnm/Trd3jCX5I+FFcZ0q2YpTZHQIeSFXZdPCjZf1lSEvLIECfI4/ofYi4HBSDWafUZGsMwPdp8I3J+YZudyzTz/ne7V7Qt7XJkaQ/Dg0qul3OsJwyhczGhDUzqF0Pb/gOdwXBeVg7omttgrgO5UJzg4Xvf7+WpJO5p4gIJQCKGF4zxwaUOXHTKjjmsrGA+G0b8mHrAk1ghWK0IoXTVrnAkuCG3HbltIVKMuDh1ZOMWMhb7LfdPob8LKOi5riNVxBgL/O8+7eOEB6piB0fRGlrfLiQydo4ZjBt2w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Lvaco25o8RLIds6PPcREu6ulKdH3kH2Oshx39OaOnXc=; b=ROGnU5ALfSwIxn9vgs2oHNCJ3M20EG1Qk/lKX5ejZGl26Jb7UhgqWTtFMfPsk8J5eAEWQcgDJoLFGHLlz70J2TaofFjZUHeDWvkDrbsBeJfKYh4UW3gTOntqU+RYqplxDhI2EPzvmev/CQ8G1KxZU24u0980uA3tXCsI7pI1kwauF9+JiM8fFGlNgdTiikWjvOV/wXiLQNyQod6b3cW4MYYJgMnagIsHC3XSB2/IdAwSaa4WPkBd9uOszebkVXWMRbgJuBkqU3MKCZu3bi33iJetLmOtvDZWJbYm+YLRMtx5NRxDX2LZbHjW58/kNbQYkYFXwNOZWY82bwLmj6PExQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=msn.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Lvaco25o8RLIds6PPcREu6ulKdH3kH2Oshx39OaOnXc=; b=n9mX86ikPSj4bjh8QJZs23fCUPHsb5ylEnn74PfFoXQnJo3zth3uA7Vy/OcOzKWIltXblUunDObqhZGNUY95HFjiMh8bklHaaobiAnkS3tPIvdfr68Q5a8wQZvpdFGyG8su13M/cS1WJkU6U6w8B848ShgmHSYXUltLdgtIT/VJ5+uznbJX+2UUfIuFX+M5eUSXBfzr0Dx6U8bYBQNIPwFRM474J23n0LmIt9eAGcdYiSuvMv2iVIiqLw3fVrUyQYoV5BLmastENajT4wZ4vtnOFBaKY0vohM2xMrZq4aHF40iGIFGmI8JD6cQPJT9Qi0gPC/f6vv5ieUnPlLRZZ+A==
Received: from MWHPR1301MB2112.namprd13.prod.outlook.com (2603:10b6:301:36::19) by MWHPR13MB1199.namprd13.prod.outlook.com (2603:10b6:300:11::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4713.15; Wed, 17 Nov 2021 20:03:46 +0000
Received: from MWHPR1301MB2112.namprd13.prod.outlook.com ([fe80::a4f0:7d94:3a62:c526]) by MWHPR1301MB2112.namprd13.prod.outlook.com ([fe80::a4f0:7d94:3a62:c526%6]) with mapi id 15.20.4649.014; Wed, 17 Nov 2021 20:03:46 +0000
From: Peter Constable <pgcon6@msn.com>
To: Simon Cozens <simon@simon-cozens.org>, "ietf-languages@ietf.org" <ietf-languages@ietf.org>
Thread-Topic: [Ietf-languages] Between language and script in Burmese
Thread-Index: AQHX25R67ZAsZ1yi40aM+3L2DXhxdKwIIQTg
Date: Wed, 17 Nov 2021 20:03:45 +0000
Message-ID: <MWHPR1301MB2112CAE05F699F78DFA28489869A9@MWHPR1301MB2112.namprd13.prod.outlook.com>
References: <6690448e-380c-e7a7-9d0a-320066e20eae@simon-cozens.org>
In-Reply-To: <6690448e-380c-e7a7-9d0a-320066e20eae@simon-cozens.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_ActionId=0387440c-6d84-4012-b92b-064da4ed4950; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_ContentBits=0; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Enabled=true; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Method=Standard; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Name=Internal; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SetDate=2021-11-17T19:48:26Z; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SiteId=72f988bf-86f1-41af-91ab-2d7cd011db47;
x-tmn: [6IRGmLDM12yQcxlCJntTPbjv0TtDQqn7KzQoMKctI5Ton8Ro6KDsBw8c8837lPMb]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 9385aac4-9d64-4f02-ab1d-08d9aa055cfc
x-ms-traffictypediagnostic: MWHPR13MB1199:
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: tbovdf1LVoISszOlYLBCErXzwH5WhZiSx5fFKls4M3nN4i6KuBzNWwJZ28ekEgdvvqVZwa2ZawYo1qePvnuRPWHTiNtAcZVXLmUyrajPxNZNAs3B4sU8suOc/vf7NaG2i2vT1wbxCJSyB56KxHZW0E/g9weYTNrYY5lolkzvwPE55uaQfueYDlyY1YEzC6eBMIcgQDrctemGXQgeYbQzewa1fbhZMZnkc/rEIrI+zCGX1fgBiKXpTgtRlKq3WO3oUdJlYPUJgdN5ecv2mCbzYn0/TZEwVOIky8br9mYIf9NbEkgFv30PnH/MtO6FZmql+zOwhl/5yOgC7PdKGm7rwAXxgXQQ0JSHkNKrcLHJH88AGQPoSgfevsY5obrlDH5+uKxR83gYd8b7zjjzMZYgkLaLxrbiZX0zVnzXdWDeIeW6fVt3ZzYmfjzKwS+vOQ/LCfusPVwvua467jwPT6J+57uac4Up1kuA5nPWZ+S7XJpT8FCjdPoFLimYIoIloCg7OvSGycLZ/FzgUH8t3PmkOXKIPsM20Lxn/bAw6RpSFh9b+0hMhUpJFKy2oNgysfLORkeLUF/gT0jdqpj1140pXQ==
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: oP3r/CdjsZ4tCY8ERln2jG1AnQ1rPidFxPMezewH1b8Kk6L4bjI7Oqr47AEyUilIs/iBMwEOMh+Qf7MBQv5zW2lX80wPbE5MpslrIYj3amkblbASLqVP1dNeB8cVLF8COJTsi+0n8a2b1ppPYNS2AQnlb9/2gjfWSEDNmxmNo6bnnsWtIKC4MDf4/3JiJW4jXcSPGdm7QaVTKtXbSuYyVIAL/IHexT5v0yN0xblqIrJAxGX8EV3QhoZoz1DLCkMSKHWWFZa7xPnEV5KLgzRIAVGrAnZsBLHVKeAk/4gfaUyGhS1B6TsH1kVTT5Fuc+/hcvOxTFZktf7GwkiH84JPCyB4gnjdJOGHbUeqg54l0g5XX2YbK4SpGXEPrM8g71pOlMgIA+0fBbBzq9Znazhx7+4LXxoh7QA4jObRRBrtsfW90xLk6ct6gRvyE93gcHxbV/it7zKNr8dHkuy3xtV5S+/VI69cxqHhRm9+NRwJHFVJYR/Fw5gL1DWzMt3lNVNZ/QjdOwhDd0snDjX3l0xuBs1j8FeV4cquaySb6c1x7I6mTBZVDx2DaCVwg9s3dpPvJqa+zMbehOB+Ry1LE0GXyR02DDaqZPWvFzpHqY7QGOGbjuvgF1hy6Jha9jliIogAYbMRiRk41+5XXreyuPeWjG+xoDrlxD/WwCEX5u+n2U5KkcWiiXtkQb5a/EF6q+OT8Y1q7bumn2B2D2VRGvt5HLqlwnbJcYpthYi5/ayXKyBltAdz5mMNZS8JAZa6B4Eq
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: sct-15-20-3174-8-msonline-outlook-32ef5.templateTenant
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: MWHPR1301MB2112.namprd13.prod.outlook.com
X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000
X-MS-Exchange-CrossTenant-Network-Message-Id: 9385aac4-9d64-4f02-ab1d-08d9aa055cfc
X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Nov 2021 20:03:46.0227 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR13MB1199
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/mQ_ih3wHG6M2cEsCWdcdOJOAS5Q>
Subject: Re: [Ietf-languages] Between language and script in Burmese
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Nov 2021 20:03:52 -0000

Hi, Simon

You've touched on the easier cases that can be supported now in BCP 47. For Mon, you suggested mnw-TH could be used, though I would also consider mnw-Thai versus mnw-Mymr: if the only or main distinction to be made in content is the script used, then I'd use the script ID, not the region subtags. But if there are dialect differences between what is used in eastern Myanmar versus western Thailand, then that would be reason to use the region subtags.

For Pali written in Myanmar script, just that much information should be tagged using pl-Mymr, and that tag alone would convey _nothing_ about orthographic or typographic variants. (Though for some language tags a variant that is much more frequently used than others gets tagged as an unmarked case, without any additional subtag to reflect the distinction. Cf. suppress-script, for example.) If there are finer distinctions to be made-certainly for orthographic differences-then additional subtags would be needed. If orthographic distinctions correlate closely with region differences, then region subtags could be used to capture that. But for the cases you mention, region is not a good correlate. For those cases, variant subtags would be needed.

Of course, mapping from language tags to OpenType Layout language system tags to implement typographic distinctions is a related but separate matter.


Peter


-----Original Message-----
From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Simon Cozens
Sent: Wednesday, November 17, 2021 1:20 AM
To: ietf-languages@ietf.org
Subject: [Ietf-languages] Between language and script in Burmese

Hello!

I've been working on a system font which covers a number of minority languages and scripts of Burma, some of which are not currently addressable because they lack IETF (and OpenType) script/language tags, or where the correct tag combination is not obvious.

The Burmese script has many language-specific and context-specific variant forms (see UTN11 -
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.unicode.org%2Fnotes%2Ftn11%2FUTN11_4.pdf&amp;data=04%7C01%7C%7C3be526f409fd4088a03208d9a9ab9c8f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637727376793324395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=k%2BeqHiUinoNECnQao0SiMRDlRhdZ%2B8KMAwRvtEPscoM%3D&amp;reserved=0 - for examples), and the boundary between script and language is not always obvious. Some of these differences in letterforms are encoded separately in Unicode and some of them as allographs. It's all a bit of mess.

The easy problem we have is the Thai Mon language. This is a variant of the Mon language used by Mon people in Thailand. It has its own distinct script tradition. 
(https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.unicode.org%2FL2%2FL2020%2F20163-arakanese-mon.pdf&amp;data=04%7C01%7C%7C3be526f409fd4088a03208d9a9ab9c8f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637727376793324395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=PQYVW4W9Zwu6Vk2EfJSpR4snPN4P8bJQYwN8PclcHyc%3D&amp;reserved=0) There's no distinct language subtag but I believe mnw-TH is enough to distinguish this language - although we may have to pull some OpenType strings to enable that distinction to select Thai Mon specific orthographic forms.

The hard problem we have is that some of these language-specific variant orthographies are used to write text in *other* languages. In that sense, they are essentially functioning as *different scripts* to standard Myanmar.

For example: a document written in the Shan language using the Shan variant orthography of Burmese is clearly shn-Mymr, and setting the Shan language in a document should be enough to activate the Shan variant forms. No problems here. And a document written in Pali using the standard Burmese orthography is obviously pl-Mymr, and because it's standard Burmese, a Burmese font doesn't need to do any magic to get the right glyphs.

But what is a document written in the Pali language using the Shan (or Khamti, or Mon) orthography? Do we need variant tags to distinguish the flavour of Burmese script used in these cases? Shouldn't Shan, Khamti and Mon actually be separate scripts? And if not, how on earth are we going to get browsers to choose the Shan forms for this document, without pretending that it's actually written in the Shan language?

Any advice would be helpful!

Thanks,
Simon

_______________________________________________
Ietf-languages mailing list
Ietf-languages@ietf.org
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fietf-languages&amp;data=04%7C01%7C%7C3be526f409fd4088a03208d9a9ab9c8f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637727376793324395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=TUhhHlRbKHXmwTcwJLCzdNap1S%2F0G4VbUhxV2%2BQinXw%3D&amp;reserved=0