Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry

Peter Constable <pgcon6@msn.com> Thu, 13 August 2020 16:29 UTC

Return-Path: <pgcon6@msn.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A760A3A0E5A for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 09:29:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.839
X-Spam-Level:
X-Spam-Status: No, score=-1.839 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=msn.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kCLv1SC_QSQm for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 09:29:36 -0700 (PDT)
Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12olkn2030.outbound.protection.outlook.com [40.92.21.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 003203A0C3B for <ietf-languages@ietf.org>; Thu, 13 Aug 2020 09:29:35 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RNfaRb02EAIWM09Wrf9BIkmxVouM/lamjraakILWdgExxKD92CqYQb2/DQHOhD5Yh89Ss04+s1dkB/jAjAJTGW9S7Gmu57AHcPU4JxaQucSV43ic1QzGyynj8kupN9EhjJFmTz1fQlA3Hliuij275pTrdlPr3qo5LvpT2//yVqRF2J2Gy53t0OOxSqSqeABaEV9PQgFOc4ngCiUQ9HH2tkBMSRSpyn4quufwOfzC1E56UTZuQENztPvks00YGI+vzGy78hGkjbc6a2g/j/zJCha1SO6SuOcLHJX2/E/B1P65MXaIgmdjy4ko7/CkKPZMhhiWsNxOxgNdEG+KCOY6Xw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cR35d5CjNrgMXj4ANlYR2IskmRlSw5GFNDTo3uL7BZQ=; b=KjUF3CR/t3zwqbcl1LFRmavX5aaNujmzfX70M3DEoBs9mpGSs/+UAlyT8NU1TVgZVECwvGXwv4DGSa/eMQNgVZuzLz07e7fhNuOSSpFq63Q63xA6m8go8rY0C+C+q9lGlujtfzXWNrkjDvBMDcLWCe4q41jzf26h8OMBw5QS5NMnH21pHBlTxJg0uJqs0Y1CZoU3fX+vTzrWIx94eYWb9pBgJUkJKLYjlnxhGKKwdDIToMxNzejahV2LYBFMGYBZQXrnmYTLrzIW4gGCR+e4w2firA/51LjdboUza1JNWeDQ8m+M52gMW8cr9qOkXptP48MLA4FzgT2We0b8DQnwKA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=msn.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cR35d5CjNrgMXj4ANlYR2IskmRlSw5GFNDTo3uL7BZQ=; b=cAI3/HuLMKcPLBzXX9bq6IbITN8AyRUEbT3c7aWTldQMU4NX9m44OW+IPsA+UtuLMuikrAVtKrcgxoY3nQ+LUpYXKOFB44K1GMwc+WqUfQnsJoXxTtCSBAN9e930tVv8tSdshJCZ0mA0pJmFWE2roPlScTWW626ELXaxNPQV+N+dDxSJvuqd06Wtaz5YNt2m/UUC6Et80WYffo7qK3P0p6M63JPp4GICr2KRxck02p1yFuntgb6Jwqf8D7G+oY3v7GhKMjcj1Qc82cZwTMebeocInf6zpFvgy7CSbVi14vwVZboBWo1NKnVrm2fcenptluHyUINLfV5WAOKeA+aYlw==
Received: from BN8NAM12FT018.eop-nam12.prod.protection.outlook.com (2a01:111:e400:fc66::45) by BN8NAM12HT132.eop-nam12.prod.protection.outlook.com (2a01:111:e400:fc66::375) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3283.11; Thu, 13 Aug 2020 16:29:34 +0000
Received: from MWHPR1301MB2112.namprd13.prod.outlook.com (2a01:111:e400:fc66::43) by BN8NAM12FT018.mail.protection.outlook.com (2a01:111:e400:fc66::347) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3283.11 via Frontend Transport; Thu, 13 Aug 2020 16:29:34 +0000
Received: from MWHPR1301MB2112.namprd13.prod.outlook.com ([fe80::cde3:7e26:b95b:8ee9]) by MWHPR1301MB2112.namprd13.prod.outlook.com ([fe80::cde3:7e26:b95b:8ee9%6]) with mapi id 15.20.3283.011; Thu, 13 Aug 2020 16:29:34 +0000
From: Peter Constable <pgcon6@msn.com>
To: r12a <ishida@w3.org>, "ietf-languages@ietf.org" <ietf-languages@ietf.org>
Thread-Topic: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry
Thread-Index: AdZw0YwxrG6XW+9cQ1W5H4C6dzxxrAAF0WyAAAAbZnAAARceAAAcn8wAAAsstlA=
Date: Thu, 13 Aug 2020 16:29:33 +0000
Message-ID: <MWHPR1301MB21120388068B8E68EB6C8DE586430@MWHPR1301MB2112.namprd13.prod.outlook.com>
References: <CY4PR0401MB36203305BEFEBF938B654E8FC6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <000201d670e8$d25e7e60$771b7b20$@ewellic.org> <CY4PR0401MB362045E1E4D11D92E1F89443C6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <001a01d670ed$9c868530$d5938f90$@ewellic.org> <f4fa9f5c-3bb6-6b27-f294-7df9e0afa3d4@w3.org>
In-Reply-To: <f4fa9f5c-3bb6-6b27-f294-7df9e0afa3d4@w3.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-incomingtopheadermarker: OriginalChecksum:C9E231D7C41503E17410F532E8399F95D944BE61976C9DBA5C15E09EE3BFE28C; UpperCasedChecksum:F933E28479EDCA8D362433B8B05B156B74B77073F31EEA3E406AE96A7C040382; SizeAsReceived:7181; Count:43
x-tmn: [eM+XlXYdCNcf3NKCWV7sFDbBuoqIprc3]
x-ms-publictraffictype: Email
x-incomingheadercount: 43
x-eopattributedmessage: 0
x-ms-office365-filtering-correlation-id: b1525d97-d738-4267-84b8-08d83fa61036
x-ms-traffictypediagnostic: BN8NAM12HT132:
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: +yhUbDjq25rUqJPdZqMIhBNMUKNQwGras4XEEHL+wo+BAq03UysDKG+Pr78YyDeG4Sino9wMO23LGOETn57aO7V2fbHGREzZpx4iip2YIWFoq91YTBudiB0FOFj+lwQ0TX7qnj76rUIOzhiWFhKYASKT4TT8hSd69CMlllMKVZCe0MtQ4p3zQ9vb7JcucQO+9TqbmZeVvi4xTj52OO6oDg==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:0; SRV:; IPV:NLI; SFV:NSPM; H:MWHPR1301MB2112.namprd13.prod.outlook.com; PTR:; CAT:NONE; SFS:; DIR:OUT; SFP:1901;
x-ms-exchange-antispam-messagedata: WWMYjl2aNlG3AQNwNrqK4Ayz7bgWKiCfUzS/jkrAz0TrELEV4TFeOi+PrGNGQMZTvS2YVVEI7wP9xep0HENBtpLHJ3PO74MY7y5lTJa0WZzAPQDb1siR/kQMOj6KAI19RIH8Pgzmsf23NNZJv7U7Hw==
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_MWHPR1301MB21120388068B8E68EB6C8DE586430MWHPR1301MB2112_"
MIME-Version: 1.0
X-OriginatorOrg: outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-AuthSource: BN8NAM12FT018.eop-nam12.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000
X-MS-Exchange-CrossTenant-Network-Message-Id: b1525d97-d738-4267-84b8-08d83fa61036
X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000
X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Aug 2020 16:29:34.1624 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Internet
X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8NAM12HT132
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/vW8-F088pyp5uwGY1vbvjpingxo>
Subject: Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Aug 2020 16:29:38 -0000

If typographic styles are to be indicated in a tag that is primarily intended to declare the _language_ of content, then what are the limits to that?

For some languages written with Latin script, certain styles of fonts are preferred. For example, the Puyallup nature just south of where I live use fonts with a distinctive style. Should Puyallup content be marked up with a tag that includes something indicating that distinct style? If another language doesn’t normally use the Puyallup style fonts but some documents do, should those documents be tagged to indicate that style?

Or within a single language, there may be different typographic preferences. I’m sure you’re familiar with issues regarding accented capitals in French: one convention allows for them, while another convention would never have accented capitals. Do we need variant subtags to indicate such distinctions? Or take this a step further: there are lots of Latin ligatures that _could_ be used in a document but aren’t required. Should language tags reflect levels of ligation that are used?

Certainly there are stylistic preferences associated with some languages, such as Nastaliq for Urdu. (Mind, I suspect Urdu speakers wouldn’t expect or want a Nastaliq font to be used for UI strings on a phone.) I could perhaps see some benefit in having a way to qualify fonts for such distinctions, and to capture that language/style association as a default in something like CLDR. But aside from characterizing font resources, it’s not clear to me where is it would be beneficial to use ‘Aran’, either in a language tag or some other context.


Peter

From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of r12a
Sent: Thursday, August 13, 2020 3:54 AM
To: Doug Ewell <doug@ewellic.org>; 'Daniel LaVon Billings' <daniel=40ChurchofJesusChrist.org@dmarc.ietf.org>
Cc: ietf-languages@ietf.org
Subject: Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry

Doug Ewell wrote on 12/08/2020 22:14:

We can certainly check with ISO 15924/RA-JAC to see if there is any unstated expectation that ‘Arab’ implies the Naskh variant.

I would hope not, since Naskh is only one of several writing styles used for Arabic.  These include Naskh, Nastaliq (Aran), Ruq'a, Kano, Kufi, and so on.  If Arab was equated with naskh only, we'd be stuck for what to use to represent text written in the other styles.

I would have thought that, generally speaking, the presence of ur would already indicate that an application should by default use a nastaliq font for Urdu (and ks for Kashmiri), without the need to further qualify.  Additional subtags are mostly useful for modifying the default assumptions that come with the language, rather than completing the intent.

It seems to me that Aran might appeal for languages such as Persian, which are commonly written in naskh style, but can be written in a kind of nastaliq, so the -Aran marker could help to indicate that distinction. In a similar way, then, -Arab could be used after ur to indicate that a non-nastaliq font should be used.  But the problem here seems to be that -Arab and -Aran only work for a tiny subset of the actual list of writing-style identifiers that are actually needed,. There are also other places where it would be useful to distinguish between particular styles. For example, Hausa in Arabic script can be written with the hafs or warsh orthographies (typically requiring different fonts because they include different character repertoires), but in Nigeria Hausa also uses the Kano writing-style.  One might also want to label text that uses a magrebi style font in North Africa. Etc.

I think the -Arab subtag is mostly useful for languages (such as the many in Central Asia and nearby) that can be written in more than one script, and where you need to explicitly distinguish whether, say, Latin, Arabic, or Cyrillic, should be used for a given bit of text. But even there my personal preference is only to use the script tag when i need to make a meaningful distinction, not all the time.

I find myself wondering whether Aran ought really to have been a variant subtag, to which we could add others for different writing styles.  In particular, because some of the usage distinctions just mentioned can't be expressed by combining language and region tags.


ri