Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry

Doug Ewell <doug@ewellic.org> Thu, 13 August 2020 21:09 UTC

Return-Path: <doug@ewellic.org>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 79F673A09A4 for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 14:09:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JYPqfmvQJB1t for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 14:09:00 -0700 (PDT)
Received: from p3plsmtpa07-02.prod.phx3.secureserver.net (p3plsmtpa07-02.prod.phx3.secureserver.net [173.201.192.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 36C843A0A08 for <ietf-languages@ietf.org>; Thu, 13 Aug 2020 14:08:48 -0700 (PDT)
Received: from DESKTOPLPOB1E4 ([73.229.14.229]) by :SMTPAUTH: with ESMTPSA id 6KSskkWCeXBtn6KSskWAq2; Thu, 13 Aug 2020 14:08:46 -0700
X-CMAE-Analysis: v=2.3 cv=O5ZHQy1W c=1 sm=1 tr=0 a=9XGd8Ajh92evfb2NHZFWmw==:117 a=9XGd8Ajh92evfb2NHZFWmw==:17 a=IkcTkHD0fZMA:10 a=nORFd0-XAAAA:8 a=UxPV6wLUrEFE_D2LxKcA:9 a=QEXdDO2ut3YA:10 a=AYkXoqVYie-NGRFAsbO8:22
X-SECURESERVER-ACCT: doug@ewellic.org
From: Doug Ewell <doug@ewellic.org>
To: 'Richard Wordingham' <richard.wordingham=40ntlworld.com@dmarc.ietf.org>, ietf-languages@ietf.org
References: <CY4PR0401MB36203305BEFEBF938B654E8FC6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <000201d670e8$d25e7e60$771b7b20$@ewellic.org> <CY4PR0401MB362045E1E4D11D92E1F89443C6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <001a01d670ed$9c868530$d5938f90$@ewellic.org> <f4fa9f5c-3bb6-6b27-f294-7df9e0afa3d4@w3.org> <MWHPR1301MB21120388068B8E68EB6C8DE586430@MWHPR1301MB2112.namprd13.prod.outlook.com> <000001d6719a$9c3c7b40$d4b571c0$@ewellic.org> <20200813202934.3b348a9d@JRWUBU2>
In-Reply-To: <20200813202934.3b348a9d@JRWUBU2>
Date: Thu, 13 Aug 2020 15:08:47 -0600
Message-ID: <001c01d671b5$efad4e60$cf07eb20$@ewellic.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQH5rA+CwpVeTdY2lU0+kS8AwUmZoQHtL+XRAeUUEP4B+wQcGwHAtBlyATjuoXICvr+czAKZejNbqH8G1bA=
Content-Language: en-us
X-CMAE-Envelope: MS4wfJUBNDBs78rMfHVuJo8tve2tTxmQ8UI5C9FGT6EvkTvBwXy6SwY5exIR+crzDPN0ZzgxzGTSfXQVodwacbC5CwN1bYvJpDRJ5S5q9jFe0sxzIhk5UWqH pwJnGxq+kRX6CxAVDPc47VNVfdoWJDrnrwKjlkKLYVVXmU1oHgZTw1ZdzUVpcggmw3l/JX6kuq3eVyZ/CLMU4pzWr08RgASlGs0bLa5LPw7pnyGLwnWCDnKi aeokCBEyC+AomeRtgbDWJw==
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/GEvAvcSiEA6MbKSJLjnuZOBElXc>
Subject: Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Aug 2020 21:09:07 -0000

Richard Wordingham wrote:

>> For the OP’s original scenario of Urdu in Nastaliq, I don’t have any
>> issue with specifying “ur-Aran” when the choice of a Nastaliq font is
>> considered important, especially since 'Aran' already exists. It's
>> just not what Suppress-Script is for.
>
> I am now very confused.  BCP 47 says one SHOULD NOT tag text as
> en-Latn.

Not unless you have a particular reason for calling out the script. For example, you might have two parallel English texts, one in normal Latin and one in the Gaelic variety. It might be appropriate to tag them as "en-Latn" and "en-Latg" because you want to call special attention to the distinction. RFC 5646, Section 4.1, item 2, first bullet point shows another example.

That is not the same as reflexively tagging every piece of "en" text as "en-Latn", which was the concern 15 years ago.

> So, if I were cataloguing the various texts cluttering my house and
> recording their language and script, I would have some that were
> English in the Latin script and some that were English in the Thai
> script.*
>
> *These arise for teaching and, at risk, as semi-encrypted messages.

And presumably the Latin-script and Thai-script texts were not parallel texts presented in contrast, as posited above.

> So, if I chose to use BCP 47 to record the language, the former should
> be recorded as "en" rather than as "en-Latn" and the latter could be
> recorded as "en-Thai".

Yes, because English is so overwhelmingly written in Latin script and not in Thai or other script (your personal collection notwithstanding) that this could be reasonably assumed for the former.

> Then, when I came to use the catalogue, I would know that those
> labelled as "en-Thai" were in the Thai script, but for those labelled
> "en", I should be unsure of the script; it would be improper to assume
> that the script was the Latin script, though that would be the best
> **guess**.

Maybe I'm unclear about the difference between "assume" and "guess" in this context.

Your, or some institutional, knowledge of English should indicate that English is written predominantly in Latin. The Suppress-Script for 'en' is there because English is written so predominantly in Latin script that to say so in a language tag is normally redundant.

To project this onto the Urdu-in-Nastaliq scenario, you would need to conclude that tag consumers encountering plain "ur" would be so likely to assume "ur-Aran" (specifically Nastaliq), and so unlikely to assume "ur-Arab" (Arabic with no indication as to calligraphic style), that "ur-Arab" is a special case to be called out in the tag, sort of like "en-Thai".

> In the implausible case that I knew that something was in English but
> hadn't looked at it to determine the script, I presume I should record
> the language and script as "en-Zyyy".

I would presume that if you just didn't look at it, it should be "en". If you looked at it and literally cannot determine what the script is, either because you don't have the knowledge or resources to identify it or because it's a new Voynich or whatever, then it would be "en-Zyyy".

--
Doug Ewell | Thornton, CO, US | ewellic.org