Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry

Doug Ewell <doug@ewellic.org> Thu, 13 August 2020 21:09 UTC

From: Doug Ewell <doug@ewellic.org>
To: 'Richard Wordingham' <richard.wordingham=40ntlworld.com@dmarc.ietf.org>, ietf-languages@ietf.org
References: <CY4PR0401MB36203305BEFEBF938B654E8FC6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <000201d670e8$d25e7e60$771b7b20$@ewellic.org> <CY4PR0401MB362045E1E4D11D92E1F89443C6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <001a01d670ed$9c868530$d5938f90$@ewellic.org> <f4fa9f5c-3bb6-6b27-f294-7df9e0afa3d4@w3.org> <MWHPR1301MB21120388068B8E68EB6C8DE586430@MWHPR1301MB2112.namprd13.prod.outlook.com> <000001d6719a$9c3c7b40$d4b571c0$@ewellic.org> <20200813202934.3b348a9d@JRWUBU2>
In-Reply-To: <20200813202934.3b348a9d@JRWUBU2>
Date: Thu, 13 Aug 2020 15:08:47 -0600
Message-ID: <001c01d671b5$efad4e60$cf07eb20$@ewellic.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AQH5rA+CwpVeTdY2lU0+kS8AwUmZoQHtL+XRAeUUEP4B+wQcGwHAtBlyATjuoXICvr+czAKZejNbqH8G1bA=
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/GEvAvcSiEA6MbKSJLjnuZOBElXc>
Subject: Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry
Precedence: list

Richard Wordingham wrote:

>> For the OP’s original scenario of Urdu in Nastaliq, I don’t have any
>> issue with specifying “ur-Aran” when the choice of a Nastaliq font is
>> considered important, especially since 'Aran' already exists. It's
>> just not what Suppress-Script is for.
>
> I am now very confused.  BCP 47 says one SHOULD NOT tag text as
> en-Latn.

Not unless you have a particular reason for calling out the script. For example, you might have two parallel English texts, one in normal Latin and one in the Gaelic variety. It might be appropriate to tag them as "en-Latn" and "en-Latg" because you want to call special attention to the distinction. RFC 5646, Section 4.1, item 2, first bullet point shows another example.

That is not the same as reflexively tagging every piece of "en" text as "en-Latn", which was the concern 15 years ago.

> So, if I were cataloguing the various texts cluttering my house and
> recording their language and script, I would have some that were
> English in the Latin script and some that were English in the Thai
> script.*
>
> *These arise for teaching and, at risk, as semi-encrypted messages.

And presumably the Latin-script and Thai-script texts were not parallel texts presented in contrast, as posited above.

> So, if I chose to use BCP 47 to record the language, the former should
> be recorded as "en" rather than as "en-Latn" and the latter could be
> recorded as "en-Thai".

Yes, because English is so overwhelmingly written in Latin script and not in Thai or other script (your personal collection notwithstanding) that this could be reasonably assumed for the former.

> Then, when I came to use the catalogue, I would know that those
> labelled as "en-Thai" were in the Thai script, but for those labelled
> "en", I should be unsure of the script; it would be improper to assume
> that the script was the Latin script, though that would be the best
> **guess**.

Maybe I'm unclear about the difference between "assume" and "guess" in this context.

Your, or some institutional, knowledge of English should indicate that English is written predominantly in Latin. The Suppress-Script for 'en' is there because English is written so predominantly in Latin script that to say so in a language tag is normally redundant.

To project this onto the Urdu-in-Nastaliq scenario, you would need to conclude that tag consumers encountering plain "ur" would be so likely to assume "ur-Aran" (specifically Nastaliq), and so unlikely to assume "ur-Arab" (Arabic with no indication as to calligraphic style), that "ur-Arab" is a special case to be called out in the tag, sort of like "en-Thai".

> In the implausible case that I knew that something was in English but
> hadn't looked at it to determine the script, I presume I should record
> the language and script as "en-Zyyy".

I would presume that if you just didn't look at it, it should be "en". If you looked at it and literally cannot determine what the script is, either because you don't have the knowledge or resources to identify it or because it's a new Voynich or whatever, then it would be "en-Zyyy".

--
Doug Ewell | Thornton, CO, US | ewellic.org

[Ietf-languages] Suggestion to update Urdu Script… Daniel LaVon Billings
Re: [Ietf-languages] Suggestion to update Urdu Sc… Peter Constable
Re: [Ietf-languages] Suggestion to update Urdu Sc… Doug Ewell
Re: [Ietf-languages] Suggestion to update Urdu Sc… Daniel LaVon Billings
Re: [Ietf-languages] Suggestion to update Urdu Sc… Daniel LaVon Billings
Re: [Ietf-languages] Suggestion to update Urdu Sc… Doug Ewell
Re: [Ietf-languages] Suggestion to update Urdu Sc… Mark Davis ☕️
Re: [Ietf-languages] Suggestion to update Urdu Sc… Doug Ewell
Re: [Ietf-languages] Suggestion to update Urdu Sc… Peter Constable
Re: [Ietf-languages] Suggestion to update Urdu Sc… Richard Wordingham
Re: [Ietf-languages] Suggestion to update Urdu Sc… Peter Constable
Re: [Ietf-languages] Suggestion to update Urdu Sc… r12a
Re: [Ietf-languages] Suggestion to update Urdu Sc… Hugh Paterson III
Re: [Ietf-languages] Suggestion to update Urdu Sc… Richard Wordingham
Re: [Ietf-languages] Suggestion to update Urdu Sc… Doug Ewell
Re: [Ietf-languages] Suggestion to update Urdu Sc… Peter Constable
Re: [Ietf-languages] Suggestion to update Urdu Sc… Mark Davis ☕️
Re: [Ietf-languages] Suggestion to update Urdu Sc… Doug Ewell
Re: [Ietf-languages] Suggestion to update Urdu Sc… Mark Davis ☕️
Re: [Ietf-languages] Suggestion to update Urdu Sc… John Cowan
Re: [Ietf-languages] Suggestion to update Urdu Sc… Richard Wordingham
Re: [Ietf-languages] Suggestion to update Urdu Sc… Richard Wordingham
Re: [Ietf-languages] Suggestion to update Urdu Sc… Doug Ewell
Re: [Ietf-languages] Suggestion to update Urdu Sc… Shawn Steele
Re: [Ietf-languages] Likely subtags howlers Richard Wordingham
Re: [Ietf-languages] Suggestion to update Urdu Sc… Doug Ewell
Re: [Ietf-languages] Likely subtags howlers Mark Davis ☕️
Re: [Ietf-languages] Suggestion to update Urdu Sc… Richard Wordingham
Re: [Ietf-languages] Suggestion to update Urdu Sc… Richard Wordingham
[Ietf-languages] Default tagging Martin Hosken
Re: [Ietf-languages] Suggestion to update Urdu Sc… Martin J. Dürst
Re: [Ietf-languages] Suggestion to update Urdu Sc… Doug Ewell
Re: [Ietf-languages] Suggestion to update Urdu Sc… Michael Everson
Re: [Ietf-languages] Suggestion to update Urdu Sc… Michael Everson
Re: [Ietf-languages] Suggestion to update Urdu Sc… Richard Wordingham