Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry

Doug Ewell <doug@ewellic.org> Fri, 14 August 2020 05:18 UTC

Return-Path: <doug@ewellic.org>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8E2023A0D2B for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 22:18:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HTNjBf8pAxyT for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 22:18:00 -0700 (PDT)
Received: from p3plsmtpa06-05.prod.phx3.secureserver.net (p3plsmtpa06-05.prod.phx3.secureserver.net [173.201.192.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F24283A0D25 for <ietf-languages@ietf.org>; Thu, 13 Aug 2020 22:17:59 -0700 (PDT)
Received: from DESKTOPLPOB1E4 ([73.229.14.229]) by :SMTPAUTH: with ESMTPSA id 6S6Hk3rE45Zij6S6IkVkrV; Thu, 13 Aug 2020 22:17:58 -0700
X-CMAE-Analysis: v=2.3 cv=XP9OtjpE c=1 sm=1 tr=0 a=9XGd8Ajh92evfb2NHZFWmw==:117 a=9XGd8Ajh92evfb2NHZFWmw==:17 a=IkcTkHD0fZMA:10 a=nORFd0-XAAAA:8 a=c3GLFJA_jWhQAeUVr5gA:9 a=QEXdDO2ut3YA:10 a=AYkXoqVYie-NGRFAsbO8:22
X-SECURESERVER-ACCT: doug@ewellic.org
From: Doug Ewell <doug@ewellic.org>
To: 'Richard Wordingham' <richard.wordingham=40ntlworld.com@dmarc.ietf.org>, ietf-languages@ietf.org
References: <CY4PR0401MB36203305BEFEBF938B654E8FC6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <000201d670e8$d25e7e60$771b7b20$@ewellic.org> <CY4PR0401MB362045E1E4D11D92E1F89443C6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <001a01d670ed$9c868530$d5938f90$@ewellic.org> <f4fa9f5c-3bb6-6b27-f294-7df9e0afa3d4@w3.org> <MWHPR1301MB21120388068B8E68EB6C8DE586430@MWHPR1301MB2112.namprd13.prod.outlook.com> <000001d6719a$9c3c7b40$d4b571c0$@ewellic.org> <20200813202934.3b348a9d@JRWUBU2> <001c01d671b5$efad4e60$cf07eb20$@ewellic.org> <20200814012621.2c6a9b69@JRWUBU2>
In-Reply-To: <20200814012621.2c6a9b69@JRWUBU2>
Date: Thu, 13 Aug 2020 23:17:57 -0600
Message-ID: <000001d671fa$4618c740$d24a55c0$@ewellic.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Content-Language: en-us
Thread-Index: AQH5rA+CwpVeTdY2lU0+kS8AwUmZoQHtL+XRAeUUEP4B+wQcGwHAtBlyATjuoXICvr+czAKZejNbAby2HZ4Bu2Gc9ahj12rA
X-CMAE-Envelope: MS4wfFZrc+tzgqt+mb/aolPkxTBRK3Mccd/kgSIPxNLz+11VxqYZhV6SE57++aXPRmB6sIdmQGXNQPvZmHgNQiJZH9OEm/7b8nfxNfuqxbDOfES9MX3M1Hro pf9vcI4QiCWvAuvhAUzbMdUMfxBK+tatINcu+QJyp2ucxacCGXaqF/gxt5E5t9rJvXHyH25W2Bzm69yWULoSlk1isb2kdAGNMj5y3pBV+u3TmomRiO1sCU5L IoyuemxGxCEfVN86efEdlg==
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/ok9wi3zY1QhBoWvdJrUGGcGz7xI>
Subject: Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Aug 2020 05:18:02 -0000

Richard Wordingham wrote:

>>> Then, when I came to use the catalogue, I would know that those
>>> labelled as "en-Thai" were in the Thai script, but for those
>>> labelled "en", I should be unsure of the script; it would be
>>> improper to assume that the script was the Latin script, though that
>>> would be the best **guess**.
>
>> Maybe I'm unclear about the difference between "assume" and "guess"
>> in this context.
>
> If what I did with one of them depended on the script, then if I "guess"
> I need to check the script: if I "assume", I don't check.

Um, OK.

> Now, I can't rely on the suppress-script field to assume that "en"
> means "en-Latn"; that would be an improper use of the field.  Is that
> correct?

I have the feeling I'm being set up for something, but here goes:

The Suppress-Script field on 'en' means that IF content tagged as "en" is written at all — it could be spoken, or even some other modality — it is PROBABLY written in the Latin script. You cannot be 100% certain that it is written, or that it is written in the Latin script. If you need 100% certainty, the text should be tagged as "en-Latn".

But it does not follow from this that all English text written in the Latin script should be tagged "en-Latn". Most of the time, the assumption is correct and adequate. This is why the word SHOULD is used, instead of MUST.

> Surely an English text in your conscript Ewellic should be labelled
> "en-Zzzz", for 'uncoded script'.

I could do that, or I could use a script subtag in the private-use area (e.g. "en-Qabe") or a private-use subtag (e.g. "en-x-ewellic"). Each of these approaches has its pros and cons.

> So are you saying that the script is only 'undetermined' if an attempt
> to determine its script code has failed?

That is my understanding. I welcome the opinions of others on this question. I don't think it is covered in RFC 5646, and I don't believe the subject has come up on this list before.

Do you have a concrete use case surrounding this?

> You seem to be saying that my index SHOULD NOT use a BCP 47 tag to
> record whether a text in English is in the Latin script.  On the other
> hand, it could be used to record the script of Northern Thai texts.

In your everyday life, it is not necessary under normal circumstances to point out that your car has four wheels, because the overwhelming majority of cars do.

That is how I view Suppress-Script. The distinction is not between English per se and Northern Thai per se, nor between the Latin script and other scripts.

> Would a rule that the script must be indicated somehow make a
> difference, e.g. by making plain "en" or "ur" imply that the script
> subtag had been suppressed?

A rule that the script must be indicated would be the exact opposite of what we were trying to accomplish in 2005, which is backward compatibility with the huge volume of existing language-tagged data.

Suppression of a script subtag isn't usually that kind of active thought process, like "I really want to indicate that this English text is in Latin script, but BCP 47 said I mustn't." It's more a matter of not stating that your car has four wheels, which might be obvious enough without saying so, or which might not even be relevant.

--
Doug Ewell | Thornton, CO, US | ewellic.org