Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry

Richard Wordingham <richard.wordingham@ntlworld.com> Fri, 14 August 2020 00:26 UTC

Return-Path: <richard.wordingham@ntlworld.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 92E593A0B6A for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 17:26:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ntlworld.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M5ESHwOuKnWm for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 17:26:29 -0700 (PDT)
Received: from know-smtprelay-omc-6.server.virginmedia.net (know-smtprelay-omc-6.server.virginmedia.net [80.0.253.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5CAEC3A0B69 for <ietf-languages@ietf.org>; Thu, 13 Aug 2020 17:26:29 -0700 (PDT)
Received: from JRWUBU2 ([82.4.11.47]) by cmsmtp with ESMTP id 6NYBkelTDj1gc6NYBk3QCM; Fri, 14 Aug 2020 01:26:27 +0100
X-Originating-IP: [82.4.11.47]
X-Authenticated-User:
X-Spam: 0
X-Authority: v=2.3 cv=NerIKVL4 c=1 sm=1 tr=0 a=yrOAJgItaIMndimPI+pDLQ==:117 a=yrOAJgItaIMndimPI+pDLQ==:17 a=IkcTkHD0fZMA:10 a=nORFd0-XAAAA:8 a=AT1swc74QzEVNu0AZkkA:9 a=QEXdDO2ut3YA:10 a=AYkXoqVYie-NGRFAsbO8:22
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1597364787; bh=JgyjHwUXBczMG/q1sCMWZlh5a8CWsLBFrYH4Yi3KGUk=; h=Date:From:To:Subject:In-Reply-To:References; b=L59ZchBxIS72MYMtsPaYYJ3x0PzrLoQMrnLGWEACzoBSlGVLjjTydcO19bKLEKvwO izO7/6O1P6VQFUt1ZX0OhIeW/QnkvQGtjfxlIwdZGEFL0UIABAtVnQpSS7LR403u91 lh5Y06v6iRp6b49e/ra27CBenGtI5D0RDQR3nlwUm1bYyE8kpldtQbBZ6ZZCEh6pj/ uRQaX3vXhVuFY3x5OtZq6lLbcaZghjZo5BwutmsmpOmcLYdERr8o5wL7O2ugUYmXuI bCfFVaBO3CROxJIEBRq79q/8P4KvYvymje2NF26reJhxKy3lAAzSSQqAPAEDpNfYDt mYQxtxADfAQwA==
Date: Fri, 14 Aug 2020 01:26:21 +0100
From: Richard Wordingham <richard.wordingham@ntlworld.com>
To: ietf-languages@ietf.org
Message-ID: <20200814012621.2c6a9b69@JRWUBU2>
In-Reply-To: <001c01d671b5$efad4e60$cf07eb20$@ewellic.org>
References: <CY4PR0401MB36203305BEFEBF938B654E8FC6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <000201d670e8$d25e7e60$771b7b20$@ewellic.org> <CY4PR0401MB362045E1E4D11D92E1F89443C6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <001a01d670ed$9c868530$d5938f90$@ewellic.org> <f4fa9f5c-3bb6-6b27-f294-7df9e0afa3d4@w3.org> <MWHPR1301MB21120388068B8E68EB6C8DE586430@MWHPR1301MB2112.namprd13.prod.outlook.com> <000001d6719a$9c3c7b40$d4b571c0$@ewellic.org> <20200813202934.3b348a9d@JRWUBU2> <001c01d671b5$efad4e60$cf07eb20$@ewellic.org>
X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; i686-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-CMAE-Envelope: MS4wfHP7Qq6f10WR2LoXim3oNmiaeoqK0o0+MRBze9OGS86+cOdwEjtns6+4HFddBGQ9gKSIdQCM1KEY/9lzTTE4RnZbRzxELjaaAW+pL1ZhuAbynZDE9xqS XCTLBTsW4Yw8T/1qeaH0hpvdEkYqaPllgA1lNsWCoaeCdGHiUdotxIOB
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/-rUn0lABd7n0wvshttiP69-CQYk>
Subject: Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Aug 2020 00:26:32 -0000

On Thu, 13 Aug 2020 15:08:47 -0600
"Doug Ewell" <doug@ewellic.org> wrote:

> Richard Wordingham wrote:
> 
> >> For the OP’s original scenario of Urdu in Nastaliq, I don’t have
> >> any issue with specifying “ur-Aran” when the choice of a Nastaliq
> >> font is considered important, especially since 'Aran' already
> >> exists. It's just not what Suppress-Script is for.  
> >
> > I am now very confused.  BCP 47 says one SHOULD NOT tag text as
> > en-Latn.  

> Not unless you have a particular reason for calling out the script.
> For example, you might have two parallel English texts, one in normal
> Latin and one in the Gaelic variety. It might be appropriate to tag
> them as "en-Latn" and "en-Latg" because you want to call special
> attention to the distinction. RFC 5646, Section 4.1, item 2, first
> bullet point shows another example.

> That is not the same as reflexively tagging every piece of "en" text
> as "en-Latn", which was the concern 15 years ago.

> > So, if I were cataloguing the various texts cluttering my house and
> > recording their language and script, I would have some that were
> > English in the Latin script and some that were English in the Thai
> > script.*
> >
> > *These arise for teaching and, at risk, as semi-encrypted
> > messages.  

> And presumably the Latin-script and Thai-script texts were not
> parallel texts presented in contrast, as posited above.

Parallel texts would indeed be a rare case.

> > So, if I chose to use BCP 47 to record the language, the former
> > should be recorded as "en" rather than as "en-Latn" and the latter
> > could be recorded as "en-Thai".  

> Yes, because English is so overwhelmingly written in Latin script and
> not in Thai or other script (your personal collection
> notwithstanding) that this could be reasonably assumed for the former.

> > Then, when I came to use the catalogue, I would know that those
> > labelled as "en-Thai" were in the Thai script, but for those
> > labelled "en", I should be unsure of the script; it would be
> > improper to assume that the script was the Latin script, though
> > that would be the best **guess**.  

> Maybe I'm unclear about the difference between "assume" and "guess"
> in this context.

If what I did with one of them depended on the script, then if I "guess"
I need to check the script: if I "assume", I don't check.

Now, I can't rely on the suppress-script field to assume that "en"
means "en-Latn"; that would be an improper use of the field.  Is that
correct?

> > In the implausible case that I knew that something was in English
> > but hadn't looked at it to determine the script, I presume I should
> > record the language and script as "en-Zyyy".  

> I would presume that if you just didn't look at it, it should be
> "en". If you looked at it and literally cannot determine what the
> script is, either because you don't have the knowledge or resources
> to identify it or because it's a new Voynich or whatever, then it
> would be "en-Zyyy".

Surely an English text in your conscript Ewellic should be
labelled "en-Zzzz", for 'uncoded script'.  So are you saying that the
script is only 'undetermined' if an attempt to determine its script
code has failed?

You seem to be saying that my index SHOULD NOT use a BCP 47 tag to
record whether a text in English is in the Latin script.  On the
other hand, it could be used to record the script of Northern Thai
texts.

Would a rule that the script must be indicated somehow make a
difference, e.g. by making plain "en" or "ur" imply that the script
subtag had been suppressed?

Richard.