Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry

Mark Davis ☕️ <mark@macchiato.com> Thu, 13 August 2020 16:53 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7FCF33A0ED7 for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 09:53:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.884
X-Spam-Level:
X-Spam-Status: No, score=-1.884 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_FONT_FACE_BAD=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=macchiato-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GOlI0oDPPnUt for <ietf-languages@ietfa.amsl.com>; Thu, 13 Aug 2020 09:53:12 -0700 (PDT)
Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 845CD3A0EC7 for <ietf-languages@ietf.org>; Thu, 13 Aug 2020 09:53:12 -0700 (PDT)
Received: by mail-qk1-x730.google.com with SMTP id 62so5766323qkj.7 for <ietf-languages@ietf.org>; Thu, 13 Aug 2020 09:53:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=macchiato-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KgPhQsOIJt3DYiR3/AbBoX/tUnR8lob70jexOgkY1ew=; b=wEwGw8ShtfACkMWcxh63z9D15Gz0oT2EuAenb4G+MQjODgtx7PGRicekhLXeHtbyfZ gnZxlFP5iNs234upy2G0BREEkOeOaxcfZZ6TxSVldN+uUKir72MgBgiqkaE4qwOwqPem xeitlvTIQXceEafl4lstg/uIcVOVE3WCB2MDBLk4Al0Y+UmEObZ2cBNXm0Klg8QJMvuv xj8mqc3kllXaOAmoCohbePWbN+vRIsGjzzp6vm8RLoAbBbmYt/Vn+7D+9gwico9Ei+O5 AI+LzihbGq5sHmzP/yrG7P5XzjoHMmEaBDgz4gB17b9fovzAvxFB0X9wO76f1fhyIwVG AcGg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KgPhQsOIJt3DYiR3/AbBoX/tUnR8lob70jexOgkY1ew=; b=U2yMpCwg6O7RGczpS0XfRkYgkGCdhvJsgIGr0+QI1X7FCGYzhWiFPKkhMzhw8pDdJK MMDnHlEYc7W7z0gA87NtYweifmcOR1TFWhc4lbUKUTZpfLmhw6XdHaTAc8gRtes0tpmc 6Msi9mErpZEPgAhDs10qStwswoaofhEIk0qpDgk7wNj+f7dUraDu1GNaorf7+nbEhIfj dRLMzdj1bcl/O4OhUilVokOX0Legh7P2BrE+DRYtmnrxUbQcmcP1mEzPMSDTEZlv7FNY kG2yiVZfxZJBB5KVRCCRaAHv3Q/fdU57kcPdNMRcPpkk6jqrhla+kXgyWbUW5DpPdnKD TTtQ==
X-Gm-Message-State: AOAM530M7aSztioYHm6fiBadxudRBw576Mb4bVjic13EFR9xpKPwkZ3u UJ6vVT18rjPxwnsX7bSGrK0Lzjc116Eb7T9OoHwA88ki
X-Google-Smtp-Source: ABdhPJzsrkjov8YLM/88ndm9HxbwAUIJFRLVF1SHHtGwsd4rXgyhxs8I30xWTkwkCyWRITvezNk9m3YWd1y0lO6WcFo=
X-Received: by 2002:a37:9fd0:: with SMTP id i199mr5663503qke.248.1597337591490; Thu, 13 Aug 2020 09:53:11 -0700 (PDT)
MIME-Version: 1.0
References: <CY4PR0401MB36203305BEFEBF938B654E8FC6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <000201d670e8$d25e7e60$771b7b20$@ewellic.org> <CY4PR0401MB362045E1E4D11D92E1F89443C6420@CY4PR0401MB3620.namprd04.prod.outlook.com> <001a01d670ed$9c868530$d5938f90$@ewellic.org> <f4fa9f5c-3bb6-6b27-f294-7df9e0afa3d4@w3.org> <CAE=3Ky-ZR1py3+Ok1i+YjDR-WUH1Q=0bahZhcAC_Y+i+xc80Cw@mail.gmail.com> <20200813150859.0309f1b0@JRWUBU2>
In-Reply-To: <20200813150859.0309f1b0@JRWUBU2>
From: Mark Davis ☕️ <mark@macchiato.com>
Date: Thu, 13 Aug 2020 09:52:58 -0700
Message-ID: <CAJ2xs_GOmtJ7W4GHQWoG2F-OX+SGe1yxnswOWk+yFwCr9thM2g@mail.gmail.com>
To: Richard Wordingham <richard.wordingham=40ntlworld.com@dmarc.ietf.org>
Cc: ietf-languages@ietf.org
Content-Type: multipart/alternative; boundary="000000000000ba2ee505acc52360"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/hJmmIDmz7tv-Qn6cJ2DGqz4L7bc>
Subject: Re: [Ietf-languages] Suggestion to update Urdu Script Designation in the subtag registry
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Aug 2020 16:53:15 -0000

CLDR does have likely subtags, which provide default interpretations for
incomplete language tags.

https://unicode-org.github.io/cldr-staging/charts/38/supplemental/likely_subtags.html

For Hausa, the default script varies by region.
https://unicode-org.github.io/cldr-staging/charts/38/supplemental/likely_subtags.html#ha


Richard, these are all 'best available data', and are continually being
refined. It is especially a problem for longer-tail languages. Where there
are howlers, we'd like to know about them. Saurashtra is still an issue
(see the above page), but I see no bug report with the term "Saurashtra",
except for the following 2, which appear unrelated.


CLDR-11994 <https://unicode-org.atlassian.net/browse/CLDR-11994>

Review localeId display names
<https://unicode-org.atlassian.net/browse/CLDR-11994>



CLDR-9408 <https://unicode-org.atlassian.net/browse/CLDR-9408>

Review and fix primarily spoken languages
<https://unicode-org.atlassian.net/browse/CLDR-9408>

https://unicode-org.atlassian.net/issues/?filter=-2&jql=text~Saurashtra%20and%20project%3Dcldr%20and%20status%20!%3Ddone

Could you file a bug report for Saurashtra and any other cases you know of?

Mark


On Thu, Aug 13, 2020 at 7:09 AM Richard Wordingham <richard.wordingham=
40ntlworld.com@dmarc.ietf.org> wrote:

> On Thu, 13 Aug 2020 14:49:27 +0200
> Hugh Paterson III <sil.linguist@gmail.com> wrote:
>
> > ri,
> >
> > In Nigeria Hausa can also be written with the Latin script. Where can
> > I go to find what the basic default assumptions are for a language
> > tag? Is the default always Latin?
>
> Unless there is a suppress-script tag for the language, the default is
> not normatively defined.  The ICU database, CLDR, seems to have
> empirical fallback rules, but they're not trustworthy.  They have
> (have had?) such howlers as the Saurashtra script being the default for
> the Saurashtra language.  (The proposal for the script's encoding
> noted that speakers usually used a different script for their
> language.  I have reported the howler.)  It's a lot of work to dig this
> information out, and sane fallback might even depend on the context, as
> I've noticed in the choice of digits for page numbering in Thai -
> magazines use (Western) Arabic digits, while the government sponsored
> dictionary uses Thai digits. The Northern Thai nameplates in a Chiang
> Mai university will use the Tai Tham script, but if you're trying to
> spread the word of God, you'll use the Thai script with Bangkok spelling
> rules.  The recent New Testament translation into Northern Thai used
> both scripts because of the competing requirements. (The two NT versions
> correspond very exactly.) Most Northern Thai speakers can function in
> Thai.
>
> Richard.
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@ietf.org
> https://www.ietf.org/mailman/listinfo/ietf-languages
>