Re: [Ietf-languages] Between language and script in Burmese

r12a <ishida@w3.org> Fri, 19 November 2021 12:50 UTC

Return-Path: <ishida@w3.org>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DAFED3A087E for <ietf-languages@ietfa.amsl.com>; Fri, 19 Nov 2021 04:50:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.75
X-Spam-Level:
X-Spam-Status: No, score=-3.75 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, NICE_REPLY_A=-1.852, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0yUKuUqET8pS for <ietf-languages@ietfa.amsl.com>; Fri, 19 Nov 2021 04:50:13 -0800 (PST)
Received: from isaac.sophia.w3.org (isaac.sophia.w3.org [193.51.208.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B23043A088A for <ietf-languages@ietf.org>; Fri, 19 Nov 2021 04:50:13 -0800 (PST)
Received: from cpc119494-heme14-2-0-cust33.9-1.cable.virginm.net ([82.12.236.34] helo=[192.168.1.169]) by isaac.sophia.w3.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from <ishida@w3.org>) id 1mo3LA-0000cz-O1; Fri, 19 Nov 2021 12:50:04 +0000
To: Simon Cozens <simon@simon-cozens.org>
Cc: Martin Hosken <martin_hosken@sil.org>, Peter Constable <pgcon6@msn.com>, ietf-languages@ietf.org, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
References: <20211118090511.305c280c@silmh9> <1F8F5822-FD34-4456-9B87-AE9AF3EABAD4@simon-cozens.org> <6bebcf1c-c4a5-85f4-4170-add5cfddcc7f@it.aoyama.ac.jp>
From: r12a <ishida@w3.org>
Message-ID: <40c6e9ae-7c88-d322-7d0f-b04436468bbd@w3.org>
Date: Fri, 19 Nov 2021 12:49:59 +0000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:52.0) Gecko/20100101 PostboxApp/7.0.52
MIME-Version: 1.0
In-Reply-To: <6bebcf1c-c4a5-85f4-4170-add5cfddcc7f@it.aoyama.ac.jp>
Content-Type: multipart/alternative; boundary="------------F8C2CF94A2AEE6F000775214"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/hVz6RdKB2-wTHif_388Z8duRAhQ>
Subject: Re: [Ietf-languages] Between language and script in Burmese
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Nov 2021 12:50:18 -0000

fwiw, here are some thoughts for my 2p:

It seems to me that the less you have to remember that a particular 
combination of tags means x or y, the better.  If we can use tags that 
are easy to understand and guess without having to remember special 
rules or conventions, then we're much better off.

I think the language tag should always reflect the actual language of 
the text, regardless of script or orthography. Labelling something as 
shn when it's not will mess up things like voice browsers, spell 
checkers, and possibly layout, opentype features, etc.

We actually appear to have a few script tags that are in fact 
orthography tags, such as aran, syrn, and syrj (see 
https://r12a.github.io/app-subtags/?lookup=aran,syrn,syrj) but it's 
never been clear to me why that is a good idea (except for hant, and 
hans, but the situation is somewhat  different there).  I'm inclined to 
think that script tags should just be distinctive at the script level.

I'm also wary of the idea of using region tags to specify a particular 
orthography (like we used to with zh-TW, and zh-CN), unless you really 
want to identify the orthography only on the basis of a particular 
regional standard (like en-GB), because (a) the usage may not be cleanly 
defined by a region tag (eg. things like Azeri, Kurdish, etc. or use of 
zh-CN for Singapore) and (b) it may be difficult to know/remember which 
region tag should be used. These relationships with region may also 
change in the future, as people migrate or countries mutate.

On the other hand, we have *loads* of variant tags that are related to 
specific orthographies. See 
https://r12a.github.io/app-subtags/?find=ortho which list around 50.

So i'm inclined to think that perhaps the solution is to submit a 
request for new orthographic variant subtags.

ri





Martin J. Dürst wrote on 19/11/2021 10:21:
> Hello Simon, others,
>
> On 2021-11-19 17:10, Simon Cozens wrote:
>
>> But Shan Pali is a useful case because it does ask the question of 
>> whether something like Shan - and all these variant forms of Burmese 
>> orthography - is a language or a script (or a script variant). Some 
>> of them diverge quite strongly from the standard Burmese letterforms 
>> but IETF considers them all Mymr. I’d question that.
>
> It's not the IETF that decides on script codes. The IETF just pulls 
> together various ISO standards into a single language tag, and has 
> some escape hatches such as variant subtags when the ISO standards are 
> not enough.
>
> Regards,   Martin.
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@ietf.org
> https://www.ietf.org/mailman/listinfo/ietf-languages