Re: [Ietf-languages] BCP47 violation in the recent extlang ajp change

Hugh Paterson III <sil.linguist@gmail.com> Tue, 28 March 2023 21:26 UTC

Return-Path: <sil.linguist@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1222AC15270E for <ietf-languages@ietfa.amsl.com>; Tue, 28 Mar 2023 14:26:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.095
X-Spam-Level:
X-Spam-Status: No, score=-2.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nNcqqCu62vv9 for <ietf-languages@ietfa.amsl.com>; Tue, 28 Mar 2023 14:26:00 -0700 (PDT)
Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DA211C151B3C for <ietf-languages@ietf.org>; Tue, 28 Mar 2023 14:26:00 -0700 (PDT)
Received: by mail-ed1-x536.google.com with SMTP id x3so55156434edb.10 for <ietf-languages@ietf.org>; Tue, 28 Mar 2023 14:26:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680038759; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+oKspOa/SclmwftIZOzqkmUF+kFH24TIFyiATQjVG7I=; b=Ohpoi626Ikl0XONL/wmSs/0nnkkXBf5quU9NGq7SBzR/T5Jah/DOxTE74PVR5wDV4W lIe2vnBWPC2YGWbmr73dJz0VSGp81PxRJS4KLfQUHf/ZkNGEQ7QbSBP9iUd46AZChIwZ 0NoVShRkWNDy6BsglsOn6Tjd2DVgwM2j6MqCgXwkzGDF0/CfBN1MKDRIoRx0l+ZE1cqA Vpdtb/jVKu8kwPdviZQSgNo0AISwWaX/XP32n82i7qMKa5azgaiFinMIaL9mDSCkXb3D +FwYWJ9KchVGcZ6u5Lm/DjVnEb9fWQTD0VoE1B9jb4efFTcMbYqFgvefVkxGI46F13Ff IvLw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680038759; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+oKspOa/SclmwftIZOzqkmUF+kFH24TIFyiATQjVG7I=; b=bUGl3OFADM/+JfhrmKL7FyvnwcAuYN+rQVUwyDEAKmbgzUHNWP06bWDXgU3Ucdw8SP Mc0iz7AoMtYp1wKZGrk897SObkex7NWkfivBXfuDWLik9P7y5UgtJLo4Te4lqixCCBJl u2Ch67Tg6QYDR6qLTQH/8R0LVtSNRKTBn5D6Vq5YrplPdhx9ZOvNXpy5yLxAGMun5INV JnyR3qgSHceadv3q0AvPlf2qjzt2bPc6IXXI70pWXqIHQo2qU1Sun/IyQkRbonu+jQtU E0aG9mkyOALXtj8zpIDKZQVRLpXDfBCb62/6ATtUR7BOr13kiWVlYk0gsE9QwokF2Xyk NM2w==
X-Gm-Message-State: AAQBX9dtll6EBRDETxmbOcUzQ3tRTpP+dERRPLF0jPGwpcLQemYfORom R1ZT2+OdmZfSf9YIcZ6NefoL0Rx5XxCZINRM4Us=
X-Google-Smtp-Source: AKy350Y+sIlNIcJ3lurMKAG4HJm6YV/KpdzYByLdeAe7r2nW8JdUqAF6Dqs/trA710KaQE/D+/Ut2LSSEPwktOqXFcY=
X-Received: by 2002:a17:907:c25:b0:8b2:fa6d:45e3 with SMTP id ga37-20020a1709070c2500b008b2fa6d45e3mr8659010ejc.1.1680038758883; Tue, 28 Mar 2023 14:25:58 -0700 (PDT)
MIME-Version: 1.0
References: <871qlbvbgo.fsf@gmail.com> <PH0PR03MB6606F4AFF9773C419EF09401CA8A9@PH0PR03MB6606.namprd03.prod.outlook.com> <7f1e4b72-fc25-bec8-9e36-cbffbdd6eeda@it.aoyama.ac.jp> <CAE=3Ky_fi5ixQp+gYJeV4kYAtr0BmTbRgiaLDUJLDAk8AE-xjA@mail.gmail.com> <SJ0PR03MB65981CE49906B746C69BCD03CA889@SJ0PR03MB6598.namprd03.prod.outlook.com>
In-Reply-To: <SJ0PR03MB65981CE49906B746C69BCD03CA889@SJ0PR03MB6598.namprd03.prod.outlook.com>
From: Hugh Paterson III <sil.linguist@gmail.com>
Date: Tue, 28 Mar 2023 14:25:47 -0700
Message-ID: <CAE=3Ky8R3mKWn+JRnPo-V5m+g+W=jsQ+_T2g3xBS6Ct-p26MNg@mail.gmail.com>
To: Doug Ewell <doug@ewellic.org>
Cc: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Christian Despres <christian.j.j.despres@gmail.com>, "ietf-languages@ietf.org" <ietf-languages@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000006ea36c05f7fc8149"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/HWl2Is0erotKUF3KJp5mWfMIG0E>
Subject: Re: [Ietf-languages] BCP47 violation in the recent extlang ajp change
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Mar 2023 21:26:05 -0000

If -t and -u are not to be included in BCP-47 (as reserved entities),
because they ought to be independently manageable (which I assume is the
purpose for a separate RFC), then should the -x extension be kicked out to
its own RFC? Why should it be included in RFC5646? Wouldn't the same
reasons for inclusion or exclusion exist for all singletons? (§2.2/3.7).

Without linking and registration of the various RFC documents defining
singletons, doesn't it mean that the only singleton which a parser can
depend on by implementing BCP-47 is -x-, as that is the only one
acknowledged? Potentially then different communities could define their own
singletons and create clashing singletons. For example, I could create an
organization called the Open Language Advocacy Community, and define a set
of singletons, some of which may clash with RFC6067 and RFC6497. Presumably
nothing is stopping some of these other singletons from becoming registered
as RFCs. This situation may create confusion for parsers who expect -t and
-u to be related to RFC6067 and RFC6497.

Maybe the more pertinent architectural approach is to have a database of
singletons where registration is required like the IANA database. In my
first read of §2.2 and §3.7, I read it as there was only a single universe
of singletons possible. Am I more appropriately to understand the current
architecture to be that there is infact a multiverse of infinite options
with regards to the semantics of any singleton other than -x-? With the
caveat that in each world of the multiverse that -x- will in fact have the
same semantics.

Kind Regards,
Hugh


On Tue, Mar 28, 2023 at 1:55 PM Doug Ewell <doug@ewellic.org> wrote:

> Hugh Paterson III wrote:
>
> > 1. Any larger BCP-47 effort for revision should likely include
> > incorporating the Unicode Extensions, RFC6067 and RFC6497:
> >
> > https://cldr.unicode.org/index/bcp47-extension
> > http://www.unicode.org/reports/tr35/#36-unicode-bcp-47-u-extension
>
> The T and U extensions are quite properly described in separate RFCs, as
> specified in Section 3.7 of RFC 5646. Folding them into an RFC 5646bis
> would place them on a different definitional level from any future
> extensions that might be created, which would be a Bad Thing™.
>
> I assume the purpose in doing so would be to call better attention to the
> extensions, since RFC 5646 itself predates them and so cannot refer to
> them. The Wikipedia page on BCP 47 already describes the extensions and
> links to the extension RFCs, as does my page; perhaps the maintainers of
> other sites that link to RFC 5646 could be persuaded to link to those
> documents as well (and also to RFC 4647).
>
> > 2. The issue with using Glottolog codes is specifically that ISO 639-3
> > is scoped at the language level, while the informatic model of the
> > Glottolog is specifically scoped to the document level, rather than
> > the language level. That is, Glottolog codes can span the classes of
> > macro-language, idiolect, dialect, or hypothetical reconstructed
> > language.  The two scopes are inconsistent with regard to purpose. If
> > the purpose of the positions of the constructed BCP-47 tags is to stay
> > consistent, I don't see how Glottolog codes become a possibility
> > except after -x- or some other yet-to-be defined identifier.
>
> I agree wholeheartedly, but one does occasionally hear suggestions to do
> this. It’s almost as if 8,000 language subtags, plus variants and the
> ability to register more, plus private-use sequences, weren’t enough.
>
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>
>