Re: [Ietf-languages] Adding prefixes with dialect variants to Occitan orthographic variants

Mark Davis ☕️ <mark@macchiato.com> Mon, 19 April 2021 21:16 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5CB973A277B for <ietf-languages@ietfa.amsl.com>; Mon, 19 Apr 2021 14:16:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.485
X-Spam-Level: *
X-Spam-Status: No, score=1.485 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_FONT_FACE_BAD=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.972, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=macchiato-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1YLXQ76xbi1V for <ietf-languages@ietfa.amsl.com>; Mon, 19 Apr 2021 14:16:18 -0700 (PDT)
Received: from mork.alvestrand.no (mork.alvestrand.no [158.38.152.117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 92B383A27BC for <ietf-languages@ietf.org>; Mon, 19 Apr 2021 14:16:05 -0700 (PDT)
Received: by mork.alvestrand.no (Postfix) id E1A7F7C6C3B; Mon, 19 Apr 2021 23:16:03 +0200 (CEST)
Delivered-To: ietf-languages@alvestrand.no
X-Comment: SPF skipped for whitelisted relay - client-ip=2620:0:2830:201::1:71; helo=pechora5.dc.icann.org; envelope-from=mark.edward.davis@gmail.com; receiver=ietf-languages@alvestrand.no
Received: from pechora5.dc.icann.org (pechora5.icann.org [IPv6:2620:0:2830:201::1:71]) by mork.alvestrand.no (Postfix) with ESMTPS id 9A03C7C6C28 for <ietf-languages@alvestrand.no>; Mon, 19 Apr 2021 23:16:03 +0200 (CEST)
Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pechora5.dc.icann.org (Postfix) with ESMTPS id 37A0270016E6 for <ietf-languages@iana.org>; Mon, 19 Apr 2021 21:16:02 +0000 (UTC)
Received: by mail-qk1-x730.google.com with SMTP id e13so27513210qkl.6 for <ietf-languages@iana.org>; Mon, 19 Apr 2021 14:16:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=macchiato-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7XteDNy55cxFb4i4JVvGCJta8J7WMuLuQtjrN2ytm00=; b=X9a/pYvRlGB8w5vIwdJrsYssk2ND/2Sv9jq+QX/mPr11dFcTTgvWOiJPTXlQdgo1VH t72CckwkgkxYtlfGgV+zTYJzes0YTj8zzmuj/S5XsDqu1UBmeqkLIG7XjVqvQ3MvQlZE eUDzESiR+lSK4BQtah3/hu2HY9n9PEEfK8GZAVxRP6XhntHq4PvRdbW5j0vKeKPTTwsj d6UloqdSVyfUkut6yh7mv8hx6GDAva8N/vknMSW3Tlv+RRC5tgnUoek+ynxMQM3AU2SE Gpe/rQQcHNhCrTR5whn0Jv9otyGKSFqGGb0h+M/gc2cNLjiyBcbnmINayiangCZscVeX bCCA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7XteDNy55cxFb4i4JVvGCJta8J7WMuLuQtjrN2ytm00=; b=hgYcFweMmpspu3X4NdGHnvaaSkbcM1TeCp+pxUEqHIl+QrVf8a20I61bbxlzn3CRTC lbesHeheGP7QQMVgjEYIdJSV/ZbRxl3Q6wP1mXUmVSSPHtAb9upVDDhSs0mkWfLlB8hJ VMsuEQ/UPexeLT+qEJtRnpVt+GKgDrWoFV4Uf+evDklx4Rg3X6+NvlrBumO3siZjMzMf TLuWPUBxPa+ltzdrUjui8dE8fwZ2+90phSv+OzLkOY9Gki61QBWvy4w98Lg18mHqo7te p4Oy82Fx/a3ET1V85pO5JOWPzUxsFLdlUeoRAhz6mKXtjm17AjHMYXA47bJ0+1hZOnQx BjtA==
X-Gm-Message-State: AOAM531Z5xVJGDxPXRMfFkwZi2ndxAkSOW2icwno67Go5FvJCS37XmZ6 4saEx61ebnUxNsIeaTCQUe3dIZ/JY0L1c5CPBLQ=
X-Google-Smtp-Source: ABdhPJxiqglxt++q/tFl/O8j3dzlozgPG1PsABnqUZXuSbTXqWi8hYbSbr5iKv7DbiWBqEby4WvYoy0VzyJyQ9A/Hc8=
X-Received: by 2002:a37:383:: with SMTP id 125mr196782qkd.221.1618866961858; Mon, 19 Apr 2021 14:16:01 -0700 (PDT)
MIME-Version: 1.0
References: <000001d733ec$108d19a0$31a74ce0$@ewellic.org> <e9d2b8203d174852b11f1b5c71329f78@EX13D08UWB002.ant.amazon.com> <000101d734c4$b7da4b50$278ee1f0$@ewellic.org> <CAJ2xs_HWqVCbGYdu3EcQPRHgTHVTaX6PsxxJG4hf=W3+-z1y8w@mail.gmail.com> <000201d73555$22fbd260$68f37720$@ewellic.org>
In-Reply-To: <000201d73555$22fbd260$68f37720$@ewellic.org>
From: Mark Davis ☕️ <mark@macchiato.com>
Date: Mon, 19 Apr 2021 14:15:49 -0700
Message-ID: <CAJ2xs_Ewa-XoHYPLoZSgP_Ov7_JA-bqhbVU4sJdbmznLoEtd=Q@mail.gmail.com>
To: Doug Ewell <doug@ewellic.org>
Cc: "ietf-languages@iana.org" <ietf-languages@iana.org>, info@locongres.org, b.dazeas@locongres.org, "Phillips, Addison" <addison=40lab126.com@dmarc.ietf.org>, David Mediavilla <nkd595qbd4@liamekaens.com>
Content-Type: multipart/alternative; boundary="00000000000033476405c059d66b"
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.2 (pechora5.dc.icann.org [0.0.0.0]); Mon, 19 Apr 2021 21:16:02 +0000 (UTC)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/3O7Rr1xLH5Zs6Dd4GBZkhH3r4j4>
Subject: Re: [Ietf-languages] Adding prefixes with dialect variants to Occitan orthographic variants
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Apr 2021 21:16:23 -0000

See message in other thread, also




Mark


On Mon, Apr 19, 2021 at 12:49 PM Doug Ewell <doug@ewellic.org> wrote:

> Mark Davis wrote:
>
> > However, since that time, truncation is far less of a concern, and the
> > bigger issue is that BCP47 doesn't provide a well-defined canonical
> > order of all prefixes. Without that, comparison of language tags is
> > fundamentally flawed.
>
> I agree with Addison that an ordering mechanism does exist, between the
> text of BCP 47 (Section 4.1, item 6) and the Prefix values in the Registry.
> There may be some gaps, making the mechanism less than perfect
> mathematically.
>

Sorting algorithms, for example, kinda depend on mathematical correctness.


>
> > So CLDR extends BCP47 to simply order the variants alphabetically. Any
> > application that cares about variants (and there are very few) can
> > treat those variants they care about simply as a set. Thus both
> > oc-gascon-phonipa and oc-phonipa-gascon are both interpreted by CLDR
> > as [language=oc, variantSet={gascon, phonipa}].
>
> However, processes that use BCP 47 but not CLDR need to work with what is
> in BCP 47, and that includes adding Prefix values to guide the use of
> multiple variants.
>

A general-purpose implementation that depends on an ordering of variants
that is not *required* by BCP47 is a fragile implementation. And note that
the canonicalization process in BCP47 does not put the variant tags in a
particular order. The incoming language tags sl-rozaj-biske and
sl-biske-rozaj are both equally valid.

>
> >> on which combinations might be valid
> >
> > I know what you mean there, but this would be better phrased as "on
> > which combinations are useful". All combinations of registered variant
> > subtags are valid; it is just that some of them are pointless.
>
> Yes, I was imprecise in my use of "valid", which is a term of art and
> needs to be used precisely. I meant combinations that would be "meaningful"
> in Occitan or have a snowball's chance of existing in non-artificial text
> samples, not "allowable" by the rules of BCP 47 or any other specification.
>
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>
>
>