Re: [Ietf-languages] Fwd: I-D Action: draft-msporny-d-langtag-ext-00.txt

Mark Davis ☕️ <mark@macchiato.com> Mon, 27 May 2019 19:07 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4BB4D120043 for <ietf-languages@ietfa.amsl.com>; Mon, 27 May 2019 12:07:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.373
X-Spam-Level:
X-Spam-Status: No, score=-1.373 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.198, FREEMAIL_FROM=0.001, FROM_EXCESS_BASE64=0.979, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_FONT_FACE_BAD=0.981, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=macchiato-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m5iMeMX_qb33 for <ietf-languages@ietfa.amsl.com>; Mon, 27 May 2019 12:07:50 -0700 (PDT)
Received: from mork.alvestrand.no (mork.alvestrand.no [158.38.152.117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 19E2012001A for <ietf-languages@ietf.org>; Mon, 27 May 2019 12:07:49 -0700 (PDT)
Received: by mork.alvestrand.no (Postfix) id 306697C37CE; Mon, 27 May 2019 21:07:48 +0200 (CEST)
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by mork.alvestrand.no (Postfix) with ESMTP id 179D67C37C4 for <ietf-languages@alvestrand.no>; Mon, 27 May 2019 21:07:48 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at alvestrand.no
Authentication-Results: mork.alvestrand.no (amavisd-new); dkim=pass (2048-bit key) header.d=macchiato-com.20150623.gappssmtp.com
Received: from mork.alvestrand.no ([127.0.0.1]) by localhost (mork.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4JsrBlVvht2D for <ietf-languages@alvestrand.no>; Mon, 27 May 2019 21:07:44 +0200 (CEST)
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
X-Comment: SPF skipped for whitelisted relay - client-ip=192.0.33.72; helo=pechora2.lax.icann.org; envelope-from=mark.edward.davis@gmail.com; receiver=ietf-languages@alvestrand.no
Received: from pechora2.lax.icann.org (pechora2.icann.org [192.0.33.72]) by mork.alvestrand.no (Postfix) with ESMTPS id 6BE877C3646 for <ietf-languages@alvestrand.no>; Mon, 27 May 2019 21:07:43 +0200 (CEST)
Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pechora2.lax.icann.org (Postfix) with ESMTPS id 802231E00FE for <ietf-languages@iana.org>; Mon, 27 May 2019 19:07:41 +0000 (UTC)
Received: by mail-oi1-x22d.google.com with SMTP id w9so12517916oic.9 for <ietf-languages@iana.org>; Mon, 27 May 2019 12:07:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=macchiato-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wtQpQO51Gac9QrojLMd6SXaSQe+jHPorKlk+Pu0bH4k=; b=iWjHjty+VFXsV25ERaC6LZbAN75yXJJacB2+BFz60xHb8Kl27ZvBHcbYMuvlrc/713 97gq19dKvO7ZRvirZitRt5XTAW3O+T1SbASGTmd4JZDVt9cokCNjKLn0Cpe1z50LLnYA 6Ug2UAk9528LBgB+z882BcGi0tTQxToh1WSwDPpRXS6KKtWsTCcO0ckubMLEGmKbojBR WCCCcx06GFpuMO4TXY/EBatv173pY8cQltTyuijxXsH20k0k+uS3pqQuUf9B13ItmYHN geNnOe9E/QF5dXLNNoPBJuLDJzOvLespARK+xDQIKtuWjGBtvf6hvCWKCHWN/AtprTNJ /T0w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wtQpQO51Gac9QrojLMd6SXaSQe+jHPorKlk+Pu0bH4k=; b=Z1+nvlDpYbvz2AWE+99LGzQwz5FYL0gEhucz2ozc0/4XkRd61rtOXI3ohBvHzou9ko 2nKwdnQyoWUi2udjGgfGtZ66mFutukexQvph1L8KV6RvHl0OQzBoDpyldV02qfIPLjEV sFyzLYgmzegUaJgEDjJ+11IkvWbBjbMkeYgdH/rse6VCI27mNjuXqU2NzCURrdRF3BAw CwHt7d1ga57aTdZJFLRWAqvR1uwhBjjNYIriPLGQ692ThYxxvqN+mKPZq6GUa7roOrng U5jO8L8pQMtm51oCqTI/kFjkiRdPSNt0/iG1r87p+USrkuehyHfEod0mR6wRNIP1H0Hb xJwA==
X-Gm-Message-State: APjAAAUesUdGyt5fbzPFEKzSxp3QV22z1O2EPIF0nHx98TwfVLTJMDkc z+M+7g3rRC0Y1FAGNHdrvze17vIXGmFu81bI7os=
X-Google-Smtp-Source: APXvYqzsiG6k8mjnvUx5nASKeTmZqAO0S/YjOfdTQ5CXbBUGhXrXp5WYpDfIdni/BvGvZz+1lp9HJkG9UF+fIxpLC14=
X-Received: by 2002:aca:b50b:: with SMTP id e11mr289736oif.53.1558984060609; Mon, 27 May 2019 12:07:40 -0700 (PDT)
MIME-Version: 1.0
References: <155881874982.30992.4869767614014356043@ietfa.amsl.com> <49b6a1de-e016-514f-90e4-24703b5818d2@it.aoyama.ac.jp> <63b4f786-8b44-ecdf-ed33-ff0567ecc839@digitalbazaar.com> <000001d51425$a48ac140$eda043c0$@ewellic.org> <CAJ2xs_EwKg3Tu5etk-ELXXd0u2Go-6TZbGm3QsBxV1upKTa8_g@mail.gmail.com> <000001d514ae$b15bdbf0$141393d0$@ewellic.org>
In-Reply-To: <000001d514ae$b15bdbf0$141393d0$@ewellic.org>
From: Mark Davis ☕️ <mark@macchiato.com>
Date: Mon, 27 May 2019 21:07:28 +0200
Message-ID: <CAJ2xs_GiwkqHPxsoW91ZbA82o1oosXNb=Hm2XOuKuEkMMcNBhA@mail.gmail.com>
To: Doug Ewell <doug@ewellic.org>
Cc: "Phillips, Addison" <addison@lab126.com>, IETF Languages Discussion <ietf-languages@iana.org>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Manu Sporny <msporny@digitalbazaar.com>
Content-Type: multipart/alternative; boundary="00000000000024a81d0589e343ce"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/2Zet-XPu_8rTMTZTf2hbFoARLSo>
Subject: Re: [Ietf-languages] Fwd: I-D Action: draft-msporny-d-langtag-ext-00.txt
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 May 2019 19:07:54 -0000

Below


On Mon, May 27, 2019 at 7:07 PM Doug Ewell <doug@ewellic.org> wrote:

> Mark Davis wrote:
>
> > 1. So from the tag "ar-Arab", we get the script "Arab". Then use
> >
> https://github.com/unicode-org/cldr/blob/master/common/properties/scriptMetadata.txt
> ,
> > which has a mapping from script to direction (RTL=YES). (I'm pointing
> > to trunk, just so people can read the file easily; one would use the
> > latest release.)
>
> Yes, all right, if you are a text process that renders or formats text,
> but you don't know that the Arabic script is RTL, then fine, use CLDR to
> tell you that. OK.
>
> > 2. But let's suppose that you have just "ar". Since the script is not
> > explicit, the best way to get it is also CLDR. You can use
> >
> https://github.com/unicode-org/cldr/blob/master/common/supplemental/likelySubtags.xml
> ,
> > which has a mapping from language or language+region to default
> > language-script-region. So "ar" => "ar_Arab_EG", from which we get the
> > script "Arab", and then use step 1. Or from "fr" you'd get "Latn" and
> > map it to RTL=NO.
>
> You don't need CLDR for this step. The BCP 47 Language Subtag Registry
> could have told you that:
>
> Type: language
> Subtag: ar
> Description: Arabic
> Added: 2005-10-16
> Suppress-Script: Arab      ← THIS
> Scope: macrolanguage
>
> I mean, jeez, that's why we invented Suppress-Script in the first place.
> (Well, sort of. We invented it to go the other way, when you're *creating*
> a tag, so when you're tempted to write "ar-Arab" you can stop and just
> write "ar" instead. But this scenario works just as well.)


> If you have a more complex matching scenario in mind, involving region and
> variant subtags, or crossing over from Breton to French or whatever, then
> go ahead and use CLDR. Or, I suppose, you can also use it here if you
> already didn't know Arabic was a RTL script in #1 and you had to have CLDR
> available for that anyway.
>
> This is a lot like RFC 4647 matching: it isn't good enough for all
> scenarios, so you might need customized or more ambitious matching
> algorithms for those, but for many simpler cases it's just fine. But it
> DOES exist and there's no need to pretend it doesn't and invent another
> basic scheme.
>

There are Suppress-Script values for 134 languages, while there are 1,320
language-script values in CLDR; plus it has cases where the best script
depends not only on both the language and the region (eg az-IQ).

If someone wants to limit their usage to what's in Suppress-Script, it is
up to them. But my advice (which is what I was providing) would be to use
the more complete set in CLDR.


>
> > It isn't that Arabic would be displayed left to right, it is what
> > establishes the paragraph ordering. The problem arises when you have
> > mixed text. Look at the following example, using the convention that
> > lowercase = English and uppercase=Arabic. The majority of the text and
> > the first strong character are both English, but the sentence is meant
> > to be used in an Arabic environment, so the default paragraph
> > embedding level needs to be RTL.
>
> It still doesn't sound as though inserting 'ltr' or 'rtl' into the
> language tab would solve this problem. We have Unicode bidi controls for
> this. (I know HTML has its own tags for this, which are preferred in that
> context, but the GitHub thread indicates this isn't mostly about HTML.)
>
It wouldn't, and as I said (how did you miss this?), I agree that having
-d- is a bad idea.

What I was explaining was that "what rendering process is likely to display
Arabic left-to-right?" is not the right issue. You don't need to know the
paragraph embedding level to display pure Arabic. If you are curious about
the BIDI algorithm, you can consult the spec, or better yet, the W3C page
on it.


> >> 4. Scripts exist in other directionalities besides LTR and RTL...
> >
> > While this is true, for the [v]ast majority of cases, LTR and RTL are
> > the important issues. Most computer systems don't really handle
> > vertical natively; one needs to have more specialized text processing
> > systems, and that is not, I imagine, the target for this syntax.
>
> Today, yes. N years from now, when these specialized text processing
> systems become mainstream, maybe it will be a different story. You don't
> want to lock out that possibility by hard-coding the set of allowable
> values into the RFC for all time.
>

Again, -d- is a bad idea anyway, whether or not there is a registry, or
that there are more values at the onset. There are bigger fish to fry.

>
> > I don't see that there is any reason to approve it, given that it is,
> > as far as I can tell, completely unnecessary and would just complicate
> > implementer's lives to no good end.
>
> Agreed, obviously.
>
> What we need to do is get together with the folks on the GitHub thread and
> explain the situation to them, how this proposed solution is neither
> necessary nor sufficient, and show them the right way(s) to do what they
> need.
>

I think Addison and Manu will probably be taking feedback from this list
back into the GitHub thread.


> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
>