Re: [Ietf-languages] Fwd: I-D Action: draft-msporny-d-langtag-ext-00.txt

Mark Davis ☕️ <mark@macchiato.com> Mon, 27 May 2019 13:53 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 14AE71200C4 for <ietf-languages@ietfa.amsl.com>; Mon, 27 May 2019 06:53:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.354
X-Spam-Level:
X-Spam-Status: No, score=-2.354 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.198, FREEMAIL_FROM=0.001, FROM_EXCESS_BASE64=0.979, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=macchiato-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EVLMomit3VIY for <ietf-languages@ietfa.amsl.com>; Mon, 27 May 2019 06:53:03 -0700 (PDT)
Received: from mork.alvestrand.no (mork.alvestrand.no [158.38.152.117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EE27D12006E for <ietf-languages@ietf.org>; Mon, 27 May 2019 06:53:02 -0700 (PDT)
Received: by mork.alvestrand.no (Postfix) id 5654A7C37CE; Mon, 27 May 2019 15:53:01 +0200 (CEST)
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by mork.alvestrand.no (Postfix) with ESMTP id 3153D7C37C4 for <ietf-languages@alvestrand.no>; Mon, 27 May 2019 15:53:01 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at alvestrand.no
Authentication-Results: mork.alvestrand.no (amavisd-new); dkim=pass (2048-bit key) header.d=macchiato-com.20150623.gappssmtp.com
Received: from mork.alvestrand.no ([127.0.0.1]) by localhost (mork.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lI7_Jivcr24q for <ietf-languages@alvestrand.no>; Mon, 27 May 2019 15:52:58 +0200 (CEST)
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
X-Comment: SPF skipped for whitelisted relay - client-ip=192.0.33.72; helo=pechora2.lax.icann.org; envelope-from=mark.edward.davis@gmail.com; receiver=ietf-languages@alvestrand.no
Received: from pechora2.lax.icann.org (pechora2.icann.org [192.0.33.72]) by mork.alvestrand.no (Postfix) with ESMTPS id 2621F7C3646 for <ietf-languages@alvestrand.no>; Mon, 27 May 2019 15:52:58 +0200 (CEST)
Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pechora2.lax.icann.org (Postfix) with ESMTPS id 7158E1E0119 for <ietf-languages@iana.org>; Mon, 27 May 2019 13:52:56 +0000 (UTC)
Received: by mail-oi1-x22d.google.com with SMTP id f4so11926809oib.4 for <ietf-languages@iana.org>; Mon, 27 May 2019 06:52:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=macchiato-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=dUru1EKsAKuxYSdrLdWA0gYUkBzajPyllD7gYpJbJz8=; b=gd7rCnLE90qz1v5hmZsQElnqe0Mmgo2GGQFwRKjKgm8BZl28czW0j1YFwWJSv0W2np kCGq6dLDt2neUIKN4Pmq6K/qSxKaD55nlhWQMrJM6scMKGx9eo7s6hv8DqfDZ2uKqF5/ C2RzzsyPsS4FuDud2HnSiq+e7HbuiZqy/NHOAvMdZYA0lwx66/2y4jHtjcPTmqA882ci lSuDQPIBODl8Ngr+abTjs3Lw9Qy2JhCbsot34qgWjMO/cu3+wopobOziBUBk2aFh7O5o UV3oAJHDfSYjx3P78WgUUbqBYc7LhCkzOuXde70bmtrn2rmbs+ivk6JXwk985/KrBGnn 3yZA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=dUru1EKsAKuxYSdrLdWA0gYUkBzajPyllD7gYpJbJz8=; b=t10Ip6wA4yJ02qx+Mo6Cd9QF21XOs7E1BbpXrH0nAM3EnMcchvPCwlbwhHvEdKmhVa DWRJwrFt8bRDNsFl+FMSOHfPd1H1AFIsSJlMi24FzRC3XooGLY3wNpzrRsxlbEWKf8/b uxaXiweRvqI5DjuNAun4Bs+5yx8PmlFiHLyebNhGXShrA1Mbzwt6ezjOzJnKOBpzfSrz 4S2N8q/5YLTgHaaWI/BHZWueA66LeMYtYlADToMdixth6V4gVz6YSSYft2UPFx7GP64g r9yuyhX8Xp1TqfVzZt/uiNcJbYVqA4+ASiGRdY/Nhhat2pzP4bthpQNN+vQVeAhlN4dY pXGQ==
X-Gm-Message-State: APjAAAWBTHovtHoV1KnEDhJ4EqDxG3x4fr3tqZ/Z8/ZONAgm/ITXvvfy UyXTVOHW32pBmi32fIUwAkbdqOjaNudZ9s5XnVc=
X-Google-Smtp-Source: APXvYqzUOIQMxtvfvpWjf/s6DNAEVNcX/Ug5opSavHIjaUFCDOxVRfA/4LwVVmu4q3gA24CWXEzIv7ow6JzVbjNrmSw=
X-Received: by 2002:a05:6808:98a:: with SMTP id a10mr151483oic.57.1558965155683; Mon, 27 May 2019 06:52:35 -0700 (PDT)
MIME-Version: 1.0
References: <155881874982.30992.4869767614014356043@ietfa.amsl.com> <49b6a1de-e016-514f-90e4-24703b5818d2@it.aoyama.ac.jp> <63b4f786-8b44-ecdf-ed33-ff0567ecc839@digitalbazaar.com> <000001d51425$a48ac140$eda043c0$@ewellic.org>
In-Reply-To: <000001d51425$a48ac140$eda043c0$@ewellic.org>
From: Mark Davis ☕️ <mark@macchiato.com>
Date: Mon, 27 May 2019 15:52:23 +0200
Message-ID: <CAJ2xs_EwKg3Tu5etk-ELXXd0u2Go-6TZbGm3QsBxV1upKTa8_g@mail.gmail.com>
To: Doug Ewell <doug@ewellic.org>, "Phillips, Addison" <addison@lab126.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, IETF Languages Discussion <ietf-languages@iana.org>
Content-Type: multipart/alternative; boundary="0000000000005265510589dedc51"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/PEVFPGwcmSixlOKvcOja_hBZcDY>
Subject: Re: [Ietf-languages] Fwd: I-D Action: draft-msporny-d-langtag-ext-00.txt
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 May 2019 13:53:06 -0000

Doug, I agree most of the points you are making, especially #1.

I think what they are trying to do is shoehorn in a parameter that lets
them set the paragraph embedding level (https://unicode.org/reports/tr9/#BD4)
for the Bidi Algorithm. But instead of that hack, as you point out, one can
deduce the direction from the language tag. The best way to do this is to
get the ordering from CLDR for an ordinary language tag like "ar" or
"ar-Arab".

1. So from the tag "ar-Arab", we get the script "Arab". Then use
https://github.com/unicode-org/cldr/blob/master/common/properties/scriptMetadata.txt,
which has a mapping from script to direction (RTL=YES). (I'm pointing to
trunk, just so people can read the file easily; one would use the latest
release.)

2. But let's suppose that you have just "ar". Since the script is not
explicit, the best way to get it is also CLDR. You can use
https://github.com/unicode-org/cldr/blob/master/common/supplemental/likelySubtags.xml,
which has a mapping from language or language+region to default
language-script-region. So "ar" => "ar_Arab_EG", from which we get the
script "Arab", and then use step 1. Or from "fr" you'd get "Latn" and map
it to RTL=NO.

A few more comments below.

On Mon, May 27, 2019 at 2:47 AM Doug Ewell <doug@ewellic.org> wrote:

> Manu Sporny wrote:
>
> > There is a time pressure here. Our i198n concerns have been hanging
> > out there for more than 9 months and our WG charter is up in a couple
> > of months. We need to wrap this up in 3 weeks. Or to put it another
> > way, if we don't wrap this up in 3 weeks, we won't be addressing this
> > issue, which would be a shame.
>
> I know it is flippant to say "that's not our problem," and I apologize in
> advance for that, but trying to push through this extension quickly,
> without consulting or even notifying the language-tagging community, does
> not seem to me an appropriate way to compensate for this lapse. It was only
> by chance that Martin happened to spot this I-D and was able to bring it to
> our attention.
>
> Apparently Addison did know about this effort, and is credited in the
> Acknowledgements section of the I-D, but it would be nice if the author(s)
> of an extension proposal would check in with ietf-languages as part of
> their effort. RFC 5646 does not require this; I wish it did. The IETF at
> large and W3C are not experts in this field, and probably will not be able
> to detect significant operational problems in such a proposal.
>
> > In any case, if you're going to engage in this discussion, the issue
> > #3 above is probably the place to do it.
>
> I believe THIS LIST is the place to discuss this I-D. (Definitely not on
> some GitHub account.)
>
> I have other questions and/or concerns, some of which overlap with
> Martin's:
>
> 1. In the proposal's lone example, the Arabic script is a right-to-left
> script. How does "ar-d-rtl" indicate right-to-left directionality in a way
> that "ar-Arab" does not?


> 2. Given #1, and given that the script subtag 'Arab' is a Suppress-Script
> for the language subtag 'ar' (which means "ar" is equivalent to "ar-Arab"
> for almost all purposes), how is "ar" not sufficient? I agree with Martin's
> comment here: what rendering process is likely to display Arabic
> left-to-right?
>

It isn't that Arabic would be displayed left to right, it is what
establishes the paragraph ordering. The problem arises when you have mixed
text. Look at the following example, using the convention that lowercase =
English and uppercase=Arabic. The majority of the text and the first strong
character are both English, but the sentence is meant to be used in an
Arabic environment, so the default paragraph embedding level needs to be
RTL.

rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz IS A LONG
WORD.

>
> 3. I also agree with Martin that the definition "automatically detected"
> for subtag 'auto' is not adequate. How does it differ from leaving off the
> D extension altogether?
>

Agreed, not well specified. But -d- is not needed in the first place, so
moot.

>
> 4. Scripts exist in other directionalities besides LTR and RTL. Chinese,
> Japanese, and Korean can be written top-to-bottom, right-to-left. Mongolian
> in Mongolian script is properly written top-to-bottom, left-to-right, but
> is sometimes (although incorrectly) rendered LTR as well. Some languages
> have been written boustrophedon, either with or without reversing the
> glyphs when transitioning from LTR to RTL. None of these scenarios are
> covered in the proposal, but some of them seem much more in need of
> explicit marking than the Arabic example.
>

While this is true, for the fast majority of cases, LTR and RTL are the
important issues. Most computer systems don't really handle vertical
natively; one needs to have more specialized text processing systems, and
that is not, I imagine, the target for this syntax.


> 5. Given #4, the lack of a registry for the proposed extension, or even
> the mention of one, is a significant problem. The set of exactly 3 values
> associated with this extension ('ltr', 'rtl', and 'auto') would be fixed;
> adding to it would require updating the RFC, which is much more work than
> updating a registry.
>

Agreed, that would be a major drawback.  But -d- is not needed in the first
place, so moot.


> Without these issues being addressed in a satisfactory way, I would lobby
> IETF not to approve this I-D.
>

I don't see that there is any reason to approve it, given that it is, as
far as I can tell, completely unnecessary and would just complicate
implementer's lives to no good end.

>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@ietf.org
> https://www.ietf.org/mailman/listinfo/ietf-languages
>