Re: [Ietf-languages] Fwd: I-D Action: draft-msporny-d-langtag-ext-00.txt

"Phillips, Addison" <addison@lab126.com> Tue, 28 May 2019 00:37 UTC

Return-Path: <prvs=04484e173=addison@lab126.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4D1A21200EA for <ietf-languages@ietfa.amsl.com>; Mon, 27 May 2019 17:37:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.118
X-Spam-Level:
X-Spam-Status: No, score=-1.118 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id me7R2ami3XoF for <ietf-languages@ietfa.amsl.com>; Mon, 27 May 2019 17:37:02 -0700 (PDT)
Received: from mork.alvestrand.no (mork.alvestrand.no [IPv6:2001:700:1:2::117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 501A212008A for <ietf-languages@ietf.org>; Mon, 27 May 2019 17:37:02 -0700 (PDT)
Received: by mork.alvestrand.no (Postfix) id 9465F7C37FE; Tue, 28 May 2019 02:37:00 +0200 (CEST)
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by mork.alvestrand.no (Postfix) with ESMTP id 7B1987C37E9 for <ietf-languages@alvestrand.no>; Tue, 28 May 2019 02:37:00 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at alvestrand.no
Received: from mork.alvestrand.no ([127.0.0.1]) by localhost (mork.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nT13-rtaRXJK for <ietf-languages@alvestrand.no>; Tue, 28 May 2019 02:36:56 +0200 (CEST)
X-Greylist: delayed 00:06:58 by SQLgrey-1.8.0
X-Greylist: from auto-whitelisted by SQLgrey-1.8.0
X-Comment: SPF skipped for whitelisted relay - client-ip=2620:0:2d0:201::1:71; helo=pechora1.lax.icann.org; envelope-from=prvs=04484e173=addison@lab126.com; receiver=ietf-languages@alvestrand.no
Received: from pechora1.lax.icann.org (pechora1.icann.org [IPv6:2620:0:2d0:201::1:71]) by mork.alvestrand.no (Postfix) with ESMTPS id 90AA67C37CE for <ietf-languages@alvestrand.no>; Tue, 28 May 2019 02:36:55 +0200 (CEST)
Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pechora1.lax.icann.org (Postfix) with ESMTPS id 4F3A01E027F for <ietf-languages@iana.org>; Tue, 28 May 2019 00:29:53 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="5.60,520,1549929600"; d="scan'208,217";a="676681065"
Received: from sea3-co-svc-lb6-vlan3.sea.amazon.com (HELO email-inbound-relay-2b-baacba05.us-west-2.amazon.com) ([10.47.22.38]) by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 28 May 2019 00:29:31 +0000
Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan3.pdx.amazon.com [10.236.137.198]) by email-inbound-relay-2b-baacba05.us-west-2.amazon.com (8.14.7/8.14.7) with ESMTP id x4S0TQec015140 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Tue, 28 May 2019 00:29:29 GMT
Received: from EX13D08UWB003.ant.amazon.com (10.43.161.186) by EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 28 May 2019 00:29:28 +0000
Received: from EX13D08UWB002.ant.amazon.com (10.43.161.168) by EX13D08UWB003.ant.amazon.com (10.43.161.186) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 28 May 2019 00:29:28 +0000
Received: from EX13D08UWB002.ant.amazon.com ([10.43.161.168]) by EX13D08UWB002.ant.amazon.com ([10.43.161.168]) with mapi id 15.00.1367.000; Tue, 28 May 2019 00:29:28 +0000
From: "Phillips, Addison" <addison@lab126.com>
To: Mark Davis ☕️ <mark@macchiato.com>, Doug Ewell <doug@ewellic.org>
CC: Manu Sporny <msporny@digitalbazaar.com>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, IETF Languages Discussion <ietf-languages@iana.org>, "Richard Ishida (ishida@w3.org)" <ishida@w3.org>
Thread-Topic: [Ietf-languages] Fwd: I-D Action: draft-msporny-d-langtag-ext-00.txt
Thread-Index: AQHVEz6iwfqaq4Xs40KUuCBo/SSZZKZ8rSIAgADaFwCAAJ2RgIAA24qAgAA+4uA=
Date: Tue, 28 May 2019 00:29:27 +0000
Message-ID: <1f0cb20200f24ba58510ca2753705867@EX13D08UWB002.ant.amazon.com>
References: <155881874982.30992.4869767614014356043@ietfa.amsl.com> <49b6a1de-e016-514f-90e4-24703b5818d2@it.aoyama.ac.jp> <63b4f786-8b44-ecdf-ed33-ff0567ecc839@digitalbazaar.com> <000001d51425$a48ac140$eda043c0$@ewellic.org> <CAJ2xs_EwKg3Tu5etk-ELXXd0u2Go-6TZbGm3QsBxV1upKTa8_g@mail.gmail.com>
In-Reply-To: <CAJ2xs_EwKg3Tu5etk-ELXXd0u2Go-6TZbGm3QsBxV1upKTa8_g@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.43.162.38]
Content-Type: multipart/alternative; boundary="_000_1f0cb20200f24ba58510ca2753705867EX13D08UWB002antamazonc_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/f8nx8ajKdBlAn0voftmR-rXvSmw>
Subject: Re: [Ietf-languages] Fwd: I-D Action: draft-msporny-d-langtag-ext-00.txt
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 May 2019 00:37:05 -0000

Some background would be probably useful in this discussion. I’m playing catch-up here due to the U.S. holiday.

This is an outgrowth of a recent review of a number of specifications at W3C (in Manu’s case, specifically the Verifiable Claims spec). Like many current specs, VC uses JSON as a file format and depends on Linked Data specifications such as RDF and JSON-LD. The W3C I18N working group identified some time ago that these formats do not provide for metadata to set the base direction in the UBA and we’ve been raising the visibility of this—we have an open issue with the TAG and comments on a variety of specs.

Here are useful links for context: [1][2]. To document the requirements and current status Richard Ishida and I have been editing a W3C note “string-meta” [3] which might make a good explainer (send comments to our github).

The problem here is there are specs that need to progress in maturity, whose authors accept the need to manage base direction, but which have no way to do so with RDF/JSON-LD compliant serializations.

From the discussion below/elsewhere, I’m obviously aware of this I-D. However, I do not agree with it at present. I think that there are other, better ways to address this issue.

I’m not going to respond to the entirety of this thread because this discussion is better off happening at [1], in the RDF space, where it is actually a problem and I would urge those interested to decamp there.

Additionally, I am hosting a call on Wednesday at 1300 UTC. Details can be found at [1].

Thanks,

Addison

Addison Phillips
Sr. Principal SDE – I18N (Amazon)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.



[1] https://github.com/w3c/rdf-dir-literal/issues/3<https://github.com/w3c/rdf-dir-literal/issues/3#issuecomment-495937506>
[2] https://github.com/w3ctag/design-reviews/issues/178
[3] https://w3c.github.io/string-meta/ (or the official copy: https://www.w3.org/TR/string-meta)

From: Mark Davis ☕️ [mailto:mark@macchiato.com]
Sent: Monday, May 27, 2019 6:52 AM
To: Doug Ewell <doug@ewellic.org>; Phillips, Addison <addison@lab126.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>; Martin J. Dürst <duerst@it.aoyama.ac.jp>; IETF Languages Discussion <ietf-languages@iana.org>
Subject: Re: [Ietf-languages] Fwd: I-D Action: draft-msporny-d-langtag-ext-00.txt

Doug, I agree most of the points you are making, especially #1.

I think what they are trying to do is shoehorn in a parameter that lets them set the paragraph embedding level (https://unicode.org/reports/tr9/#BD4) for the Bidi Algorithm. But instead of that hack, as you point out, one can deduce the direction from the language tag. The best way to do this is to get the ordering from CLDR for an ordinary language tag like "ar" or "ar-Arab".

1. So from the tag "ar-Arab", we get the script "Arab". Then use https://github.com/unicode-org/cldr/blob/master/common/properties/scriptMetadata.txt, which has a mapping from script to direction (RTL=YES). (I'm pointing to trunk, just so people can read the file easily; one would use the latest release.)

2. But let's suppose that you have just "ar". Since the script is not explicit, the best way to get it is also CLDR. You can use https://github.com/unicode-org/cldr/blob/master/common/supplemental/likelySubtags.xml, which has a mapping from language or language+region to default language-script-region. So "ar" => "ar_Arab_EG", from which we get the script "Arab", and then use step 1. Or from "fr" you'd get "Latn" and map it to RTL=NO.

A few more comments below.

On Mon, May 27, 2019 at 2:47 AM Doug Ewell <doug@ewellic.org<mailto:doug@ewellic.org>> wrote:
Manu Sporny wrote:

> There is a time pressure here. Our i198n concerns have been hanging
> out there for more than 9 months and our WG charter is up in a couple
> of months. We need to wrap this up in 3 weeks. Or to put it another
> way, if we don't wrap this up in 3 weeks, we won't be addressing this
> issue, which would be a shame.

I know it is flippant to say "that's not our problem," and I apologize in advance for that, but trying to push through this extension quickly, without consulting or even notifying the language-tagging community, does not seem to me an appropriate way to compensate for this lapse. It was only by chance that Martin happened to spot this I-D and was able to bring it to our attention.

Apparently Addison did know about this effort, and is credited in the Acknowledgements section of the I-D, but it would be nice if the author(s) of an extension proposal would check in with ietf-languages as part of their effort. RFC 5646 does not require this; I wish it did. The IETF at large and W3C are not experts in this field, and probably will not be able to detect significant operational problems in such a proposal.

> In any case, if you're going to engage in this discussion, the issue
> #3 above is probably the place to do it.

I believe THIS LIST is the place to discuss this I-D. (Definitely not on some GitHub account.)

I have other questions and/or concerns, some of which overlap with Martin's:

1. In the proposal's lone example, the Arabic script is a right-to-left script. How does "ar-d-rtl" indicate right-to-left directionality in a way that "ar-Arab" does not?

2. Given #1, and given that the script subtag 'Arab' is a Suppress-Script for the language subtag 'ar' (which means "ar" is equivalent to "ar-Arab" for almost all purposes), how is "ar" not sufficient? I agree with Martin's comment here: what rendering process is likely to display Arabic left-to-right?

It isn't that Arabic would be displayed left to right, it is what establishes the paragraph ordering. The problem arises when you have mixed text. Look at the following example, using the convention that lowercase = English and uppercase=Arabic. The majority of the text and the first strong character are both English, but the sentence is meant to be used in an Arabic environment, so the default paragraph embedding level needs to be RTL.

rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz IS A LONG WORD.

3. I also agree with Martin that the definition "automatically detected" for subtag 'auto' is not adequate. How does it differ from leaving off the D extension altogether?

Agreed, not well specified. But -d- is not needed in the first place, so moot.

4. Scripts exist in other directionalities besides LTR and RTL. Chinese, Japanese, and Korean can be written top-to-bottom, right-to-left. Mongolian in Mongolian script is properly written top-to-bottom, left-to-right, but is sometimes (although incorrectly) rendered LTR as well. Some languages have been written boustrophedon, either with or without reversing the glyphs when transitioning from LTR to RTL. None of these scenarios are covered in the proposal, but some of them seem much more in need of explicit marking than the Arabic example.

While this is true, for the fast majority of cases, LTR and RTL are the important issues. Most computer systems don't really handle vertical natively; one needs to have more specialized text processing systems, and that is not, I imagine, the target for this syntax.


5. Given #4, the lack of a registry for the proposed extension, or even the mention of one, is a significant problem. The set of exactly 3 values associated with this extension ('ltr', 'rtl', and 'auto') would be fixed; adding to it would require updating the RFC, which is much more work than updating a registry.

Agreed, that would be a major drawback.  But -d- is not needed in the first place, so moot.


Without these issues being addressed in a satisfactory way, I would lobby IETF not to approve this I-D.

I don't see that there is any reason to approve it, given that it is, as far as I can tell, completely unnecessary and would just complicate implementer's lives to no good end.

--
Doug Ewell | Thornton, CO, US | ewellic.org<http://ewellic.org>


_______________________________________________
Ietf-languages mailing list
Ietf-languages@ietf.org<mailto:Ietf-languages@ietf.org>
https://www.ietf.org/mailman/listinfo/ietf-languages