Re: [art] Artart last call review of draft-ietf-core-problem-details-05 (minor correction)
"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Thu, 23 June 2022 07:00 UTC
Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: art@ietfa.amsl.com
Delivered-To: art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 419DDC157B33; Thu, 23 Jun 2022 00:00:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.787
X-Spam-Level:
X-Spam-Status: No, score=-3.787 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-1.876, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=itaoyama.onmicrosoft.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9-bVxr74uIXm; Thu, 23 Jun 2022 00:00:34 -0700 (PDT)
Received: from JPN01-OS0-obe.outbound.protection.outlook.com (mail-os0jpn01on2104.outbound.protection.outlook.com [40.107.113.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 455A2C15791D; Thu, 23 Jun 2022 00:00:07 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VnXy6ZacfzQvKv4i8UAHo57IYwapSJzEonCd7Lxw7UAH4d4inPq8HwOl64xehrMf/z7FYr/TjENlL+TQyY9sIlV986LYay2UNdhEjHWlaYUb1ILxm2/Oo/ubjPkMAQ/ysDkXP0LBK5zLlxX9Q2xlZtlENLukHq+k1lnEQrRfg5rNRHP+6A7FhP2DNlL+C8t5ENhoMF85o1ChcfuN4e5FheUe2vFb/6VSjYMPPqmeQaHWlX9U5Vo2jReTUhlUrWh3loQVn4NNXXUuH2m3wPnJNwIOCcQxjNthzGWFWMTak5rXMi0A+jmdGsDA3BtRTG6Sg/Logxgku0/3MKslOhiKlg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=O9m1N8SVdNgIdnKaCBNYp7WD2lpmP9NEUQQ9jqPRszo=; b=QqkRr5mDXivIJdJV9GxljdkC3tOm4B22Hb1BQpTDwGBl0+FaGZHtYdQnuAvKmstwGuUZMZyoEb8ughHbbqeULKHhOv4JwErfReEAWoWay6SXljbVH573Dk4gyAv3DXBjo2U4tpfZ7QoRRtPmZpxzDudrYu6+USLNtXiGUznlZJnTNU6rAP+jAFFm1OuN4NTgeYwq2NExXDMX3MpB1C5NAexLobKJqgLnvNHt1vU4PMfo0uTuT2q2oXqklQ/iXzmKxHaMnikfWH+gdzZXQv2l9B7OJmAM/qPWwKXbHcd5hBE5cST7n5hhJRQBqhOUtjJFfJvgZToqqGsbaEG+GirXrg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=it.aoyama.ac.jp; dmarc=pass action=none header.from=it.aoyama.ac.jp; dkim=pass header.d=it.aoyama.ac.jp; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itaoyama.onmicrosoft.com; s=selector2-itaoyama-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=O9m1N8SVdNgIdnKaCBNYp7WD2lpmP9NEUQQ9jqPRszo=; b=Pxx0LnKJLZIjCMt5A8fkm0vZuee8pw+ZMblpaxu6ri4jowZUNWtzsvCm71zrdO4VSm7u2nwn2kuzq6lz/rTBUlydrnIjMh/0q9LvrZcIRwLB/Ns7Fm1J5BYm/H6J+zN1fvWll2vuju4qCLIcKy0KjXC1aToaoZjJxJfaCmJXHl4=
Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=it.aoyama.ac.jp;
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7) by OSAPR01MB2322.jpnprd01.prod.outlook.com (2603:1096:604:8::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.16; Thu, 23 Jun 2022 07:00:02 +0000
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::e587:9d9a:d780:ef39]) by TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::e587:9d9a:d780:ef39%6]) with mapi id 15.20.5353.022; Thu, 23 Jun 2022 07:00:01 +0000
Message-ID: <cf39dd0a-3da5-0088-8716-b326031ec253@it.aoyama.ac.jp>
Date: Thu, 23 Jun 2022 16:00:00 +0900
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0
Content-Language: en-US
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
To: Carsten Bormann <cabo@tzi.org>, Harald Alvestrand <harald@alvestrand.no>
Cc: art@ietf.org, core@ietf.org, draft-ietf-core-problem-details.all@ietf.org, last-call@ietf.org
References: <165511479760.19573.12671700576299137749@ietfa.amsl.com> <63D13796-758D-469B-AFA8-3050C9F87819@tzi.org> <dde9d36c-61e5-afcc-e15a-787c99d5fba9@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
In-Reply-To: <dde9d36c-61e5-afcc-e15a-787c99d5fba9@it.aoyama.ac.jp>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-ClientProxiedBy: TYCP286CA0045.JPNP286.PROD.OUTLOOK.COM (2603:1096:400:29d::19) To TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7)
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 538f5db0-1290-4e6f-0a74-08da54e5fe5c
X-MS-TrafficTypeDiagnostic: OSAPR01MB2322:EE_
X-Microsoft-Antispam-PRVS: <OSAPR01MB23229ADCE4695C7B10FA16E7CAB59@OSAPR01MB2322.jpnprd01.prod.outlook.com>
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: F1k9qTW1L5682vDpviN4McSBUj+1oVedDdukONEK1AGOhuw+3ktdLRWpz6Hk/KKcB9LhbRUI0ipLm9+y2jdE5WV29PjmVUHM/XAWbb0T0aUiJjYCHLJPTMN7pzqEuhF9hfmuUP0Dmu5b1fGRSZUOVyCwV8idkbi+ALAMaWyAcn7MWkQ0HLrkCPTDyWdcyXhqu1wa1QKw4IAuMTbeCKvKBLk1RcGW8TvYP9lPv9EHRtXyWE+PZS8Zc458Qg5iFtAjTKVypEcGslXtPFQencpcB14NEMYFxNLbfPyOAvlwxvcQU47H1I8F+wNYQ2Zmx4WVUIGUKYyWT0CwHRBPUnc3UNhGMyKxNde1iQ778sG4zhTGJ9lGYlUEOpm8VEtaYx/i+zWeJ1xdyzJ61I/GuxGd/b0A6LBSiEcC451LuioPuC+I1sQPU+33r/aCx0PFfq0tBSUy/BL0LiPCF/i9awno2lYXN1Zl/EUSGqiSnhn8tIw7VQdoirXMO2zQ+2EAvpHP9OAVkjWgj5O8ZtQTlpTiZEyszJcmqiiH+e4gx/qon+CVAaXTlGsZ7XAUhqPuV9mQ/JvwabTIbZa3bMwgh34sJIF7BaVxSh3SG//FbRIrDLM9alPLEO38rUMxIvpOxeCHgSvypYKo0BdtkSP1h6QhNNKbABcToKSMGDPdZxPLJ7IHZVlziG5LNTrxqiTDHZontnrxqUcCHSL/ueMjuTouQ7WPIgiO53B3H7fh0gp2BZip5moMjYMp2PwtGs0k76BI2Zoy4FrhqtWwNjgye9+zu0PEd+wPlt2CqrIkrQhKMCnYLzCL85KJk+Q31p0/d1fmGj4qJC3Z/Ty7405YZb7Nq0UlekmDyGEdstSKgKLfso4=
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB5689.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(39850400004)(396003)(346002)(376002)(366004)(136003)(2906002)(41320700001)(8936002)(83380400001)(8676002)(66946007)(66556008)(66476007)(4326008)(31686004)(30864003)(5660300002)(66574015)(53546011)(52116002)(6506007)(6512007)(26005)(36916002)(786003)(316002)(110136005)(478600001)(966005)(41300700001)(31696002)(38100700002)(6486002)(38350700002)(2616005)(86362001)(186003)(45980500001)(43740500002); DIR:OUT; SFP:1102;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: NSj29bfcyLV81T4Dx1pqimPhq1Aa8mscKHbtH9uJid6SzarZbxIgz2qWCpf5Nvp7fgKuXN8FB6h3+O3KRY/c/bCe/pu7Ltx+BK0EKJsKzy9YM+1ChgLYh3JYiZMU3wOpTtDEthcLiSTZVXkWCfDVIzbW8oHILr7Oty+sSOPFGdzEq/zsWJY27/vEvZ/QLaOOMtdZZzt3iY49KSw9Y+vsp+6eZLydhIX856BTPP4//G81CFax6Ovvo89/N8NkFAGYMbj8LScaCUWTCt5N0kE2/LBLvmN3JLZtzxFhkZriJrA8K3/gFOVzXSHHRekFo/SHT5kEk2fg46F4bwI/XKTxD6KLBTO3Xk+j2GiiIKCUnPumtuYVvLuPHs/XrF2GKDUwpBmd9KS1qAF0obUiJON+3bywFRh9/yQCXkQeroiYh9v6VhVWbqQoC4/PzcU8OvBXCGKOzb/0dVuBf9DsfSAWpSwmMTa2Stqvlx3twgPfkIXc/i4A4hK22uh5LG5E7IaU8TrjxPTxRGWnX85Dsdl2/QYj+X0A6AjugbSF23vntmxN0Q9e5r0oBnfA3ttKU9OfL1afTFB3voQWaNOWclrssB4Z2gAj+fLkPrcCWtvulUdBtGeQFFpz5OnY14hPxzOLk7xXyFFigUrOKIFpph/h2P/CrSOcV2RCFq7XiY2CALyPKGGwQmc7WlrHoTG3J9IuWIW1bX5vm6ZOTf8gbkAP45fcAtHgACjQMfGs3XIwaTNWQWgdhKA7ejbnBR3Ons25
X-OriginatorOrg: it.aoyama.ac.jp
X-MS-Exchange-CrossTenant-Network-Message-Id: 538f5db0-1290-4e6f-0a74-08da54e5fe5c
X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB5689.jpnprd01.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Jun 2022 07:00:01.8972 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: e02030e7-4d45-463e-a968-0290e738c18e
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 7F3T9wc8DXn/mPA2u4MnPVYT4kbbjs4NQkrnQoAyTkHn45m9hn/KG8blIokuHBSMpJxEPkVM+xWdtNwvgEMXgQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: OSAPR01MB2322
Archived-At: <https://mailarchive.ietf.org/arch/msg/art/-yBJpMtl2DljVMJ3rLoZFce9Qyw>
Subject: Re: [art] Artart last call review of draft-ietf-core-problem-details-05 (minor correction)
X-BeenThere: art@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Applications and Real-Time Area Discussion <art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/art>, <mailto:art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/art/>
List-Post: <mailto:art@ietf.org>
List-Help: <mailto:art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/art>, <mailto:art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Jun 2022 07:00:38 -0000
[just a very minor correction to my comments below: The heading "Directionality Information" should be moved down, just below the text "boil that ocean".] Regards, Martin. On 2022-06-23 15:47, Martin J. Dürst wrote: > Dear Core and I18N experts, > > Some comments on the I18N aspects of Tag 38 below. > > [Sorry this answer took so long, and got so long. The two 'long's > influenced each other :-).] > > On 2022-06-16 01:23, Carsten Bormann wrote: >> >> Hi Harald, >> >> thank you for this thoughtful review. > >>> The “Tag 38 internationalized string” >>> This document adds an appendix defining an “internationalized string” >>> format >>> that adds a BCP 47 language tag and an Unicode-based direction >>> indicator to an >>> UTF-8 string. This is laudable; RFC 2277 section 4 pointed out the >>> need for >>> this ability 24 years ago. > > I think that Language-Tagged Strings (CBOR Tag 38, > https://datatracker.ietf.org/doc/html/draft-ietf-core-problem-details-06#appendix-A) > are a very good step ahead. At least for CBOR, in many cases from now > on, the answer might just be "use Tag 38" (assuming we get the details > right). > > >>> Unfortunately neither definition is problem-free. >>> >>> First of all, this tag, if useful at all, is of far greater utility >>> than the >>> error format. Burying it in an appendix of a document whose stated >>> purpose is >>> something else makes it far more difficult to refer to than it needs >>> to be. >> >> That is usually not a problem. The focal point for finding a CBOR tag >> for a specific application is the CBOR tag registry; this then points >> to the places where the specifications for the tags can be found >> (which in this case is easily expressed as “Appendix A of RFC XXXX”). > > Separate Draft or Not > ===================== > > I agree with Harald that it should be a separate draft; it would > definitely help with visibility of I18N in general and the issue of > strings with language and directionality information inside and outside > the IETF (not only the visibility within the CBOR community, which may > be covered by the tag registry). Being able to say "look at RFC XXXX for > a good example" is way better than being able to say "look at appendix X > of RFC YYYY for a good example". > > I understand Francesca's arguments, too, but I think the investment in a > separate draft would be well worth the effort. I'm willing to contribute > although I guess that Carsten would do the necessary work in less time > than it takes him to get anybody else up to speed. > > >>> Second, the “detailed semantics” has chosen to include the quite >>> complex BNF of >>> RFC 5646 translated into CDDL; this may have some use, but BCP 47 is >>> a moving >>> target; >> >> We intend tag38 to be useful for the current form of BCP 47, so it is >> hard to plan for the future. If BCP 47 needs to be considered >> unstable, we could of course define a “bcp47-extension” alternative >> with a CDDL feature control operator. > > (NOT!) Copying BCP 47 Grammar > ============================= > > I also agree with Harald that the definition of 'Language-Tagged > Strings' has room for improvement. First, as Harald said, it repeats the > BCP 47 grammar when we very well know that repeating grammars is usually > a bad idea. I'm really not sure why CBOR wants to check each and every > detail of the current language tag syntax. My understanding was that > CBOR was (among else if not primarily) for constrained devices. I just > cannot see the motivation of embedding a list of legacy tags into a > constrained device. > > I also don't know about other technology on a similar level as CBOR that > would do so. As an example, XML had productions 33-38 (see > https://www.w3.org/TR/1998/REC-xml-19980210#sec-lang-tag), but they were > removed as early as 2000 (see > https://www.w3.org/TR/2000/REC-xml-20001006#sec-lang-tag), for very good > reasons. I really have difficulties to imagine why CBOR would want to > make the same mistake that XML fixed more than 20 years ago. > > Similarly, XML Schema Datatypes only gives a very simple regular > expression ([a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*) and notes > (see https://www.w3.org/TR/xmlschema11-2/#language): > > [[[[ > Note: The regular expression above provides the only normative > constraint on the lexical and value spaces of this type. The additional > constraints imposed on language identifiers by [BCP 47] and its > successor(s), and in particular their requirement that language codes be > registered with IANA or ISO if not given in ISO 639, are not part of > this datatype as defined here. > ]]]] > Again, XML Schema would have done something more precise if anybody had > been convinced that such precision made sense. > > > Another way to see this is that in general, when giving restricting > syntactic rules, there's the question of "bang for the buck". The > complexity of the language tag syntax rules, down to the legacy > (grandfathered) stuff, mean that the cost ("buck") is quite high. This > not only includes implementation and memory footprint, but also testing > and everything else. > > On the other hand, the "bang" is quite low, because of two reasons: > First, without a check against the registry, a lot of garbage still can > go through. Think e.g. "en-UK", which looks reasonable and fits the > grammar, but is not allowed (UK is not a country code, "en-GB" is > correct). Second, most actual language tags, in particular for > constrained devices, are more on the level of "fr" or "en-US", which > means that on most actual data, the full syntax isn't really exercised. > Which further means that software with implementation bugs in the syntax > testing part doesn't get weeded out. > > The main mechanisms (if any) that will help to make sure these language > tags are correct are the following: > 1) On the 'sender' side, texts will be translated, by "hand" or using > some localization tools, and the correct language tags will be set there > (because somebody translating to Ukrainian, or their tool, knows the > correct tag is "uk", and not something else). > 2) On the 'receiver' side, user preferences will be expressed as > language tags (or prefixes,...), which should assure that correctly > tagged data gets shown and incorrectly tagged data gets ignored. > > To summarize, copying the grammar from BCP 47 brings extremely little > bang for rather high costs. Get rid of it in the same way other > standards which have thought this through have gone rid of a detailled > grammar. If you want something that gives you a minimal plausibility > test (catch cases where e.g. the text and the language tag got swapped > by some accident,...), do what XML Schema did. > > This will also be future proof. There are many changes to BCP 47 that > have been discussed in the past (although none of these got traction, or > are expected to get traction in the near future), but changing the basic > syntax constraint expressed by XML Schema was never considered an > option. On the other hand, it was always clear to the people involved > that users of language tags shouldn't create artificial barriers to > future changes. It would be really a pity if CBOR created such a barrier > just because they could. Things such as "CDDL feature control operators" > are great where they actually serve a purpose, here I don't think they > would. > > > Directionality Information > ========================== > > Regarding language tags, in addition, there is the following note: > [[[[ > NOTE: The Unicode Standard [Unicode-14.0.0] includes a set of > characters designed for tagging text (including language tagging), in > the range U+E0000 to U+E007F. Although many applications, including > RDF, do not disallow these characters in text strings, the Unicode > Consortium has deprecated these characters and recommends annotating > language via a higher-level protocol instead. See the section > "Deprecated Tag Characters" in Section 23.9 of [Unicode-14.0.0]. > ]]]] > It's weird for the IETF to refer (only) to the Unicode standard here > even though the IETF has deprecated this kind of language tagging in RFC > 6082. (see https://www.rfc-editor.org/rfc/rfc6082.html) So please cite > that RFC. > > >>> having CDDL parsers try to validate tags according to this grammar is >>> not going to be useful. If included at all, this needs to be clearly >>> marked >>> with text saying that BCP 47 is normative for this grammar, and that >>> language >>> tag parsers should NOT try to reject tags based on this grammar; >>> instead, they >>> should be treated as strings, and looked up against relevant language >>> handling >>> APIs. (“zh-ZZ” is perfectly valid according to the grammar, but is >>> semantically >>> invalid according to BCP 47). >> >> Here again, it is hard to capture semantics in a structural definition. >> Our document is going to reference RFC 5646 (including its ABNF), as >> that is the current definition; if BCP 47 is updated, the effect of >> that update on this document will need new consideration. > > No, please. I understand that in some areas, you don't want to allow > gratuitous changes to your network and software based on changes to > technology that you use. But for language tags, such a mindset is really > counterproductive. Some of the changes to BCP 47 that have been > discussed are to include some subtags for dialects. Now if such a change > happened, there are two questions relevant for CBOR: > 1) How many cases would there be in the CBOR landscape where people > would want to use such subtags? The answer would probably be: Very few, > so a change (using a "CDDL feature control operator" or whatever) would > have very low priority. But why should people be prohibited from using > such subtags if they want to use them? > 2) What's the problem in letting such subtags though the current > infrastructure? My guess is that there's no problem at all. When there > are parallel texts, one tagged with "en-US" and the other with one of > these dialect subtags, the chance is very high that a recipient will be > displaying the former. Would that be a problem? > > >>> Note also that the sentence “Data items with tag >>> 38 that do not meet the criteria above are invalid (see Section 5.3.2 of >>> [STD94]).” is really hard to parse semantically, given that section >>> 5.3.2 of >>> RFC 8949 doesn’t use the word “invalid”, it uses “inadmissible >>> value”. I do not >>> recommend rejecting unknown language tags. >> >> They may not be rejected, they are just not “valid” in RFC 8949 sense >> (they are still well-formed). I would expect language tags to evolve >> within the grammar defined by RFC 5646 (which does have an extension >> point); it that is a mistaken assumption, please let us know. > > In the short term (my average guess at "short term" would be 10 years or > so), evolution *within* RFC 5646 is definitely the main focus. In the > really long term, I guess anything that fits the XML Schema production > is fair game. That restriction has been there since the original RFC > 1766, and provides some actual "bang for the buck". It is also baked in > into technologies such as XML Schema which would provide a very strong > argument to not give up on it. In all the work on revising RFC 1766 > (which I co-chaired, and which was quite long-winded), changing the rule > that each subtag had to be 8 characters or less was never strongly > disputed at all. > > >>> Thirdly, the definition of the tri-state direction attribute can be made >>> clearer; in particular, the Unicode Bidirectional Algorithm (UAX#9) >>> should be >>> referenced, with particular reference to >>> https://www.unicode.org/reports/tr9/tr9-44.html#Markup_And_Formatting >>> - the >>> important property here is that the desired semantic is isolation - >>> the markup >>> is intended to have zero influence on strings outside the embedded >>> string - the >>> semantics of embedding in RLI…PDI is the desired effect. >> >> Tag38 does not provide a way to handle embedding, so we are not trying >> to boil that ocean yet. > > Again, I agree with Harald here. But first, please be careful. > "embedding" has a very narrow technical meaning in the Bidi Algorithm > (UAX #9). Tag 38 doesn't need a way to handle embeddings in this sense. > When Harald used the term "embedded string", he didn't use "embedded" in > this very narrow technical sense, but in a more general sense, namely > that the string from Tag 38 is expected to be put into some > (surrounding) context. That might mean that it shows up by itself > somewhere, or that it gets included in a larger text of some sorts. > > In the draft, you have the following text: > [[[[ > The optional third element, if present, is a Boolean value that > indicates a direction: false for "ltr" direction, true for "rtl" > direction. If the third element is absent, no indication is made > about the direction; it can be explicitly given as null to express > the same while overriding any context that might be considered > applying to this element. Note that the proper processing of > Language and Direction Metadata is an active area of investigation; > the reader is advised to consult ongoing standardization activities > such as [STRING-META] when processing the information represented in > this tag. > ]]]] > > [override is also a technical term in the Bidi Algorithm] > > I think this text is very important, so I'll got into some details. > First (minor nit), it says "If the third element is absent ...". Because > this is in a paragraph that starts with "The optional third element > ...", I think it would better say "If this element is absent ...". > > Next, let me make sure that I get this right: This is a Boolean value, > but it can in effect have four different states, yes? That would be: > - True (rtl) > - False (ltr) > - null (no indication about direction, but overriding any context) > - absent (no indication about direction, but context may apply) > If that's true, then it might be good to put that into a more structured > from (something like the above list). > > [very major point] The main problem is with the last sentence. There's > not much of a point in defining a field for directionality if it's not > clear what that is supposed to be used for. I'm also not sure where the > claim "the proper processing of Language and Direction Metadata is an > active area of investigation" came from, and why it is here. > > It is true that some areas of bidi processing (e.g. the best consistent > way to display IRIs that contain pieces of text from both > directionalities) that are not solved yet, or even (as the example a > line ago) are not even actively being investigated because the general > agreement is that the problem is too difficult to have a solution. > It is also true that "Strings on the Web: Language and Direction > Metadata" (https://www.w3.org/TR/string-meta/) is still in Draft status. > > But neither of these facts should have to influence the specification of > Tag 38. [StringMeta] (3.4 What consumers need to do to support > direction, https://www.w3.org/TR/string-meta/#what_consumers_do), Harald > and I all agree about what the right thing to do is: Use Bidi isolation > (in the technical sense of > https://www.unicode.org/reports/tr9/#Explicit_Directional_Isolates). > > So given all the above considerations, what about rewriting the > paragraph under consideration along the following lines: > > [[[[ > The optional third element, if present, is a Boolean value that > indicates a direction, as follows: > - false: LTR direction. The text is expected to be displayed > with LTR base direction if standalone, and isolated with LTR > direction (enclosed in RLI ... PDI or equivalent, see [1]) in > the context of a longer string or text. > - true: RTL direction. The text is expected to be displayed > with LTR base direction if standalone, and isolated with RTL > direction (enclosed in LRI ... PDI or equivalent, see [1]) in > the context of a longer string or text. > - absent: no indication is made about the direction > - (explicit) null: no indication is made about the direction, > but any directionality context applying to this element (e.g., > base directionality information for an entire CBOR message or > part thereof) is ignored. > ]]]] > [1] Unicode® Standard Annex #9, Unicode Bidirectional Algorithm, Section > 2.7 Markup and Formatting Characters, > https://www.unicode.org/reports/tr9/#Markup_And_Formatting > > I'm not really sure yet about the 'absent' and 'null' entries, neither > if they are really distinct nor whether the specification is good enough > (we might want to specify FIRST STRONG ISOLATE semantics). > > > Hope this helps. Let's make sure together that we get this right. > > Regards, Martin. > > _______________________________________________ > art mailing list > art@ietf.org > https://www.ietf.org/mailman/listinfo/art -- Prof. Dr.sc. Martin J. Dürst Department of Intelligent Information Technology College of Science and Engineering Aoyama Gakuin University Fuchinobe 5-1-10, Chuo-ku, Sagamihara 252-5258 Japan
- [art] Artart last call review of draft-ietf-core-… Harald Alvestrand via Datatracker
- Re: [art] Artart last call review of draft-ietf-c… Carsten Bormann
- Re: [art] Artart last call review of draft-ietf-c… Francesca Palombini
- Re: [art] Artart last call review of draft-ietf-c… Martin J. Dürst
- Re: [art] Artart last call review of draft-ietf-c… Martin J. Dürst
- Re: [art] Artart last call review of draft-ietf-c… Ira McDonald
- Re: [art] [Last-Call] Artart last call review of … Carsten Bormann
- Re: [art] [Last-Call] Artart last call review of … Ira McDonald
- Re: [art] [Last-Call] Artart last call review of … Carsten Bormann
- Re: [art] Artart last call review of draft-ietf-c… Carsten Bormann
- Re: [art] [Last-Call] Artart last call review of … Ira McDonald
- Re: [art] [Last-Call] Artart last call review of … John C Klensin
- Re: [art] [Last-Call] Artart last call review of … Martin J. Dürst
- Re: [art] [Last-Call] Artart last call review of … tom petch
- Re: [art] [Last-Call] Artart last call review of … Carsten Bormann
- Re: [art] [Last-Call] Artart last call review of … Carsten Bormann
- Re: [art] [Last-Call] Artart last call review of … tom petch
- Re: [art] [Last-Call] Artart last call review of … Ira McDonald
- Re: [art] [Last-Call] Artart last call review of … tom petch
- Re: [art] [Last-Call] Artart last call review of … Martin J. Dürst
- Re: [art] [Last-Call] Artart last call review of … Martin J. Dürst
- Re: [art] [Last-Call] Artart last call review of … Harald Alvestrand
- Re: [art] [Last-Call] Artart last call review of … John C Klensin
- Re: [art] [Last-Call] Artart last call review of … Carsten Bormann
- Re: [art] [Last-Call] Artart last call review of … Carsten Bormann
- Re: [art] [Last-Call] Artart last call review of … John C Klensin
- Re: [art] [Last-Call] Artart last call review of … John C Klensin
- Re: [art] [Last-Call] Artart last call review of … tom petch
- Re: [art] [Last-Call] Artart last call review of … Martin J. Dürst
- Re: [art] Artart last call review of draft-ietf-c… Martin J. Dürst
- Re: [art] Artart last call review of draft-ietf-c… Carsten Bormann
- Re: [art] [Last-Call] Artart last call review of … John C Klensin
- Re: [art] Artart last call review of draft-ietf-c… Martin J. Dürst
- Re: [art] [Last-Call] Artart last call review of … Martin J. Dürst
- Re: [art] [Last-Call] Artart last call review of … Carsten Bormann
- Re: [art] Artart last call review of draft-ietf-c… Carsten Bormann
- [art] Thank you! -- Re: [core] Artart last call r… Carsten Bormann
- Re: [art] Thank you! -- Re: [core] Artart last ca… Francesca Palombini
- Re: [art] Artart last call review of draft-ietf-c… Martin J. Dürst
- [art] Language tags and YANG Francesca Palombini
- Re: [art] [Last-Call] Artart last call review of … Carsten Bormann
- Re: [art] [core] Artart last call review of draft… Thomas Fossati
- [art] Call for comments on draft-ietf-core-proble… Francesca Palombini
- Re: [art] Call for comments on draft-ietf-core-pr… Marco Tiloca
- Re: [art] [Last-Call] [core] Artart last call rev… John C Klensin
- Re: [art] [Last-Call] [core] Artart last call rev… Carsten Bormann
- Re: [art] [Last-Call] [core] Artart last call rev… John C Klensin
- Re: [art] [Last-Call] Artart last call review of … Martin J. Dürst
- Re: [art] [core] Call for comments on draft-ietf-… Ari Keränen
- Re: [art] [Last-Call] [core] Artart last call rev… Thomas Fossati
- Re: [art] [Last-Call] [core] Artart last call rev… Carsten Bormann
- Re: [art] Language tags and YANG tom petch
- Re: [art] [Last-Call] Language tags and YANG Carsten Bormann
- Re: [art] [Last-Call] Call for comments on draft-… tom petch
- Re: [art] [Last-Call] Call for comments on draft-… Carsten Bormann
- Re: [art] [Last-Call] [core] Artart last call rev… Francesca Palombini
- Re: [art] [Last-Call] Call for comments on draft-… Francesca Palombini
- Re: [art] [Last-Call] [core] Artart last call rev… Carsten Bormann
- Re: [art] [core] [Last-Call] Artart last call rev… Carsten Bormann
- Re: [art] [Last-Call] Call for comments on draft-… Randy Presuhn
- Re: [art] [core] [Last-Call] Call for comments on… Carsten Bormann
- Re: [art] [Last-Call] Call for comments on draft-… Martin J. Dürst
- Re: [art] [Last-Call] Language tags and YANG Martin J. Dürst
- Re: [art] [Last-Call] Call for comments on draft-… tom petch
- Re: [art] [Last-Call] Language tags and YANG tom petch
- Re: [art] [Last-Call] Call for comments on draft-… Carsten Bormann
- Re: [art] [core] [Last-Call] Artart last call rev… Thomas Fossati
- Re: [art] [core] Call for comments on draft-ietf-… Hubert Przybysz
- Re: [art] [core] [Last-Call] Artart last call rev… Francesca Palombini
- Re: [art] [core] [Last-Call] Artart last call rev… Carsten Bormann
- Re: [art] [core] [Last-Call] Artart last call rev… Martin J. Dürst
- Re: [art] [core] [Last-Call] Artart last call rev… Carsten Bormann
- Re: [art] [core] [Last-Call] Artart last call rev… Carsten Bormann
- Re: [art] [core] [Last-Call] Artart last call rev… Martin J. Dürst
- [art] Obsoletes Re: [core] [Last-Call] Artart las… tom petch
- Re: [art] Call for comments on draft-ietf-core-pr… Francesca Palombini
- Re: [art] Obsoletes Re: [core] [Last-Call] Artart… John C Klensin
- Re: [art] Obsoletes Re: [core] [Last-Call] Artart… Scott O. Bradner
- Re: [art] Obsoletes Re: [core] [Last-Call] Artart… tom petch
- Re: [art] Obsoletes Re: [core] [Last-Call] Artart… Carsten Bormann