Re: [art] Artart last call review of draft-ietf-core-problem-details-05 (minor correction)

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Thu, 23 June 2022 07:00 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: art@ietfa.amsl.com
Delivered-To: art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 419DDC157B33; Thu, 23 Jun 2022 00:00:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.787
X-Spam-Level:
X-Spam-Status: No, score=-3.787 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-1.876, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=itaoyama.onmicrosoft.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9-bVxr74uIXm; Thu, 23 Jun 2022 00:00:34 -0700 (PDT)
Received: from JPN01-OS0-obe.outbound.protection.outlook.com (mail-os0jpn01on2104.outbound.protection.outlook.com [40.107.113.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 455A2C15791D; Thu, 23 Jun 2022 00:00:07 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VnXy6ZacfzQvKv4i8UAHo57IYwapSJzEonCd7Lxw7UAH4d4inPq8HwOl64xehrMf/z7FYr/TjENlL+TQyY9sIlV986LYay2UNdhEjHWlaYUb1ILxm2/Oo/ubjPkMAQ/ysDkXP0LBK5zLlxX9Q2xlZtlENLukHq+k1lnEQrRfg5rNRHP+6A7FhP2DNlL+C8t5ENhoMF85o1ChcfuN4e5FheUe2vFb/6VSjYMPPqmeQaHWlX9U5Vo2jReTUhlUrWh3loQVn4NNXXUuH2m3wPnJNwIOCcQxjNthzGWFWMTak5rXMi0A+jmdGsDA3BtRTG6Sg/Logxgku0/3MKslOhiKlg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=O9m1N8SVdNgIdnKaCBNYp7WD2lpmP9NEUQQ9jqPRszo=; b=QqkRr5mDXivIJdJV9GxljdkC3tOm4B22Hb1BQpTDwGBl0+FaGZHtYdQnuAvKmstwGuUZMZyoEb8ughHbbqeULKHhOv4JwErfReEAWoWay6SXljbVH573Dk4gyAv3DXBjo2U4tpfZ7QoRRtPmZpxzDudrYu6+USLNtXiGUznlZJnTNU6rAP+jAFFm1OuN4NTgeYwq2NExXDMX3MpB1C5NAexLobKJqgLnvNHt1vU4PMfo0uTuT2q2oXqklQ/iXzmKxHaMnikfWH+gdzZXQv2l9B7OJmAM/qPWwKXbHcd5hBE5cST7n5hhJRQBqhOUtjJFfJvgZToqqGsbaEG+GirXrg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=it.aoyama.ac.jp; dmarc=pass action=none header.from=it.aoyama.ac.jp; dkim=pass header.d=it.aoyama.ac.jp; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itaoyama.onmicrosoft.com; s=selector2-itaoyama-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=O9m1N8SVdNgIdnKaCBNYp7WD2lpmP9NEUQQ9jqPRszo=; b=Pxx0LnKJLZIjCMt5A8fkm0vZuee8pw+ZMblpaxu6ri4jowZUNWtzsvCm71zrdO4VSm7u2nwn2kuzq6lz/rTBUlydrnIjMh/0q9LvrZcIRwLB/Ns7Fm1J5BYm/H6J+zN1fvWll2vuju4qCLIcKy0KjXC1aToaoZjJxJfaCmJXHl4=
Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=it.aoyama.ac.jp;
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7) by OSAPR01MB2322.jpnprd01.prod.outlook.com (2603:1096:604:8::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.16; Thu, 23 Jun 2022 07:00:02 +0000
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::e587:9d9a:d780:ef39]) by TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::e587:9d9a:d780:ef39%6]) with mapi id 15.20.5353.022; Thu, 23 Jun 2022 07:00:01 +0000
Message-ID: <cf39dd0a-3da5-0088-8716-b326031ec253@it.aoyama.ac.jp>
Date: Thu, 23 Jun 2022 16:00:00 +0900
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0
Content-Language: en-US
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
To: Carsten Bormann <cabo@tzi.org>, Harald Alvestrand <harald@alvestrand.no>
Cc: art@ietf.org, core@ietf.org, draft-ietf-core-problem-details.all@ietf.org, last-call@ietf.org
References: <165511479760.19573.12671700576299137749@ietfa.amsl.com> <63D13796-758D-469B-AFA8-3050C9F87819@tzi.org> <dde9d36c-61e5-afcc-e15a-787c99d5fba9@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
In-Reply-To: <dde9d36c-61e5-afcc-e15a-787c99d5fba9@it.aoyama.ac.jp>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-ClientProxiedBy: TYCP286CA0045.JPNP286.PROD.OUTLOOK.COM (2603:1096:400:29d::19) To TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7)
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 538f5db0-1290-4e6f-0a74-08da54e5fe5c
X-MS-TrafficTypeDiagnostic: OSAPR01MB2322:EE_
X-Microsoft-Antispam-PRVS: <OSAPR01MB23229ADCE4695C7B10FA16E7CAB59@OSAPR01MB2322.jpnprd01.prod.outlook.com>
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: F1k9qTW1L5682vDpviN4McSBUj+1oVedDdukONEK1AGOhuw+3ktdLRWpz6Hk/KKcB9LhbRUI0ipLm9+y2jdE5WV29PjmVUHM/XAWbb0T0aUiJjYCHLJPTMN7pzqEuhF9hfmuUP0Dmu5b1fGRSZUOVyCwV8idkbi+ALAMaWyAcn7MWkQ0HLrkCPTDyWdcyXhqu1wa1QKw4IAuMTbeCKvKBLk1RcGW8TvYP9lPv9EHRtXyWE+PZS8Zc458Qg5iFtAjTKVypEcGslXtPFQencpcB14NEMYFxNLbfPyOAvlwxvcQU47H1I8F+wNYQ2Zmx4WVUIGUKYyWT0CwHRBPUnc3UNhGMyKxNde1iQ778sG4zhTGJ9lGYlUEOpm8VEtaYx/i+zWeJ1xdyzJ61I/GuxGd/b0A6LBSiEcC451LuioPuC+I1sQPU+33r/aCx0PFfq0tBSUy/BL0LiPCF/i9awno2lYXN1Zl/EUSGqiSnhn8tIw7VQdoirXMO2zQ+2EAvpHP9OAVkjWgj5O8ZtQTlpTiZEyszJcmqiiH+e4gx/qon+CVAaXTlGsZ7XAUhqPuV9mQ/JvwabTIbZa3bMwgh34sJIF7BaVxSh3SG//FbRIrDLM9alPLEO38rUMxIvpOxeCHgSvypYKo0BdtkSP1h6QhNNKbABcToKSMGDPdZxPLJ7IHZVlziG5LNTrxqiTDHZontnrxqUcCHSL/ueMjuTouQ7WPIgiO53B3H7fh0gp2BZip5moMjYMp2PwtGs0k76BI2Zoy4FrhqtWwNjgye9+zu0PEd+wPlt2CqrIkrQhKMCnYLzCL85KJk+Q31p0/d1fmGj4qJC3Z/Ty7405YZb7Nq0UlekmDyGEdstSKgKLfso4=
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB5689.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(39850400004)(396003)(346002)(376002)(366004)(136003)(2906002)(41320700001)(8936002)(83380400001)(8676002)(66946007)(66556008)(66476007)(4326008)(31686004)(30864003)(5660300002)(66574015)(53546011)(52116002)(6506007)(6512007)(26005)(36916002)(786003)(316002)(110136005)(478600001)(966005)(41300700001)(31696002)(38100700002)(6486002)(38350700002)(2616005)(86362001)(186003)(45980500001)(43740500002); DIR:OUT; SFP:1102;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: NSj29bfcyLV81T4Dx1pqimPhq1Aa8mscKHbtH9uJid6SzarZbxIgz2qWCpf5Nvp7fgKuXN8FB6h3+O3KRY/c/bCe/pu7Ltx+BK0EKJsKzy9YM+1ChgLYh3JYiZMU3wOpTtDEthcLiSTZVXkWCfDVIzbW8oHILr7Oty+sSOPFGdzEq/zsWJY27/vEvZ/QLaOOMtdZZzt3iY49KSw9Y+vsp+6eZLydhIX856BTPP4//G81CFax6Ovvo89/N8NkFAGYMbj8LScaCUWTCt5N0kE2/LBLvmN3JLZtzxFhkZriJrA8K3/gFOVzXSHHRekFo/SHT5kEk2fg46F4bwI/XKTxD6KLBTO3Xk+j2GiiIKCUnPumtuYVvLuPHs/XrF2GKDUwpBmd9KS1qAF0obUiJON+3bywFRh9/yQCXkQeroiYh9v6VhVWbqQoC4/PzcU8OvBXCGKOzb/0dVuBf9DsfSAWpSwmMTa2Stqvlx3twgPfkIXc/i4A4hK22uh5LG5E7IaU8TrjxPTxRGWnX85Dsdl2/QYj+X0A6AjugbSF23vntmxN0Q9e5r0oBnfA3ttKU9OfL1afTFB3voQWaNOWclrssB4Z2gAj+fLkPrcCWtvulUdBtGeQFFpz5OnY14hPxzOLk7xXyFFigUrOKIFpph/h2P/CrSOcV2RCFq7XiY2CALyPKGGwQmc7WlrHoTG3J9IuWIW1bX5vm6ZOTf8gbkAP45fcAtHgACjQMfGs3XIwaTNWQWgdhKA7ejbnBR3Ons25
X-OriginatorOrg: it.aoyama.ac.jp
X-MS-Exchange-CrossTenant-Network-Message-Id: 538f5db0-1290-4e6f-0a74-08da54e5fe5c
X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB5689.jpnprd01.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Jun 2022 07:00:01.8972 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: e02030e7-4d45-463e-a968-0290e738c18e
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 7F3T9wc8DXn/mPA2u4MnPVYT4kbbjs4NQkrnQoAyTkHn45m9hn/KG8blIokuHBSMpJxEPkVM+xWdtNwvgEMXgQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: OSAPR01MB2322
Archived-At: <https://mailarchive.ietf.org/arch/msg/art/-yBJpMtl2DljVMJ3rLoZFce9Qyw>
Subject: Re: [art] Artart last call review of draft-ietf-core-problem-details-05 (minor correction)
X-BeenThere: art@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Applications and Real-Time Area Discussion <art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/art>, <mailto:art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/art/>
List-Post: <mailto:art@ietf.org>
List-Help: <mailto:art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/art>, <mailto:art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Jun 2022 07:00:38 -0000

[just a very minor correction to my comments below:
The heading "Directionality Information" should be moved down, just 
below the text "boil that ocean".]

Regards,   Martin.


On 2022-06-23 15:47, Martin J. Dürst wrote:
> Dear Core and I18N experts,
> 
> Some comments on the I18N aspects of Tag 38 below.
> 
> [Sorry this answer took so long, and got so long. The two 'long's 
> influenced each other :-).]
> 
> On 2022-06-16 01:23, Carsten Bormann wrote:
>>
>> Hi Harald,
>>
>> thank you for this thoughtful review.
> 
>>> The “Tag 38 internationalized string”
>>> This document adds an appendix defining an “internationalized string” 
>>> format
>>> that adds a BCP 47 language tag and an Unicode-based direction 
>>> indicator to an
>>> UTF-8 string. This is laudable; RFC 2277 section 4 pointed out the 
>>> need for
>>> this ability 24 years ago.
> 
> I think that Language-Tagged Strings (CBOR Tag 38, 
> https://datatracker.ietf.org/doc/html/draft-ietf-core-problem-details-06#appendix-A) 
> are a very good step ahead. At least for CBOR, in many cases from now 
> on, the answer might just be "use Tag 38" (assuming we get the details 
> right).
> 
> 
>>> Unfortunately neither definition is problem-free.
>>>
>>> First of all, this tag, if useful at all, is of far greater utility 
>>> than the
>>> error format. Burying it in an appendix of a document whose stated 
>>> purpose is
>>> something else makes it far more difficult to refer to than it needs 
>>> to be.
>>
>> That is usually not a problem.  The focal point for finding a CBOR tag 
>> for a specific application is the CBOR tag registry; this then points 
>> to the places where the specifications for the tags can be found 
>> (which in this case is easily expressed as “Appendix A of RFC XXXX”).
> 
> Separate Draft or Not
> =====================
> 
> I agree with Harald that it should be a separate draft; it would 
> definitely help with visibility of I18N in general and the issue of 
> strings with language and directionality information inside and outside 
> the IETF (not only the visibility within the CBOR community, which may 
> be covered by the tag registry). Being able to say "look at RFC XXXX for 
> a good example" is way better than being able to say "look at appendix X 
> of RFC YYYY for a good example".
> 
> I understand Francesca's arguments, too, but I think the investment in a 
> separate draft would be well worth the effort. I'm willing to contribute 
> although I guess that Carsten would do the necessary work in less time 
> than it takes him to get anybody else up to speed.
> 
> 
>>> Second, the “detailed semantics” has chosen to include the quite 
>>> complex BNF of
>>> RFC 5646 translated into CDDL; this may have some use, but BCP 47 is 
>>> a moving
>>> target;
>>
>> We intend tag38 to be useful for the current form of BCP 47, so it is 
>> hard to plan for the future.  If BCP 47 needs to be considered 
>> unstable, we could of course define a “bcp47-extension” alternative 
>> with a CDDL feature control operator.
> 
> (NOT!) Copying BCP 47 Grammar
> =============================
> 
> I also agree with Harald that the definition of 'Language-Tagged 
> Strings' has room for improvement. First, as Harald said, it repeats the 
> BCP 47 grammar when we very well know that repeating grammars is usually 
> a bad idea. I'm really not sure why CBOR wants to check each and every 
> detail of the current language tag syntax. My understanding was that 
> CBOR was (among else if not primarily) for constrained devices. I just 
> cannot see the motivation of embedding a list of legacy tags into a 
> constrained device.
> 
> I also don't know about other technology on a similar level as CBOR that 
> would do so. As an example, XML had productions 33-38 (see 
> https://www.w3.org/TR/1998/REC-xml-19980210#sec-lang-tag), but they were 
> removed as early as 2000 (see 
> https://www.w3.org/TR/2000/REC-xml-20001006#sec-lang-tag), for very good 
> reasons. I really have difficulties to imagine why CBOR would want to 
> make the same mistake that XML fixed more than 20 years ago.
> 
> Similarly, XML Schema Datatypes only gives a very simple regular 
> expression ([a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*) and notes
> (see https://www.w3.org/TR/xmlschema11-2/#language):
> 
> [[[[
> Note: The regular expression above provides the only normative 
> constraint on the lexical and value spaces of this type. The additional 
> constraints imposed on language identifiers by [BCP 47] and its 
> successor(s), and in particular their requirement that language codes be 
> registered with IANA or ISO if not given in ISO 639, are not part of 
> this datatype as defined here.
> ]]]]
> Again, XML Schema would have done something more precise if anybody had 
> been convinced that such precision made sense.
> 
> 
> Another way to see this is that in general, when giving restricting 
> syntactic rules, there's the question of "bang for the buck". The 
> complexity of the language tag syntax rules, down to the legacy 
> (grandfathered) stuff, mean that the cost ("buck") is quite high. This 
> not only includes implementation and memory footprint, but also testing 
> and everything else.
> 
> On the other hand, the "bang" is quite low, because of two reasons:
> First, without a check against the registry, a lot of garbage still can 
> go through. Think e.g. "en-UK", which looks reasonable and fits the 
> grammar, but is not allowed (UK is not a country code, "en-GB" is 
> correct). Second, most actual language tags, in particular for 
> constrained devices, are more on the level of "fr" or "en-US", which 
> means that on most actual data, the full syntax isn't really exercised. 
> Which further means that software with implementation bugs in the syntax 
> testing part doesn't get weeded out.
> 
> The main mechanisms (if any) that will help to make sure these language 
> tags are correct are the following:
> 1) On the 'sender' side, texts will be translated, by "hand" or using 
> some localization tools, and the correct language tags will be set there 
> (because somebody translating to Ukrainian, or their tool, knows the 
> correct tag is "uk", and not something else).
> 2) On the 'receiver' side, user preferences will be expressed as 
> language tags (or prefixes,...), which should assure that correctly 
> tagged data gets shown and incorrectly tagged data gets ignored.
> 
> To summarize, copying the grammar from BCP 47 brings extremely little 
> bang for rather high costs. Get rid of it in the same way other 
> standards which have thought this through have gone rid of a detailled 
> grammar. If you want something that gives you a minimal plausibility 
> test (catch cases where e.g. the text and the language tag got swapped 
> by some accident,...), do what XML Schema did.
> 
> This will also be future proof. There are many changes to BCP 47 that 
> have been discussed in the past (although none of these got traction, or 
> are expected to get traction in the near future), but changing the basic 
> syntax constraint expressed by XML Schema was never considered an 
> option. On the other hand, it was always clear to the people involved 
> that users of language tags shouldn't create artificial barriers to 
> future changes. It would be really a pity if CBOR created such a barrier 
> just because they could. Things such as "CDDL feature control operators" 
> are great where they actually serve a purpose, here I don't think they 
> would.
> 
> 
> Directionality Information
> ==========================
> 
> Regarding language tags, in addition, there is the following note:
> [[[[
> NOTE: The Unicode Standard [Unicode-14.0.0] includes a set of
>     characters designed for tagging text (including language tagging), in
>     the range U+E0000 to U+E007F.  Although many applications, including
>     RDF, do not disallow these characters in text strings, the Unicode
>     Consortium has deprecated these characters and recommends annotating
>     language via a higher-level protocol instead.  See the section
>     "Deprecated Tag Characters" in Section 23.9 of [Unicode-14.0.0].
> ]]]]
> It's weird for the IETF to refer (only) to the Unicode standard here 
> even though the IETF has deprecated this kind of language tagging in RFC 
> 6082. (see https://www.rfc-editor.org/rfc/rfc6082.html) So please cite 
> that RFC.
> 
> 
>>> having CDDL parsers try to validate tags according to this grammar is
>>> not going to be useful. If included at all, this needs to be clearly 
>>> marked
>>> with text saying that BCP 47 is normative for this grammar, and that 
>>> language
>>> tag parsers should NOT try to reject tags based on this grammar; 
>>> instead, they
>>> should be treated as strings, and looked up against relevant language 
>>> handling
>>> APIs. (“zh-ZZ” is perfectly valid according to the grammar, but is 
>>> semantically
>>> invalid according to BCP 47).
>>
>> Here again, it is hard to capture semantics in a structural definition.
>> Our document is going to reference RFC 5646 (including its ABNF), as 
>> that is the current definition; if BCP 47 is updated, the effect of 
>> that update on this document will need new consideration.
> 
> No, please. I understand that in some areas, you don't want to allow 
> gratuitous changes to your network and software based on changes to 
> technology that you use. But for language tags, such a mindset is really 
> counterproductive. Some of the changes to BCP 47 that have been 
> discussed are to include some subtags for dialects. Now if such a change 
> happened, there are two questions relevant for CBOR:
> 1) How many cases would there be in the CBOR landscape where people 
> would want to use such subtags? The answer would probably be: Very few, 
> so a change (using a "CDDL feature control operator" or whatever) would 
> have very low priority. But why should people be prohibited from using 
> such subtags if they want to use them?
> 2) What's the problem in letting such subtags though the current 
> infrastructure? My guess is that there's no problem at all. When there 
> are parallel texts, one tagged with "en-US" and the other with one of 
> these dialect subtags, the chance is very high that a recipient will be 
> displaying the former. Would that be a problem?
> 
> 
>>> Note also that the sentence “Data items with tag
>>> 38 that do not meet the criteria above are invalid (see Section 5.3.2 of
>>> [STD94]).” is really hard to parse semantically, given that section 
>>> 5.3.2 of
>>> RFC 8949 doesn’t use the word “invalid”, it uses “inadmissible 
>>> value”. I do not
>>> recommend rejecting unknown language tags.
>>
>> They may not be rejected, they are just not “valid” in RFC 8949 sense 
>> (they are still well-formed).  I would expect language tags to evolve 
>> within the grammar defined by RFC 5646 (which does have an extension 
>> point); it that is a mistaken assumption, please let us know.
> 
> In the short term (my average guess at "short term" would be 10 years or 
> so), evolution *within* RFC 5646 is definitely the main focus. In the 
> really long term, I guess anything that fits the XML Schema production 
> is fair game. That restriction has been there since the original RFC 
> 1766, and provides some actual "bang for the buck". It is also baked in 
> into technologies such as XML Schema which would provide a very strong 
> argument to not give up on it. In all the work on revising RFC 1766 
> (which I co-chaired, and which was quite long-winded), changing the rule 
> that each subtag had to be 8 characters or less was never strongly 
> disputed at all.
> 
> 
>>> Thirdly, the definition of the tri-state direction attribute can be made
>>> clearer; in particular, the Unicode Bidirectional Algorithm (UAX#9) 
>>> should be
>>> referenced, with particular reference to
>>> https://www.unicode.org/reports/tr9/tr9-44.html#Markup_And_Formatting 
>>> - the
>>> important property here is that the desired semantic is isolation - 
>>> the markup
>>> is intended to have zero influence on strings outside the embedded 
>>> string - the
>>> semantics of embedding in RLI…PDI is the desired effect.
>>
>> Tag38 does not provide a way to handle embedding, so we are not trying 
>> to boil that ocean yet.
> 
> Again, I agree with Harald here. But first, please be careful. 
> "embedding" has a very narrow technical meaning in the Bidi Algorithm 
> (UAX #9). Tag 38 doesn't need a way to handle embeddings in this sense. 
> When Harald used the term "embedded string", he didn't use "embedded" in 
> this very narrow technical sense, but in a more general sense, namely 
> that the string from Tag 38 is expected to be put into some 
> (surrounding) context. That might mean that it shows up by itself 
> somewhere, or that it gets included in a larger text of some sorts.
> 
> In the draft, you have the following text:
> [[[[
>     The optional third element, if present, is a Boolean value that
>     indicates a direction: false for "ltr" direction, true for "rtl"
>     direction.  If the third element is absent, no indication is made
>     about the direction; it can be explicitly given as null to express
>     the same while overriding any context that might be considered
>     applying to this element.  Note that the proper processing of
>     Language and Direction Metadata is an active area of investigation;
>     the reader is advised to consult ongoing standardization activities
>     such as [STRING-META] when processing the information represented in
>     this tag.
> ]]]]
> 
> [override is also a technical term in the Bidi Algorithm]
> 
> I think this text is very important, so I'll got into some details. 
> First (minor nit), it says "If the third element is absent ...". Because 
> this is in a paragraph that starts with "The optional third element 
> ...", I think it would better say "If this element is absent ...".
> 
> Next, let me make sure that I get this right: This is a Boolean value, 
> but it can in effect have four different states, yes? That would be:
> - True (rtl)
> - False (ltr)
> - null (no indication about direction, but overriding any context)
> - absent (no indication about direction, but context may apply)
> If that's true, then it might be good to put that into a more structured 
> from (something like the above list).
> 
> [very major point] The main problem is with the last sentence. There's 
> not much of a point in defining a field for directionality if it's not 
> clear what that is supposed to be used for. I'm also not sure where the 
> claim "the proper processing of Language and Direction Metadata is an 
> active area of investigation" came from, and why it is here.
> 
> It is true that some areas of bidi processing (e.g. the best consistent 
> way to display IRIs that contain pieces of text from both 
> directionalities) that are not solved yet, or even (as the example a 
> line ago) are not even actively being investigated because the general 
> agreement is that the problem is too difficult to have a solution.
> It is also true that "Strings on the Web: Language and Direction 
> Metadata" (https://www.w3.org/TR/string-meta/) is still in Draft status.
> 
> But neither of these facts should have to influence the specification of 
> Tag 38. [StringMeta] (3.4 What consumers need to do to support 
> direction, https://www.w3.org/TR/string-meta/#what_consumers_do), Harald 
> and I all agree about what the right thing to do is: Use Bidi isolation 
> (in the technical sense of 
> https://www.unicode.org/reports/tr9/#Explicit_Directional_Isolates).
> 
> So given all the above considerations, what about rewriting the 
> paragraph under consideration along the following lines:
> 
> [[[[
>     The optional third element, if present, is a Boolean value that
>     indicates a direction, as follows:
>     - false: LTR direction. The text is expected to be displayed
>       with LTR base direction if standalone, and isolated with LTR
>       direction (enclosed in RLI ... PDI or equivalent, see [1]) in
>       the context of a longer string or text.
>     - true: RTL direction. The text is expected to be displayed
>       with LTR base direction if standalone, and isolated with RTL
>       direction (enclosed in LRI ... PDI or equivalent, see [1]) in
>       the context of a longer string or text.
>     - absent: no indication is made about the direction
>     - (explicit) null: no indication is made about the direction,
>       but any directionality context applying to this element (e.g.,
>       base directionality information for an entire CBOR message or
>       part thereof) is ignored.
> ]]]]
> [1] Unicode® Standard Annex #9, Unicode Bidirectional Algorithm, Section 
> 2.7  Markup and Formatting Characters, 
> https://www.unicode.org/reports/tr9/#Markup_And_Formatting
> 
> I'm not really sure yet about the 'absent' and 'null' entries, neither 
> if they are really distinct nor whether the specification is good enough 
> (we might want to specify FIRST STRONG ISOLATE semantics).
> 
> 
> Hope this helps. Let's make sure together that we get this right.
> 
> Regards,    Martin.
> 
> _______________________________________________
> art mailing list
> art@ietf.org
> https://www.ietf.org/mailman/listinfo/art

-- 
Prof. Dr.sc. Martin J. Dürst
Department of Intelligent Information Technology
College of Science and Engineering
Aoyama Gakuin University
Fuchinobe 5-1-10, Chuo-ku, Sagamihara
252-5258 Japan