Re: [Rpat] [Xml-sg-cmt] [AD] <u>: Re: AUTH48: RFC-to-be 9290 <draft-ietf-core-problem-details-08> for your review

Jay Daley <exec-director@ietf.org> Tue, 13 September 2022 10:15 UTC

Return-Path: <exec-director@ietf.org>
X-Original-To: rpat@ietfa.amsl.com
Delivered-To: rpat@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 889ECC15257D; Tue, 13 Sep 2022 03:15:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.907
X-Spam-Level:
X-Spam-Status: No, score=-1.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bs7MJqHomLSE; Tue, 13 Sep 2022 03:15:08 -0700 (PDT)
Received: from ietfx.amsl.com (ietfx.amsl.com [50.223.129.196]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 92412C14F72D; Tue, 13 Sep 2022 03:15:08 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by ietfx.amsl.com (Postfix) with ESMTP id 4FBF7491325F; Tue, 13 Sep 2022 03:15:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from ietfx.amsl.com ([50.223.129.196]) by localhost (ietfx.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uezKk9l6DYAi; Tue, 13 Sep 2022 03:15:08 -0700 (PDT)
Received: from smtpclient.apple (host-92-27-125-209.static.as13285.net [92.27.125.209]) by ietfx.amsl.com (Postfix) with ESMTPSA id 276604196943; Tue, 13 Sep 2022 03:15:07 -0700 (PDT)
From: Jay Daley <exec-director@ietf.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\))
Date: Tue, 13 Sep 2022 11:15:04 +0100
References: <20220804195913.906BF55ECC@rfcpa.amsl.com> <4018FA5D-1031-437C-B9D6-A496A6B100B8@tzi.org> <7E8CED15-A7F7-4F9B-B54D-C858B7E64255@amsl.com> <40DB4CD3-2D93-4481-BFB6-7F374A4128B6@tzi.org> <37BDB141-B704-476E-BC06-C58AD319CAA7@amsl.com> <0DB1E862-25B5-491C-ACC1-13AD9ED9743B@tzi.org> <DB9PR08MB6524DBA3C87BDBF8BAE1F7FA9C419@DB9PR08MB6524.eurprd08.prod.outlook.com> <62BA61C2-0558-4C4F-A138-11F8C8EF159E@tzi.org> <76F4210F-CDB4-4824-824E-79BF9E4CAE08@amsl.com> <CAL0qLwa9sQZqWe8c-pgAobA__feSY=+68rUkryK2EEYMc9DVBg@mail.gmail.com> <1B242483-1E4E-4ECE-972A-871BA046049B@tzi.org> <09F4BF47-84A8-4F37-A1F9-345761CC04A7@tzi.org> <382E70EA-DD5B-48D6-8CF7-70F23522D6F6@amsl.com> <2CD40313-987E-484A-934F-E9182ECF8EB0@tzi.org> <46bd7895-ee80-8c4a-60cc-f7a3542d0ab8@taugh.com> <51394456-C071-4064-96C9-ED450C428909@tzi.org> <F477788C-0BBE-47D9-86B2-264416C42411@tzi.org> <AAB26AAC-1C47-4198-AC2C-F7B747A78ED5@amsl.com> <4E2087D2-6C39-4510-87BC-E5254A184D9B@tzi.org> <b04288a9-6a4c-ced3-2dd8-a81424dd11c0@stpeter.im>
To: Peter Saint-Andre <stpeter@stpeter.im>, Carsten Bormann <cabo@tzi.org>, Sandy Ginoza <sginoza@amsl.com>, rpat@rfc-editor.org, "John R. Levine" <johnl@taugh.com>, RFC Editor <rfc-editor@rfc-editor.org>
In-Reply-To: <b04288a9-6a4c-ced3-2dd8-a81424dd11c0@stpeter.im>
Message-Id: <5B37D903-0FD3-4BCB-BC72-2E546FEE65AB@ietf.org>
X-Mailer: Apple Mail (2.3696.120.41.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rpat/Drve8JNKld8dlDIlIj0PijD7fyc>
Subject: Re: [Rpat] [Xml-sg-cmt] [AD] <u>: Re: AUTH48: RFC-to-be 9290 <draft-ietf-core-problem-details-08> for your review
X-BeenThere: rpat@rfc-editor.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: RFC Production Advisory Team - Provides operational advice to the RFC Production Center <rpat.rfc-editor.org>
List-Unsubscribe: <https://mailman.rfc-editor.org/mailman/options/rpat>, <mailto:rpat-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rpat/>
List-Post: <mailto:rpat@rfc-editor.org>
List-Help: <mailto:rpat-request@rfc-editor.org?subject=help>
List-Subscribe: <https://mailman.rfc-editor.org/mailman/listinfo/rpat>, <mailto:rpat-request@rfc-editor.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Sep 2022 10:15:09 -0000

> Allowing non-ASCII text everywhere might be construed as a policy topic...

I’ve reread RFC 7997 and to my surprise I agree with Carsten that non-ASCII characters are allowed in the body text with some constraints and in this specific case the constraints do not apply.  

My analysis begins with this statement, which, while frustratingly written in terms of what is not allowed rather than what is allowed, does imply that non-ASCII characters used as part of an example can appear anywhere without special handling:

> 3.1. General Usage throughout a Document
> 
> Where the use of non-ASCII characters is purely part of an example and not otherwise required for correct protocol operation, escaping the non-ASCII character is not required.

There is a qualifier that immediately follows that statement that is again frustratingly cryptic and I don’t understand how/why it is presented as a qualifier, however my best guess is that it does not apply where the non-ASCII characters are not being used in a word that would normally be spelt in English without them (e.g. café).

> Note, however, that as the language of the RFC Series is English, the use of non-ASCII characters is based on the spelling of words commonly used in the English language following the guidance in the Merriam-Webster dictionary [MerrWeb].

Later on in the document there is a discussion about code points, which again I do not think applies because this is specifically about when the individual characters need to be identified (which is what I interpret as the use of the word "mention" in this text), and in Carsten’s case all that matters is that people are away this is a RTL string in Hebrew script.  In other words, if this RFC were specifying what Hebrew characters could be used in a particular place then they would have to be specified as below, but not if they are just any Hebrew characters used as an example.

> 3.4. Body of the Document
> 
> When the mention of non-ASCII characters is required for correct protocol operation and understanding, the characters' Unicode code points must be used in the text. The addition of each character name is encouraged.

Then we get to the current position, which is based on changes made after the publication of RFC 7991/7997 and which in my view are based on a mistaken reading of 3.4 above because 3.4 only applies where the actual characters matter (and therefore need to be mentioned) as well as being based on an entirely novel concept of being able to "distinguish between intentional and non-intentional inclusion of non-ASCII characters".

From https://www.ietf.org/archive/id/draft-levkowetz-xml2rfc-v3-implementation-notes-13.html#name-new-section-2x-u 

> 3.1.24. New Section 2.X, <u>
> 
> Thinking about being able to issue warnings both during xml2rfc processing and when running idnits, it seems very hard to distinguish between intentional and non-intentional inclusion of non-ASCII characters in document text.
> 
> In addition to the problem of correctly detecting non-intentional use of Unicode characters, there is also the issue (for authors) of correctly converting given Unicode characters to one of the forms recommended in [RFC7997], and the issue (for idnits) of verifying that any Unicode characters or strings are correctly represented as Unicode code-point values next to the literal character or string.
> 
> One solution to this could be to not try to guess, or establish heuristics, but instead use a v3 schema element with preptool validation to ensure a straightforward solution to all the issues, as follows:
> 
> Proposal: Limit the arbitrary placement of Unicode characters and strings in the body of a document, and control the expansion of the Unicode code-points by requiring that Unicode characters and strings be placed within a specific element if they are to occur in the body of a document. Such an expansion is already mandated by Section 3.4 of [RFC7997]; but without schema support, it would be very hard for tools to enforce this. The text in Appendix A.1 is proposed for inclusion in RFC 7991-bis as a new section.
> 
> Proposal:
> Limit the arbitrary placement of Unicode characters and strings in the body of a document, and control the expansion of the Unicode code-points by requiring that Unicode characters and strings be placed within a specific element if they are to occur in the body of a document. Such an expansion is already mandated by Section 3.4 of [RFC7997]; but without schema support, it would be very hard for tools to enforce this. The text in Appendix A.1 is proposed for inclusion in RFC 7991-bis as a new section.
> 
> Implementation: Implemented as described in Appendix A.1.
> 
> Heather's indication 20 Jul 2019: Isn't this already required by 7997??

I don’t actually want to undo the addition of <u> because I can see it is useful when section 3.4 above applies, but I do agree with Carsten that xml2rfc should allow non-ASCII characters in the body of a document without being wrapped in <u> tags (or <name> or any other workaround) and it is up to the RPC to decide if that usage is acceptable, not the tool.  All the tool needs to do is provide a clear warning that they exist so that the RPC knows they are there.

In this case my advice to the RPC would be that this usage is acceptable because the text is just an example of RTL text using Hebrew and the choice of what characters are displayed is largely irrelevant.

Jay

-- 
Jay Daley
IETF Executive Director
exec-director@ietf.org