Re: [xml2rfc] [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.16.0

Marc Petit-Huguenin <marc@petit-huguenin.org> Fri, 20 January 2023 13:16 UTC

Return-Path: <marc@petit-huguenin.org>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 474A9C1524B5; Fri, 20 Jan 2023 05:16:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.897
X-Spam-Level:
X-Spam-Status: No, score=-6.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_FAIL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id F53Vq5CxV-Xe; Fri, 20 Jan 2023 05:16:21 -0800 (PST)
Received: from implementers.org (implementers.org [92.243.22.217]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 41A2FC14CE3D; Fri, 20 Jan 2023 05:16:21 -0800 (PST)
Received: from [IPV6:2601:204:e37f:a6af:d250:99ff:fedf:93cf] (unknown [IPv6:2601:204:e37f:a6af:d250:99ff:fedf:93cf]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "Marc Petit-Huguenin", Issuer "implementers.org" (verified OK)) by implementers.org (Postfix) with ESMTPS id 578FBAE232; Fri, 20 Jan 2023 14:16:18 +0100 (CET)
Message-ID: <147fb362-e522-8ad4-51e2-2363a6e0eeb8@petit-huguenin.org>
Date: Fri, 20 Jan 2023 05:16:16 -0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0
Content-Language: en-US
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Jay Daley <exec-director@ietf.org>
Cc: xml2rfc@ietf.org, tools-discuss <tools-discuss@ietf.org>
References: <CAD2=Z87EMetcpv66YY_b2+X1-yFy4cTpKMjPoJL=cH99c7P_Uw@mail.gmail.com> <9d719176-a4eb-7cce-e706-10325700531c@petit-huguenin.org> <F1A5624B-16D0-4463-AC5F-B0A03F3B94B6@ietf.org> <8f5a497e-4135-7c0c-46cb-c3fe4791e9f3@petit-huguenin.org> <c3b3064f-e505-f504-f258-06f0d824ed4b@it.aoyama.ac.jp>
From: Marc Petit-Huguenin <marc@petit-huguenin.org>
In-Reply-To: <c3b3064f-e505-f504-f258-06f0d824ed4b@it.aoyama.ac.jp>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="------------6h98moUGS9YXG8Zau1F1ZtwY"
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/GpEnq3ELXOn0efZk3I6XOGJiRqw>
Subject: Re: [xml2rfc] [Rfc-markdown] [Tools-discuss] New xml2rfc release: v3.16.0
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: XML2RFC discussion list <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Jan 2023 13:16:26 -0000

On 1/19/23 22:10, Martin J. Dürst wrote:
> On 2023-01-20 02:20, Marc Petit-Huguenin wrote:
> 
>> RFC 7997 clearly says that Unicode CANNOT be used unless for a finite list of cases:
>>
>> 1. Purely part of an example (3.1)
>> 2. for English words imported from foreign languages, with the strict constraints that they are defined in the Merriam-Webster dictionary (3.1).
>> 3. person or Organization name (3.2, 3.3)
>> 4. when the Unicode character is described, instead of being used (3.4)
>> 5. in a table
>> 6. in code
>> 7. in a bibliographic item
>> 8. in address information
>>
>> The modification above is clearly not restricted to these cases.
>>
>> I notice that the xml2rfc language already contains some elements that are can be used into enforcing these cases.  When missing, new elements could be added:
>>
>> (1) An <artwork> element can contain Unicode
>> (2) a new element (as <t> content) can mark word that can contain Unicode.  Xml2rfc can then extract them and check that they are valid English words
>> (3) <contact> can contain Unicode
>> (4) <u>, used to describe a Unicode character, can contains Unicode
>> (5) a <tr> element can contain Unicode
>> (6) a <sourcecode> element can contain Unicode
>> (7) a <reference> can contain Unicode (not just the organization/address)
>> (8) The <address> and <organization> elements can contain Unicode.
> 
> This doesn't include point 1) in your first list (Purely part of an example (3.1)). I guess they could go under (2), but the part "check that they are valid English words" would have to move from xml2rfc to people. Or there could be a second new element such as <example>. The content of this would again have to be checked by people.

<artwork> is one of the ways an example can be inserted in an xml2rfc document, so it could be used to clearly mark in a document examples that contain unlimited Unicode.

> 
> But then, we don't actually have to go that far. The announcement to which your reacted contains the following:
> 
> ```
> * New flag --warn-bare-unicode when set, xml2rfc warns about bare
> Unicode in the <t> elements. By default, this is set to False.
> ```
> 
> Because it is very easy for a program to detect (non-ASCII) Unicode, there isn't even a need for any new element.

By that reasoning, is there any need for any element?

> 
>> Doing that and documenting it in the next revision of RFC 7991 seems the sensible thing to do.
>>
>> But unconditionally letting everyone adding Unicode characters willy-nilly looks to me as a way to, at some point in the future, being able to say that we have no other choices than officially authorizing Unicode everywhere because there is already too many legacy RFCs doing that (a well known tactic to work around standards).
> 
> The addition of the --warn-bare-unicode flag should be enough to show that there is no intention to let everyone add Unicode characters willy-nilly. I'm sure the RPC knows how to use that flag.
> 
> Regards,   Martin.

-- 
Marc Petit-Huguenin
Email: marc@petit-huguenin.org
Blog: https://marc.petit-huguenin.org
Profile: https://www.linkedin.com/in/petithug