Re: [Rswg] Updating RFC 7997 (The Use of Non-ASCII Characters in RFCs)

Martin Thomson <mt@lowentropy.net> Thu, 02 March 2023 06:08 UTC

Return-Path: <mt@lowentropy.net>
X-Original-To: rswg@ietfa.amsl.com
Delivered-To: rswg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 75CAFC1522AA for <rswg@ietfa.amsl.com>; Wed, 1 Mar 2023 22:08:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.798
X-Spam-Level:
X-Spam-Status: No, score=-2.798 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b="oBSY1Sx1"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="kedZeq84"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5xiGG1Md-cQn for <rswg@ietfa.amsl.com>; Wed, 1 Mar 2023 22:08:41 -0800 (PST)
Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3F8A5C14CE4A for <rswg@rfc-editor.org>; Wed, 1 Mar 2023 22:08:41 -0800 (PST)
Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id 411A03200C3B for <rswg@rfc-editor.org>; Thu, 2 Mar 2023 01:08:38 -0500 (EST)
Received: from imap41 ([10.202.2.91]) by compute6.internal (MEProxy); Thu, 02 Mar 2023 01:08:38 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm2; t=1677737317; x=1677823717; bh=d7 8u+QjQmaQzKc87xK0uKGQXvFRGBpFmAVWkwVqOaAk=; b=oBSY1Sx1N/lV0bg2/d XthF6l4O3FgVuEMCjI6tIBIAbMTqQqh4Szv2UR/0ZPP0ufjvmKJgddwxcLX3CF4K yNNlQpfYuBEd6FdBsfZIGb0urChhmuL+IHbLaBrjB8XeA3+EzPRYJdKnONpasVR+ wnTCvDQRXh/hKw+JU/LNApVCEc5MATsDo4BKHKjpwGo24UHKV9CEjFp7ZG5IgCdq yta8K4rl71C8fnALUDC9xlEeMg3hZLbN4RZUBwDw/tNl9//WB33w8M9rtY1T6MqY NiZY9sWA5bBeGQJgeeHBm3Qydzo9G5yqoQ4Sh01F0Mz1loKmJjsvNT0wM4lryeZ3 Uhpg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1677737317; x=1677823717; bh=d78u+QjQmaQzK c87xK0uKGQXvFRGBpFmAVWkwVqOaAk=; b=kedZeq846z1gcVJ0PQyrewolA/KBH oHEO21rFIHasSdk8c40YfKmgPFYXNv6r5J5KrR3y9oqAajTbz+EX5jUCn6+ZPKZd Djr9naTww9N4pGdZ/8uVy8NB/98cd+qFtqJ8C9jGd148NunyHSM5CSJULCw+1ZCo 1Yu0pZLUrqQdk3znd7O5o0EF8x6hsa0DXu1S/Z1t+7p5JqGIH+uWsT1ljR8y2s51 AfvnfdCcJ8KZ90v+fdyOC1rPd0+TTPPp/N7T214UiwN8yFvfM+ng3ZmE7301ULKx yRcCLU68N7p5aIuLL1ZAuo2teR3/XZBd2Ghx/aiTwzj2ik286rNXdRVzw==
X-ME-Sender: <xms:ZT0AZP-D6chUsqgOeu2_FSG5l4nGH5JadrvmfCcCaM2v8FihVUxqZA> <xme:ZT0AZLtpDnFCAxmx9uZyXdoRkE_SgObrTJneLvHoAzAzJgH1sM2DRiY0AM_gMxEBv QyU65SivkPhO1RLmzE>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrudeliedgledtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsehttd ertderredtnecuhfhrohhmpedfofgrrhhtihhnucfvhhhomhhsohhnfdcuoehmtheslhho figvnhhtrhhophihrdhnvghtqeenucggtffrrghtthgvrhhnpedugfekffehtedvtdeufe etieefkefffedugeffffduveejleffheevtdehfeekveenucffohhmrghinhepihgvthhf rdhorhhgpdhrfhgtqdgvughithhorhdrohhrghenucevlhhushhtvghrufhiiigvpedtne curfgrrhgrmhepmhgrihhlfhhrohhmpehmtheslhhofigvnhhtrhhophihrdhnvght
X-ME-Proxy: <xmx:ZT0AZNBe5a-kPzEDW0VIlwe--CSJxPbWk1-LrCrp7KN6hokStvtLKg> <xmx:ZT0AZLeqewSOrkA0WvFNhV9AdLCqCBapllmzBaYU3QEGKEtyLxyZLw> <xmx:ZT0AZEPe_1K7FevyDDLW53Uoe2b0PZThN2fyMv0PAfDarF9R7wX94w> <xmx:ZT0AZKbPQPAp58VD3OoAzMgsaTra2yomtcL7UT4x6WJBMe6xMov7zA>
Feedback-ID: ic129442d:Fastmail
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 8C424234007B; Thu, 2 Mar 2023 01:08:37 -0500 (EST)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.9.0-alpha0-183-gbf7d00f500-fm-20230220.001-gbf7d00f5
Mime-Version: 1.0
Message-Id: <b9ad57d7-55f1-4344-be52-a86dd06b54d0@betaapp.fastmail.com>
In-Reply-To: <D00AEDC64CE8DC5B4BF7D388@PSB>
References: <BA2D696A-7F72-47B9-8692-682807E0FDDA@icann.org> <D00AEDC64CE8DC5B4BF7D388@PSB>
Date: Thu, 02 Mar 2023 17:08:16 +1100
From: Martin Thomson <mt@lowentropy.net>
To: rswg@rfc-editor.org
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/rswg/wN6fv5BczZ-L56FI-LbJ66I7TqM>
Subject: Re: [Rswg] Updating RFC 7997 (The Use of Non-ASCII Characters in RFCs)
X-BeenThere: rswg@rfc-editor.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "RFC Series Working Group \(RSWG\)" <rswg.rfc-editor.org>
List-Unsubscribe: <https://mailman.rfc-editor.org/mailman/options/rswg>, <mailto:rswg-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rswg/>
List-Post: <mailto:rswg@rfc-editor.org>
List-Help: <mailto:rswg-request@rfc-editor.org?subject=help>
List-Subscribe: <https://mailman.rfc-editor.org/mailman/listinfo/rswg>, <mailto:rswg-request@rfc-editor.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Mar 2023 06:08:46 -0000

John,

The depth of analysis that you might like to see applied might differ substantially from mine.  I would rather have an OK solution soon, than a perfect solution that takes a lot of energy to produce.  If there is a bit of wiggle room and we have to rely on authors, RPC, and (in the event of failure) stream managers to exercise good judgment, then I don't see that as a failure.

1.  Availability.  We've got Cherokee script in this draft and that's fine.

2. Searching.  The mess of normalization forms and whatnot is far less of a problem here than in something like the DNS.  Where it really matters, like in protocol elements, we can trust protocol authors and their reviewers to use or require appropriate levels of precision.  Otherwise, the documents will still be in English, so while searching might fail due to a poor choice, that only happens for the sorts of things that Unicode is used for (names, examples, etc...).  Again, I don't think that we need to make this hard.

3. RTL.  HTML has a solution here.  We should carefully reuse their design (we might choose not to take vertical writing systems) than try to invent our own thing.  Our needs are extremely modest here as we are not producing entire texts in languages other than English.  Supporting names and small examples appear to work without a lot of fancy machinery and we can likely choose not to use text that causes spectacular failures.  Spot fixing as the need arises might be better.

On Thu, Mar 2, 2023, at 15:40, John C Klensin wrote:
> Paul,
>
> Replying to the first message in the thread, but I've read all
> the others.  In addition to the more procedural, document
> relationship, and general content  issues those comments have
> mostly been about, I think the draft is going to need some more
> substantive work.  A few rather high level points by means of
> example(these are not the only ones):
>
> (1) While I agree with John Levine [1] that almost all systems
> these have Unicode support these days, that support gets
> seriously non-uniform when one starts looking at little-used and
> archaic scripts, some symbol sets, etc.   John's "for some
> version of reasonable" comment hints at the problem.    While it
> might be reasonable to put the specific instructions about what
> is considered reasonable to the Style Guide, it seems to me that
> this document should at least work out the theory or, at least,
> that it should not be completely and published until the Style
> Guide has addressed those issues in depth.
>
> (2) For most writing systems, at least most contemporary ones,
> Unicode has the unfortunate property that there can be several
> different ways to express the same conceptual character or
> string and the nice property that they usually display that same
> way (i.e., for display purposes, those different forms make no
> difference).  However, as soon as one imposes a requirement for
> searching, things get more complicated and one has to specify
> rules about how characters and strings are represented or about
> properties of the search procedure that go beyond matching that
> the octet level and into areas that require an understanding of
> the possible variations and that are, in turn, often
> language-dependent at the level of BCP 47 language codings and
> not just, e.g., "French".  At one point, we believed that
> normalization would solve those problems and it does... except
> where it doesn't.
>
> (3) There are many circumstances in which there is a need to
> embed left-to-right text in right-to-left strings and vice
> versa.  Simple application of Unicode's "Bidi" procedure can
> produce unintended (sometimes astonishing) results.  The
> problems are best dealt with by careful markup, but it is not
> clear that the markup mechanisms we have are sufficient.
>
> In addition to the issues with addressing those questions at
> all, different possible answers may imply very different
> requirements for expertise and expected times for document
> processing within the RPC.
>
> I don't think the document should move forward without issues of
> those types being addressed.  And it is not clear to me that
> trying to dig into them on this list would be the optimal way
> forward.
>
>     john
>
>
>
>
>
>
> [1]
> https://mailarchive.ietf.org/arch/msg/rswg/WW-b0P5aNtJFDw2OZbsIsfF0AYM
>
> --On Tuesday, February 28, 2023 20:08 +0000 Paul Hoffman
> <paul.hoffman@icann.org> wrote:
>
>> Greetings again. The message from the RSAB last week
>> <https://mailarchive.ietf.org/arch/msg/rswg/BgB7Ue-vNLCa3GTUTX
>> ZVsx6_VcA> showed that the RPC is having problems with RFC
>> 7997. RFC 7997 has a number of internal inconsistencies that
>> will make publication of RFCs more work for the RPC and the
>> stream managers. Although it was part of the series of RFCs
>> related to the new grammar and output formats, it dealt with a
>> different topic that had long interested the IETF: the use of
>> non-ASCII characters in names and technical text.
>> 
>> Given that, I have prepared
>> <https://datatracker.ietf.org/doc/draft-hoffman-rfc7997bis/>
>> to clean up RFC 7997, to make it internally consistent, and to
>> deal with situations like the one from the RSAB of documents
>> that have good uses for non-ASCII characters. These changes
>> will likely spark concern for some people about not being able
>> to read some RFCs, but the world of terminal software,
>> browsers, and PDF viewers has moved far forwards in the past
>> six years. Also,  the publication of RFC 9280 makes it clear
>> that the RFC streams themselves should be determining the
>> contents of the RFCs in their stream, with coordination among
>> the streams.
>> 
>> If working on this is of interest to the WG (even if there is
>> differences of opinion of what the rules should be), it could
>> be adopted as a WG document.
>
>
> -- 
> rswg mailing list
> rswg@rfc-editor.org
> https://mailman.rfc-editor.org/mailman/listinfo/rswg