Re: [rfc-i] Unicode in xml2rfc v3

Marc Petit-Huguenin <marc@petit-huguenin.org> Sun, 20 December 2020 16:39 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7A27E3A10C1; Sun, 20 Dec 2020 08:39:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.651
X-Spam-Level:
X-Spam-Status: No, score=-2.651 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, NICE_REPLY_A=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PkKquDYnf-af; Sun, 20 Dec 2020 08:39:45 -0800 (PST)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3BE803A10BE; Sun, 20 Dec 2020 08:39:45 -0800 (PST)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id 899A2F4070F; Sun, 20 Dec 2020 08:39:30 -0800 (PST)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id AEC70F4070F for <rfc-interest@rfc-editor.org>; Sun, 20 Dec 2020 08:39:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eTSlEKCAwigw for <rfc-interest@rfc-editor.org>; Sun, 20 Dec 2020 08:39:24 -0800 (PST)
Received: from implementers.org (implementers.org [92.243.22.217]) by rfc-editor.org (Postfix) with ESMTPS id AFC49F406F0 for <rfc-interest@rfc-editor.org>; Sun, 20 Dec 2020 08:39:24 -0800 (PST)
Received: from [IPv6:2601:648:8400:8e7d:3995:454c:9923:c3b5] (unknown [IPv6:2601:648:8400:8e7d:3995:454c:9923:c3b5]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "Marc Petit-Huguenin", Issuer "implementers.org" (verified OK)) by implementers.org (Postfix) with ESMTPS id D1B7AAE284; Sun, 20 Dec 2020 17:39:33 +0100 (CET)
To: John R Levine <johnl@taugh.com>, rfc-interest@rfc-editor.org
References: <20201219215415.CFEBA2AE17AC@ary.qy> <53f68fa2-933f-8909-0c37-6e8e1d5e9c9b@petit-huguenin.org> <93bd1bc0-229-3914-ba71-ccaf1976f69@taugh.com>
From: Marc Petit-Huguenin <marc@petit-huguenin.org>
Message-ID: <7b588e1d-74db-4bab-6154-4e8306fd779b@petit-huguenin.org>
Date: Sun, 20 Dec 2020 08:39:31 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0
MIME-Version: 1.0
In-Reply-To: <93bd1bc0-229-3914-ba71-ccaf1976f69@taugh.com>
Content-Language: en-US
Subject: Re: [rfc-i] Unicode in xml2rfc v3
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset="utf-8"; Format="flowed"
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

On 12/19/20 5:29 PM, John R Levine wrote:
> On Sat, 19 Dec 2020, Marc Petit-Huguenin wrote:
>> I care exclusively about specifications that can be implemented as interoperable programs.  The minimal formulation for such specifications is a dependent type, which can be always be expressed in ASCII.
> 
> I entirely agree that it makes sense to write code in ASCII.
> 
> But most of the contents of RFCs is not code, it's text, and we have hundreds of years of experience typesetting text.  Look at any decently produced book or magazine and you will see that the character set is a lot broader than ASCII, which makes it a lot more readable.
> 

I am not arguing that all text produced should be in ASCII.  In fact non-normative parts of a standard could use non-ASCII  -- I do not read these anyway because I believe that a standard should be implementable even after been stripped of all the informative parts (abstract, introduction, overview of operations, examples, any diagram that is not complete, any list that is not exhaustive, informative references, appendices).

I would even go further, in that everything produced by the IRTF stream should use the whole Unicode character set, such as they look more a paper written in LaTeX, and just forgo the text version.  These documents are meant to be read, not to be implemented, and there is really nothing that come close to a nicely typeset PDF to absorb information.

Where I draw the line is in the parts that are meant to be implemented, aka normative text.  These are to be (at least virtually) translated into a dependent type (or higher order intuitionistic logic), itself then derived into a program (aka, the stuff I am payed to produce).  To be able to do that translation I need unambiguous text, and with as little flourish as possible.  Even before adding non-ASCII characters, my observation is that the normative text in RFCs already have way too many words, and I wish that a editor specialized in the topic (not the RFC editor) would have spent time distilling these sentences to their essence.  Which in turn would make the translation explained above easier, which in turn would make the programs derived from it safer, which in turn would make the Internet works better.

Now admitting non-ASCII for some RFCs or some parts of an RFCs requires discipline so that's a dead end.  One way could be to annotate each section with a normative boolean attribute and prevent <u> (and other stuff) in a normative=true section.  That will not happen so as a fallback the <u> thing seems a good way for me to generate a plain text that replaces these with the CLDR short text (as suggested in a previous email) when preparing for printing.  I already use a patched xml2rfc (to print RFCs with pagination), so that's not a big deal.  I do not believe that the IETF Trust allows for redistribution of stripped down RFCs, so other people will not profit from that improvement.

-- 
Marc Petit-Huguenin
Email: marc@petit-huguenin.org
Blog: https://marc.petit-huguenin.org
Profile: https://www.linkedin.com/in/petithug
_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest