[Tools-discuss] Why do we even have text formats any more?

Martin Thomson <mt@lowentropy.net> Wed, 28 July 2021 01:54 UTC

Return-Path: <mt@lowentropy.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0C9B33A160B for <tools-discuss@ietfa.amsl.com>; Tue, 27 Jul 2021 18:54:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b=qrGuWq/P; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=tx2sse0o
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rSu9OySxRBXC for <tools-discuss@ietfa.amsl.com>; Tue, 27 Jul 2021 18:54:10 -0700 (PDT)
Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1A8AE3A1607 for <tools-discuss@ietf.org>; Tue, 27 Jul 2021 18:54:10 -0700 (PDT)
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 8B1BC5C0161 for <tools-discuss@ietf.org>; Tue, 27 Jul 2021 21:54:06 -0400 (EDT)
Received: from imap41 ([10.202.2.91]) by compute5.internal (MEProxy); Tue, 27 Jul 2021 21:54:06 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=mime-version:message-id:date:from:to:subject:content-type; s= fm2; bh=NQpaoNrLAUrfvLY4IDGNncRPFGBsejzDiMuFyBEo2/g=; b=qrGuWq/P Flx8z43ssaFcPq9/v7hLcWfMx2j2LeND6gXERJhRysrH4wYZ2OQquWDL8Tr1lJZJ 5sPlbvltZxNA2gucE/MCXpy2c/pGcMvPZRUEaFqchG3gRl4bLnVUD+qwuWRCH16X ixDUEF3tF9VYg388w6fR07s460wDNcXogqIWAauIgWK4RU6wObS0x+/A+U0BeyMP ozMXN9vmFYIN0UqFt2l3hJTdoZh9j15GjTKJ7oxy7byCqpPUuAER9d7RSDC9mcyD nk3RBTRBnrxmm9Bv7zsR2s/uFjfGtVpsbCYVfMR8oFIsBrl79Z+i+JH8av3NO7pK LtLIc9b5HN7VHw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:message-id :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; bh=NQpaoNrLAUrfvLY4IDGNncRPFGBse jzDiMuFyBEo2/g=; b=tx2sse0onb8hCXtVB4/KyFfZOHiz5r05BL0hbksBGULrg JNwun8K69TVPeFzljVrALqQJHKBHSjY+Nf7czG51fKkNFEZu/IniYCHtKyrdu9J+ r0dUjmFGMGYx6kS6jsaFUUDxHGvjkzDLbHoCu/oqSE/ChUSi95rAJFpfxl3BeenC /tpF2QtpLyLBc+NrgmlJfPNwt4yyJKWUGTFUf2LvNPLYRkipCiN5N22O3tly5P7Y up9dvcOBYXKAXlWQK9Ngo9r3iPw2q2VeuKI/FT87n5L9jdbvdivrFna6XjyDMhFy iKXeuhClPtkpr6O31Z00znKex13/ks8Nu7FeN12pg==
X-ME-Sender: <xms:vrgAYdDljII3BMd2xmwxyDy7fn5mWwUQ9uJWANUSj1pRIwS-CLaPBg> <xme:vrgAYbjVLLBBGnIZrTlZGiT2rXKEDumA9zmrNgA7hfIiOgbVJR9X-9VAM2977eYPm i6u1gV3Qf-s9FumBHM>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrgeekgdeflecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecuogfuuhhsphgvtghtffhomhgrihhnucdlgeelmdenuc fjughrpefofgggkfffhffvufgtsehttdertderredtnecuhfhrohhmpedfofgrrhhtihhn ucfvhhhomhhsohhnfdcuoehmtheslhhofigvnhhtrhhophihrdhnvghtqeenucggtffrrg htthgvrhhnpedufeejtdetuedtvedvgedttdejkedtveejveegheeikefftdelgfdvgfet ueejfeenucffohhmrghinhepghhithhhuhgsrdgtohhmpdhgihhthhhusgdrihhonecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepmhhtsehlohif vghnthhrohhphidrnhgvth
X-ME-Proxy: <xmx:vrgAYYmnGtjJ5S32rd9bdXWAazprqaTBi4tJTbvC3N4HtSqlN-PABQ> <xmx:vrgAYXzXDXtv4gN6rJTnLi9ElV_VUlp3XE2P1KZ2G05S_0AZBFL66g> <xmx:vrgAYSTsET--ybcPLV8gJdpoq-DF8adyjxsQCOrrUbnqeH8S6H-Bqg> <xmx:vrgAYZc6RkplPOPxoWWCmFiakQT5JucGQ80GFyz4iOXJvanoZz_OXA>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 2C40D3C0471; Tue, 27 Jul 2021 21:54:06 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.5.0-alpha0-545-g7a4eea542e-fm-20210727.001-g7a4eea54
Mime-Version: 1.0
Message-Id: <4d70a1ac-a275-420a-83f6-99dfd5b5385c@www.fastmail.com>
Date: Wed, 28 Jul 2021 11:53:49 +1000
From: "Martin Thomson" <mt@lowentropy.net>
To: tools-discuss@ietf.org
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/mKS3YnmcWsCrDa47_VnSwoVDuW8>
Subject: [Tools-discuss] Why do we even have text formats any more?
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jul 2021 01:54:15 -0000

I realize that this might be a little inflammatory as far as subjects go, but bear with me.

There are probably a few narrow cases where rendering plain text is better than HTML.  But what we've been doing for years (thanks to Henrik's great tool) is take text and turn it into HTML using the power of regular expressions.  That's been good, but it's not always reliable (how many errata mention that "Section X of [FOO]" links to Section X of this document?).  It's also been lagging as the text format changed (case in point: lack of a table of contents).

Here's an alternative: style the HTML so that it looks like the text.  I tried this and it worked shockingly well.

Repo: https://github.com/martinthomson/rfc-txt-html
Demo: https://martinthomson.github.io/rfc-txt-html/diff.html

This isn't perfect, but it seems pretty good to me.  Keep in mind that this took only a little bit of time to sketch out. No doubt it can be improved.  The readme has a bunch of things I found, all minor.

I don't think that this is the end of text, but a possible way to limit our use of the htmlizer[1].  People who need to automate access to content might still use text, though I will argue that XML is superior in that regard.   The other thing that comes to mind is diffs: HTML-native diff tools are somewhat less than ideal.  Either way, serving HTML is just better.

Enjoy,
Martin


[1] Though I still see a shocking number of people authoring in XML (or XML-capable input formats) and submitting in text.  But I think we have plans to limit that.