Re: [Tools-discuss] Why do we even have text formats any more?

Martin Thomson <mt@lowentropy.net> Wed, 28 July 2021 06:59 UTC

Return-Path: <mt@lowentropy.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 086C83A207E for <tools-discuss@ietfa.amsl.com>; Tue, 27 Jul 2021 23:59:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b=m33sCQw9; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=qHCwYmcp
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7vuapTxR8dWz for <tools-discuss@ietfa.amsl.com>; Tue, 27 Jul 2021 23:59:17 -0700 (PDT)
Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 002113A207D for <tools-discuss@ietf.org>; Tue, 27 Jul 2021 23:59:16 -0700 (PDT)
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 06CBF5C00D0 for <tools-discuss@ietf.org>; Wed, 28 Jul 2021 02:59:16 -0400 (EDT)
Received: from imap41 ([10.202.2.91]) by compute5.internal (MEProxy); Wed, 28 Jul 2021 02:59:16 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=mime-version:message-id:in-reply-to:references:date:from:to :subject:content-type; s=fm2; bh=8GGdWe1iTIwcmSizV9kkC9ax3Q0GpVo Nt4S8+s/KZ4c=; b=m33sCQw9m6fACsuwstVCBAesEyeBEVx1EgEXLM6CXASjll9 WuTMsIHlCTI+HvOfMbxFO9Sq2hkq/L3PNqnX4bcu+hgK592hvAfYEKSSMz3VRkoG iD3JMV+8MxfOQaUnbCfGLeZfrwDvI+fzGR7DZRcc+BBTdiFU71T98x9n3E4AWPSg v88wDu10ePtSdiLc8SZGiZNiJ3SGFv9ob5VFs1XA5nYNcLYOSENmnSBq33gcfeK6 FbsYUoxMN2uO3ZhCuZjeiKHeiDb9axiby+Wt/6OeGcWpNyp3zehKhAD0I2j/OCW/ HQTsovFvLjX6pKzVUpNlk3OW+Lxh1rwuPprRSsw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=8GGdWe 1iTIwcmSizV9kkC9ax3Q0GpVoNt4S8+s/KZ4c=; b=qHCwYmcpbLlkQFQMMm9bEI yzpUZ2/Ljf7srFb2x7O8okv8qVsAkjI/0UIbof8T66yw8NlPkqTT9hkhOkJet3oo xIsaaBBDWiECmlA3C9Pm95f94sHfqRXRlB726HBNWYAdI9k19IQ4FJl0Z33eVq0E s2b8U5rRoKYoiQiQzC7CsUly0I1IjnlCZcC9+Tnx9YifaKGH4gO6e0qc5YrB1cnu q/WbeOrWFIVNWWB00OUamgy1rOpHpfHYLU6sSGTbhDaNjQoAYWtoweqG8a9xfzIm FnpbK+za0mRdB9Ehd2lhdt7CNH3+gj2FFw4Qu+eTsUSZ36j/EQh+Xhhhbb59MW1w ==
X-ME-Sender: <xms:QwABYe9gkTtOdi70m1uE-LGzIvRDfX-9OLn7vkBkntuyBs5F0sZkow> <xme:QwABYesbR-C5P8OVvJGCpp0e9v0auiuXfp3dm6uG4oBlhNnyF-RxPV9B3AYGXZpn8 i3K_Uos9M27jmHhhus>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrgeekgddutddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsehttd ertderredtnecuhfhrohhmpedfofgrrhhtihhnucfvhhhomhhsohhnfdcuoehmtheslhho figvnhhtrhhophihrdhnvghtqeenucggtffrrghtthgvrhhnpeekteeuieektdekleefke evhfekffevvdevgfekgfeluefgvdejjeegffeigedtjeenucevlhhushhtvghrufhiiigv pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmtheslhhofigvnhhtrhhophihrdhnvg ht
X-ME-Proxy: <xmx:QwABYUBdYN88pMKnpfJ-hJAtXi6gR6kumdWGJGAOUvi8O73IbXR9kQ> <xmx:QwABYWegsmvQgK7wimlMQahgY5Ht-AI4KivBCsFc83ScS147aEkVtg> <xmx:QwABYTOaYsT0-blM5A96-jpHIQrAJd2g6yNZPKqKJwYW9y9T7v6T9g> <xmx:RAABYZZazfgXKoXnzQ99lerEOpaXeKbMZbVbdKlYnDIo4V9Dy2MZ1g>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 86A333C0471; Wed, 28 Jul 2021 02:59:15 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.5.0-alpha0-545-g7a4eea542e-fm-20210727.001-g7a4eea54
Mime-Version: 1.0
Message-Id: <4e967ce8-618d-4c00-b873-e6b64b54659a@www.fastmail.com>
In-Reply-To: <14bd112c-fd34-44ce-dcbc-9f3b989cdd7d@nostrum.com>
References: <4d70a1ac-a275-420a-83f6-99dfd5b5385c@www.fastmail.com> <14bd112c-fd34-44ce-dcbc-9f3b989cdd7d@nostrum.com>
Date: Wed, 28 Jul 2021 16:59:00 +1000
From: "Martin Thomson" <mt@lowentropy.net>
To: tools-discuss@ietf.org
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/cYI1sbVz2P_x_tyIzKmgGbaebYM>
Subject: Re: [Tools-discuss] Why do we even have text formats any more?
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jul 2021 06:59:22 -0000

On Wed, Jul 28, 2021, at 12:22, Robert Sparks wrote:
> "for new things"

Of course.  Anything that is in v3 format could get this treatment.

> diff

Yeah, I noted that.

It is clear to me that HTML tools are just not good enough.  Oddly, this is not because they couldn't just diff the text, but because they try to do so much more, like capture style or element choice changes as well as text changes.

For this, and for Ekr's use case, or anything where the goal is to remove noise, having text renderings available is useful.  

My pitch here is mostly about the most common case, which is people reading the documents in a web browser (and maybe those who print them too).  Keeping text because people have processes or tools that rely on text is probably necessary, if only because that means being able to present the entire series in a nominally consistent format.

I do think that the htmlizer can go in that case.

> create a writer for it that builds it from the xml source rather than trying to 
> pull things by heuristics from the text. 

Yeah, this is a neat idea, and it would avoid some of the issues I've discovered.  That said, the marginal benefit for the engineering cost involved seems unjustified in light of the HTML restyling option here.

> the processor at the datatracker will get the references wrong. 

Interesting.  Is this because people use some combination of entity references and processing instructions and fear that those will be filled in poorly?  That seems like a reasonable concern.  Part of it might derive from the fact that people have their own (skewed or hacked) local reference caches that they want to use in generating the final output.  I believe that xml2rfc can already produce XML with inline references, which draws from local sources.  Maybe we just need to document that process better for those people.

That and make the entire references infrastructure more reliable.  I routinely have build failures due to reference fetches that fail.