Re: [rfc-i] line wrapping in XML

Martin Thomson <mt@lowentropy.net> Fri, 30 October 2020 00:57 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F3A0C3A0AC3; Thu, 29 Oct 2020 17:57:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.45
X-Spam-Level:
X-Spam-Status: No, score=-2.45 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=lowentropy.net header.b=sB6RUsGI; dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=messagingengine.com header.b=r4pGYlp+
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bpFZWVvfVnPS; Thu, 29 Oct 2020 17:57:12 -0700 (PDT)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2E44A3A0A85; Thu, 29 Oct 2020 17:57:12 -0700 (PDT)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id 2768EF40723; Thu, 29 Oct 2020 17:56:58 -0700 (PDT)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id 563F1F4071D for <rfc-interest@rfc-editor.org>; Thu, 29 Oct 2020 17:56:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Authentication-Results: rfcpa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b=sB6RUsGI; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=r4pGYlp+
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4FIs4J49gBcO for <rfc-interest@rfc-editor.org>; Thu, 29 Oct 2020 17:56:53 -0700 (PDT)
Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) by rfc-editor.org (Postfix) with ESMTPS id 2F121F40723 for <rfc-interest@rfc-editor.org>; Thu, 29 Oct 2020 17:56:53 -0700 (PDT)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 8EDA7559 for <rfc-interest@rfc-editor.org>; Thu, 29 Oct 2020 20:57:03 -0400 (EDT)
Received: from imap10 ([10.202.2.60]) by compute1.internal (MEProxy); Thu, 29 Oct 2020 20:57:03 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=mime-version:message-id:in-reply-to:references:date:from:to :subject:content-type; s=fm3; bh=q09ahZ+naSpy1gJ5vFOuMNODNWRo8OL IYc5TRsleFJ0=; b=sB6RUsGIdyxH/mzLdUNQF3ZOFMrntxmMBKR5LEgEhkKI/2M K3aNhAN0MMgiEdEHXKN73XEHF4C4uCb5+0WNQRGAD0Pv879mEf6djZAjNI9cxHyr JwLrjBC8wAMmkN7oJ1RfAzMuGHdXPddmvzfyc1WUhWoMCLSNXhAwz19O7yzvCPFm Px9sO9XoZDc7fVfXBIQo3nlo8d+OdZlNkN+VZRUtsV2N2q7NYoQ/TuKAEzUmYayK eDpMxt9xK4BEmT4NYGP/fKhk6fheQS5bHYK3DDvU1vdPyCjCy0cIdfItgMbdnwn1 Q2KTwCYI/dg3d7TGml48bCuHEU5sssJ3KI1ouEA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=q09ahZ +naSpy1gJ5vFOuMNODNWRo8OLIYc5TRsleFJ0=; b=r4pGYlp+dkMOFSC2L38fxt Aug2cTI3mWPso2dlV+1qDsseKMtnQTsWCsBDB4Itl8syvf2MoJp8EmUW4J5QlQIC HDzyyGjzT1mZ224ea+PBp1OycFA5rF70SOpOMDit8J7iSZzJHvUecyu9S1xaTZDk sO8a9Umd3qEdESSIgWhS5MKnezhEQZbjRqCOrj7fG3lIhOL3QHnuovH+10KkFkTW SOxno5abAiIJ14Lj++attl1TwIihVoZP2ldMilzauEwFyNCEVpIccc1wjKpox6v0 2jMhFlq0iKY7f96JQkyZN/sSiTeTjIhw5iaFpx31+v0+1TI9FRi2V+01rhJsez/w ==
X-ME-Sender: <xms:32SbX-wpyELqB9GKbD6J_N5EK-VMKmlpL7PSywXqmFf6rjpHDzMWFw> <xme:32SbX6T9adpNamyImWNfxl2SCcUkauiNkVRNvwIKvZkiGd6oQwWbnGGqO67vbzj9N 8E9CsPEhTH3Fp8kdIE>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrleeggddvkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhepofgfggfkjghffffhvffutgesthdtre dtreertdenucfhrhhomhepfdforghrthhinhcuvfhhohhmshhonhdfuceomhhtsehlohif vghnthhrohhphidrnhgvtheqnecuggftrfgrthhtvghrnhepheevvdfffefhtedvieeule ffudevgeduheefffdtveegtdevheeviedtkeegvdeknecuffhomhgrihhnpehnrghmvghs rdgsrhdpthgrghhsrdgsrhdpvghlvghmvghnthhsrdgsrhdphhhtmhhlqdhtihguhidroh hrghdpmhhnohhtrdhnvghtpdhrfhgtqdgvughithhorhdrohhrghenucevlhhushhtvghr ufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmtheslhhofigvnhhtrhhoph ihrdhnvght
X-ME-Proxy: <xmx:32SbXwWvkjveoCUMLwMf3vr7RnBmE8LDfYxx3RUaddSzbebLoSX_Fg> <xmx:32SbX0hvMhr9456_C9Xj7xwrDh8XoA1t4IusFis5QaPNl5lr-MgSPg> <xmx:32SbXwDWF7qEXOinJAU-7YJZFB3CpoiQcbNR3pD5--bLfsoshwHtMA> <xmx:32SbXzNmQx09HRFo_gIWuUJONdVnAIxbcKAzpibTFfHw-X6yh6HDXA>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id E31F220214; Thu, 29 Oct 2020 20:57:02 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.3.0-530-g8da6958-fm-20201021.003-g69105b13-v35
Mime-Version: 1.0
Message-Id: <4cc032cf-9ab2-42f4-aec8-4d688a1b5150@www.fastmail.com>
In-Reply-To: <ae62b047-be53-40ca-8a3d-1b6dcb195ecc@www.fastmail.com>
References: <30D23CA0-2A80-4BA3-AC18-285CF45FB5FF@mnot.net> <5D30DC79-3EF7-4BAC-BAAD-07224122C8D7@akamai.com> <996A8A43-640B-4888-B4CB-8A1081C4986C@mnot.net> <567ecf8f-2c0f-48ae-bb66-2e594c67bcfd@www.fastmail.com> <ae62b047-be53-40ca-8a3d-1b6dcb195ecc@www.fastmail.com>
Date: Fri, 30 Oct 2020 11:56:41 +1100
From: Martin Thomson <mt@lowentropy.net>
To: rfc-interest@rfc-editor.org
Subject: Re: [rfc-i] line wrapping in XML
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

I just tested tidy and it's almost good enough, but it has one major problem that I've found.

FWIW, this is what I ran:
tidy -xml -o tls.xml -w 80 -i draft-ietf-quic-tls.xml

The problem is that there are places in the document like this:

> This document describes how QUIC <xref target="QUIC-TRANSPORT" format="default" /> is

And tidy removes the space after the <xref/>, which is quite unfortunate.

On Fri, Oct 30, 2020, at 11:35, Martin Thomson wrote:
> And I should read what I paste.  This says that it ignores the option 
> for XML mode, which isn't great.  I think that you might have to use 
> xml:space="preserve" on those elements that you want to protect, so a 
> wrapper might be needed.  It's a small wrapper though.
> 
> On Fri, Oct 30, 2020, at 11:32, Martin Thomson wrote:
> > You can configure tidy to preserve whitespace in certain elements, so 
> > it seems like a viable option.
> > 
> >  <option class="TidyMarkupTeach">
> >   <name>new-pre-tags</name>
> >   <type>Tag Names</type>
> >   <default />
> >   <example>tagX, tagY, ...</example>
> >   <description>This option specifies new tags that are to be processed 
> > in exactly the same way as HTML's <code>&lt;pre&gt;</code> element. 
> > This option takes a space or comma separated list of tag names. 
> > <br/>Unless you declare new tags, Tidy will refuse to generate a tidied 
> > file if the input includes previously unknown tags. <br/>Note you 
> > cannot as yet add new CDATA elements. <br/>This option is ignored in 
> > XML mode. </description>
> >   <seealso>new-blocklevel-tags</seealso>
> >   <seealso>new-empty-tags</seealso>
> >   <seealso>new-inline-tags</seealso>
> >   <seealso>custom-tags</seealso>
> >     <eqconsole />
> >  </option>
> > 
> > -- from `tidy -xml-config`
> > 
> > 
> > On Fri, Oct 30, 2020, at 11:28, Mark Nottingham wrote:
> > > Nope; tried, but it removes spaces in odd places that are semantically 
> > > significant.
> > > 
> > > 
> > > > On 30 Oct 2020, at 11:26 am, Salz, Rich <rsalz@akamai.com> wrote:
> > > > 
> > > > 
> > > >>   1. Can we identify a freely available tool for correctly line-wrapping XML that people can insert into their toolchain pre-RPC, so that the RPC doesn't need to hand-wrap lines?
> > > > 
> > > > https://www.html-tidy.org/
> > > > 
> > > > 
> > > > 
> > > 
> > > --
> > > Mark Nottingham   https://www.mnot.net/
> > > 
> > > _______________________________________________
> > > rfc-interest mailing list
> > > rfc-interest@rfc-editor.org
> > > https://www.rfc-editor.org/mailman/listinfo/rfc-interest
> > >
> > _______________________________________________
> > rfc-interest mailing list
> > rfc-interest@rfc-editor.org
> > https://www.rfc-editor.org/mailman/listinfo/rfc-interest
> >
> _______________________________________________
> rfc-interest mailing list
> rfc-interest@rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
>
_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest