Re: [rfc-i] line wrapping in XML

Mark Nottingham <mnot@mnot.net> Fri, 30 October 2020 00:57 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0B64E3A0AC3; Thu, 29 Oct 2020 17:57:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.45
X-Spam-Level:
X-Spam-Status: No, score=-2.45 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=mnot.net header.b=mQChYuBv; dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=messagingengine.com header.b=npC1qxOs
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FHj_DdJiGTEq; Thu, 29 Oct 2020 17:57:05 -0700 (PDT)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 452E43A0A85; Thu, 29 Oct 2020 17:57:05 -0700 (PDT)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id 0B345F4071E; Thu, 29 Oct 2020 17:56:51 -0700 (PDT)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id DAFBFF4071E for <rfc-interest@rfc-editor.org>; Thu, 29 Oct 2020 17:56:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Authentication-Results: rfcpa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=mnot.net header.b=mQChYuBv; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=npC1qxOs
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RUCyaUI4X-eJ for <rfc-interest@rfc-editor.org>; Thu, 29 Oct 2020 17:56:45 -0700 (PDT)
Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by rfc-editor.org (Postfix) with ESMTPS id 742AAF4071D for <rfc-interest@rfc-editor.org>; Thu, 29 Oct 2020 17:56:45 -0700 (PDT)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id AD9385C0153; Thu, 29 Oct 2020 20:56:58 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Thu, 29 Oct 2020 20:56:58 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mnot.net; h= content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=fm1; bh=3 bQdjROl44RumEzH6MBD5vzEW3/+e+JK/NbnZ45Y/Q8=; b=mQChYuBvIiqmF6f8m K6WLNHCenw6vLx3ugGGLCQBvXqDAZGVtLEehFddHk0vd6x6BNvlPoSOArCfXx0x3 2mEYxCJtTYeQHTQ2g9Mq04KYeCJc5TnvulRBqdKLQB+OO+RaMQk1sY7cO6lF/YAQ x+hitBw9jKfsxo1YspJSt2NZGm25/foyDOpGrBtFjvXbt4cO2kxe5RTCdUm6h6bX SsBmGuZ2V0Hg9wf/zLJ+Pq7ybPHpUVSKTNqXO9vd0WHSKaxouO7r6BTGeCR85Tut 3E93X2RtTAjkCjrLaVrpCHCRtV9ssbS1XWlWBt9u9/vHCTe6xyOO9YxjKDIgM/hc APXww==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=3bQdjROl44RumEzH6MBD5vzEW3/+e+JK/NbnZ45Y/ Q8=; b=npC1qxOsMuNZQACgoNUAbjXh9CjotFXqdVw2fdMF8sDO55S5BXISvIsA3 c0PU4bfO5qpr7iF1gMNXRM5qTlSWqJpb/01NePlwsvkUepG4bsBPihaPd1mMTqTO gc8H7AHzW4BW72DgTXpwmNkuBQc4bF+x4HLcc8XLiZvNZN1oNYBx1PWsDKGGYukh 2WY7/ElYWZx6vD5Ebl9BeIpQsMJQY2w0tda3RlRjQip6dEPRxx54q8NH4WH8FQ+Q SL9kdnTUfaJS4t/Yalw0ioZ1HL40R/0f1FN26VABw7W6jw1/3J5I4XTUbdrRrZPF xt7iKILZfBetR2+SVvcDRBHL++rSg==
X-ME-Sender: <xms:2WSbX-_cpYbhECXuG9isYxBmYJ4iu36Gt6TSTJxewwk0Ymk59WqFmw> <xme:2WSbX-vnZ48ef5Kkpz9Ehq37nBT6VHPrpnVRkfhaiCEfc1wqCpEAU_1O4uMQB-XQe S5CICJe_UiOeYuKbQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrleeggddvkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpegtggfuhfgjfffgkfhfvffosehtqhhmtdhhtddvnecuhfhrohhmpeforghrkhcu pfhothhtihhnghhhrghmuceomhhnohhtsehmnhhothdrnhgvtheqnecuggftrfgrthhtvg hrnhepvedtheetjeeukefhiefgtdektdehhfefhfdugfeggfeuhfehiedtveeftdevleev necuffhomhgrihhnpehhthhtphhfihgvlhguvhgrlhhuvghsthhorgguughrvghsshhthh gvshgvphhrohgslhgvmhhsrdhinhdpshhouhhrtggvfhhorhhgvgdrnhgvthdpghhithhh uhgsrdgtohhmpdhnrghmvghsrdgsrhdpthgrghhsrdgsrhdpvghlvghmvghnthhsrdgsrh dphhhtmhhlqdhtihguhidrohhrghdpmhhnohhtrdhnvghtpdhrfhgtqdgvughithhorhdr ohhrghenucfkphepudduledrudejrdduheekrddvhedunecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomhepmhhnohhtsehmnhhothdrnhgvth
X-ME-Proxy: <xmx:2WSbX0A5n_e6NhRMHEGlyFOndKQcBGTDeXVtgdq0jnoQOxqXhGA_KA> <xmx:2WSbX2dMHgj1b72Rqiv4NzHHEHTG192TmoaLlyF6nj9qAJkvnz1bVA> <xmx:2WSbXzPWwqqOcvp-008FBrTDrI19YtT6pm_mnRGpS6C79HT3GIri9A> <xmx:2mSbX8ZWUWWnWuCJFsNPy39d3NwZDawWBrOQu-CAzldRLwwNpDHnNw>
Received: from [192.168.7.30] (119-17-158-251.77119e.mel.static.aussiebb.net [119.17.158.251]) by mail.messagingengine.com (Postfix) with ESMTPA id 40F3C3280059; Thu, 29 Oct 2020 20:56:56 -0400 (EDT)
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <ae62b047-be53-40ca-8a3d-1b6dcb195ecc@www.fastmail.com>
Date: Fri, 30 Oct 2020 11:56:51 +1100
Message-Id: <831F4F53-DD3C-4B8E-A970-7CA794F25848@mnot.net>
References: <30D23CA0-2A80-4BA3-AC18-285CF45FB5FF@mnot.net> <5D30DC79-3EF7-4BAC-BAAD-07224122C8D7@akamai.com> <996A8A43-640B-4888-B4CB-8A1081C4986C@mnot.net> <567ecf8f-2c0f-48ae-bb66-2e594c67bcfd@www.fastmail.com> <ae62b047-be53-40ca-8a3d-1b6dcb195ecc@www.fastmail.com>
To: Martin Thomson <mt@lowentropy.net>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Subject: Re: [rfc-i] line wrapping in XML
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Cc: rfc-interest@rfc-editor.org
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

The problem is that this happens in blocks like this:

<t>This document introduces a set of common data structures for use in definitions of new HTTP field values to address these problems. In particular, it defines a generic, abstract model for them, along with a concrete serialization for expressing that model in HTTP <xref target="RFC7230" format="default"/> header and trailer fields.</t>

which becomes:

<t>This document introduces a set of common data structures for use in
definitions of new HTTP field values to address these problems. In
particular, it defines a generic, abstract model for them, along with a
concrete serialization for expressing that model in HTTP 
<xref target="RFC7230" format="default" />header and trailer fields.</t>

Notice the removed space after the <xref> element. I don't think xml:space="preserve" is the solution here.

This has been reported to Tidy many times over the years:
 https://sourceforge.net/p/tidy/bugs/663/
 https://sourceforge.net/p/tidy/bugs/740/
 https://github.com/htacg/tidy-html5/issues/818

Cheers,


> On 30 Oct 2020, at 11:35 am, Martin Thomson <mt@lowentropy.net> wrote:
> 
> And I should read what I paste.  This says that it ignores the option for XML mode, which isn't great.  I think that you might have to use xml:space="preserve" on those elements that you want to protect, so a wrapper might be needed.  It's a small wrapper though.
> 
> On Fri, Oct 30, 2020, at 11:32, Martin Thomson wrote:
>> You can configure tidy to preserve whitespace in certain elements, so 
>> it seems like a viable option.
>> 
>> <option class="TidyMarkupTeach">
>>  <name>new-pre-tags</name>
>>  <type>Tag Names</type>
>>  <default />
>>  <example>tagX, tagY, ...</example>
>>  <description>This option specifies new tags that are to be processed 
>> in exactly the same way as HTML's <code>&lt;pre&gt;</code> element. 
>> This option takes a space or comma separated list of tag names. 
>> <br/>Unless you declare new tags, Tidy will refuse to generate a tidied 
>> file if the input includes previously unknown tags. <br/>Note you 
>> cannot as yet add new CDATA elements. <br/>This option is ignored in 
>> XML mode. </description>
>>  <seealso>new-blocklevel-tags</seealso>
>>  <seealso>new-empty-tags</seealso>
>>  <seealso>new-inline-tags</seealso>
>>  <seealso>custom-tags</seealso>
>>    <eqconsole />
>> </option>
>> 
>> -- from `tidy -xml-config`
>> 
>> 
>> On Fri, Oct 30, 2020, at 11:28, Mark Nottingham wrote:
>>> Nope; tried, but it removes spaces in odd places that are semantically 
>>> significant.
>>> 
>>> 
>>>> On 30 Oct 2020, at 11:26 am, Salz, Rich <rsalz@akamai.com> wrote:
>>>> 
>>>> 
>>>>>  1. Can we identify a freely available tool for correctly line-wrapping XML that people can insert into their toolchain pre-RPC, so that the RPC doesn't need to hand-wrap lines?
>>>> 
>>>> https://www.html-tidy.org/
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Mark Nottingham   https://www.mnot.net/
>>> 
>>> _______________________________________________
>>> rfc-interest mailing list
>>> rfc-interest@rfc-editor.org
>>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
>>> 
>> _______________________________________________
>> rfc-interest mailing list
>> rfc-interest@rfc-editor.org
>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
>> 
> _______________________________________________
> rfc-interest mailing list
> rfc-interest@rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest

--
Mark Nottingham   https://www.mnot.net/

_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest