Re: [xml2rfc] Insufficiency of txt format

Carsten Bormann <cabo@tzi.org> Mon, 01 February 2021 17:21 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 $3608.120.23.2.4$)
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <9739f26c-30d2-7b04-e866-b556c05ce07a@alum.mit.edu>
Date: Mon, 01 Feb 2021 18:21:21 +0100
Cc: xml2rfc@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <153F2EFE-0450-463C-B9D2-3601CD2F4E4E@tzi.org>
References: <20210130190821.7504E6D02AD4@ary.qy> <fcf04c37-7d63-337f-a434-92bb26aa27cd@alum.mit.edu> <rv7pcs$fv0$1@gal.iecc.com> <9739f26c-30d2-7b04-e866-b556c05ce07a@alum.mit.edu>
To: Paul Kyzivat <pkyzivat@alum.mit.edu>
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/njbhZ0n1yv4eG0m_Hob-uxfLtr4>
Subject: Re: [xml2rfc] Insufficiency of txt format
Precedence: list

On 2021-02-01, at 17:34, Paul Kyzivat <pkyzivat@alum.mit.edu> wrote:
> 
> I'll be dead before v2 xml is.

I hope not, but this is an indication of the timelines we have to think in here.

Also, our documents have drawn-out timelines, during which the authoring-format-du-jour will change (as will even the “canonical” format).

So txt-level comparison needs to be part of any sustainable long-term strategy.

The interesting part here is normalization.  Rfcdiff has been normalizing the txt all along by removing footers and headers.  It could add normalization steps such as turning box-drawing characters into legacy ASCII.  If thinking about the comparison story is part of the strategy, such a presentation change could become invisible to long-timespan diffs.

Doing XML-level comparison could be aided by converting to a format such as Pyx (line-based ESIS representation) first.  Again, further normalization is crucial (as we have seen in the example of random attribute order already). The most ugly part here is likely to be white-space handling, which is a very sorry story for XML in general, but could be handled in a schema-driven way for RFCXML.  Also, preptool-generated attributes could be removed in the normalization (but they also can carry useful information).

Grüße, Carsten

[xml2rfc] Unicode box-drawing for a new --table-b… Daniel Kahn Gillmor
Re: [xml2rfc] [xml2rfc-dev] Unicode box-drawing f… Lars Eggert
Re: [xml2rfc] Unicode box-drawing for a new --tab… John Levine
Re: [xml2rfc] [xml2rfc-dev] Unicode box-drawing f… John Levine
Re: [xml2rfc] Unicode box-drawing for a new --tab… Tom Pusateri
Re: [xml2rfc] Unicode box-drawing for a new --tab… John R Levine
Re: [xml2rfc] Unicode box-drawing for a new --tab… Daniel Kahn Gillmor
Re: [xml2rfc] Unicode box-drawing for a new --tab… Julian Reschke
Re: [xml2rfc] Unicode box-drawing for a new --tab… Anders Rundgren
Re: [xml2rfc] Unicode box-drawing for a new --tab… John R Levine
Re: [xml2rfc] Unicode box-drawing for a new --tab… Carsten Bormann
Re: [xml2rfc] Unicode box-drawing for a new --tab… Carsten Bormann
Re: [xml2rfc] Unicode box-drawing for a new --tab… John R Levine
Re: [xml2rfc] Insufficiency of txt format Paul Kyzivat
Re: [xml2rfc] Insufficiency of txt format John Levine
Re: [xml2rfc] [xml2rfc-dev] Unicode box-drawing f… Lars Eggert
Re: [xml2rfc] [xml2rfc-dev] Unicode box-drawing f… Daniel Kahn Gillmor
Re: [xml2rfc] [xml2rfc-dev] Unicode box-drawing f… Lars Eggert
Re: [xml2rfc] [xml2rfc-dev] Unicode box-drawing f… Carsten Bormann
Re: [xml2rfc] Insufficiency of txt format Paul Kyzivat
Re: [xml2rfc] Insufficiency of txt format Carsten Bormann
[xml2rfc] xml:space in v3.rnc (Re: Insufficiency … Carsten Bormann
Re: [xml2rfc] xml:space in v3.rnc (Re: Insufficie… Julian Reschke
Re: [xml2rfc] Unicode box-drawing for a new --tab… Daniel Kahn Gillmor