Re: [xml2rfc] Insufficiency of txt format

Carsten Bormann <cabo@tzi.org> Mon, 01 February 2021 17:21 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4A3163A1326 for <xml2rfc@ietfa.amsl.com>; Mon, 1 Feb 2021 09:21:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1EQoHj2J2GZm for <xml2rfc@ietfa.amsl.com>; Mon, 1 Feb 2021 09:21:23 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C1C4E3A131C for <xml2rfc@ietf.org>; Mon, 1 Feb 2021 09:21:23 -0800 (PST)
Received: from [192.168.217.118] (p5089a828.dip0.t-ipconnect.de [80.137.168.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4DTvqj5V0bzyVL; Mon, 1 Feb 2021 18:21:21 +0100 (CET)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <9739f26c-30d2-7b04-e866-b556c05ce07a@alum.mit.edu>
Date: Mon, 1 Feb 2021 18:21:21 +0100
Cc: xml2rfc@ietf.org
X-Mao-Original-Outgoing-Id: 633892881.2364531-8a3bd9a6a9520d2018429d13157db0d9
Content-Transfer-Encoding: quoted-printable
Message-Id: <153F2EFE-0450-463C-B9D2-3601CD2F4E4E@tzi.org>
References: <20210130190821.7504E6D02AD4@ary.qy> <fcf04c37-7d63-337f-a434-92bb26aa27cd@alum.mit.edu> <rv7pcs$fv0$1@gal.iecc.com> <9739f26c-30d2-7b04-e866-b556c05ce07a@alum.mit.edu>
To: Paul Kyzivat <pkyzivat@alum.mit.edu>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/njbhZ0n1yv4eG0m_Hob-uxfLtr4>
Subject: Re: [xml2rfc] Insufficiency of txt format
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Feb 2021 17:21:26 -0000

On 2021-02-01, at 17:34, Paul Kyzivat <pkyzivat@alum.mit.edu> wrote:
> 
> I'll be dead before v2 xml is.

I hope not, but this is an indication of the timelines we have to think in here.

Also, our documents have drawn-out timelines, during which the authoring-format-du-jour will change (as will even the “canonical” format).

So txt-level comparison needs to be part of any sustainable long-term strategy.

The interesting part here is normalization.  Rfcdiff has been normalizing the txt all along by removing footers and headers.  It could add normalization steps such as turning box-drawing characters into legacy ASCII.  If thinking about the comparison story is part of the strategy, such a presentation change could become invisible to long-timespan diffs.

Doing XML-level comparison could be aided by converting to a format such as Pyx (line-based ESIS representation) first.  Again, further normalization is crucial (as we have seen in the example of random attribute order already). The most ugly part here is likely to be white-space handling, which is a very sorry story for XML in general, but could be handled in a schema-driven way for RFCXML.  Also, preptool-generated attributes could be removed in the normalization (but they also can carry useful information).

Grüße, Carsten