Re: Evolving document sources over a long time (Re: Comments on draft-roach-bis-documents-00)

David Noveck <davenoveck@gmail.com> Sun, 12 May 2019 15:11 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 161EE1201CB for <ietf@ietfa.amsl.com>; Sun, 12 May 2019 08:11:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zdh83pX9lJVB for <ietf@ietfa.amsl.com>; Sun, 12 May 2019 08:11:34 -0700 (PDT)
Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 30C2E12004E for <ietf@ietf.org>; Sun, 12 May 2019 08:11:34 -0700 (PDT)
Received: by mail-ot1-x336.google.com with SMTP id l17so9603669otq.1 for <ietf@ietf.org>; Sun, 12 May 2019 08:11:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2ROArCxgMrqM7f/LNILLoAdQCzw0tyltGBEu4+4br3Q=; b=FYkJGZTYlDiI2eunyf5UuTt19FR61kiVGKhVeUUG2t2VuqFmbSBpQRXMUSeCStLvIF rx7KHyk+zlCOAFZIrbHdfnxrOU/hKX87RtuTnpSd7fnJ9BMjTP3G/2SUi7SL07kAZFle fViE4gzKzqNxO/QzJpZAjptQVsWXJ3oASTpJ9NvstU6ryHKHH8NXShNpBKZ5AkFwHkFn aJlB3+5hLwsRY0uAx+5IHdJKKkyeBfC70Dh+kp/IOjohRaFbcySvMWp/E86+HRGiPDwG FD0jwVex88OthmyArmUf5ISdhDDqwcd7PUyPSEqn4SBHkYn8bzKC/yP5b4OweAmbaP6K X2FQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2ROArCxgMrqM7f/LNILLoAdQCzw0tyltGBEu4+4br3Q=; b=fBRNL62Dhqx8NbUX4+bta6PKpf+51JdxF6tPuE/AOA2+OiBoz6tnwTwJ0D+Nie2oVM Av8x5gUHc1oKkGiY0DwVj6JRsfl4SpcQz7IH0AYwxM3tK/I0rIPb4XCD+ZQ70elUDqGY 3SiHf9JfaU2/tKNgD/nVWbY1PplKYWMc3AqY+D7v5Ftf6UVcOrfJH3CixFn5JSXJHfgn Tluw6NKzWYAat//w/fiIaqBdAQ7zPO/MDutZ6eYs0Sj8XH3SHZ2F44yzO27b7SFXmrwH W9vaCkRLzmUfQa7RtOkDow74xYPN+lUD4wWJJSCZaanZdkCIcC90IYVhDB9Z6BOY0eHh lfwg==
X-Gm-Message-State: APjAAAWQx2BwsOFthlcnIUjuQ2n369Q5NdYNH4PjNXS9lT7qxBPjYwuT z9GldhVlmMQf/vp2p91iGvJX+dZeJp7AGcW3Nz34XQ==
X-Google-Smtp-Source: APXvYqzPD7KcI5cBHj5ebCmMN7EVdng1aIxPoZybV3xByv0H+hHEvTf2VlfcLOVfuPchMkWbzRfZQqmXxErhzrFQfcM=
X-Received: by 2002:a05:6830:149a:: with SMTP id s26mr3615996otq.221.1557673893251; Sun, 12 May 2019 08:11:33 -0700 (PDT)
MIME-Version: 1.0
References: <CADaq8jdRMUZAN3rRXoActXqvGpkgx_-kW67uwzGLtVPoh7LfAQ@mail.gmail.com> <6E787E2A-18F2-4EFE-BFBA-61B1B4300930@tzi.org>
In-Reply-To: <6E787E2A-18F2-4EFE-BFBA-61B1B4300930@tzi.org>
From: David Noveck <davenoveck@gmail.com>
Date: Sun, 12 May 2019 11:11:22 -0400
Message-ID: <CADaq8jc1KJwC=Ypoo9a+-=Me=GP5tgX=2kcfUd56o53Mcu05kw@mail.gmail.com>
Subject: Re: Evolving document sources over a long time (Re: Comments on draft-roach-bis-documents-00)
To: RFC Interest <rfc-interest@rfc-editor.org>
Cc: IETF Discussion Mailing List <ietf@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000001550460588b23796"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/ctyUtAWgvs_YfyuIqCkKFqBCSI0>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 May 2019 15:11:38 -0000

>> Apparently an .xml file used to prodtce an rfc is not necessarily
acceptable to later versions of xml2rfc.

> Most of your changes have to do with the better validation in the current
tools, namely:

At least more extensive validation.

> — validating the syntax of anchors (no spaces, plus signs)

It is not clear to me, given that anchors are always in quotes, why these
restrictions were added

In any case, they were added with no concern about the fact that many
existing .xml files would be invalidated.  I hope future instances of such
changes can be avoided.

> — validating artwork.

Artwork, by its nature, does not need to be validated.  I really don't want
xml2rfc warnings about the quality of my artwork :-)

However, changes in the processing of character entities within artwork has
given rise to unxpected rejections of existing xml files :-(

> For the latter, doing a wholesale <![CDATA[ … ]]> is always a better
approach than sprinkling &amp;/&lt;

That's good advice which the nfsv4 working group would do well to follow
for future RFC's.

As our RFC's often have lots of XDR, it is all too easy for a case of
amper-lt/amper-gt dysphoria to develop (not in the DSM :-)

> (you almost never need &gt;, by the way).  (See also authoring tools
below.)

If it's *almost* never, my tendency would be to use it always, just in case.

> The anchor syntax — it is just too bad the v1 tool didn’t check that.

Since we can't go back in time and change the v1 tool, a better way to
provide compatibility would be for the existing tool to get a processing
option, allowing it to accept formerly acceptable anchors that have been
forbidden in recent times.

> > I was able to get a an .xml from which a .txt file could be generated,
but despite my best efforts there were differences (more than a few)
between it and rfc5661, which your document would consider to be "spurious"
but appear to be unavoidable. In any case they are not gratuitous changes
and should not interfere with consideration of the document.   The majority
of the diffs arise from the following  issues:
> >       • Despite the fact that the xml for the reference sections of
both xml files are identical and the processing options are identical
(symrefs="no" sortrefs="yes") the reference ids in rfc5661Ready.txt and in
rfc5661 are different so that rfcdiff shows every line containing a
reference as part of a diff.  Apparently, different versions of xml2rfc use
different approaches to sorting references.

> Protip: DO NOT USE numeric references.  Ever.

Good advice, but we need to deal with the fact imappropriate choices were
made in the past and incorporated published RFCs.

> This was stylistically appealing for some tiny documents, but rarely is
appropriate for actual specifications (and certainly not for NFSv4-sized
ones!).

I'm not sure how this choice came to be.   The actual .xml file was
generated by make and I don't think anybody really paid attention to the
processing instructions which were fixed early on and never really
reconsidered.

>>       • There are a fair number of difference that seem to have arisen
because the RFC edtitor made minor corrections directly on the .txt file so
that each such correction (while valid) is reported by rfcdiiff as a
difference.

> Right.  That will be less of a problem in the future,

I hope so.  Why do you think that this will be getting better?

> but it does require tediously porting back changes to the document
source.

Yes :-(

> >       • The reference sections are exposed to the same sorting issues
as the reference id's.  To the naked eye, and to rfcdiff they look very
different, despite that fact that they contain the same references.

> The sorting issue should be taken care of by not using numeric references

I don't think it will be.   It will eliminate diffs due to different
numeric tags but the layout of the reference sections will be different,
which will show up as a diff.

I hope Adam's document would allow documents with such diffs to be
processed by his procedure.

> (really, for the sake of your readers, please don’t).

For RFC5661 and documents derived from it using Adam's procedure, that ship
has already sailed :-(.

However, when there is a full bis succeeeding RFC5661, not processed
according to Adam's procedure, it should not use numeric references :-)

> Since RFC 5661, we also got DOIs on RFCs, so it is inevitable there are a
lot of diffs.

It is not inevitable as shown by the fact that I didn't run into that
issue.   It's kind of nice to know that there was an issue out there that I
didn't run into :-)

For reasons I  really don't understand, the xml for rfc5661 does not
include rfc reference from external libraries.   It includes them inline,
so a new rfc derived from that xml  file will not include DOIs.   That is
not a problem for Adam's procedure, but it may be for the IESG or the RFC
editor.   I hope that, in processing RFC's using Adam's procedure, people
will overlook the lack of DOIs in the same way that they overlook other
aspects of the document that would prevent a new document of that form from
being published.

> Again, tedious, but not really avoidable.

In cases in which it is not avoidable, the addition of DOI's should not
prevent processing of a document acccording to Adam's procedure.

> Of course, I would not recommend directly authoring in XML these days

Why not?

> (there are now good markdown and asciidoc choices, as well as an org-mode
one if that is your thing),

Where are these documented?

> but that was the way things were done in 2010.

I'm prepared to stick with that, unless there is something better about the
alternatives.

> (If you do want to make the transition, there are some conversion tools
available; I may be able to help.)

I would only make a transition for new documents.

For documents to be processed according to Adam's procedure, the likelihood
of minor diffs arising is such that I don't think a transition is possible.

For a later full bis, I would be replacing major sections of an existing
xml-written document which would make any transition especially difficult.

For completely new documents,my concerns would be to make sure that the
needs of those who might later be called upon to update the document were
taken account of.   Relevant issues:

   - Could we be sure that the new source could be processed at a later
   time, and give rise to the same xml?
   - Would the xml produced by a new tool be easily human-editable?


On Sat, May 11, 2019 at 2:10 AM Carsten Bormann <cabo@tzi.org> wrote:

> Hi David,
>
> Thank you for your article in ietf@ietf.org, it is really a little
> treasure-trove.
>
> Let me extract those parts that pertain to your experiences with the
> evolution of RFCXML and add rfc-interest; please continue discussion on
> rfc-interest (which you may want to subscribe if RFC authoring is part of
> your life).
>
> > On May 11, 2019, at 01:31, David Noveck <davenoveck@gmail.com> wrote:
> >
> > Apparently an .xml file used to prodtce an rfc is not necessarily
> acceptable to later versions of xml2rfc.  You can get an idea of the issues
> by looking at rc5661Base.xml and the file I wound up with,
> rfc5661Ready.xml; both are attached.
>
> Most of your changes have to do with the better validation in the current
> tools, namely:
>
> — validating the syntax of anchors (no spaces, plus signs)
> — validating artwork.
>
> For the latter, doing a wholesale <![CDATA[ … ]]> is always a better
> approach than sprinkling &amp;/&lt; (you almost never need &gt;, by the
> way).  (See also authoring tools below.)
>
> The anchor syntax — it is just too bad the v1 tool didn’t check that.
>
> > I was able to get a an .xml from which a .txt file could be generated,
> but despite my best efforts there were differences (more than a few)
> between it and rfc5661, which your document would consider to be "spurious"
> but appear to be unavoidable. In any case they are not gratuitous changes
> and should not interfere with consideration of the document.   The majority
> of the diffs arise from the following  issues:
> >       • Despite the fact that the xml for the reference sections of both
> xml files are identical and the processing options are identical
> (symrefs="no" sortrefs="yes") the reference ids in rfc5661Ready.txt and in
> rfc5661 are different so that rfcdiff shows every line containing a
> reference as part of a diff.  Apparently, different versions of xml2rfc use
> different approaches to sorting references.
>
> Protip: DO NOT USE numeric references.  Ever.  This was stylistically
> appealing for some tiny documents, but rarely is appropriate for actual
> specifications (and certainly not for NFSv4-sized ones!).
>
> >       • There are a fair number of difference that seem to have arisen
> because the RFC edtitor made minor corrections directly on the .txt file so
> that each such correction (while valid) is reported by rfcdiiff as a
> difference.
>
> Right.  That will be less of a problem in the future, but it does require
> tediously porting back changes to the document source.
>
> >       • The reference sections are exposed to the same sorting issues as
> the reference id's.  To the naked eye, and to rfcdiff they look very
> different, despite that fact that they contain the same references.
>
> The sorting issue should be taken care of by not using numeric references
> (really, for the sake of your readers, please don’t).
> Since RFC 5661, we also got DOIs on RFCs, so it is inevitable there are a
> lot of diffs.  Again, tedious, but not really avoidable.
>
> Of course, I would not recommend directly authoring in XML these days
> (there are now good markdown and asciidoc choices, as well as an org-mode
> one if that is your thing), but that was the way things were done in 2010.
> (If you do want to make the transition, there are some conversion tools
> available; I may be able to help.)
>
> Grüße, Carsten
>
>