Re: [netmod] Benjamin Kaduk's Discuss on draft-ietf-netmod-artwork-folding-09: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Wed, 11 September 2019 00:03 UTC

Date: Tue, 10 Sep 2019 19:03:38 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: Kent Watsen <kent+ietf@watsen.net>
Cc: The IESG <iesg@ietf.org>, "netmod-chairs@ietf.org" <netmod-chairs@ietf.org>, draft-ietf-netmod-artwork-folding@ietf.org, "netmod@ietf.org" <netmod@ietf.org>
Message-ID: <20190911000337.GQ18198@kduck.mit.edu>
References: <156766366671.22774.7481795788724573201.idtracker@ietfa.amsl.com> <0100016d0372debf-16e6e132-b334-41b3-ad9c-953fd9314963-000000@email.amazonses.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <0100016d0372debf-16e6e132-b334-41b3-ad9c-953fd9314963-000000@email.amazonses.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/0icg3aA4-ZSOSLETMnF5FKzZOyA>
Subject: Re: [netmod] Benjamin Kaduk's Discuss on draft-ietf-netmod-artwork-folding-09: (with DISCUSS and COMMENT)
Precedence: list

On Thu, Sep 05, 2019 at 10:02:03PM +0000, Kent Watsen wrote:
> Hi Ben,
> 
> Thank you for your review.  Comments below.
> 
> Update @ https://tools.ietf.org/html/draft-ietf-netmod-artwork-folding-10 <https://tools.ietf.org/html/draft-ietf-netmod-artwork-folding-10>
> 
> Kent // as co-author
> 
> 
> > On Sep 5, 2019, at 2:07 AM, Benjamin Kaduk via Datatracker <noreply@ietf.org> wrote:
> > 
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-netmod-artwork-folding-09: Discuss
> > 
> > When responding, please keep the subject line intact and reply to all
> > email addresses included in the To and CC lines. (Feel free to cut this
> > introductory paragraph, however.)
> > 
> > 
> > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> > for more information about IESG DISCUSS and COMMENT positions.
> > 
> > 
> > The document, along with other ballot positions, can be found here:
> > https://datatracker.ietf.org/doc/draft-ietf-netmod-artwork-folding/
> > 
> > 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > I think the procedures described herein are incomplete without a footer
> > to terminate the un-folding process.  Otherwise, it seems that the
> > described algorithms would leave the two-line header for the second and
> > subsequent instances of folded text in a single document.  (If we tried
> > to just blindly remove all instances of the header without seeking
> > boundaries, then we would misreconstruct content when different folding
> > algorithms are used in the same document with the single-backslash
> > algorithm occurring first.)
> 
> Are you referring to when an RFC contains multiple inclusions and one is
> trying to unfold them all at once?   That's not the intention here, as

Yes, that was what I was thinking; sorry for missing or misinterpreting the
notes in Sections 7.2/8.2.

> noted in paragraph 3 in both sections 7.2 and 8.2.  FWIW, this sounds
> like the framing problem that the WG discussed with the conclusion that
> extracting from plain-text is dead, now that XML is the required
> submission format, and XML provides a superior framing mechanism than any
> footer we could add.
> 
> BTW, yes, each text inclusion in a single RFC may independently be folded
> using either the '\' or '\\' strategy, with the recommendation that '\'
> always be tried first and '\\' only used when '\' fails.
> 
> If referring to a single text content instance, could you provide an
> example illustrating the concern?
> 
> 
> 
> 
> > I don't think it's proper to refer to a script that requires bash
> > specifically as a "POSIX shell script".  I did not attmept to check
> > whether any bash-specific features are used or this requirements stems
> > solely from the shebang line, though.
> 
> I just changed "POSIX" to "Bash" in the title for Appendix A.
> 
> Not that it matters, but "--posix" is passed into `bash` on the first
> line of the script  ;)
> 
> 
> 
> > I think the shell script does need to use double-quotes around some
> > variable expansions, especially "$infile" and "$outfile", to work
> > properly for filenames containing spaces.  We do quote "$infile" when
> > we're checking that it exists, just not (most of the time) when we
> > actually use it!
> 
> Updated.
> 
> 
> 
> > In addition to the above, I also share Alissa's (and Mirja's) concerns,
> > but feel that Discuss is more appropriate than Abstain, so we can
> > discuss what the best way to get this content published is.  For it's
> > fine content, and we should see it published; it's just not immediately
> > clear to me what the right way to do so is.
> 
> Agreed.   For now, I've changed it to Informational, but I think there
> remains a discussion around if the draft should be re-rerun through the
> IAB stream.   My responses today to Alissa's Abstain and Suresh Discuss
> dig into this.  Is it okay to use those threads for this item?

Please do; this point was mostly intended to make sure that we didn't
inadvertently approve the document while those discussions were still going
on.

> 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > Section 4.1
> > 
> >   Automated folding of long lines is needed in order to support draft
> >   compilations that entail a) validation of source input files (e.g.,
> >   XML, JSON, ABNF, ASN.1) and/or b) dynamic generation of output, using
> >   a tool that doesn't observe line lengths, that is stitched into the
> >   final document to be submitted.
> > 
> > I don't think the intended meaning of "source input files" will be
> > clear to all readers just from this text.  Some discussion of how RFCs
> > can consider source code, data structures, generated output, etc., that
> > have standalone representations and natural formats, and the need to
> > display their contents in the RFC format that has different
> > requirements might be helpful context for this paragraph and the next.
> 
> Is the updated text more understandable?

Yes, thanks

> 
> 
> > Section 7.1.2
> > 
> > For some reason my mental model of "RFC style" does not use the word
> > "really" in this way, and prefers alternatives like "very" or
> > "exceptionally".  (Also in Section 8.1.2.)
> 
> s/Really/Exceptionally/ in both cases.
> 
> 
> > Section 7.2.1
> > 
> >   1.  Determine where the fold will occur.  This location MUST be
> >   before or at the desired maximum column, and MUST NOT be chosen such
> >   that the character immediately after the fold is a space (' ')
> >   character.  For forced foldings, the location is between the
> > 
> > This is a rather awkward natural line break.  I suggest an RFC Editor
> > note to make sure that the punctuation around the space character all
> > appears on the same line.
> 
> RFC Editor note added, near the top of the draft.
> 
> 
> 
> >   3.  On the following line, insert any number of space (' ')
> >   characters.
> > 
> > I'm not sure I'd characterize the procedure as "complete" when it
> > leaves the value of the output subject to implementation choice such as
> > this.  (Note that the next paragraph talks about the resulting
> > "arbitrary number of space" characters, and would presumably also need
> > to be adjusted if this text was adjusted.) We also don't seem to bound
> > this number of spaces to be fewer than the target line length, which
> > only matters in some weirdly pedantic sense.
> 
> Added "subject to the resulting line not exceeding the desired maximum"
> to both locations in the draft.
> 
> 
> 
> > Section 7.2.2
> > 
> >   Scan the beginning of the text content for the header described in
> >   Section 7.1.1.  If the header is not present, starting on the first
> >   line of the text content, exit (this text contents does not need to
> >   be unfolded).
> > 
> > I'm not sure I understand what "starting on the first line of the text
> > content" is intended to mean.  (Also in 8.2.2.)
> 
> I think you are saying that it seems overly prescriptive, given that the
> previous sentence says "beginning" and "header", it defies logic that the
> header might not start on the first line and, by this text calling it
> out, it suggests something special is going on.   Is this what you mean?
> To be clear, the only intention here is to catch the case whereby there
> might be, e.g., some blank lines preceding the header.   Do you think the
> "starting on the first line of the text content" fragment should be
> removed?

I think I was too confused by the text to be complaining that it was overly
prescriptive :(
I guess my complaint is that it seems ambiguous whether this is "the
procedure says: start on the first line of text content, and check for the
header" or "If the header is not present [anywhere in the content], start
on the first line of content, and exit".  That is, I think the order in
which the clauses appear confuses me, with perhaps some exacerbation by
verb tense.  I support being able to cope with some blank lines preceding
the header!

> > Section 8.2.1
> > 
> >   If this text content needs to and can be folded, insert the header
> >   described in Section 8.1.1, ensuring that any additional printable
> >   characters surrounding the header do not result in a line exceeding
> >   the desired maximum.
> > 
> > We discussed above some cases when text could not be folded using the
> > algorithm from Section 7.2.1; in what case could text not be folded
> > with this algorithm?  Just the case when the implementation doesn't
> > support forced folding?
> 
> Yes, that's the only case known.   But what does this have to do with
> Section 8.2.1?  Are you keying off of the "needs to" part?   Is it okay?

I was just trying to check that we have given the reader enough information
to ascertain the "can be folded" result.

> > Section 10
> > 
> > We should warn against implementations scanning past the end of a
> > buffer (containing the entire contents of a file) when checking what's
> > in the beginning of the next line -- if a file ends with a backslash
> > and "end of line" but no further content, we could perform an out of
> > bounds access if the code assumes it is safe to check for the next
> > line's initial content.
> 
> Both Sections 7.2.2 and 8.2.2 describe conditions to determine when
> unfolding occurs. AFIACT, in both cases, the unfolding algorithm stays
> within the bounds of those conditions.  

These procedures are fine if you're operating in a context where you
interact with the text corpus via "get next line" operations.  But I don't
think we have limited ourselves to such contexts; consider the case where I
(foolishly) write text-processing code in C, and read(2) the text in
question into a memory buffer.  I'm on my own for linebreak detection, and
if I start peeking past escape characters, it's not so hard to imagine that
I could fail to check for "end of buffer" and trigger undefined behavior.

> For instance, given the input sequence [ '\' '\n' EOF] , the 7.2.2
> algorithm would replace it with [ EOF ] and the 8.2.2 algorithm wouldn't
> even attempt to unfold it since the condition of the next line containing
> a second '\' character isn't met.
> 
> Is this Security Consideration needed?

Well, it's a nonblocking comment.  So if the above description seems
totally implausible to you, I can accept it not being included in the
document.

> 
> 
> > Section 12.2
> > 
> > I think that RFC 7991 could be normative, since we say "per RFC 7991"
> > to describe some requirements on behavior.  Likewise for RFC 7994,
> > whose character encoding requirements we incorporate by reference.
> 
> Given that this format may be used in contexts outside the IETF, it seems
> that understanding RFC 7991 is optional.  Agreed?

For most of the occurrences of 7991 references, I agree with you.  The only
one that makes me think otherwise is in Section 7.1.2:

   The character encoding is the same as described in Section 2 of
   [RFC7994], except that, per [RFC7991], tab characters are prohibited.

which is a statement of behavior that defers to an external specification.

> 
> 
> > Appendix A
> > 
> > I could perhaps argue that we should include a reference to POSIX for
> > "POSIX shell script" but find it somewhat hard to believe that this
> > would be a problem in practice.  It's also moot since we require bash
> > specifically, so we'd need to reference bash instead of POSIX.
> 
> Per above, "POSIX" is now "Bash" in the title.   I added an Informative
> reference for Bash.

Thanks!

> 
> >   copy/paste the script for local use.  As should be evident by the
> >   lack of the mandatory header described in Section 7.1.1, these
> >   backslashes do not designate a folded line, such as described in
> >   Section 7.
> > 
> > It perhaps should be, but I think currently is not -- we only talk
> > about using the two-line header to detect instances of folding, without
> > mention of a requirement to be contained within <CODE BEGINS>/<CODE
> > ENDS> or similar.
> 
> Correct.  The 2-line header is missing.  That <CODE BEGINS>/<CODE ENDS>
> appears is secondary.  Is there anything to be done here?

In light of the previous discussion about extracting artwork individually
from the document, probably not.
Though it seems the -10 has added a line-wrapping header to the script,
which seems to be inadvertent, if I understand correctly.

> > It seems that my perception of "common shell style" diverges from that
> > presented in this document, which is not necessarily problematic.
> > (Things like what diagnostics go to stdout vs. stderr, use or ">
> > /dev/null" vs ">> /dev/null", etc.)
> 
> I fixed one "> /dev/null" case.

Heh, I was trying to say that I prefer to always write "> /dev/null", while
acknowledging that my preference is irrelevant for this document.  I'm glad
it helped to fix a consistency nit, though!

> As for style, we could review line by line but, for the cases where
> output is directed to /dev/null/, it's unclear where the output is
> needed, only the exit code status ever seems to matter.  
> 
> 
> >     printf "Usage: rfcfold [-s <strategy>] [-c <col>] [-r] -i <infile>"
> >     printf " -o <outfile>\n"
> > 
> > This summary usage line doesn't mention -d, -q, or -h.  (Maybe it
> > doesn't have to, of course.)
> 
> Added.
> 
> 
> >     # ensure input file doesn't contain a TAB grep $'\t' $infile >>
> >     /dev/null 2>&1
> > 
> > (`grep -q` is a thing, here and elsewhere.)
> 
> Added.
> 
> 
> >     # unfold wip file "$SED" '{H;$!d};x;s/^\n//;s/\\\n *//g'
> >     $temp_dir/wip > $outfile
> > 
> > [I don't remember why the s/^\n// is needed; similarly for the
> > unfold_it_2() case.]
> 
> Erik responded to this point already.
> 
> 
> >     if [[ $strategy -eq 2 ]]; then min_supported=`expr ${#hdr_txt_2} +
> >     8` else min_supported=`expr ${#hdr_txt_1} + 8` fi
> > 
> > On the face of it this seems like it will produce "folded" output that
> > exceeds the line length, when we give min_supported of 54, use
> > autodetection of strategy, and have input that is incompatible with
> > fold_it_1().
> 
> Fixed off-by-one error.
> 
> 
> 
> >     process_input $@
> > 
> > Need double-quotes around "$@" to properly handle arguments with
> > embedded spaces.
> 
> Added.

Thanks!

I'll try to find time to look at the new script with an eye for quoting,
and update my position in the datatracker; please start complaining if I
haven't done so and the other threads about where/how to publish have come
to a conclusion.

-Ben

[netmod] Benjamin Kaduk's Discuss on draft-ietf-n… Benjamin Kaduk via Datatracker
Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Erik Auerswald
Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Kent Watsen
Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Benjamin Kaduk
Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Erik Auerswald
Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Kent Watsen