Re: [netmod] Benjamin Kaduk's Discuss on draft-ietf-netmod-artwork-folding-09: (with DISCUSS and COMMENT)

Kent Watsen <kent+ietf@watsen.net> Thu, 05 September 2019 22:02 UTC

Return-Path: <0100016d0372debf-16e6e132-b334-41b3-ad9c-953fd9314963-000000@amazonses.watsen.net>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E798E120B46; Thu, 5 Sep 2019 15:02:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=amazonses.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id U7Pa8Mop9cOK; Thu, 5 Sep 2019 15:02:06 -0700 (PDT)
Received: from a8-88.smtp-out.amazonses.com (a8-88.smtp-out.amazonses.com [54.240.8.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 20EC0120B75; Thu, 5 Sep 2019 15:02:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=6gbrjpgwjskckoa6a5zn6fwqkn67xbtw; d=amazonses.com; t=1567720923; h=From:Message-Id:Content-Type:Mime-Version:Subject:Date:In-Reply-To:Cc:To:References:Feedback-ID; bh=+DF3d4yw7tjd+/mQjBX5sc+vjrypzQ0pRuViGJRTGLk=; b=iBfWFaltcV79FVK7B/EIlRnxR1kZDOKCCAJIUakScmVZ9QwjbFW40Fb8zLano6lB XQlZGGR4yDP0nKJR5CZoaah5z62QOLMgRa/RlzcNLdS8tp2D0Y7JiZmV2tVo3bAGAr9 tI2UGqwshJREGMYHaVf408fZZ0dHMgRlUT6FYxw0=
From: Kent Watsen <kent+ietf@watsen.net>
Message-ID: <0100016d0372debf-16e6e132-b334-41b3-ad9c-953fd9314963-000000@email.amazonses.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_1EF271D6-7CC1-4FEE-A73D-3C3755F9C368"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Thu, 5 Sep 2019 22:02:03 +0000
In-Reply-To: <156766366671.22774.7481795788724573201.idtracker@ietfa.amsl.com>
Cc: The IESG <iesg@ietf.org>, "netmod-chairs@ietf.org" <netmod-chairs@ietf.org>, draft-ietf-netmod-artwork-folding@ietf.org, "netmod@ietf.org" <netmod@ietf.org>
To: Benjamin Kaduk <kaduk@mit.edu>
References: <156766366671.22774.7481795788724573201.idtracker@ietfa.amsl.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-SES-Outgoing: 2019.09.05-54.240.8.88
Feedback-ID: 1.us-east-1.DKmIRZFhhsBhtmFMNikgwZUWVrODEw9qVcPhqJEI2DA=:AmazonSES
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/EqQFQqcO7cNoYaFYmkRwdrTDdFY>
Subject: Re: [netmod] Benjamin Kaduk's Discuss on draft-ietf-netmod-artwork-folding-09: (with DISCUSS and COMMENT)
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Sep 2019 22:02:18 -0000

Hi Ben,

Thank you for your review.  Comments below.

Update @ https://tools.ietf.org/html/draft-ietf-netmod-artwork-folding-10 <https://tools.ietf.org/html/draft-ietf-netmod-artwork-folding-10>

Kent // as co-author


> On Sep 5, 2019, at 2:07 AM, Benjamin Kaduk via Datatracker <noreply@ietf.org> wrote:
> 
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-netmod-artwork-folding-09: Discuss
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-netmod-artwork-folding/
> 
> 
> 
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> I think the procedures described herein are incomplete without a footer
> to terminate the un-folding process.  Otherwise, it seems that the
> described algorithms would leave the two-line header for the second and
> subsequent instances of folded text in a single document.  (If we tried
> to just blindly remove all instances of the header without seeking
> boundaries, then we would misreconstruct content when different folding
> algorithms are used in the same document with the single-backslash
> algorithm occurring first.)

Are you referring to when an RFC contains multiple inclusions and one is trying to unfold them all at once?   That's not the intention here, as noted in paragraph 3 in both sections 7.2 and 8.2.  FWIW, this sounds like the framing problem that the WG discussed with the conclusion that extracting from plain-text is dead, now that XML is the required submission format, and XML provides a superior framing mechanism than any footer we could add.

BTW, yes, each text inclusion in a single RFC may independently be folded using either the '\' or '\\' strategy, with the recommendation that '\' always be tried first and '\\' only used when '\' fails.

If referring to a single text content instance, could you provide an example illustrating the concern?




> I don't think it's proper to refer to a script that requires bash
> specifically as a "POSIX shell script".  I did not attmept to check
> whether any bash-specific features are used or this requirements stems
> solely from the shebang line, though.

I just changed "POSIX" to "Bash" in the title for Appendix A.

Not that it matters, but "--posix" is passed into `bash` on the first line of the script  ;)



> I think the shell script does need to use double-quotes around some
> variable expansions, especially "$infile" and "$outfile", to work
> properly for filenames containing spaces.  We do quote "$infile" when
> we're checking that it exists, just not (most of the time) when we
> actually use it!

Updated.



> In addition to the above, I also share Alissa's (and Mirja's) concerns,
> but feel that Discuss is more appropriate than Abstain, so we can discuss
> what the best way to get this content published is.  For it's fine
> content, and we should see it published; it's just not immediately clear
> to me what the right way to do so is.

Agreed.   For now, I've changed it to Informational, but I think there remains a discussion around if the draft should be re-rerun through the IAB stream.   My responses today to Alissa's Abstain and Suresh Discuss dig into this.  Is it okay to use those threads for this item?


> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> Section 4.1
> 
>   Automated folding of long lines is needed in order to support draft
>   compilations that entail a) validation of source input files (e.g.,
>   XML, JSON, ABNF, ASN.1) and/or b) dynamic generation of output, using
>   a tool that doesn't observe line lengths, that is stitched into the
>   final document to be submitted.
> 
> I don't think the intended meaning of "source input files" will be clear
> to all readers just from this text.  Some discussion of how RFCs can
> consider source code, data structures, generated output, etc., that have
> standalone representations and natural formats, and the need to display
> their contents in the RFC format that has different requirements might
> be helpful context for this paragraph and the next.

Is the updated text more understandable?



> Section 7.1.2
> 
> For some reason my mental model of "RFC style" does not use the word
> "really" in this way, and prefers alternatives like "very" or
> "exceptionally".  (Also in Section 8.1.2.)

s/Really/Exceptionally/ in both cases.


> Section 7.2.1
> 
>   1.  Determine where the fold will occur.  This location MUST be
>       before or at the desired maximum column, and MUST NOT be chosen
>       such that the character immediately after the fold is a space ('
>       ') character.  For forced foldings, the location is between the
> 
> This is a rather awkward natural line break.  I suggest an RFC Editor
> note to make sure that the punctuation around the space character all
> appears on the same line.

RFC Editor note added, near the top of the draft.



>   3.  On the following line, insert any number of space (' ')
>       characters.
> 
> I'm not sure I'd characterize the procedure as "complete" when it leaves
> the value of the output subject to implementation choice such as this.
> (Note that the next paragraph talks about the resulting "arbitrary
> number of space" characters, and would presumably also need to be
> adjusted if this text was adjusted.)
> We also don't seem to bound this number of spaces to be fewer than the
> target line length, which only matters in some weirdly pedantic sense.

Added "subject to the resulting line not exceeding the desired maximum" to both locations in the draft.



> Section 7.2.2
> 
>   Scan the beginning of the text content for the header described in
>   Section 7.1.1.  If the header is not present, starting on the first
>   line of the text content, exit (this text contents does not need to
>   be unfolded).
> 
> I'm not sure I understand what "starting on the first line of the text
> content" is intended to mean.  (Also in 8.2.2.)

I think you are saying that it seems overly prescriptive, given that the previous sentence says "beginning" and "header", it defies logic that the header might not start on the first line and, by this text calling it out, it suggests something special is going on.   Is this what you mean?   To be clear, the only intention here is to catch the case whereby there might be, e.g., some blank lines preceding the header.   Do you think the "starting on the first line of the text content" fragment should be removed?



> Section 8.2.1
> 
>   If this text content needs to and can be folded, insert the header
>   described in Section 8.1.1, ensuring that any additional printable
>   characters surrounding the header do not result in a line exceeding
>   the desired maximum.
> 
> We discussed above some cases when text could not be folded using the
> algorithm from Section 7.2.1; in what case could text not be folded with
> this algorithm?  Just the case when the implementation doesn't support
> forced folding?

Yes, that's the only case known.   But what does this have to do with Section 8.2.1?  Are you keying off of the "needs to" part?   Is it okay?



> Section 10
> 
> We should warn against implementations scanning past the end of a buffer
> (containing the entire contents of a file) when checking what's in the
> beginning of the next line -- if a file ends with a backslash and "end
> of line" but no further content, we could perform an out of bounds
> access if the code assumes it is safe to check for the next line's
> initial content.

Both Sections 7.2.2 and 8.2.2 describe conditions to determine when unfolding occurs. AFIACT, in both cases, the unfolding algorithm stays within the bounds of those conditions.  

For instance, given the input sequence [ '\' '\n' EOF] , the 7.2.2 algorithm would replace it with [ EOF ] and the 8.2.2 algorithm wouldn't even attempt to unfold it since the condition of the next line containing a second '\' character isn't met.

Is this Security Consideration needed?



> Section 12.2
> 
> I think that RFC 7991 could be normative, since we say "per RFC 7991" to
> describe some requirements on behavior.  Likewise for RFC 7994, whose
> character encoding requirements we incorporate by reference.

Given that this format may be used in contexts outside the IETF, it seems that understanding RFC 7991 is optional.  Agreed?



> Appendix A
> 
> I could perhaps argue that we should include a reference to POSIX for
> "POSIX shell script" but find it somewhat hard to believe that this
> would be a problem in practice.  It's also moot since we require bash
> specifically, so we'd need to reference bash instead of POSIX.

Per above, "POSIX" is now "Bash" in the title.   I added an Informative reference for Bash.


>   copy/paste the script for local use.  As should be evident by the
>   lack of the mandatory header described in Section 7.1.1, these
>   backslashes do not designate a folded line, such as described in
>   Section 7.
> 
> It perhaps should be, but I think currently is not -- we only talk about
> using the two-line header to detect instances of folding, without
> mention of a requirement to be contained within <CODE BEGINS>/<CODE
> ENDS> or similar.

Correct.  The 2-line header is missing.  That <CODE BEGINS>/<CODE ENDS> appears is secondary.  Is there anything to be done here?



> It seems that my perception of "common shell style" diverges from that
> presented in this document, which is not necessarily problematic.
> (Things like what diagnostics go to stdout vs. stderr, use or ">
> /dev/null" vs ">> /dev/null", etc.)

I fixed one "> /dev/null" case.

As for style, we could review line by line but, for the cases where output is directed to /dev/null/, it's unclear where the output is needed, only the exit code status ever seems to matter.  


>     printf "Usage: rfcfold [-s <strategy>] [-c <col>] [-r] -i <infile>"
>     printf " -o <outfile>\n"
> 
> This summary usage line doesn't mention -d, -q, or -h.  (Maybe it
> doesn't have to, of course.)

Added.


>     # ensure input file doesn't contain a TAB
>     grep $'\t' $infile >> /dev/null 2>&1
> 
> (`grep -q` is a thing, here and elsewhere.)

Added.


>     # unfold wip file
>     "$SED" '{H;$!d};x;s/^\n//;s/\\\n *//g' $temp_dir/wip > $outfile
> 
> [I don't remember why the s/^\n// is needed; similarly for the
> unfold_it_2() case.]

Erik responded to this point already.


>     if [[ $strategy -eq 2 ]]; then
>       min_supported=`expr ${#hdr_txt_2} + 8`
>     else
>       min_supported=`expr ${#hdr_txt_1} + 8`
>     fi
> 
> On the face of it this seems like it will produce "folded" output that
> exceeds the line length, when we give min_supported of 54, use
> autodetection of strategy, and have input that is incompatible with
> fold_it_1().

Fixed off-by-one error.



>     process_input $@
> 
> Need double-quotes around "$@" to properly handle arguments with
> embedded spaces.

Added.


Kent // as co-author