Re: [netmod] Benjamin Kaduk's Discuss on draft-ietf-netmod-artwork-folding-09: (with DISCUSS and COMMENT)
Kent Watsen <kent+ietf@watsen.net> Mon, 04 November 2019 18:19 UTC
Return-Path: <0100016e37a43881-098ba24f-351a-472e-b022-cd175db267c9-000000@amazonses.watsen.net>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 09A8612004A; Mon, 4 Nov 2019 10:19:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=amazonses.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Hrt8MnlW05cv; Mon, 4 Nov 2019 10:19:02 -0800 (PST)
Received: from a8-33.smtp-out.amazonses.com (a8-33.smtp-out.amazonses.com [54.240.8.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E38BC120115; Mon, 4 Nov 2019 10:19:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=6gbrjpgwjskckoa6a5zn6fwqkn67xbtw; d=amazonses.com; t=1572891539; h=From:Message-Id:Content-Type:Mime-Version:Subject:Date:In-Reply-To:Cc:To:References:Feedback-ID; bh=sIQtT+Ay9GQJg9kesUCNDlBXNYPZT+4icJteaMQHzrE=; b=h+FKtagvL6VUqK0xgNAm2O+fgmpU6ZO+gecEvw5vCMq8X3g53V4kDPowprJjmQZW zVlRB2f7inckVlj7wwBAclRfnnewjOHLQ9kbW/whaOzsieZfV/ruNX7fYgfBTlpESEc CDIyi42+itlkD+fOpNGLqPR5JEP11oxE72mj5kvg=
From: Kent Watsen <kent+ietf@watsen.net>
Message-ID: <0100016e37a43881-098ba24f-351a-472e-b022-cd175db267c9-000000@email.amazonses.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_E2533340-FBC4-4B80-9997-0C832E64730E"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Mon, 04 Nov 2019 18:18:59 +0000
In-Reply-To: <20190911000337.GQ18198@kduck.mit.edu>
Cc: The IESG <iesg@ietf.org>, "netmod-chairs@ietf.org" <netmod-chairs@ietf.org>, draft-ietf-netmod-artwork-folding@ietf.org, "netmod@ietf.org" <netmod@ietf.org>, Erik Auerswald <auerswal@unix-ag.uni-kl.de>
To: Benjamin Kaduk <kaduk@mit.edu>
References: <156766366671.22774.7481795788724573201.idtracker@ietfa.amsl.com> <0100016d0372debf-16e6e132-b334-41b3-ad9c-953fd9314963-000000@email.amazonses.com> <20190911000337.GQ18198@kduck.mit.edu>
X-Mailer: Apple Mail (2.3445.104.11)
X-SES-Outgoing: 2019.11.04-54.240.8.33
Feedback-ID: 1.us-east-1.DKmIRZFhhsBhtmFMNikgwZUWVrODEw9qVcPhqJEI2DA=:AmazonSES
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/v_BALh00BnhgUF5DLLIMUeSILNM>
Subject: Re: [netmod] Benjamin Kaduk's Discuss on draft-ietf-netmod-artwork-folding-09: (with DISCUSS and COMMENT)
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Nov 2019 18:19:06 -0000
Hi Ben, I sent you a private email before indicating that changes to the `rfcfold` script addressing your concerns were in progress. I'm happy to announce that an update has just been posted containing these changes (much thanks to my co-author, Erik, CC-ed). Please see below as well. Thanks, Kent // co-author >>> ---------------------------------------------------------------------- >>> DISCUSS: >>> ---------------------------------------------------------------------- >>> >>> I think the procedures described herein are incomplete without a footer >>> to terminate the un-folding process. Otherwise, it seems that the >>> described algorithms would leave the two-line header for the second and >>> subsequent instances of folded text in a single document. (If we tried >>> to just blindly remove all instances of the header without seeking >>> boundaries, then we would misreconstruct content when different folding >>> algorithms are used in the same document with the single-backslash >>> algorithm occurring first.) >> >> Are you referring to when an RFC contains multiple inclusions and one is >> trying to unfold them all at once? That's not the intention here, as > > Yes, that was what I was thinking; sorry for missing or misinterpreting the > notes in Sections 7.2/8.2. This issue is resolved. >> noted in paragraph 3 in both sections 7.2 and 8.2. FWIW, this sounds >> like the framing problem that the WG discussed with the conclusion that >> extracting from plain-text is dead, now that XML is the required >> submission format, and XML provides a superior framing mechanism than any >> footer we could add. >> >> BTW, yes, each text inclusion in a single RFC may independently be folded >> using either the '\' or '\\' strategy, with the recommendation that '\' >> always be tried first and '\\' only used when '\' fails. >> >> If referring to a single text content instance, could you provide an >> example illustrating the concern? >> >> >> >> >>> I don't think it's proper to refer to a script that requires bash >>> specifically as a "POSIX shell script". I did not attmept to check >>> whether any bash-specific features are used or this requirements stems >>> solely from the shebang line, though. >> >> I just changed "POSIX" to "Bash" in the title for Appendix A. >> >> Not that it matters, but "--posix" is passed into `bash` on the first >> line of the script ;) >> >> >> >>> I think the shell script does need to use double-quotes around some >>> variable expansions, especially "$infile" and "$outfile", to work >>> properly for filenames containing spaces. We do quote "$infile" when >>> we're checking that it exists, just not (most of the time) when we >>> actually use it! >> >> Updated. >> >> >> >>> In addition to the above, I also share Alissa's (and Mirja's) concerns, >>> but feel that Discuss is more appropriate than Abstain, so we can >>> discuss what the best way to get this content published is. For it's >>> fine content, and we should see it published; it's just not immediately >>> clear to me what the right way to do so is. >> >> Agreed. For now, I've changed it to Informational, but I think there >> remains a discussion around if the draft should be re-rerun through the >> IAB stream. My responses today to Alissa's Abstain and Suresh Discuss >> dig into this. Is it okay to use those threads for this item? > > Please do; this point was mostly intended to make sure that we didn't > inadvertently approve the document while those discussions were still going > on. This issue is currently with the IESG. >>> ---------------------------------------------------------------------- >>> COMMENT: >>> ---------------------------------------------------------------------- >>> >>> Section 4.1 >>> >>> Automated folding of long lines is needed in order to support draft >>> compilations that entail a) validation of source input files (e.g., >>> XML, JSON, ABNF, ASN.1) and/or b) dynamic generation of output, using >>> a tool that doesn't observe line lengths, that is stitched into the >>> final document to be submitted. >>> >>> I don't think the intended meaning of "source input files" will be >>> clear to all readers just from this text. Some discussion of how RFCs >>> can consider source code, data structures, generated output, etc., that >>> have standalone representations and natural formats, and the need to >>> display their contents in the RFC format that has different >>> requirements might be helpful context for this paragraph and the next. >> >> Is the updated text more understandable? > > Yes, thanks Great, this issue is closed. >>> Section 7.1.2 >>> >>> For some reason my mental model of "RFC style" does not use the word >>> "really" in this way, and prefers alternatives like "very" or >>> "exceptionally". (Also in Section 8.1.2.) >> >> s/Really/Exceptionally/ in both cases. >> >> >>> Section 7.2.1 >>> >>> 1. Determine where the fold will occur. This location MUST be >>> before or at the desired maximum column, and MUST NOT be chosen such >>> that the character immediately after the fold is a space (' ') >>> character. For forced foldings, the location is between the >>> >>> This is a rather awkward natural line break. I suggest an RFC Editor >>> note to make sure that the punctuation around the space character all >>> appears on the same line. >> >> RFC Editor note added, near the top of the draft. >> >> >> >>> 3. On the following line, insert any number of space (' ') >>> characters. >>> >>> I'm not sure I'd characterize the procedure as "complete" when it >>> leaves the value of the output subject to implementation choice such as >>> this. (Note that the next paragraph talks about the resulting >>> "arbitrary number of space" characters, and would presumably also need >>> to be adjusted if this text was adjusted.) We also don't seem to bound >>> this number of spaces to be fewer than the target line length, which >>> only matters in some weirdly pedantic sense. >> >> Added "subject to the resulting line not exceeding the desired maximum" >> to both locations in the draft. >> >> >> >>> Section 7.2.2 >>> >>> Scan the beginning of the text content for the header described in >>> Section 7.1.1. If the header is not present, starting on the first >>> line of the text content, exit (this text contents does not need to >>> be unfolded). >>> >>> I'm not sure I understand what "starting on the first line of the text >>> content" is intended to mean. (Also in 8.2.2.) >> >> I think you are saying that it seems overly prescriptive, given that the >> previous sentence says "beginning" and "header", it defies logic that the >> header might not start on the first line and, by this text calling it >> out, it suggests something special is going on. Is this what you mean? >> To be clear, the only intention here is to catch the case whereby there >> might be some blank lines preceding the header. Do you think the >> "starting on the first line of the text content" fragment should be >> removed? > > I think I was too confused by the text to be complaining that it was overly > prescriptive :( > I guess my complaint is that it seems ambiguous whether this is "the > procedure says: start on the first line of text content, and check for the > header" or "If the header is not present [anywhere in the content], start > on the first line of content, and exit". That is, I think the order in > which the clauses appear confuses me, with perhaps some exacerbation by > verb tense. I support being able to cope with some blank lines preceding > the header! I have removed the "starting on the first line of the text content" fragment, from both 7.2.2 and 8.2.2, since it seemed unnecessary and caused confusion. >>> Section 8.2.1 >>> >>> If this text content needs to and can be folded, insert the header >>> described in Section 8.1.1, ensuring that any additional printable >>> characters surrounding the header do not result in a line exceeding >>> the desired maximum. >>> >>> We discussed above some cases when text could not be folded using the >>> algorithm from Section 7.2.1; in what case could text not be folded >>> with this algorithm? Just the case when the implementation doesn't >>> support forced folding? >> >> Yes, that's the only case known. But what does this have to do with >> Section 8.2.1? Are you keying off of the "needs to" part? Is it okay? > > I was just trying to check that we have given the reader enough information > to ascertain the "can be folded" result. I wish to amend my previous statement, other reasons that might lead to unfoldability include: 1) presence of a TAB character. This issue is already discussed in this draft. 2) presence of ASCII-based control characters. This issue was not discussed previously (nor in RFC 7991), but control characters in general (i.e., beyond TAB) are an issue. But the issue may be just a limitation in the command line tools like `sed` that are byte-orientated more so than character-oriented. Thusly, in the latest update, the `rfcfold` script now issues a *warning* if it detects any ASCII control characters. 3) presence of non-ASCII (e.g., UTF-8) characters. This issue was not discussed previously (nor in RFC 7991), but multibyte characters and multi-width-characters are not supported by `sed`. It is unclear from RFC 7991 and RFC 7994 if such characters may appear in <sourcecode> and <artwork> inclusions, but presumably they MAY (e.g., the XML file format is known to support UTF-8 encodings). To be safe, the `rfcfold` script now issues a *warning* if it detects any non-ASCII characters. >>> Section 10 >>> >>> We should warn against implementations scanning past the end of a >>> buffer (containing the entire contents of a file) when checking what's >>> in the beginning of the next line -- if a file ends with a backslash >>> and "end of line" but no further content, we could perform an out of >>> bounds access if the code assumes it is safe to check for the next >>> line's initial content. >> >> Both Sections 7.2.2 and 8.2.2 describe conditions to determine when >> unfolding occurs. AFIACT, in both cases, the unfolding algorithm stays >> within the bounds of those conditions. > > These procedures are fine if you're operating in a context where you > interact with the text corpus via "get next line" operations. But I don't > think we have limited ourselves to such contexts; consider the case where I > (foolishly) write text-processing code in C, and read(2) the text in > question into a memory buffer. I'm on my own for linebreak detection, and > if I start peeking past escape characters, it's not so hard to imagine that > I could fail to check for "end of buffer" and trigger undefined behavior. > >> For instance, given the input sequence [ '\' '\n' EOF] , the 7.2.2 >> algorithm would replace it with [ EOF ] and the 8.2.2 algorithm wouldn't >> even attempt to unfold it since the condition of the next line containing >> a second '\' character isn't met. >> >> Is this Security Consideration needed? > > Well, it's a nonblocking comment. So if the above description seems > totally implausible to you, I can accept it not being included in the > document. I'll choose this route, thanks. >>> Section 12.2 >>> >>> I think that RFC 7991 could be normative, since we say "per RFC 7991" >>> to describe some requirements on behavior. Likewise for RFC 7994, >>> whose character encoding requirements we incorporate by reference. >> >> Given that this format may be used in contexts outside the IETF, it seems >> that understanding RFC 7991 is optional. Agreed? > > For most of the occurrences of 7991 references, I agree with you. The only > one that makes me think otherwise is in Section 7.1.2: > > The character encoding is the same as described in Section 2 of > [RFC7994], except that, per [RFC7991], tab characters are prohibited. > > which is a statement of behavior that defers to an external specification. Okay, RFC 7991 is now a normative reference. >>> Appendix A >>> >>> I could perhaps argue that we should include a reference to POSIX for >>> "POSIX shell script" but find it somewhat hard to believe that this >>> would be a problem in practice. It's also moot since we require bash >>> specifically, so we'd need to reference bash instead of POSIX. >> >> Per above, "POSIX" is now "Bash" in the title. I added an Informative >> reference for Bash. > > Thanks! > >> >>> copy/paste the script for local use. As should be evident by the >>> lack of the mandatory header described in Section 7.1.1, these >>> backslashes do not designate a folded line, such as described in >>> Section 7. >>> >>> It perhaps should be, but I think currently is not -- we only talk >>> about using the two-line header to detect instances of folding, without >>> mention of a requirement to be contained within <CODE BEGINS>/<CODE >>> ENDS> or similar. >> >> Correct. The 2-line header is missing. That <CODE BEGINS>/<CODE ENDS> >> appears is secondary. Is there anything to be done here? > > In light of the previous discussion about extracting artwork individually > from the document, probably not. Okay, this issue is closed. > Though it seems the -10 has added a line-wrapping header to the script, > which seems to be inadvertent, if I understand correctly. That was a mistake. The authors added a build-time test-case ensuring that the `rfcfold` script doesn't require folding when appearing in Appendix A. >>> It seems that my perception of "common shell style" diverges from that >>> presented in this document, which is not necessarily problematic. >>> (Things like what diagnostics go to stdout vs. stderr, use or "> >>> /dev/null" vs ">> /dev/null", etc.) >> >> I fixed one "> /dev/null" case. > > Heh, I was trying to say that I prefer to always write "> /dev/null", while > acknowledging that my preference is irrelevant for this document. I'm glad > it helped to fix a consistency nit, though! The script now uses "> /dev/null" throughout. >> As for style, we could review line by line but, for the cases where >> output is directed to /dev/null/, it's unclear where the output is >> needed, only the exit code status ever seems to matter. >> >> >>> printf "Usage: rfcfold [-s <strategy>] [-c <col>] [-r] -i <infile>" >>> printf " -o <outfile>\n" >>> >>> This summary usage line doesn't mention -d, -q, or -h. (Maybe it >>> doesn't have to, of course.) >> >> Added. >> >> >>> # ensure input file doesn't contain a TAB grep $'\t' $infile >> >>> /dev/null 2>&1 >>> >>> (`grep -q` is a thing, here and elsewhere.) >> >> Added. >> >> >>> # unfold wip file "$SED" '{H;$!d};x;s/^\n//;s/\\\n *//g' >>> $temp_dir/wip > $outfile >>> >>> [I don't remember why the s/^\n// is needed; similarly for the >>> unfold_it_2() case.] >> >> Erik responded to this point already. >> >> >>> if [[ $strategy -eq 2 ]]; then min_supported=`expr ${#hdr_txt_2} + >>> 8` else min_supported=`expr ${#hdr_txt_1} + 8` fi >>> >>> On the face of it this seems like it will produce "folded" output that >>> exceeds the line length, when we give min_supported of 54, use >>> autodetection of strategy, and have input that is incompatible with >>> fold_it_1(). >> >> Fixed off-by-one error. >> >> >> >>> process_input $@ >>> >>> Need double-quotes around "$@" to properly handle arguments with >>> embedded spaces. >> >> Added. > > Thanks! > > I'll try to find time to look at the new script with an eye for quoting, > and update my position in the datatracker; please start complaining if I > haven't done so and the other threads about where/how to publish have come > to a conclusion. > > -Ben Please let us know if you see any other issues needing to be addressed! Thanks, Kent // co-author
- [netmod] Benjamin Kaduk's Discuss on draft-ietf-n… Benjamin Kaduk via Datatracker
- Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Erik Auerswald
- Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Kent Watsen
- Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Benjamin Kaduk
- Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Erik Auerswald
- Re: [netmod] Benjamin Kaduk's Discuss on draft-ie… Kent Watsen