Re: [abnf-discuss] ABNF colloquialism for end-of-line

Sean Leonard <dev+ietf@seantek.com> Wed, 15 November 2017 23:36 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A1154128D8B for <abnf-discuss@ietfa.amsl.com>; Wed, 15 Nov 2017 15:36:00 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Fa3quTR8X30Z for <abnf-discuss@ietfa.amsl.com>; Wed, 15 Nov 2017 15:35:58 -0800 (PST)
Received: from smtp-out-2.mxes.net (smtp-out-2.mxes.net [67.222.241.249]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 419F6126CD8 for <abnf-discuss@ietf.org>; Wed, 15 Nov 2017 15:35:58 -0800 (PST)
Received: from dhcp-9e97.meeting.ietf.org (dhcp-9e97.meeting.ietf.org [31.133.158.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 1472D2750C; Wed, 15 Nov 2017 18:35:55 -0500 (EST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
From: Sean Leonard <dev+ietf@seantek.com>
In-Reply-To: <3fbd228d-c6cf-be73-c7f2-f6b15979b852@gmail.com>
Date: Thu, 16 Nov 2017 07:35:52 +0800
Cc: ABNF-Discuss <abnf-discuss@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <2329FFC4-0791-4B70-A8D2-E6FC33F6C483@seantek.com>
References: <97E6D6C0-7010-46D6-8641-670F10A2504C@seantek.com> <3fbd228d-c6cf-be73-c7f2-f6b15979b852@gmail.com>
To: Dave Crocker <dcrocker@gmail.com>
X-Mailer: Apple Mail (2.3273)
Archived-At: <https://mailarchive.ietf.org/arch/msg/abnf-discuss/uGh55JGfUv2KSILbRUtjqnKcoQc>
Subject: Re: [abnf-discuss] ABNF colloquialism for end-of-line
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Nov 2017 23:36:00 -0000

Hello Dave,

> On Nov 15, 2017, at 11:37 PM, Dave Crocker <dcrocker@gmail.com> wrote:
> 
> On 11/15/2017 3:54 AM, Sean Leonard wrote:
>> Hello ABNF “Doctors”:
>> On a recent thread on cbor@ <https://mailarchive.ietf.org/arch/msg/cbor/ZL2NvalH6jmSVqfvkBH4af3BHsE>, the following question arose:
>> What is the best ABNF colloquialism for end-of-line?
>> CR LF is the “Internet standard newline”, with rule name <CRLF> [RFC5234]. However, there is a desire amongst some specification writers to admit Unix LF as well. I have not heard of people clamoring for bare CR; however if bare LF is on the table then bare CR might be on the table too, for completeness. A quite small minority of published RFC specifications define an end-of-line production as CRLF / LF, and an even smaller minority of published RFC specifications allow for bare CR as well.
>> So I just wanted to see what ABNF people’s opinions are on this one.
> 
> 
> Let's start from first principles:
> 
>     Interoperability is improved by simplicity. Simplicity is improved by having a single, canonical form for data; participants that do not use that form as their 'native' form convert to the canonical form. Core ABNF rules are designed to aid simplicity.
> 
>     When designing abnf rulesets decide whether the goal is interoperability -- that is, common representation for participants in a standardized protocol -- or is to describe raw, non-canonical data; that is, the goal is describing a range of native data.  Clarity about this distinction aids in making design choices for rules.
> 
> 
> Given that the thread in CBOR says 'matching rules', I'm guessing that the goal here is to describe freeform data coming from the net.  Hence, requiring a simple, canonicalized data form is not appropriate.  (This is an essential point; if it's not correct, then what follows won't be either.)

The original thread was about the Matching Rules appendix, which was based on the adjacent ABNF grammar for CDDL (CBOR Data Definition Language, or whatever the acronym is these days). It’s really to describe the grammar that specification writers are to write, akin to ABNF itself, ASN.1, programming language code, etc.

> 
> Note that the term "Internet standard newline" in RFC5234 is merely a comment to the ABNF Core Rule definition of CRLF.

Yes. Oh, I forgot the reference that standardizes CRLF as the actual Internet standard; I guess the latest is RFC 5198?

> 
> What you most definitely do /not/ want is to redefine crlf to cover both crlf and lf.

Yeah, I think that is dumb, but (as I documented) a handful of RFCs do just that.

> 
> But I gather you /do/ want a rule that covers both CRLF and LF.
> 
> So define one.
> 
> For example:
> 
>   cbor-nl:  crlf / lf
> 
> Is there some problem with using this?

My opinion is that newline === CRLF in IETF Internet standards, and since ABNF is an Internet standard, the concept of newline === CRLF. In a local operating environment, one can convert CRLF to whatever else. For example, the C standard library treats \n (aka LF) as the end-of-line marker, and converts appropriately on ingress/egress. Ditto for HTML/XML. (This is congruent with your first paragraph on simplicity.)

I understand why people want to define EOL = CRLF / LF; however, ABNF producers such as abnfgen will (probably) randomly output sample data with mixed line endings. The author’s intent of EOL = CRLF / LF is to say that a format (rather than a wire protocol) ought to accept either CRLF or LF as line endings in a file, but it has the effect of saying that CRLF/LF can be randomly interspersed in the same PDU. (And, in so doing, it privileges LF but denies CR, or NEL, or LS / PS, which don’t have anything better or worse for them going when it comes to line endings.)

In my own RFC 7468, I said:
 eol        = CRLF / CR / LF
 eolWSP     = WSP / CR / LF
 W          = WSP / CR / LF / %x0B / %x0C ; whitespace

and while I don’t view that as a mistake necessarily, I’m just pointing out that probably a better way to handle the issue would have been to say that CRLF *is* the end-of-line marker, and that when ingressing/egressing data, to treat whatever the local convention is, as if it’s CRLF in the ABNF.

One reason to have a uniform ABNF for line endings (i.e., CRLF) is inter-specification reuse. For example, one protocol such as URI [RFC3986] has ABNF productions that are reused throughout many other specifications, including specs that are CRLF oriented (mail...) and specs that are just LF oriented (HTML). URIs do not include line breaks, but if you want to reuse a production that includes line breaks in one RFC, but the line ending specs don’t match, then it will be a problem.

Sean

> 
> d/
> 
> -- 
> Dave Crocker
> Brandenburg InternetWorking
> bbiw.net
> 
> _______________________________________________
> abnf-discuss mailing list
> abnf-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/abnf-discuss