Re: [abnf-discuss] ABNF colloquialism for end-of-line

Sean Leonard <> Mon, 20 November 2017 09:19 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 93A601294F0 for <>; Mon, 20 Nov 2017 01:19:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id lf5ZqpDO0UZO for <>; Mon, 20 Nov 2017 01:19:19 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 0FFD31294D4 for <>; Mon, 20 Nov 2017 01:19:18 -0800 (PST)
Received: from [] ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 37BD6274F7 for <>; Mon, 20 Nov 2017 04:19:18 -0500 (EST)
References: <> <> <> <> <> <>
From: Sean Leonard <>
Message-ID: <>
Date: Mon, 20 Nov 2017 01:17:18 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Language: en-US
Archived-At: <>
Subject: Re: [abnf-discuss] ABNF colloquialism for end-of-line
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 20 Nov 2017 09:19:20 -0000

With respect to the original topic about end-of-line, I think the 
solution has only 10% to do with ABNF per-se, and 90% to do with the 
surrounding computing environment. Therefore, I have come up with the 
following conventions based on inspection of a large quantity of RFCs:

Some protocols are binary-oriented (unit is based on octet), others are 
text-oriented (unit is based on code point, e.g., ASCII, or Unicode/UTF-8).

Within the text-oriented protocols, some are line-oriented and some are 
whitespace-oriented. (Some are oriented towards both.)

CRLF is the Internet standard newline. (RFC 5198) LF is the XML standard 
newline. ( Section 2.11) CRLF is the Windows 
standard newline. LF is the Unix standard newline. (No citations 
necessary.) And etc.

A format is a "protocol" for our purposes if the format is sent over the 
Internet. Specifically, RFC 2046 says:

    The canonical form of any MIME "text" subtype MUST always represent a
    line break as a CRLF sequence.  Similarly, any occurrence of CRLF in
    MIME "text" MUST represent a line break.  Use of CR and LF outside of
    line break sequences is also forbidden.

When ABNF is used to describe a line-oriented protocol that has to do 
with the Internet, it ought to use <CRLF>.

When ABNF is used to describe a line-oriented protocol that does not 
have to do with the Internet, it can define line markers "to taste"; a 
common convention is <EOL>. But don't define EOL = CRLF because there is 
no point. EOL is an indication to the reader that something is going on 
with end-of-line markers, but there needs to be a good reason. Don't 
redefine CRLF (or LF for that matter) either, because that confuses readers.

When ABNF is used to describe a whitespace-oriented (but not 
line-oriented) protocol, it is acceptable to define whitespace as a glob 
of any of SP, HTAB, CR, and LF. A key example is JSON RFC 7159, which is 
whitespace-oriented, but does not care about lines. It is media-typed as 
application/json, not text/json, for that and related reasons.

When ABNF is used to describe a binary protocol, do whatever, but don't 
use <CRLF> or <EOL> rule names in a binary protocol definition since 
those conventions imply a text-oriented protocol.

When in doubt, ask: "If this format were to be dumped into a MIME part, 
would this format foreseeably be transmitted as text/* such as 
text/plain or message/* such as message/http? Or is it going to have to 
be transmitted as application/* such as application/xml?" And there's a 
reasonable guide.

If there is disagreement about line endings and it matters to the 
protocol, say something about that during "preprocessing", rather than 
in the (A)BNF definition. Example is CSS3: Section 3.3 of 



On 11/19/2017 10:56 AM, Dave Crocker wrote:
> On 11/16/2017 1:26 PM, Sean Leonard wrote:
>> To add some color to this point, “cuts” was discussed in the CBOR WG 
>> in the context of CDDL. It is a technique from Parsing Expression 
>> Grammars. Here is overview:
> ...
>> Basically it commits the parser at a particular point, so that it 
>> does not backtrack.
>> However, PEGs are ordered; ABNF is unordered. With ABNF (as presently 
>> constituted), all alternatives in a choice are considered 
>> simultaneously (order is not relevant). Even if you match one 
>> alternative, you’re supposed to try all other alternatives.
> (Disclaimer:  What follows is pure personal opinion.)
> I'll offer this for consideration, without also offering any specific 
> action...
> Why did ABNF become popular?
> (For a time, RFC 733 and then RFC 822 were the most-cited RFCs. It 
> turned out this was not due to the email portion but folk were 
> re-using the ABNF meta-specification, which is why the later, revision 
> effort to RFC 822 split the ABNF text out into its own RFC.)
> At the time ABNF was defined in the latter 1970s, most specs provided 
> their own variation of BNF.  Everyone wanted tailoring to the basic 
> tool.  But while folk have often wanted to enhance ABNF, over the 
> years, the 'let's define a new variant' tendency mostly died out -- 
> ignoring the much more recent move towards JSON... Why did this 
> popularity happen?
> Languages need to balance expressive power against human usage 
> complexity.  Enough but not too much, of each.  ABNF seemed to strike 
> a good balance.  (I like to think the documentation clarity in RFC 733 
> also helped, but then I'm quite biased about this, given how much 
> effort I put into that aspect of the work...)
> I think the biggest danger in creating a meta-language is 
> specification obscurity.  The tendency to want to add features can 
> too-easily create too much complexity for easy human comprehension.  
> The result is that seemingly-simple specifications can too-easily have 
> implications that are not understood by most readers.
> Computers are not the target audience for computer languages. Human 
> readers are.  Subtle effects (nevermind side-effects) are very easily 
> missed by human readers.
> d/