Re: [abnf-discuss] ABNF colloquialism for end-of-line

Sean Leonard <dev+ietf@seantek.com> Mon, 20 November 2017 09:19 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 93A601294F0 for <abnf-discuss@ietfa.amsl.com>; Mon, 20 Nov 2017 01:19:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lf5ZqpDO0UZO for <abnf-discuss@ietfa.amsl.com>; Mon, 20 Nov 2017 01:19:19 -0800 (PST)
Received: from smtp-out-1.mxes.net (smtp-out-1.mxes.net [67.222.241.250]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0FFD31294D4 for <abnf-discuss@ietf.org>; Mon, 20 Nov 2017 01:19:18 -0800 (PST)
Received: from [192.168.123.7] (cpe-76-90-60-238.socal.res.rr.com [76.90.60.238]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 37BD6274F7 for <abnf-discuss@ietf.org>; Mon, 20 Nov 2017 04:19:18 -0500 (EST)
To: abnf-discuss@ietf.org
References: <97E6D6C0-7010-46D6-8641-670F10A2504C@seantek.com> <3fbd228d-c6cf-be73-c7f2-f6b15979b852@gmail.com> <477FA5E8-FBAA-47D4-98A6-79DBAE4498C7@tzi.org> <7db503ef-3db4-9a72-6d14-001831742600@gmail.com> <62B9A765-E6EE-4C20-9A4E-58ADA9FDE975@seantek.com> <c10a79f2-5e42-fc00-ed5a-4459064b5af4@gmail.com>
From: Sean Leonard <dev+ietf@seantek.com>
Message-ID: <c9a7213d-0412-2280-6e24-dacaa00b4ee3@seantek.com>
Date: Mon, 20 Nov 2017 01:17:18 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <c10a79f2-5e42-fc00-ed5a-4459064b5af4@gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/abnf-discuss/2rDcbfsBe1ulrN1pwFetaaE6rRw>
Subject: Re: [abnf-discuss] ABNF colloquialism for end-of-line
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Nov 2017 09:19:20 -0000

With respect to the original topic about end-of-line, I think the 
solution has only 10% to do with ABNF per-se, and 90% to do with the 
surrounding computing environment. Therefore, I have come up with the 
following conventions based on inspection of a large quantity of RFCs:

Some protocols are binary-oriented (unit is based on octet), others are 
text-oriented (unit is based on code point, e.g., ASCII, or Unicode/UTF-8).

Within the text-oriented protocols, some are line-oriented and some are 
whitespace-oriented. (Some are oriented towards both.)

CRLF is the Internet standard newline. (RFC 5198) LF is the XML standard 
newline. (https://www.w3.org/TR/xml/ Section 2.11) CRLF is the Windows 
standard newline. LF is the Unix standard newline. (No citations 
necessary.) And etc.

A format is a "protocol" for our purposes if the format is sent over the 
Internet. Specifically, RFC 2046 says:

    The canonical form of any MIME "text" subtype MUST always represent a
    line break as a CRLF sequence.  Similarly, any occurrence of CRLF in
    MIME "text" MUST represent a line break.  Use of CR and LF outside of
    line break sequences is also forbidden.



Therefore:
When ABNF is used to describe a line-oriented protocol that has to do 
with the Internet, it ought to use <CRLF>.

When ABNF is used to describe a line-oriented protocol that does not 
have to do with the Internet, it can define line markers "to taste"; a 
common convention is <EOL>. But don't define EOL = CRLF because there is 
no point. EOL is an indication to the reader that something is going on 
with end-of-line markers, but there needs to be a good reason. Don't 
redefine CRLF (or LF for that matter) either, because that confuses readers.

When ABNF is used to describe a whitespace-oriented (but not 
line-oriented) protocol, it is acceptable to define whitespace as a glob 
of any of SP, HTAB, CR, and LF. A key example is JSON RFC 7159, which is 
whitespace-oriented, but does not care about lines. It is media-typed as 
application/json, not text/json, for that and related reasons.

When ABNF is used to describe a binary protocol, do whatever, but don't 
use <CRLF> or <EOL> rule names in a binary protocol definition since 
those conventions imply a text-oriented protocol.

When in doubt, ask: "If this format were to be dumped into a MIME part, 
would this format foreseeably be transmitted as text/* such as 
text/plain or message/* such as message/http? Or is it going to have to 
be transmitted as application/* such as application/xml?" And there's a 
reasonable guide.

If there is disagreement about line endings and it matters to the 
protocol, say something about that during "preprocessing", rather than 
in the (A)BNF definition. Example is CSS3: Section 3.3 of 
<https://www.w3.org/TR/css-syntax-3/>.

Regards,

Sean

On 11/19/2017 10:56 AM, Dave Crocker wrote:
> On 11/16/2017 1:26 PM, Sean Leonard wrote:
>> To add some color to this point, “cuts” was discussed in the CBOR WG 
>> in the context of CDDL. It is a technique from Parsing Expression 
>> Grammars. Here is overview:
> ...
>> Basically it commits the parser at a particular point, so that it 
>> does not backtrack.
>>
>> However, PEGs are ordered; ABNF is unordered. With ABNF (as presently 
>> constituted), all alternatives in a choice are considered 
>> simultaneously (order is not relevant). Even if you match one 
>> alternative, you’re supposed to try all other alternatives.
>
>
> (Disclaimer:  What follows is pure personal opinion.)
>
>
> I'll offer this for consideration, without also offering any specific 
> action...
>
> Why did ABNF become popular?
>
> (For a time, RFC 733 and then RFC 822 were the most-cited RFCs. It 
> turned out this was not due to the email portion but folk were 
> re-using the ABNF meta-specification, which is why the later, revision 
> effort to RFC 822 split the ABNF text out into its own RFC.)
>
> At the time ABNF was defined in the latter 1970s, most specs provided 
> their own variation of BNF.  Everyone wanted tailoring to the basic 
> tool.  But while folk have often wanted to enhance ABNF, over the 
> years, the 'let's define a new variant' tendency mostly died out -- 
> ignoring the much more recent move towards JSON... Why did this 
> popularity happen?
>
> Languages need to balance expressive power against human usage 
> complexity.  Enough but not too much, of each.  ABNF seemed to strike 
> a good balance.  (I like to think the documentation clarity in RFC 733 
> also helped, but then I'm quite biased about this, given how much 
> effort I put into that aspect of the work...)
>
> I think the biggest danger in creating a meta-language is 
> specification obscurity.  The tendency to want to add features can 
> too-easily create too much complexity for easy human comprehension.  
> The result is that seemingly-simple specifications can too-easily have 
> implications that are not understood by most readers.
>
> Computers are not the target audience for computer languages. Human 
> readers are.  Subtle effects (nevermind side-effects) are very easily 
> missed by human readers.
>
> d/