Re: [abnf-discuss] ABNF colloquialism for end-of-line

Sean Leonard <dev+ietf@seantek.com> Thu, 16 November 2017 21:26 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 630EC128DF3 for <abnf-discuss@ietfa.amsl.com>; Thu, 16 Nov 2017 13:26:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D7mjIZ0P7zoW for <abnf-discuss@ietfa.amsl.com>; Thu, 16 Nov 2017 13:26:42 -0800 (PST)
Received: from smtp-out-2.mxes.net (smtp-out-2.mxes.net [67.222.241.249]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 70F061200F1 for <abnf-discuss@ietf.org>; Thu, 16 Nov 2017 13:26:42 -0800 (PST)
Received: from dhcp-9e97.meeting.ietf.org (dhcp-9e97.meeting.ietf.org [31.133.158.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 2F23C274FB; Thu, 16 Nov 2017 16:26:39 -0500 (EST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.1 \(3445.4.7\))
From: Sean Leonard <dev+ietf@seantek.com>
In-Reply-To: <7db503ef-3db4-9a72-6d14-001831742600@gmail.com>
Date: Fri, 17 Nov 2017 05:26:34 +0800
Cc: Carsten Bormann <cabo@tzi.org>, ABNF-Discuss <abnf-discuss@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <62B9A765-E6EE-4C20-9A4E-58ADA9FDE975@seantek.com>
References: <97E6D6C0-7010-46D6-8641-670F10A2504C@seantek.com> <3fbd228d-c6cf-be73-c7f2-f6b15979b852@gmail.com> <477FA5E8-FBAA-47D4-98A6-79DBAE4498C7@tzi.org> <7db503ef-3db4-9a72-6d14-001831742600@gmail.com>
To: Dave Crocker <dcrocker@gmail.com>
X-Mailer: Apple Mail (2.3445.4.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/abnf-discuss/iTOIQWSAuRFWa8K2UmefnAV6h_w>
Subject: Re: [abnf-discuss] ABNF colloquialism for end-of-line
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Nov 2017 21:26:44 -0000


> On Nov 16, 2017, at 11:31 PM, Dave Crocker <dcrocker@gmail.com> wrote:
> 
> Carsten,
> 
> On 11/15/2017 6:35 PM, Carsten Bormann wrote:
>> Hi Dave,
>> On Nov 15, 2017, at 23:37, Dave Crocker <dcrocker@gmail.com> wrote:
>>> 
>>> [...]
> 
>> PS.: If ABNF had cuts, I’d write
>>    EOL = [CR ^] LF
>> to get better errors on stray CRs.  Maybe not so important these days.
> 
> Sorry.  Not sure what you mean.  "cuts"? "^”?

To add some color to this point, “cuts” was discussed in the CBOR WG in the context of CDDL. It is a technique from Parsing Expression Grammars. Here is overview:

http://www.romanredz.se/papers/CSP2014.pdf

https://en.wikipedia.org/wiki/Parsing_expression_grammar
(see soft cut)

Basically it commits the parser at a particular point, so that it does not backtrack.

However, PEGs are ordered; ABNF is unordered. With ABNF (as presently constituted), all alternatives in a choice are considered simultaneously (order is not relevant). Even if you match one alternative, you’re supposed to try all other alternatives.

EOL = [CR ~CUT~] LF
EOL = CR ~CUT~ LF / LF

are equivalent: the point being that once a CR is encountered, LF is supposed to follow...and if LF does not follow, then the parser fails but throws an error that EOL did not match because there was no LF after the CR (because it’s more specific). However, cuts also takes the initial production CR out of the equation, so if you have:

EOL = CR ~CUT~ LF / LF / CR "foo"

the last alternative will never be reached, because it’s order-sensitive: the ~CUT~ will consume the first CR, so CR "foo" can never be matched. The equivalent technology in regular expressions is “atomic grouping”.

With a context-free grammar, it is better to think about all alternatives simultaneously.

Sean