Re: [Cbor] cddl 0.8.17: Add .abnf (draft-ietf-cbor-cddl-control)

Carsten Bormann <cabo@tzi.org> Fri, 26 February 2021 23:52 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 420C73A142B for <cbor@ietfa.amsl.com>; Fri, 26 Feb 2021 15:52:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7QSt7AMhzkeD for <cbor@ietfa.amsl.com>; Fri, 26 Feb 2021 15:51:58 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 31C833A142A for <cbor@ietf.org>; Fri, 26 Feb 2021 15:51:57 -0800 (PST)
Received: from [192.168.217.123] (p5089a828.dip0.t-ipconnect.de [80.137.168.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4DnRJq2vLFzyXr; Sat, 27 Feb 2021 00:51:55 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <3a80f922-cfc9-2831-c4f0-7c918e211d74@alum.mit.edu>
Date: Sat, 27 Feb 2021 00:51:55 +0100
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 636076314.975646-ce0b48d750301e8efb0976566231d914
Content-Transfer-Encoding: quoted-printable
Message-Id: <8B7422CF-6DE1-4ACB-B6A8-BE0B466BB072@tzi.org>
References: <AC771EE9-9672-4B2D-B66A-2C815D102687@tzi.org> <358dd1e3-74a5-c32e-235a-c0a6c308ae6f@alum.mit.edu> <CD82E5A7-4D46-412F-B45F-B4970C2FF8E8@tzi.org> <3a80f922-cfc9-2831-c4f0-7c918e211d74@alum.mit.edu>
To: Paul Kyzivat <pkyzivat@alum.mit.edu>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/Hj2BVrFBN0Q-U2ZSB9WOu77PJ2U>
Subject: Re: [Cbor] cddl 0.8.17: Add .abnf (draft-ietf-cbor-cddl-control)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Feb 2021 23:52:01 -0000

Hi Paul,

> The obvious revision would be to simply change the syntax of rule to:
> 
>     rule           =  WSP rulename defined-as elements c-nl

Well, a single WSP won’t cut it…

    rule           =  *WSP rulename defined-as elements c-nl

> 
> That seems unambiguous to me. But my formal parsing knowledge isn't sufficient to say if it is LL(1).

Once you allow white space before rulenames, it is most definitely not LL(1).

When you see…

A = B
  C

…you need to look ahead one more to distinguish, say…

A = B
  C = D
(New rule)

…from…

A = B
  C D
(Still in same rule.)

I do think it is LL(2), but I didn’t formally check.

I think that’s why 5234 insists on having an indent on continuation lines, as in

         c-wsp          =  WSP / (c-nl WSP)

(Having at least one WSP after a c-nl identifies the line as continuation line.)

So a naive parser can parse out the rule structure by simply throwing away comments and newlines before leading whitespace and combining them into rules.

If this looks like RFC 822 continuation lines to you, that is not a surprise :-) 
(Ken L. Harrenstien invented the ABNF notation for RFC 733 that then became RFC 822; previous versions of this document had used a more conventional ::= style BNF as in
         <field>           ::=   <field-name> ":" <field-body>
         <field-body>      ::=   <field-body-contents>
                               | <field-body-contents> <crlf>
                                    <linear-white-space-char>
                                    <field-body>
in RFC 724.  Shudder.)


> BAP is old school and parses using yacc and lex.

The trick is that BAP's lexer (scanner.l) keeps a current indent.
When that is not set, leading whitespace is ignored.
When a rule name is first seen, the indent is set to the current column.
After that, lines that have less leading whitespace than the indent cause warnings; the indent is essentially removed from the view of the parser.

The effect is pretty close to

indent = spec.lines.grep(/\S/).map {|l| l[/^\s*/].size}.min
spec = spec.lines.map {|l| l.sub(/^\s{#{indent}}/, "")}.join

…which I’m contemplating putting into the ABNF support of the CDDL tool (but then would need to get into the text of the CDDL .abnf spec as well).

> (And doesn't derive its parser from the abnf of abnf.) That means its an LALR parser.

Yes.  With those scanner hacks.

> Is there a public tool that can generate a parser from abnf that can parse abnf?

Well, I have been using my abnftt tool(*), https://github.com/cabo/abnftt
as part of the CDDL tool’s ABNF capability.  It converts an ABNF grammar into a PEG grammar (in “treetop” syntax); this can then be used for validation.
Note that a PEG grammar behaves slightly different from a traditional BNF grammar, so I had to make this change to the ABNF grammar of ABNF:

         defined-as     =  *c-wsp ("=" / "=/") *c-wsp
➔
         defined-as     =  *c-wsp ("=/" / "=") *c-wsp

(PEG rules make the first “=“ stick before the “=/“ can match; this is called “prioritized choice”.)

I once heard there is an ANTLR module for ABNF (not just an ANTLR parser for ABNF!), but I didn’t follow up.

Who else should we draw into this conversation?

Grüße, Carsten

(*) which is a rewrite of the abnc tool that I used to align the ABNF in Appendix A of RFC 4997 https://tools.ietf.org/html/rfc4997#appendix-A with that RFC’s examples (note the case-sensitive literals on the next page :-).
The abnc tool still is the core of the CDDL tool, which I need to port to abnftt…