[Cbor] Indentation and parsing (Re: cddl 0.8.17: Add .abnf (draft-ietf-cbor-cddl-control))

Carsten Bormann <cabo@tzi.org> Sat, 27 February 2021 19:30 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 1BB283A12FA; Sat, 27 Feb 2021 11:30:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.197
X-Spam-Status: No, score=-4.197 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id YbvosLNw0K0X; Sat, 27 Feb 2021 11:30:54 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 384D33A12F9; Sat, 27 Feb 2021 11:30:53 -0800 (PST)
Received: from [] (p5089a828.dip0.t-ipconnect.de []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4DnxT82QRczyhc; Sat, 27 Feb 2021 20:30:52 +0100 (CET)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <d0a2a7bb-b84f-da24-d706-4b626571b2c8@alum.mit.edu>
Date: Sat, 27 Feb 2021 20:30:51 +0100
Cc: cbor@ietf.org, Paul Kyzivat <pkyzivat@alum.mit.edu>
X-Mao-Original-Outgoing-Id: 636147051.580502-6de739a54ccb2e0f36a57145c6772543
Reply-To: abnf-discuss@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <AA88AA86-485D-4359-80D5-8EE7F952279A@tzi.org>
References: <AC771EE9-9672-4B2D-B66A-2C815D102687@tzi.org> <358dd1e3-74a5-c32e-235a-c0a6c308ae6f@alum.mit.edu> <CD82E5A7-4D46-412F-B45F-B4970C2FF8E8@tzi.org> <3a80f922-cfc9-2831-c4f0-7c918e211d74@alum.mit.edu> <8B7422CF-6DE1-4ACB-B6A8-BE0B466BB072@tzi.org> <d0a2a7bb-b84f-da24-d706-4b626571b2c8@alum.mit.edu>
To: abnf-discuss@ietf.org
X-Mailer: Apple Mail (2.3608.
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/YOIZeaaaMx_CiO91RRxoGKpQ7vg>
Subject: [Cbor] Indentation and parsing (Re: cddl 0.8.17: Add .abnf (draft-ietf-cbor-cddl-control))
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2021 19:30:57 -0000

The CBOR WG maintains CDDL, the ABNF equivalent for data structured in a JSON or CBOR like manner.

Over at cbor@ietf.org, we are currently discussing the integration of ABNF support into CDDL for describing the content of text and byte strings [1].  

[1]: <https://www.ietf.org/archive/id/draft-ietf-cbor-cddl-control-02.html#name-embedded-abnf>

A prototype for that support already is implemented in the original CDDL tool; support in another CDDL tool is in progress.

The one surprise that I had when I implemented this was that I had completely forgotten that ABNF is indentation sensitive (or, more precisely, rules out indentation except on continuation lines, where indentation is required).  Tools like BAP have been working around this aspect in a sufficiently effective way that I simply lost track of that fact.

I think we have a way forward for dealing with this in CDDL (namely, dealing with indentation before handing over parts to ABNF).
But several interesting side discussions ensued, and Paul suggested below that maybe we should take this to abnf-discuss now.

One question was how the fact that proper ABNF does not accept indentation on a rule can be repaired.  A followup question was, of course, how much the parsing of ABNF depends on the way indentation is handled.  More below; a whole thread is in the archive [2].  (Another question is how a relaxed syntax could be expressed in ABNF itself.)

[2]: <https://mailarchive.ietf.org/arch/msg/cbor/p5D3G6DNy5ZAiurp2HLA4E8VtAM>

Opinions welcome; I have set reply-to to abnf-discuss (but am aware that this setting is ignored for wide-replies by most mail readers).

Grüße, Carsten

> On 2021-02-27, at 16:41, Paul Kyzivat <pkyzivat@alum.mit.edu> wrote:
> On 2/26/21 6:51 PM, Carsten Bormann wrote:
>> Hi Paul,
>>> The obvious revision would be to simply change the syntax of rule to:
>>>     rule           =  WSP rulename defined-as elements c-nl
>> Well, a single WSP won’t cut it…
>>     rule           =  *WSP rulename defined-as elements c-nl
>>> That seems unambiguous to me. But my formal parsing knowledge isn't sufficient to say if it is LL(1).
>> Once you allow white space before rulenames, it is most definitely not LL(1).
>> When you see…
>> A = B
>>   C
>> …you need to look ahead one more to distinguish, say…
>> A = B
>>   C = D
>> (New rule)
>> …from…
>> A = B
>>   C D
>> (Still in same rule.)
>> I do think it is LL(2), but I didn’t formally check.
> If there is no separate lexer, so that terminal symbols are all single bytes, then isn't the required effectively unbounded?
>> Who else should we draw into this conversation?
> We should restart the discussion on abnf-discuss.
> 	Thanks,
> 	Paul
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor