Re: [Cbor] CDDL parsing questions

Carsten Bormann <cabo@tzi.org> Thu, 18 August 2022 10:53 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E3C98C152576 for <cbor@ietfa.amsl.com>; Thu, 18 Aug 2022 03:53:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.209
X-Spam-Level:
X-Spam-Status: No, score=-4.209 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LA1ZFzz0Q0nK for <cbor@ietfa.amsl.com>; Thu, 18 Aug 2022 03:53:39 -0700 (PDT)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [134.102.50.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6DF3FC1524C8 for <cbor@ietf.org>; Thu, 18 Aug 2022 03:53:37 -0700 (PDT)
Received: from [192.168.217.149] (p5089abf5.dip0.t-ipconnect.de [80.137.171.245]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4M7hZQ1K1gzDCfv; Thu, 18 Aug 2022 12:53:34 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <Yv4XtKqLUrto4f/c@faui48e.informatik.uni-erlangen.de>
Date: Thu, 18 Aug 2022 12:53:33 +0200
Cc: Derek Atkins <derek@ihtfp.com>, cbor@ietf.org
X-Mao-Original-Outgoing-Id: 682512813.5811321-ad6ec5f5b24ae793a1e92db3fe5fbf73
Content-Transfer-Encoding: quoted-printable
Message-Id: <76F35EDA-ADAE-49D2-BEB0-15B73CAC0A39@tzi.org>
References: <Yv13HuFndByI/TtZ@faui48e.informatik.uni-erlangen.de> <2d9abb4cff288213ee021bfb5d57f5a6.squirrel@mail2.ihtfp.org> <Yv4XtKqLUrto4f/c@faui48e.informatik.uni-erlangen.de>
To: Toerless Eckert <tte@cs.fau.de>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/Ugo8CHN14_eActOI-vKFElrZGXY>
Subject: Re: [Cbor] CDDL parsing questions
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2022 10:53:44 -0000

On 2022-08-18, at 12:43, Toerless Eckert <tte@cs.fau.de> wrote:
> 
> Agreed. But it seems we do not agree how to call that level of receiver
> processing. I was trying to call it CDDL parsing. Before CBOR, we would
> have called it protocol parsing i guess.

Well, the terminology is all over the place here.

I prefer to reserve the term “parsing” to text-based protocols that are best handled with parser generators.

A protocol decoder often has two levels: the lexical syntax (length fields etc.), breaking up the bytes into, say, TLVs, and the TLV processor that creates a semantic representation.  Except that TLVs are rarely done in a particularly structured way, hence all the CVEs.

What you seem to be alluding to, is the ingestion of a CBOR data item in CBOR generic data model into the semantic categories that application wants.  CDDL can describe some, but not all of this process.

CDDL can be used to write complex grammars that require more look-ahead than one would like to have, e.g.

Message = Message1 / Message2

Message1 = [foo, bar, 1]

Message2 = [foo, bar, 2]

Don’t do that.

(A tool that flags excessive look-ahead requirements would be useful.
In this case, putting the discriminator up front is helpful:
Message1 = [1, foo, bar]
Message2 = [2, foo, bar]
Whether the look-ahead actually hurts depends on whether the processing of foo and bar depends on whether the message is a message1 or a message2.  In some cases, the form that nominally requires more look-ahead is easier to process, because there is no such dependency...
)

Grüße, Carsten