Re: [Cbor] CDDL parsing questions

Toerless Eckert <tte@cs.fau.de> Fri, 19 August 2022 06:36 UTC

Return-Path: <eckert@i4.informatik.uni-erlangen.de>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DE399C14CE2A for <cbor@ietfa.amsl.com>; Thu, 18 Aug 2022 23:36:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.66
X-Spam-Level:
X-Spam-Status: No, score=-1.66 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GZoirgBV2gdD for <cbor@ietfa.amsl.com>; Thu, 18 Aug 2022 23:36:35 -0700 (PDT)
Received: from faui40.informatik.uni-erlangen.de (faui40.informatik.uni-erlangen.de [IPv6:2001:638:a000:4134::ffff:40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6E989C14CF15 for <cbor@ietf.org>; Thu, 18 Aug 2022 23:36:33 -0700 (PDT)
Received: from faui48e.informatik.uni-erlangen.de (faui48e.informatik.uni-erlangen.de [131.188.34.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by faui40.informatik.uni-erlangen.de (Postfix) with ESMTPS id 6052658C4AF; Fri, 19 Aug 2022 08:36:25 +0200 (CEST)
Received: by faui48e.informatik.uni-erlangen.de (Postfix, from userid 10463) id 4AAA34EB7C5; Fri, 19 Aug 2022 08:36:25 +0200 (CEST)
Date: Fri, 19 Aug 2022 08:36:25 +0200
From: Toerless Eckert <tte@cs.fau.de>
To: Carsten Bormann <cabo@tzi.org>
Cc: Derek Atkins <derek@ihtfp.com>, cbor@ietf.org
Message-ID: <Yv8vaeVShMFNJ9IL@faui48e.informatik.uni-erlangen.de>
References: <Yv13HuFndByI/TtZ@faui48e.informatik.uni-erlangen.de> <2d9abb4cff288213ee021bfb5d57f5a6.squirrel@mail2.ihtfp.org> <Yv4XtKqLUrto4f/c@faui48e.informatik.uni-erlangen.de> <76F35EDA-ADAE-49D2-BEB0-15B73CAC0A39@tzi.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <76F35EDA-ADAE-49D2-BEB0-15B73CAC0A39@tzi.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/wpWKpb2H7kGPYJzJC9oc-TtJ-yo>
Subject: Re: [Cbor] CDDL parsing questions
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2022 06:36:40 -0000

On Thu, Aug 18, 2022 at 12:53:33PM +0200, Carsten Bormann wrote:
> Well, the terminology is all over the place here.

Meaning you can't tell me the right words to use for what i want to describe ? ;-)

> I prefer to reserve the term “parsing” to text-based protocols that are best handled with parser generators.

Except for "text-based protocols", that was exactly what i was thinking of - if
i correctly understand you:

"CDDL parser:"
A program that creates from CDDL input a program which takes a CBOR input
and spits out a structure (tree?) of CDDL names, each pointing to the "parsed"
CDDL structures that it represents.

> A protocol decoder often has two levels: the lexical syntax (length fields etc.), breaking up the bytes into, say, TLVs, and the TLV processor that creates a semantic representation.  Except that TLVs are rarely done in a particularly structured way, hence all the CVEs.

CVE ?

> What you seem to be alluding to, is the ingestion of a CBOR data item in CBOR generic data model into the semantic categories that application wants.  CDDL can describe some, but not all of this process.

My main point is not even about implementation, but see above.

My main point is that if we use CDDL to specify protocol structures with CDDL
names that we then need an agreement about what it means for protocol input/output
to comply with that CDDL specification or not. To me, that is the case if i
could have the above "CDDL parser" and it would take my CBOR protocol structure input
and attach the CDDL name to it that i think that CBOR protocol structure represents.

Not really any different whether i specify in CDDL or in ASCII-art, only
that i think we never philosophized about the process of determining whether or
not a protocol structure is compliant with the specification - because we
intuitively/from-experience always choose to define protocol structures
simple enough that we didn't have much to discuss.

> CDDL can be used to write complex grammars that require more look-ahead than one would like to have, e.g.
> 
> Message = Message1 / Message2
> 
> Message1 = [foo, bar, 1]
> 
> Message2 = [foo, bar, 2]
> 
> Don’t do that.

Exactly. This is what i think our "CDDL protocol" in question does, or
at least would do if we went down that path, and hence this mailing list thread.

> (A tool that flags excessive look-ahead requirements would be useful.
> In this case, putting the discriminator up front is helpful:
> Message1 = [1, foo, bar]
> Message2 = [2, foo, bar]

Exactly. But IMHO that is ONLY necessary/benefical if we do have a good
definition as to what "CDDL protocols" can and which ones can't afford this lookahead

  good-protocol = [This, is, a, lovely, protocol, ",", dear]
  bad-protocol  = [This, is, a, lovely, protocol, ",", idiot]

For on-the-wire-protocols i don't think i have ever seen this "lookahead",
but in programming and human language parsers it is of course common.

So now i fundamentally start to wonder if we're not missing out on wonderful
world of richer, and for some reason better syntax in on-the-wire protocols
solely because previously we designed on-the-wire protocols primarily
so that "hand-written-parsers" could be easy, whereas those human/computer-language
parsers already went way beyond that layer of the problem and had "automated"
the parsing, hence achieving far more flexible syntax.

But then of course i go back and ask: what is the most simple _good_
example why we would want to do lookahead. Right now our protocol in question
answer to me is a bit "we forgot to avoid lookahead in our original design,
and when we now want to extend the protocol with maximum backward
compatibility, we create lookahead". But i am nor persuaded that this is
agood-enough reason.

> Whether the look-ahead actually hurts depends on whether the processing of foo and bar depends on whether the message is a message1 or a message2.  In some cases, the form that nominally requires more look-ahead is easier to process, because there is no such dependency...

Not sure if i can make up an example from what you said.

Cheers
    Toerless

> Grüße, Carsten
> 

-- 
---
tte@cs.fau.de