Re: [Cbor] CDDL parsing questions

Toerless Eckert <tte@cs.fau.de> Fri, 19 August 2022 06:36 UTC

Date: Fri, 19 Aug 2022 08:36:25 +0200
From: Toerless Eckert <tte@cs.fau.de>
To: Carsten Bormann <cabo@tzi.org>
Cc: Derek Atkins <derek@ihtfp.com>, cbor@ietf.org
Message-ID: <Yv8vaeVShMFNJ9IL@faui48e.informatik.uni-erlangen.de>
References: <Yv13HuFndByI/TtZ@faui48e.informatik.uni-erlangen.de> <2d9abb4cff288213ee021bfb5d57f5a6.squirrel@mail2.ihtfp.org> <Yv4XtKqLUrto4f/c@faui48e.informatik.uni-erlangen.de> <76F35EDA-ADAE-49D2-BEB0-15B73CAC0A39@tzi.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <76F35EDA-ADAE-49D2-BEB0-15B73CAC0A39@tzi.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/wpWKpb2H7kGPYJzJC9oc-TtJ-yo>
Subject: Re: [Cbor] CDDL parsing questions
Precedence: list

On Thu, Aug 18, 2022 at 12:53:33PM +0200, Carsten Bormann wrote:
> Well, the terminology is all over the place here.

Meaning you can't tell me the right words to use for what i want to describe ? ;-)

> I prefer to reserve the term “parsing” to text-based protocols that are best handled with parser generators.

Except for "text-based protocols", that was exactly what i was thinking of - if
i correctly understand you:

"CDDL parser:"
A program that creates from CDDL input a program which takes a CBOR input
and spits out a structure (tree?) of CDDL names, each pointing to the "parsed"
CDDL structures that it represents.

> A protocol decoder often has two levels: the lexical syntax (length fields etc.), breaking up the bytes into, say, TLVs, and the TLV processor that creates a semantic representation.  Except that TLVs are rarely done in a particularly structured way, hence all the CVEs.

CVE ?

> What you seem to be alluding to, is the ingestion of a CBOR data item in CBOR generic data model into the semantic categories that application wants.  CDDL can describe some, but not all of this process.

My main point is not even about implementation, but see above.

My main point is that if we use CDDL to specify protocol structures with CDDL
names that we then need an agreement about what it means for protocol input/output
to comply with that CDDL specification or not. To me, that is the case if i
could have the above "CDDL parser" and it would take my CBOR protocol structure input
and attach the CDDL name to it that i think that CBOR protocol structure represents.

Not really any different whether i specify in CDDL or in ASCII-art, only
that i think we never philosophized about the process of determining whether or
not a protocol structure is compliant with the specification - because we
intuitively/from-experience always choose to define protocol structures
simple enough that we didn't have much to discuss.

> CDDL can be used to write complex grammars that require more look-ahead than one would like to have, e.g.
> 
> Message = Message1 / Message2
> 
> Message1 = [foo, bar, 1]
> 
> Message2 = [foo, bar, 2]
> 
> Don’t do that.

Exactly. This is what i think our "CDDL protocol" in question does, or
at least would do if we went down that path, and hence this mailing list thread.

> (A tool that flags excessive look-ahead requirements would be useful.
> In this case, putting the discriminator up front is helpful:
> Message1 = [1, foo, bar]
> Message2 = [2, foo, bar]

Exactly. But IMHO that is ONLY necessary/benefical if we do have a good
definition as to what "CDDL protocols" can and which ones can't afford this lookahead

  good-protocol = [This, is, a, lovely, protocol, ",", dear]
  bad-protocol  = [This, is, a, lovely, protocol, ",", idiot]

For on-the-wire-protocols i don't think i have ever seen this "lookahead",
but in programming and human language parsers it is of course common.

So now i fundamentally start to wonder if we're not missing out on wonderful
world of richer, and for some reason better syntax in on-the-wire protocols
solely because previously we designed on-the-wire protocols primarily
so that "hand-written-parsers" could be easy, whereas those human/computer-language
parsers already went way beyond that layer of the problem and had "automated"
the parsing, hence achieving far more flexible syntax.

But then of course i go back and ask: what is the most simple _good_
example why we would want to do lookahead. Right now our protocol in question
answer to me is a bit "we forgot to avoid lookahead in our original design,
and when we now want to extend the protocol with maximum backward
compatibility, we create lookahead". But i am nor persuaded that this is
agood-enough reason.

> Whether the look-ahead actually hurts depends on whether the processing of foo and bar depends on whether the message is a message1 or a message2.  In some cases, the form that nominally requires more look-ahead is easier to process, because there is no such dependency...

Not sure if i can make up an example from what you said.

Cheers
    Toerless

> Grüße, Carsten
> 

-- 
---
tte@cs.fau.de

[Cbor] CDDL parsing questions Toerless Eckert
Re: [Cbor] CDDL parsing questions Derek Atkins
Re: [Cbor] CDDL parsing questions Brian E Carpenter
Re: [Cbor] CDDL parsing questions Carsten Bormann
Re: [Cbor] CDDL parsing questions Toerless Eckert
Re: [Cbor] CDDL parsing questions Carsten Bormann
Re: [Cbor] CDDL parsing questions Derek Atkins
Re: [Cbor] CDDL parsing questions Brian E Carpenter
Re: [Cbor] CDDL parsing questions Derek Atkins
Re: [Cbor] CDDL parsing questions Toerless Eckert
Re: [Cbor] CDDL parsing questions Toerless Eckert
Re: [Cbor] CDDL parsing questions Carsten Bormann