Re: [Cbor] CDDL Codegen in production

Robert Aftias <rob@emurgo.io> Wed, 09 September 2020 00:04 UTC

Return-Path: <rob@emurgo.io>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9861A3A0A9C for <cbor@ietfa.amsl.com>; Tue, 8 Sep 2020 17:04:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=emurgo-io.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IXItTWgC5l7R for <cbor@ietfa.amsl.com>; Tue, 8 Sep 2020 17:04:46 -0700 (PDT)
Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4FB223A0B47 for <cbor@ietf.org>; Tue, 8 Sep 2020 17:04:43 -0700 (PDT)
Received: by mail-ej1-x62d.google.com with SMTP id r7so871339ejs.11 for <cbor@ietf.org>; Tue, 08 Sep 2020 17:04:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=emurgo-io.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=D4TNVRaxiqZo3CBcgaG/OxTNmMYPE0s4kmXedy+GVAw=; b=jguaL4RWy1TzZ2qUTFYUJ2pJ/GroUILRTGgZMHxvUI59VrnSVte6eD6iMC8BSrC6Be xS55Ibu/iGZ4CAREoQjlIP8JCvB80MmZ4QjgueLcRLkP5hTa5Sdt+Q9Nk+uF04Dumort ARcqLsHXXa+L7jDb35p8EHjaYCzu0BiH4TVY+bqaV+GAt45ZHvSXNYIf3E5Ixm4LIbZ6 WqJtwy54x33m7Yy7bozXHzmC4qPp3ZYuX8Kq/rpm9gMHEdi44inV3t7vNYWd6xGoc2Ap nzn0B/hDqhFW9GdHJ9HdH7E1chaztplx1a9TKUP2bEeQoK0LqfxGwj9n1cHA/PBXzvIL pr8A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=D4TNVRaxiqZo3CBcgaG/OxTNmMYPE0s4kmXedy+GVAw=; b=OfeQDU4n+vJISyykFmibOVHwcfVdJxbDPDylKpv1RMQDCoUeBuN+6xIfZdIkupcMVZ ZHGrMwfJnZzIRycPTMP4+puVG2K2/M4BdDTDCtUobLJkdKRjMn1QRCaDK3hhMQoM38pP O3BvEo0pTZRudrt4lGiR4PTOdX5jcoKxbOMcwegYUOcvi7SPeds0CTAiUQHXie72fG+x cDFMJk9NASXBpkI5OJLdbUOlu/BBrDFFSVJNOoVQ8H/2lEO37C9ImypE0gQg5uBo55dn f6d7fVS1O5B8uENaP+Jh0ERXcW3CDKI3YDTrdFT3daZyNlRtYYhsprxGu0VgvQKTYElV l4vw==
X-Gm-Message-State: AOAM532erhBVajra2lyu+8lZB8Ui4SKT9bwwfVhmyoTUT0wDxgITlrK1 0pFOPQyqO38t7PIFwQ0zQO6e/jQLb5XfzRone7FjfQ==
X-Google-Smtp-Source: ABdhPJwhV0ez/fY9pvaeafhfMeDNpoENWUldbzI8xzUESug46Dy6t3sS2NhsFFq6gCONtc1iuKBvxi+g/CNbcEGqB94=
X-Received: by 2002:a17:906:c7da:: with SMTP id dc26mr921048ejb.491.1599609881640; Tue, 08 Sep 2020 17:04:41 -0700 (PDT)
MIME-Version: 1.0
References: <CAFhANbEhUFV23t_MCEO372NcpfAeeSLZREE8f0BtY7DBhjCP_Q@mail.gmail.com> <799522E6-EB4D-4A3F-80B8-4477ECE623C1@tzi.org>
In-Reply-To: <799522E6-EB4D-4A3F-80B8-4477ECE623C1@tzi.org>
From: Robert Aftias <rob@emurgo.io>
Date: Tue, 08 Sep 2020 19:04:30 -0500
Message-ID: <CAPGVviZBVMKfri+__VbH0gMh6-vC7kVmJcB6ULnHJUL_VVZetQ@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: Sebastien Guillemot <sebastien@emurgo.io>, cbor@ietf.org, Nicolas Arqueros <nicolas@emurgo.io>
Content-Type: multipart/alternative; boundary="000000000000c650ee05aed63223"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/ws44uIYzHWlavT0Hq_0u5QJBEFI>
X-Mailman-Approved-At: Tue, 08 Sep 2020 17:59:41 -0700
Subject: Re: [Cbor] CDDL Codegen in production
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Sep 2020 00:16:55 -0000

Hi Carsten

I'm the dev of our cddl-codegen tool. To address your comments most of the
limitations were simply due to time constraints and priorities. The one
thing that I can think of that wasn't purely a time issue is being able to
define human readable names to types/fields in some contexts: specifically
map keys where the key value is not very informative. I thought about using
comments to do it, but in the cddl-parsing library that Andrew wrote the
comments are stripped out. However, even with comments being preserved,
especially without taking whitespace into account, associating the comments
unambiguously to fields/types would be difficult when for example map keys
are units or other types that don't provide much context.

This I imagine could be a common situation (maps with uint keys) as it
shrinks the resulting CBOR size vs having descriptive key names. Encoding
as an array (so we don't store the keys in cbor and we can have descriptive
names) with optional fields can result in more complicated deserialization
code in some situations due to increased ambiguity during deserialization
(think [?uint, uint, ?text, text] vs { ? 0: uint, 1: uint, ? 2: text, 3:
text}) which is part of the reason why our codegen lib doesn't support
optional fields in deserialization, along with time constraints and it not
being used in our partner's cddl files.

For example to arrive at a meaningful name for key 1 we could comment it as:

foo = {
 ? 0 : bar,
 ; meaningful-name
 ? 1: uint,
 ? 2: baz
}

or

foo = {
 ? 0 : bar,
 ? 1 : uint, ; meaningful-name
 ? 2 : baz
}

but AST-wise those would be in different places - between keys 0 and 1 or 1
and 2 in the two examples, so without taking whitespace into account it's
ambiguous which key the comment is describing. Even with whitespace it's a
little subjective if you formatted the cddl in other ways.

Even if we don't use the comment for the field name it would be very nice
to be able to inject the comment into the resulting rust code.

Our cddl-codgen tool will use the field/key name as the rust field name if
it exists, and use the type otherwise, but in this situation it's not very
helpful to have a bunch of fields named key_0, key_1, key_2, etc.

This also happens when you inline types within other types like
foo = [
    0: [uint, text].
    1: { * uint => text }
]
since we need to define new types here for both due to wasm-bindgen (rust
to wasm auto-conversion) restrictions. However, in this case it's not a big
deal at all since it'd probably be better practice anyway to just define
new types and refer to them in the outer type and this wouldn't impact the
CBOR representation of the CDDL. We did that to get around uninformative
names for variants of group choices too.

Maybe there's another reasonable approach though for the field names where
restructuring without changing the underlying CBOR can't achieve it?

A cddl-to-cddl topological rule sorter would possibly be useful to have for
others. Right now to get around that our lib does a multi-pass over the
cddl for figuring out if some features are unsupported transitively as I
didn't have the time to implement a proper type dependency system. The
multi-pass gets the job done but it's not the most elegant solution. It
does however preserve declaration-order which could be desired in the
resulting codegen. For sockets I just didn't look into how codegen would be
impacted at all since the cddl files we were parsing didn't use them in a
necessary manner so I didn't devote any time to it.

A ccdl-to-cddl generics resolver would get rid of the need for us to
support generics. Andrew's cddl parsing lib does have generic information
that I believe I could use to implement generics in our codegen tool, but I
just had other priorities and didn't have time to try and implement it yet.

Thanks,
Rob

On Fri, Sep 4, 2020 at 4:05 AM Carsten Bormann <cabo@tzi.org> wrote:

> Hi Sebastien,
>
> this is very encouraging news!
>
> Yes, I’m very curious.  We are seeing a lot of pickup of CBOR in the
> Fintech space, and it is nice to see that CDDL is being embraced, too.
>
> I see that you have open-sourced the code generator at
> https://github.com/Emurgo/cddl-codegen (with MIT License).
> I’m impressed to see that you achieved all that in ~2700 lines of code.
>
> > There are still rough edges to our codegen library because some features
> of CDDL don't work well with codegen (hopefully CDDL can improve this going
> forward!),
>
> We are certainly interested in hearing what CDDL features are troublesome
> to implement in a code generator.  The github repo lists a number of
> features that are not implemented, but doesn’t say whether you just didn’t
> need them or they are simply too much work.
>
> Some of these features could be addressed by CDDL-to-CDDL preprocessors.
> E.g., maybe one of the functions I could add to the cddlc tool is a
> Generics resolver, as well as a topological rule sorter (which might then
> also do the necessary work for sockets).
>
> Grüße, Carsten
>
>