Re: [Cbor] Eric Rescorla's Discuss on draft-ietf-cbor-cddl-06: (with DISCUSS and COMMENT)

Carsten Bormann <cabo@tzi.org> Tue, 20 November 2018 16:14 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4D47A12D4F2; Tue, 20 Nov 2018 08:14:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R659oQGWosT7; Tue, 20 Nov 2018 08:14:33 -0800 (PST)
Received: from mailhost.informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 384D612426A; Tue, 20 Nov 2018 08:14:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from submithost.informatik.uni-bremen.de (submithost2.informatik.uni-bremen.de [IPv6:2001:638:708:30c8:406a:91ff:fe74:f2b7]) by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id wAKGEM7C014390; Tue, 20 Nov 2018 17:14:28 +0100 (CET)
Received: from sev.informatik.uni-bremen.de (sev.informatik.uni-bremen.de [134.102.218.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id 42zrPV6CKJz1Bqk; Tue, 20 Nov 2018 17:14:22 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <154267838656.26631.4178048052675609107.idtracker@ietfa.amsl.com>
Date: Tue, 20 Nov 2018 17:14:22 +0100
Cc: The IESG <iesg@ietf.org>, cbor@ietf.org, Barry Leiba <barryleiba@computer.org>, cbor-chairs@ietf.org, draft-ietf-cbor-cddl@ietf.org
X-Mao-Original-Outgoing-Id: 564423258.791311-e1cccfca3be5c9f99d81d554ce62e51c
Content-Transfer-Encoding: quoted-printable
Message-Id: <D99E5141-4C3E-475F-8F25-3018BBC2B484@tzi.org>
References: <154267838656.26631.4178048052675609107.idtracker@ietfa.amsl.com>
To: Eric Rescorla <ekr@rtfm.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/rrY_IijGAL7WC7q73DGCxyV35Fg>
Subject: Re: [Cbor] Eric Rescorla's Discuss on draft-ietf-cbor-cddl-06: (with DISCUSS and COMMENT)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 16:14:38 -0000

Hi Eric,

thank you for this detailed review.

Comments below.

Grüße, Carsten



> On Nov 20, 2018, at 02:46, Eric Rescorla <ekr@rtfm.com> wrote:
> 
> Eric Rescorla has entered the following ballot position for
> draft-ietf-cbor-cddl-06: Discuss
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-cbor-cddl/
> 
> 
> 
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> Rich version of this review at:
> https://mozphab-ietf.devsvcdev.mozaws.net/D4234
> 
> 
> I am marking this document discussed because I have concerns about
> whether this document can be interoperably implemented. I have noted a
> number of points below.
> 
> DETAIL
> S 3.5.2.
>>     point.)
>> 
>>  3.5.2.  Tables
>> 
>>     A table can be specified by defining a map with entries where the
>>     keytype is not single-valued, e.g.:
> 
> this is the first use of the term single-valued, so I don't know how
> to interpret this.

A single-valued type is a type that consists of only a single value.
(In CDDL, constants give rise to single-valued types — there is no separate concept for carrying around values.)

I have made this more explicit now:

   before a colon).  A string also is a type (that contains a single
   value only — the given string), so another
   form for this example is:

(567e8ee in the current github version https://github.com/cbor-wg/cddl )

> More generally: it seems like:
> 
> ```
>  square-roots = {x => y}
>                             x = int
>                             y = float
> ```
> 
> Defines a map and yet
> 
> ```
>  square-roots = {x => y}
>                             y = float
> ```
> 
> Defines a struct. Is that correct? If so, does that mean that I don't
> know whether something is a map or a struct until I ahve parsed the
> whole definition?

Struct, table, etc. are " four loosely
distinguishable styles of composition”.  
There is no interoperability semantics to these usage styles.

Maps and arrays are the two kinds of containers offered by CBOR.  Maps are suitable for carrying around tables and structs, arrays for vectors and records.  In the table/vector case, the map/array is indeed the high level semantics implied by the style; for struct/record map/array are just convenient containers.

CDDL could be defined without discussing these usage styles.
The text is meant to lead in to the way CDDL is composed of types and the construct of a _group_, which enables the description of structs and records (maps/arrays that are not homogeneous tables or vectors).

> S 3.5.3.
>>                        mynumber = int / float
>> 
>>  3.5.3.  Non-deterministic order
>> 
>>     While the way arrays are matched is fully determined by the Parsing
>>     Expression Grammar (PEG) algorithm, matching is more complicated for
> 
> PEG is an informative reference, and this text seems to create a
> normative dependency.

Many of our specifications make use of theory in one form or another without needing a normative dependency to a textbook.
RFC 5234 does not even have a reference for BNF.  Since PEGs have been a staple for parsing theory only for less than 15 years, we thought a reference might be useful.

> S 3.6.
>>                            * tstr => any
>>                          }
>> 
>>  3.6.  Tags
>> 
>>     A type can make use of a CBOR tag (major type 6) by using the
> 
> What happens if I define a type twice? Is that permitted?

Adam Roach also brought this up — which led to 60c22df.

> S 3.7.
>>                              buuid = #6.37(bstr)
>> 
>>     In the following example, usage of the tag 32 for URIs is optional:
>> 
>>                          my_uri = #6.32(tstr) / tstr
>> 
> 
> I am basically unable to make sense of this section. Your previous
> example of tags used #7.25 but here you are specifying everything as
> using 6.

6 is CBOR’s major type for tags.  #7.25 is not a tag, but a (16-bit) floating point value.
The example shows how to construct tagged items by using the #6.xxx(ttt) syntax, where xxx is a tag value and ttt is the type of the data item enclosed by the tag.

> It seems like the semantics here are something to the effect of:
> 
> X = #6.Y(Z)
> 
> means: act as if this were a thing of type Z but it's tagged by Y.
> 
> Is that correct? But then is this about the wire encoding or the
> interpretation or both? And in either case, what if what appears on
> the wire has a different tag.

Tags are visible at the CBOR data model level.
A tag with a different value than Y would not match #6.Y(Z), independent of whether the enclosed data item maps Z.

> S 3.8.2.
>>                          cwr: 15,
>>                          ns: 0,
>>                        ) / (4..7) ; data offset bits
>> 
>>                        rwxbits = uint .bits rwx
>>                        rwx = &(r: 2, w: 1, x: 0)
> 
> What is the scope of the definition for r, w, and x? is it global.

There is no scope — everything is global.
(The only scopes are for the formal parameters of generics.)

> S 3.9.
>>              $$tcp-option //= (
>>              sack-permitted: true
>>              )
>> 
>>     Names that start with a single "$" are "type sockets", names with a
>>     double "$$" are "group sockets".  It is not an error if there is no
> 
> what is the difference between these two?

Type sockets take additional types via /=.
Group sockets take alternative groups via //=.

> S 7.3.
>>     order of the rules given.  (It is not an error to extend a rule name
>>     that has not yet been defined; this makes the right hand side the
>>     first entry in the choice being created.)
>> 
>>     genericparm = "<" S id S *("," S id S ) ">"
>>     genericarg = "<" S type1 S *("," S type1 S ) ">"
> 
> What is the meaning of  <a/b>

I don’t know — currently it is not allowed to have type choices in a generic argument.
(Maybe I’m missing some context.)

> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> S 1.2.
>>     capitals, as shown here.
>> 
>>  1.2.  Terminology
>> 
>>     New terms are introduced in _cursive_.  CDDL text in the running text
>>     is in "typewriter".
> 
> I think you mean that these types of text will be indicated by being
> bracketed by _ or ", but this isn't clear from this text. It's
> especially not clear when you then quote byte and octet.

XML2RFC currently uses quotes for indicating typewriter text.
The text also uses quotes for the usual purposes.
The intro could be improved, but the gamut in XML2RFC is right now limited.

> S 2.
>>     There are a number of more or less atomic elements of a CBOR data
>>     model, such as numbers, simple values (false, true, nil), text and
>>     byte strings; CDDL does not focus on specifying their structure.
>>     CDDL of course also allows adding a CBOR tag to a data item.
>> 
>>     The more important components of a data structure definition language
> 
> More important than what?

Than the atomic elements (because these are mostly trivial to specify).

> S 2.1.
>>                               }
>> 
>>                   Figure 1: Using a group directly in a map
>> 
>>     The three entries of the group are written between the curly braces
>>     that create the map: Here, "age", "name", and "employer" are the
> 
> This is kind of a confusing way to introduce this. Its seems like the
> relevant point is that:
> 
> () makes a group
> {()} or {} makes a map.
> 
> No?

We have tried a number of ways to say this.  Maybe this very short form can work better as an introduction.

> S 2.1.
>>                            }
>> 
>>                            dog = {
>>                              identity,
>>                              leash-length: float,
>>                            }
> 
> So to be clear, this is a mixin, as in Go structs?

Yes.

> S 2.1.2.
>>     only one big protocol data unit that has all definitions ad hoc where
>>     needed.
>> 
>>  2.1.2.  Syntax
>> 
>>     The composition syntax intends to be concise and easy to read:
> 
> Syntax does not intend. perhasp you mean it is intended to be.

Yes.

Now 09f6b55

> 
> 
> S 2.2.3.
>>     tool when displaying integers that are taken from that choice).
>> 
>>  2.2.3.  Representation Types
>> 
>>     CDDL allows the specification of a data item type by referring to the
>>     CBOR representation (major and minor numbers).  How this is used
> 
> RFC 7049 does not refer to “minor" numbers.

(Already fixed in -06, 05b34be.)

> S 2.2.3.
>>     in the prelude:
>> 
>>        my_breakfast = #6.55799(breakfast)   ; cbor-any is too general!
>>        breakfast = cereal / porridge
>>        cereal = #6.998(tstr)
>>        porridge = #6.999([liquid, solid])
> 
> What is the parenthetical syntax here?

This is the syntax for specifying tags and the type for their enclosed value.

> S 3.1.
>>        according to the respective syntactic rules of that definition.
>> 
>>     o  A name can consist of any of the characters from the set {'A',
>>        ..., 'Z', 'a', ..., 'z', '0', ..., '9', '_', '-', '@', '.', '$'},
>>        starting with an alphabetic character (including '@', '_', '$')
>>        and ending in one or a digit.
> 
> This seems to say that names end in a digit, but none of your name
> examples do.

Clarified in 9a611bb

> S 3.2.
>>     An optional _occurrence_ indicator can be given in front of a group
>>     entry.  It is either one of the characters '?' (optional), '*' (zero
>>     or more), or '+' (one or more), or is of the form n*m, where n and m
>>     are optional unsigned integers and n is the lower limit (default 0)
>>     and m is the upper limit (default no limit) of occurrences.
>> 
> 
> isn’t bare * then just a degenerate case of n*m

Yes.
It is still useful to also show * besides ? and +.

> S 3.5.1.
>>     written with quoted strings in the member key positions.  More
>>     generally, all the types defined can be used in a keytype position by
>>     following them with a double arrow -- in particular, the double arrow
>>     is necessary if a type is named by an identifier (which would be
>>     interpreted as a string before a colon).  A string also is a (single-
>>     valued) type, so another form for this example is:
> 
> I'm sorry, I'm not following this. Can you give an example of when you
> have to use =>

If you need to use a name for a type.  With “:”, names become bare words (strings without the “…” decoration).

> S 3.10.
>>        message<t, v> = {type: t, value: v}
>> 
>>     When using a generic rule, the formal parameters are bound to the
>>     actual arguments supplied (also using angle brackets), within the
>>     scope of the generic rule (as if there were a rule of the form
>>     parameter = argument).
> 
> This looks more like a macro than a generic.

It is not just lexical replacement, which is what “macro” usually implies.
(Strictly speaking, it is comparable to a “macro” in the Scheme tradition, but the way this concept is used in CDDL is now better known as “generic type”.)

> S 7.3.
>>     definition with different member names); RFC 7071 could be read to
>>     forbid the repetition of ext-value ("A specific reputon-element MUST
>>     NOT appear more than once" is ambiguous.)
>> 
>>     The CDDL tool (which hasn't quite been trained for polite
>>     conversation) says:
> 
> It seems like it might be a good idea to clean up the examples here so
> they are in fact polite.

We can do that if it is considered desirable.

> S 7.3.
>>        "Thumbnail": {"Width": 1111, "Height": 176, "Url": 32("scrog")},
>>        "IDs": []}}
>> 
>>  Appendix B.  ABNF grammar
>> 
>>     The following is a formal definition of the CDDL syntax in Augmented
> 
> It's pretty odd to have the formal specification -- which clearly is
> needed to understand and implement this document — in an appendix.

In -06, we have prefixed the normative appendices with

   This appendix is normative.

(So many documents reference specific appendices of draft-ietf-cbor-cddl that we tried to minimize confusion by minimizing moving them around.)


> S 7.3.
>>        CRLF = %x0A / %x0D.0A
>> 
>>                             Figure 14: CDDL ABNF
>> 
>>     Note that this ABNF does not attempt to reflect the detailed rules of
>>     what can be in a prefixed byte string.
> 
> I am trusting others to have read the ABNF.
> 
> 
> S 7.3.
>>     for a group expression (production "grpent"), with the intention that
>>     the semantics does not change when the name is replaced by its
>>     (parenthesized if needed) definition.  Note that whether the name
>>     defined by a rule stands for a type or a group isn't always
>>     determined by syntax alone: e.g., "a = b" can make "a" a type if "b"
>>     is one, or a group if "b" is one.  More subtly, in "a = (b)", "a" may
> 
> It would be clearer to say ‘if "b" is a type' and 'if "b" is a group'

Now 9afb8cb.

> S 7.3.
>>     type = type1 S *("/" S type1 S)
>> 
>>     A type can be given as a choice between one or more types.  The
>>     choice matches a data item if the data item matches any one of the
>>     types given in the choice.  The choice uses Parsing Expression
>>     Grammar [PEG] semantics: The first choice that matches wins.  (As a
> 
> Is this the only information I need?

I think that is the essence; it certainly helps to be familiar with PEG theory.

> S 7.3.
>>     control operators.
>> 
>>     group = grpchoice S *("//" S grpchoice S)
>> 
>>     A group matches any sequence of key/value pairs that matches any of
>>     the choices given (again using Parsing Expression Grammar semantics).
> 
> This also seems not fully defined.

Similar.  We did add Section 3.5.3 to provide more details for the more complicated map matching case.

.oOo.