Re: [Cbor] Eric Rescorla's Discuss on draft-ietf-cbor-cddl-06: (with DISCUSS and COMMENT)

Carsten Bormann <cabo@tzi.org> Wed, 21 November 2018 13:10 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 24CBE130DBE; Wed, 21 Nov 2018 05:10:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ue2u8-lYXffS; Wed, 21 Nov 2018 05:10:37 -0800 (PST)
Received: from mailhost.informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 32F0612EB11; Wed, 21 Nov 2018 05:10:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from submithost.informatik.uni-bremen.de (submithost2.informatik.uni-bremen.de [IPv6:2001:638:708:30c8:406a:91ff:fe74:f2b7]) by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id wALDAQHh023223; Wed, 21 Nov 2018 14:10:31 +0100 (CET)
Received: from [192.168.217.114] (p54A6CE66.dip0.t-ipconnect.de [84.166.206.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id 430NGn3356z1Bqk; Wed, 21 Nov 2018 14:10:25 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <CABcZeBMb8nV-Bm1R3n9Dk-TyWZjwcOe7dxhAFfHaQRyPcBAROQ@mail.gmail.com>
Date: Wed, 21 Nov 2018 14:10:24 +0100
Cc: IESG <iesg@ietf.org>, cbor@ietf.org, Barry Leiba <barryleiba@computer.org>, cbor-chairs@ietf.org, draft-ietf-cbor-cddl@ietf.org
X-Mao-Original-Outgoing-Id: 564498620.182718-bd85c8e7d25904cbd6a2d7057ac33b2e
Content-Transfer-Encoding: quoted-printable
Message-Id: <6882FA6E-B656-4539-AB4D-CF9272B7A269@tzi.org>
References: <154267838656.26631.4178048052675609107.idtracker@ietfa.amsl.com> <D99E5141-4C3E-475F-8F25-3018BBC2B484@tzi.org> <CABcZeBMb8nV-Bm1R3n9Dk-TyWZjwcOe7dxhAFfHaQRyPcBAROQ@mail.gmail.com>
To: Eric Rescorla <ekr@rtfm.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/8nj4r-9YKKHeilGdlxxggajXwuo>
Subject: Re: [Cbor] Eric Rescorla's Discuss on draft-ietf-cbor-cddl-06: (with DISCUSS and COMMENT)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Nov 2018 13:10:41 -0000

Hi Eric,

you probably won’t have time to read this before the telechat, but I would like to keep the ball rolling.

(For some reason, my mail reader doesn’t pick up the quoting correctly in this reply; I hope it is still readable enough.)

Grüße, Carsten


> On Nov 20, 2018, at 20:15, Eric Rescorla <ekr@rtfm.com> wrote:
> 
> 
> 
> On Tue, Nov 20, 2018 at 8:14 AM Carsten Bormann <cabo@tzi.org> wrote:
> Hi Eric,
> 
> thank you for this detailed review.
> 
> Comments below.
> 
> Grüße, Carsten
> 
> 
> 
> > On Nov 20, 2018, at 02:46, Eric Rescorla <ekr@rtfm.com> wrote:
> > 
> > Eric Rescorla has entered the following ballot position for
> > draft-ietf-cbor-cddl-06: Discuss
> > 
> > When responding, please keep the subject line intact and reply to all
> > email addresses included in the To and CC lines. (Feel free to cut this
> > introductory paragraph, however.)
> > 
> > 
> > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> > for more information about IESG DISCUSS and COMMENT positions.
> > 
> > 
> > The document, along with other ballot positions, can be found here:
> > https://datatracker.ietf.org/doc/draft-ietf-cbor-cddl/
> > 
> > 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > Rich version of this review at:
> > https://mozphab-ietf.devsvcdev.mozaws.net/D4234
> > 
> > 
> > I am marking this document discussed because I have concerns about
> > whether this document can be interoperably implemented. I have noted a
> > number of points below.
> > 
> > DETAIL
> > S 3.5.2.
> >>     point.)
> >> 
> >>  3.5.2.  Tables
> >> 
> >>     A table can be specified by defining a map with entries where the
> >>     keytype is not single-valued, e.g.:
> > 
> > this is the first use of the term single-valued, so I don't know how
> > to interpret this.
> 
> A single-valued type is a type that consists of only a single value.
> (In CDDL, constants give rise to single-valued types — there is no separate concept for carrying around values.)
> 
> I have made this more explicit now:
> 
>    before a colon).  A string also is a type (that contains a single
>    value only — the given string), so another
>    form for this example is:
> 
> If you want to say “single-valued type" later, you need to define it here

I now expanded the overly terse expression single-valued (and one instance of multi-valued), in 9b06c30

> (567e8ee in the current github version https://github.com/cbor-wg/cddl )
> 
> > More generally: it seems like:
> > 
> > ```
> >  square-roots = {x => y}
> >                             x = int
> >                             y = float
> > ```
> > 
> > Defines a map and yet
> > 
> > ```
> >  square-roots = {x => y}
> >                             y = float
> > ```
> > 
> > Defines a struct. Is that correct? If so, does that mean that I don't
> > know whether something is a map or a struct until I ahve parsed the
> > whole definition?
> 
> Struct, table, etc. are " four loosely
> distinguishable styles of composition”.  
> There is no interoperability semantics to these usage styles.
> 
> Maps and arrays are the two kinds of containers offered by CBOR.  Maps are suitable for carrying around tables and structs, arrays for vectors and records.  In the table/vector case, the map/array is indeed the high level semantics implied by the style; for struct/record map/array are just convenient containers.
> 
> I get that, but I don't think it's responsive here. In the former case {1: 1.0} is a valid input, whereas in the latter case, {"x": 1.0} is a valid input. And as far as I can tell, until I have seen the definition for |x| I can't tell which one. Is that correct?

I’m not sure I understand the question; maybe you mean

> square-roots = {x: y}
>                              y = float

… for the second case?
x is not an identifier then, but a bare word, which gets translated to

> square-roots = {“x” => y}
>                             y = float


Am I on the right track understanding your question?

> > S 3.5.3.
> >>                        mynumber = int / float
> >> 
> >>  3.5.3.  Non-deterministic order
> >> 
> >>     While the way arrays are matched is fully determined by the Parsing
> >>     Expression Grammar (PEG) algorithm, matching is more complicated for
> > 
> > PEG is an informative reference, and this text seems to create a
> > normative dependency.
> 
> Many of our specifications make use of theory in one form or another without needing a normative dependency to a textbook.
> RFC 5234 does not even have a reference for BNF.  Since PEGs have been a staple for parsing theory only for less than 15 years, we thought a reference might be useful.
> 
> Then you need to rewrite this text, because here and below, it implies that I need to read PEG in order to understand this specification. 

That will be a bit more work than I can do today, so there will be a separate e-mail when that is done.
 
> > S 3.6.
> >>                            * tstr => any
> >>                          }
> >> 
> >>  3.6.  Tags
> >> 
> >>     A type can make use of a CBOR tag (major type 6) by using the
> > 
> > What happens if I define a type twice? Is that permitted?
> 
> Adam Roach also brought this up — which led to 60c22df.
> 
> I’m surprised that you allow any redefinition or all.

That turned up to be useful when we started to add often-used definitions to the prelude — old specs that contained those definitions already just continued to work (while specs that conflicted with those new prelude definitions got an error).

While there is no plan to further expand the prelude, a process of combining two specifications by naively concatenating them is still helped by allowing identical redefinitions.

> > S 3.7.
> >>                              buuid = #6.37(bstr)
> >> 
> >>     In the following example, usage of the tag 32 for URIs is optional:
> >> 
> >>                          my_uri = #6.32(tstr) / tstr
> >> 
> > 
> > I am basically unable to make sense of this section. Your previous
> > example of tags used #7.25 but here you are specifying everything as
> > using 6.
> 
> 6 is CBOR’s major type for tags.  #7.25 is not a tag, but a (16-bit) floating point value.
> 
> Yes, I understand that.
> 
> 
> The example shows how to construct tagged items by using the #6.xxx(ttt) syntax, where xxx is a tag value and ttt is the type of the data item enclosed by the tag.
> 
> > It seems like the semantics here are something to the effect of:
> > 
> > X = #6.Y(Z)
> > 
> > means: act as if this were a thing of type Z but it’s tagged by Y.

That is not how I would describe what tags are about.

E.g., using CBOR diagnostic notation, h’123456789a’ is a byte string, but 2(h’123456789a’) is the number 78187493520.
This does no longer act as if it were a byte string.  It is tagged by the tag number 2 (which stands for unsigned bignum).

A type for tagged data items like this is described in the prelude by:

biguint = #6.2(bstr)

> > 
> > Is that correct? But then is this about the wire encoding or the
> > interpretation or both?

It is about the interpretation (data model), which of course influences the wire encoding (but only as much as that is dictated by the data model).

> And in either case, what if what appears on
> > the wire has a different tag.

That would then not match #6.Y(Z).

> Tags are visible at the CBOR data model level.
> A tag with a different value than Y would not match #6.Y(Z), independent of whether the enclosed data item maps Z.
> 
> I don't understand what this means. Can you please answer the specific questions I asked above?

I now tried harder.
 
> > S 3.8.2.
> >>                          cwr: 15,
> >>                          ns: 0,
> >>                        ) / (4..7) ; data offset bits
> >> 
> >>                        rwxbits = uint .bits rwx
> >>                        rwx = &(r: 2, w: 1, x: 0)
> > 
> > What is the scope of the definition for r, w, and x? is it global.
> 
> There is no scope — everything is global.
> (The only scopes are for the formal parameters of generics.)
> 
> So, r, w, and x are global?

The answer is a bit more complicated, as “r”, “w”, and “x” are constants (text strings), due to the way

  r: 2

…is a shorthand for

  “r” => 2

> > S 3.9.
> >>              $$tcp-option //= (
> >>              sack-permitted: true
> >>              )
> >> 
> >>     Names that start with a single "$" are "type sockets", names with a
> >>     double "$$" are "group sockets".  It is not an error if there is no
> > 
> > what is the difference between these two?
> 
> Type sockets take additional types via /=.
> Group sockets take alternative groups via //=.
> 
> Please add this to the text.

Now in b2b2b3b.

> > S 7.3.
> >>     order of the rules given.  (It is not an error to extend a rule name
> >>     that has not yet been defined; this makes the right hand side the
> >>     first entry in the choice being created.)
> >> 
> >>     genericparm = "<" S id S *("," S id S ) ">"
> >>     genericarg = "<" S type1 S *("," S type1 S ) ">"
> > 
> > What is the meaning of  <a/b>
> 
> I don’t know — currently it is not allowed to have type choices in a generic argument.
> (Maybe I’m missing some context.)
> 
> No, I misread the ABNF for type a stype1.
> 
> 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > S 1.2.
> >>     capitals, as shown here.
> >> 
> >>  1.2.  Terminology
> >> 
> >>     New terms are introduced in _cursive_.  CDDL text in the running text
> >>     is in "typewriter".
> > 
> > I think you mean that these types of text will be indicated by being
> > bracketed by _ or ", but this isn't clear from this text. It's
> > especially not clear when you then quote byte and octet.
> 
> XML2RFC currently uses quotes for indicating typewriter text.
> 
> Well, this just turns into quoted text in the text version of the RFC, so it's
> not typewriter text. If you're going to use syntactic conventions, you need
> to more clearly state them for the text version.

Now in 98c2bd3

> > S 2.
> >>     There are a number of more or less atomic elements of a CBOR data
> >>     model, such as numbers, simple values (false, true, nil), text and
> >>     byte strings; CDDL does not focus on specifying their structure.
> >>     CDDL of course also allows adding a CBOR tag to a data item.
> >> 
> >>     The more important components of a data structure definition language
> > 
> > More important than what?
> 
> Than the atomic elements (because these are mostly trivial to specify).
> 
> Please add this to the text. 

Well, maybe it’s not that important that they are more important (which is subjective, anyway).

Now in 0289e69

> > S 2.2.3.
> >>     in the prelude:
> >> 
> >>        my_breakfast = #6.55799(breakfast)   ; cbor-any is too general!
> >>        breakfast = cereal / porridge
> >>        cereal = #6.998(tstr)
> >>        porridge = #6.999([liquid, solid])
> > 
> > What is the parenthetical syntax here?
> 
> This is the syntax for specifying tags and the type for their enclosed value.
> 
> I don't understand that. Please expand.

Maybe I don’t understand what part you mean by parenthetical syntax?

The part in parentheses, i.e. the Y in #6.X(Y), is the data type of the data item to be tagged.
In these examples, the data types are
breakfast
tstr
[liquid, solid]  (an array with exactly two members, the first of type liquid, the second of type solid).

> > S 3.1.
> >>        according to the respective syntactic rules of that definition.
> >> 
> >>     o  A name can consist of any of the characters from the set {'A',
> >>        ..., 'Z', 'a', ..., 'z', '0', ..., '9', '_', '-', '@', '.', '$'},
> >>        starting with an alphabetic character (including '@', '_', '$')
> >>        and ending in one or a digit.
> > 
> > This seems to say that names end in a digit, but none of your name
> > examples do.
> 
> Clarified in 9a611bb
> 
> > S 3.2.
> >>     An optional _occurrence_ indicator can be given in front of a group
> >>     entry.  It is either one of the characters '?' (optional), '*' (zero
> >>     or more), or '+' (one or more), or is of the form n*m, where n and m
> >>     are optional unsigned integers and n is the lower limit (default 0)
> >>     and m is the upper limit (default no limit) of occurrences.
> >> 
> > 
> > isn’t bare * then just a degenerate case of n*m
> 
> Yes.
> It is still useful to also show * besides ? and +.
> 
> Sure. I just meant that you might just explain n*m rather than *

Right, the bare * is redundant, but we still like it to be in this list.
Should we make the fact that * is just n*m more explicit?
(We were assuming readers would recognize this from ABNF.)


> > S 3.5.1.
> >>     written with quoted strings in the member key positions.  More
> >>     generally, all the types defined can be used in a keytype position by
> >>     following them with a double arrow -- in particular, the double arrow
> >>     is necessary if a type is named by an identifier (which would be
> >>     interpreted as a string before a colon).  A string also is a (single-
> >>     valued) type, so another form for this example is:
> > 
> > I'm sorry, I'm not following this. Can you give an example of when you
> > have to use =>
> 
> If you need to use a name for a type.  With “:”, names become bare words (strings without the “…” decoration).
> 
> OK, this isn’t clear, so maybe you can rewrite it.

Now in 3ebf61d
> 
> 
> > S 3.10.
> >>        message<t, v> = {type: t, value: v}
> >> 
> >>     When using a generic rule, the formal parameters are bound to the
> >>     actual arguments supplied (also using angle brackets), within the
> >>     scope of the generic rule (as if there were a rule of the form
> >>     parameter = argument).
> > 
> > This looks more like a macro than a generic.
> 
> It is not just lexical replacement, which is what “macro” usually implies.
> 
> In what way is it not lexical replacement

The arguments need to be well-formed, parseable as a type1, and they are handed in as such to the rule.

E.g., this does not work like lexical substitution would

  foo<parm> = { alice: parm }

  bar = foo< int  
             bob: float >

does not give you

  bar = { alice: int
          bob: float }

which could also be written

  bar = { alice: int,  bob: float }

> (Strictly speaking, it is comparable to a “macro” in the Scheme tradition, but the way this concept is used in CDDL is now better known as “generic type”.)
> 
> > S 7.3.
> >>     definition with different member names); RFC 7071 could be read to
> >>     forbid the repetition of ext-value ("A specific reputon-element MUST
> >>     NOT appear more than once" is ambiguous.)
> >> 
> >>     The CDDL tool (which hasn't quite been trained for polite
> >>     conversation) says:
> > 
> > It seems like it might be a good idea to clean up the examples here so
> > they are in fact polite.
> 
> We can do that if it is considered desirable.
> 
> Near-mandatory, I would think.

I gave the tool another chance.
(It still has a much larger vocabulary than I have, but the new result sounds trigger-free to me.)
Now in f107464

> 
> 
> > S 7.3.
> >>        "Thumbnail": {"Width": 1111, "Height": 176, "Url": 32("scrog")},
> >>        "IDs": []}}
> >> 
> >>  Appendix B.  ABNF grammar
> >> 
> >>     The following is a formal definition of the CDDL syntax in Augmented
> > 
> > It's pretty odd to have the formal specification -- which clearly is
> > needed to understand and implement this document — in an appendix.
> 
> In -06, we have prefixed the normative appendices with
> 
>    This appendix is normative.
> 
> (So many documents reference specific appendices of draft-ietf-cbor-cddl that we tried to minimize confusion by minimizing moving them around.)
> 
> 
> 
> > S 7.3.
> >>     type = type1 S *("/" S type1 S)
> >> 
> >>     A type can be given as a choice between one or more types.  The
> >>     choice matches a data item if the data item matches any one of the
> >>     types given in the choice.  The choice uses Parsing Expression
> >>     Grammar [PEG] semantics: The first choice that matches wins.  (As a
> > 
> > Is this the only information I need?
> 
> I think that is the essence; it certainly helps to be familiar with PEG theory.
> 
> Then you should probably rewrite this text to make thatclear.

I’ll put that into the PEG rewrite bin, for the next round.

> 
> -Ekr
> 
> 
> > S 7.3.
> >>     control operators.
> >> 
> >>     group = grpchoice S *("//" S grpchoice S)
> >> 
> >>     A group matches any sequence of key/value pairs that matches any of
> >>     the choices given (again using Parsing Expression Grammar semantics).
> > 
> > This also seems not fully defined.
> 
> Similar.  We did add Section 3.5.3 to provide more details for the more complicated map matching case.
> 
> .oOo.