Re: [Cbor] Benjamin Kaduk's No Objection on draft-ietf-cbor-cddl-05: (with COMMENT)

Carsten Bormann <cabo@tzi.org> Sun, 24 March 2019 13:16 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <154258532777.2578.8591857608340815014.idtracker@ietfa.amsl.com>
Date: Sun, 24 Mar 2019 14:16:03 +0100
Cc: The IESG <iesg@ietf.org>, cbor@ietf.org, barryleiba@computer.org, cbor-chairs@ietf.org, draft-ietf-cbor-cddl@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <FC14236A-FF4B-422A-AFF3-0BA0ED40CB25@tzi.org>
References: <154258532777.2578.8591857608340815014.idtracker@ietfa.amsl.com>
To: Benjamin Kaduk <kaduk@MIT.EDU>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/ZCKAKbENcet4E6CJaQ50m-52b8g>
Subject: Re: [Cbor] Benjamin Kaduk's No Objection on draft-ietf-cbor-cddl-05: (with COMMENT)
Precedence: list

Hi Benjamin,

apologies for sitting on this for so long; it was masked for me by a thread on one of the sub-issues.  I’m preparing a -08 right now to capture some of the finer points raised; this lives in https://github.com/cbor-wg/cddl/pull/28 and in https://cbor-wg.github.io/cddl/prepare-08/ — the plan is to submit this in the next hours as -08.

> On Nov 19, 2018, at 00:55, Benjamin Kaduk <kaduk@MIT.EDU> wrote:
> 
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-cbor-cddl-05: No Objection
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-cbor-cddl/
> 
> 
> 
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> Thanks for updating the editor's copy pursuant to the secdir review!
> 
> As I was reading, I wondered about potential confusion between a numerical
> value and the corresponding text string when used as a keytype, especially
> for barewords.  The bareword ABNF requires a leading EALPHA, which should
> force the right parsing, while the memberkey ABNF still allows literal
> values to be used as keys.  I do wonder, though, if the 'id' ABNF's
> limitations on textual names (i.e., strings that could be interpreted as
> numbers are disallowed) should be mentioned in the main text as how
> disambiguation is enforced in general.

For my taste, this is already covered by

   This is actually a complete example: an identifier that is followed
   by a colon can be directly used as the text string for a member key
   (we speak of a "bareword" member key), as can a double-quoted string
   or a number.

(There indeed is an assumption that the reader is vaguely familiar with programming language concepts such as “identifiers”.)

> It's a little weird to use PersonalData as an example, given the privacy
> considerations inherent in storing personal data, but I guess this is not
> really a flaw in the spec.

Textbooks that introduce data description formats tend to use these examples; in hindsight you are right that we might be a bit off-script here.

> Section 1
> 
> Nit: bullet (G3) lacks grammatical parallelism with its sibling bullets;
> something like “Be able to" would restore parity.

Fixed in -08.

> Section 2
> 
>   1.  Instead of defining all four types of composition in CDDL
>       separately, or even defining one kind for arrays (vectors and
>       records) and one kind for maps (tables and structs), there is
>       only one kind of composition in CDDL: the _group_ (Section 2.1).
> 
> This perhaps reads a bit strongly, as we do go on to define syntactic sugar
> for arrays and maps, even though they build on the shared group
> abstraction.

The syntactic sugar essentially encapsulates groups, so I think the statement is appropriate.

> Section 2.1
> 
>   Note that the (curly) braces signify the creation of a map; the
>   groups themselves are neutral as to whether they will be used in a
>   map or an array.
> [...]
>   Note that the lists inside the braces in the above definitions
>   constitute (anonymous) groups, while "identity" is a named group.
> 
> I might add another sentence in one of these places foreshadowing the
> behavior that groups are "macro-like" the sense that when used in the
> description of another group, their contents are siblings of the elements
> that are new in the other group, as opposed to being part of a nested
> structure.

-08 is now saying:

   Note that the lists inside the braces in the above definitions
   constitute (anonymous) groups, while `identity` is a named group,
   which can then be included as part of other groups (anonymous as in
   the example, or themselves named).

It is a bit hard to explain the splicing nature of that (splicing as in ,@) without confusing the people who would never expect there to be a non-splicing version (as in ,); I think the context of the example should make this clear enough.

> 
> Section 3.1
> 
>   o  CDDL uses UTF-8 [RFC3629] for its encoding.
> 
> It's pretty rare for it to be sufficient to just say "UTF-8" in a technical
> spec; what kind of internationalization review has been done?  Do we need
> to specify anything about normalization or canonicalization?

We are not doing any of that, but could say explicitly so.  So -08 now says:

  * CDDL uses UTF-8 {{RFC3629}} for its encoding.  Processing of CDDL
    does not involve Unicode normalization processes.

> Section 3.5.1
> 
>   The "struct" usage of maps is similar to the way JSON objects are
>   used in many JSON applications.
> 
>   A map is defined in the same way as defining an array (see
>   Section 3.4), except for using curly braces "{}" instead of square
>   brackets "[]".
> 
> Taken together, these paragraphs read as if (1) a struct is a type of map,
> and (2) a map uses curly brackets.  But the following example shows a struct
> as enclosed within square brackets.  Where am I going wrong?

The text is indeed wrong.  Sorry for not noticing this after all these editing rounds.
The sentence introducing the example now reads:

   The following is an example of a record with a structure embedded:

And the sentence after it:

  When encoding, the Geography record is encoded using a CBOR array with
  two members (the keys for the group entries are ignored), whereas the
  GpsCoordinates structure is encoded as a CBOR map with two key/value
  pairs.

> 
>         GpsCoordinates = {
>           longitude      : uint,            ; multiplied by 10^7
>           latitude       : uint,            ; multiplied by 10^7
>         }
> 
> It is perhaps irresponsible to include an example that does not specify the
> units of the measurement (e.g., degrees or radians).

Yes, 

GpsCoordinates = {
  longitude      : uint,            ; degrees, scaled by 10^7
  latitude       : uint,            ; degreed, scaled by 10^7
}

is better.  Now in -08.

> Section 3.8.6
> 
>   value from being sent over the wire.  This control is only meaningful
>   when the control type is used in an optional context; otherwise there
>   would be no way to express the default value.
> 
> Maybe s/express/utilize/?  That is, the ".default" control still expresses
> what the default value would be, but that information would never be used.

“make use of” now in -08.

> Section 5
> 
>   o  Where the CDDL includes extension points, the impact of extensions
>      on the security of the system needs to be carefully considered.
> 
> Would it make sense to also add guidance for judicious use of .within to
> constrain extension points?

Probably.  The level of experience with the .within construct is not yet very high, though, so maybe that is better addressed in another document after a few more years of experience.

>   Writers of CDDL specifications are strongly encouraged to value
>   simplicity and transparency of the specification over its elegance.
>   Keep it as simple as possible while still expressing the needed data
>   model.
> 
> Perhaps "simplicity of [type] constructions", since some readers may equate
> simplicity [of design] and elegance.

Maybe a better word for “simplicity” here would be “clarity”.
(The word “simple” is then used in the second sentence.)
I briefly thought about “plainness”, but “clarity” is probably closest.

> Section 6.1
> 
> I don't really understand why there's a need for distinctions based on the
> presence of an internal dot, especially given that this document does not
> define any such operators.  What would such a control operator look like?

At some point, we might want to namespace the control operators.  E.g., an SDO that needs control operators to peek into their proprietary data formats might want to define a set of operators such as, say,

.sdo.part1
.sdo.part2

if their data format has a part1 and a part2.  We might want to control such a namespacing process in a separate specification, that is why these names are essentially reserved.

> Section 7.2
> 
> It seems that RFC 4648 might need to be a normative reference given that it
> specifies how some byte string literals are interpreted in EDN.

(Indeed.  This was fixed in -07.)

> Appendix B
> 
> On first glance I wonder if some of the S should be 1*WS to avoid parsing
> ambiguities, but I did not think about it very hard.

The one ambiguity I’m aware of that has been hit in practice is:
1..2          (range)
one..two      (single identifier)
vs.
one .. two    (range)

This is warned about in Section 2.2.2.1.

Apart from that, we are assuming the usual greedy lexer for identifiers; without that, there are many constructions such as

foo=bar=int

that parse ambiguously (it doesn’t parse at all with a greedy lexer).

(The ABNF has been made workable directly in a tool by applying parse expression grammar semantics to it, which is what I would recommend to anyone using ABNF.)

We have since written Appendix A to discuss “ambiguity” vs. “prioritized choice”.

>   Note that this ABNF does not attempt to reflect the detailed rules of
>   what can be in a prefixed byte string.
> 
> Before I made it this far, I was going to note that the "bytes" definition
> seems to allow me to use a "h" or b64" prefix with "arbitrary" contents; it
> seems that an alternate construction could embody the semantic restrictions
> for such strings into the ABNF.  How bad would it be if a future update to
> this document attempted to actually reflect the "detailed rules of what can
> be in a prefixed byte string”?

I would love to write that piece of ABNF at some point, when I have ample opportunities for debugging it…  There are several examples in RFCs for doing the exercise for (padded) base64; after the insertion of a few S and removal of padding, that might work well here as well.   But nobody has done that work yet, 

> Appendix D
> 
> I can't decide if most of the "#" entries need double-quotes around them to
> parse properly as ABNF.  Is it best to think about this CBOR major/minor
> notation as an extension to standard ABNF?

(Covered in a separate reply.)

Grüße, Carsten

[Cbor] Benjamin Kaduk's No Objection on draft-iet… Benjamin Kaduk
Re: [Cbor] Benjamin Kaduk's No Objection on draft… Brian E Carpenter
Re: [Cbor] Benjamin Kaduk's No Objection on draft… Benjamin Kaduk
Re: [Cbor] Benjamin Kaduk's No Objection on draft… Carsten Bormann