Re: [Cbor] WGLC review of draft-ietf-cbor-7049bis-09

Carsten Bormann <cabo@tzi.org> Sat, 07 December 2019 17:36 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 $3445.9.1$)
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <036501d5ac98$201dd490$60597db0$@augustcellars.com>
Date: Sat, 07 Dec 2019 18:36:31 +0100
Cc: draft-ietf-cbor-7049bis@ietf.org, cbor@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <CF8D72A5-5378-4550-9311-62241E28746F@tzi.org>
References: <036501d5ac98$201dd490$60597db0$@augustcellars.com>
To: Jim Schaad <ietf@augustcellars.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/gM_8uBMVnY6w7BKtrVBDqwby6RU>
Subject: Re: [Cbor] WGLC review of draft-ietf-cbor-7049bis-09
Precedence: list

Hi Jim,

Thank you for this review.

> On Dec 7, 2019, at 01:49, Jim Schaad <ietf@augustcellars.com> wrote:
> 
> Here is my WGLC review of the document.
> 
> *  In paragraph 2 of section 1, I don't understand how the first sentence
> leads into the second sentence.  The first talks about specific design
> goals, so I expected a discussion of some of those goals.  Instead I get a
> discussion of the data model.  Paragraph 3 seems to follow better from the
> first sentence of paragraph 2.

I don’t think the first sentence is meant to lead into the second sentence.  This brief introduction provides a set of facts that distinguish CBOR from previous efforts but are not necessarily interrelated.
Paragraph 3 could be moved as a second sentence of paragraph 2, but that would distract a bit from the brief explanation of what CBOR is.

> * In section 1 - paragraph 3.  I think this would fit better in section 1.1
> rather than doing the forward reference.

Section 1.1 is about the objectives behind CBOR.  Section 1 paragraph 3 is about previous formats.

If we do want to restructure this, maybe it would be worth to do a subsection “history” (or “related work”?) with the JSON material and the pointer to appendix E.

> * Section 1.2 - The terminology of Encoder and Decoder do not seem to match
> each other.  The decoder decodes a well-formed CBOR data item.  The Encoder
> generates the representation of a CBOR data item.  I assume that the decoder
> should decode the representation of a well-formed CBOR data item.  The other
> way around would be that the Encoder creates a well-formed CBOR data item.

Right.  This is one instance where we don’t properly handle the distinction between the abstract data item and its representation format/encoded data item.

This is an issue that would need to go through the whole definition section and probably much more; e.g., well-formed is a property of an instance of the representation format.  Should we go for that?

Symmetry between encoder and decoder should be easy to achieve even without such massive surgery, e.g., s/well-formed CBOR data item/well-formed encoded CBOR data item/ (maybe even without the “CBOR”).

> * Section 2.2 last paragraph.  /be considered duplicates/be considered
> duplicates, even if encoded as different major types,/   This is a belts and
> suspenders suggestion but not something I would die for.

I’m not sure I like the “if” (integer 0 and float 0.0 are always encoded as different major types).  How about “while”?

> * Section 3 - para 1 - I do not believe that the encoding is summarized in
> table 6.  This table just summarized the first byte of the encoding.  Is
> this pointing to the right table or is the text just slightly messed up?

It is pointing to the right table.  That table is indeed indexed by the first byte, but does have more information in its right column.  (It *is* a summary.)

How about:
"The encoding is
summarized in {{jumptable}}, indexed by the initial byte.”
(in PR below).

> * Section 3.2.3 - The text is silent on a chuck being of length zero.
> Should this be stated as discouraged but legal?

I don’t know why this needs to be discouraged but it certainly is not ruled out.
To this reader, chunks of length 0 need to be discussed as much as chunks of length 4711, which are also not ruled out.

> * Section 3.4.2 - Is this section and 3.4.3 supposed to be at the same level
> s 3.4.1 or are they meant to be children?

Fixed in PR below.  I’m not a big fan of subsubsubsections, so I just got rid of the section head for 3.4.1 entirely, adding the last paragraph to the introduction of Section 3.4:

Protocols using tag numbers 0 and 1 extend the generic data model
({{cbor-data-models}}) with data items representing points in time;
tag numbers 2 and 3, with arbitrarily sized integers; and tag numbers
4 and 5, with floating point values of arbitrary size and precision.

> * Section 4.1 - I don't know what 1_000_000_000 is supposed to represent.
> My first assumption is that this is decimal, but it is not completely clear
> that is correct.  

Good languages have had these for a while, but even Java 7, C#, and Python allow these underscores now…
Changed text explaining this in 1.2 slightly in the PR below:
Underscores can be added to a number solely for
readability.

> * Section 4.2.2 - Do we want to include a point in this section about saying
> that applications may also need to set rules about how the content of a tag
> is going to be expressed.  One example would be tag 6.1 where an application
> could state that all times are in UTC or that times are only expressed to
> the whole second.

I would add (in PR below):  Deterministic encoding considerations also apply to the content of tags.
Would that address your point?

> * Section 5.2 - I think that /containing a byte string or a text
> string/containing a byte string or containing a text string/  The byte
> string is an error in itself, the text string is only an issue if the
> embedded text is incorrect.

Added a comma to that (in PR below).

> * Section 9.2 - Are you intending to update the template to point to this
> document rather than saying with RFC 7049?  This is not stated here.

Not sure about updating the template, but the registries should indeed be updated to this specification.

Added at the end of Section 9 in PR below:

\[To be removed by RFC editor:]
IANA is requested to update these registries to point to the present
document instead of RFC 7049.

> * Appendix B - Is there a reason why the first entry is Integer rather than
> Unsigned Integer?

Not really (well, 0 to 23 are unsigned even if we don’t say so :-).
Fixed in PR below.

> * Pseudo code - Should there be a break after case 7  (switch (mt))?  Having
> it would be cleaner

It is cleaner if we plan to update this code later.
I don’t think we do…
(And I don’t think we need another line in that code.)

> * Appendix F -  This is missing some things which seem to be rather
> important.
> - New method to sort maps
> - Lots of work on describing the CBOR Data Models
> - Serialization considerations  + Preferred - Canonical
> - Change in how validation/validity is described
> - Appendix G

Indeed.
I think we need to do a separate PR for the list of changes.

> * Appendix G - /exexactly/exactly/

Fixed in PR #144 (already merged).

I have prepare a pull request with the changes that I already know what I would propose to make from these comments:

https://github.com/cbor-wg/CBORbis/pull/149

Of course, the questions raised above could lead to more changes.

Grüße, Carsten

[Cbor] WGLC review of draft-ietf-cbor-7049bis-09 Jim Schaad
Re: [Cbor] WGLC review of draft-ietf-cbor-7049bis… Laurence Lundblade
Re: [Cbor] WGLC review of draft-ietf-cbor-7049bis… Carsten Bormann
Re: [Cbor] WGLC review of draft-ietf-cbor-7049bis… Jim Schaad
Re: [Cbor] WGLC review of draft-ietf-cbor-7049bis… Carsten Bormann