Re: [Cbor] đź”” WGLC on draft-ietf-cbor-7049bis-09

Carsten Bormann <cabo@tzi.org> Wed, 29 January 2020 14:49 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 43D5A1200BA; Wed, 29 Jan 2020 06:49:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IAB2H5hFLag1; Wed, 29 Jan 2020 06:49:40 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E6DFB120088; Wed, 29 Jan 2020 06:49:39 -0800 (PST)
Received: from [172.16.42.112] (p548DC4D8.dip0.t-ipconnect.de [84.141.196.216]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4875wx5H18zycq; Wed, 29 Jan 2020 15:49:37 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <A808010A-AD61-4FEA-A79F-9AB669E38B6A@ericsson.com>
Date: Wed, 29 Jan 2020 15:49:37 +0100
Cc: "cbor@ietf.org" <cbor@ietf.org>, "draft-ietf-cbor-7049bis@ietf.org" <draft-ietf-cbor-7049bis@ietf.org>, "cbor-chairs@ietf.org" <cbor-chairs@ietf.org>
X-Mao-Original-Outgoing-Id: 602002177.163592-50c527fa18733132899d821dc29b5673
Content-Transfer-Encoding: quoted-printable
Message-Id: <445FA6E3-5C29-476F-9AEB-716EAE1D8847@tzi.org>
References: <293AFF31-D0EF-45D6-9B9D-E8136481C404@ericsson.com> <A808010A-AD61-4FEA-A79F-9AB669E38B6A@ericsson.com>
To: Francesca Palombini <francesca.palombini=40ericsson.com@dmarc.ietf.org>
X-Mailer: Apple Mail (2.3608.40.2.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/obDIJJQR1rx4TkTrSl8LACNZViY>
Subject: Re: [Cbor] đź”” WGLC on draft-ietf-cbor-7049bis-09
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jan 2020 14:49:44 -0000

Hi Francesca,

here are my comments on your review.

> * Section 3.4.7
>  
> While reading this again, I realized that CBOR sequences cannot be tagged, as by definition they are not one data item. I think being able to tag CBOR sequence with the self-describing tag in the scenario described in this section would be good.

You can tag any single data item in the CBOR sequence.
Since CBOR sequences are just concatenated encoded data items, I see no easy way to add some overall information to the sequence itself.

> * Section 4.2.2
>  
> Second to last sub-bullet: "If a protocol includes a field that can express integers..."
>  
> I noted an inconcistency here with the text on preferred encoding preferring using maj types 0 and 1 (see text in section 3.4.4. "The preferred encoding of an integer...")

That section has recently been edited, and there are some new edits waiting in #165.
I hope to be able to cover this in #165.  The intention is to recommend using the preferred encoding for integers that fit into mt0/1, i.e., basic integers.  But a protocol could deviate, at the cost of requiring more work in the application and possibly the generic codec (which would need to separately handle the non-preferred case).
 
> * Section 9.5
>  
> Considering the Apps Area Working Group does not exist anymore, should the contact here being updated?

Yes, I have added this to #161.
My assumption is that the change controller simply is IESG.
I note a lot of variation in how this is handled in recent RFCs, e.g., in RFC 8628 there are registrations that have change controller IETF (OAuth extensions); all others there have IESG.

> Minor/Editorials
>  
> * Contributing
>  
> It might be good to put in a note for the RFC Editor to remove this section.

(BTW, the thing is a “note", not a section — at the same level as abstracts.)
Added in new branch “francesca-editorial”.
 
> 
> * Section 1.1, Point 2, sub-bullet 1
>  
> For readability, I would put an example of "very small amount of code" number directly in the text, in the parenthesis when mentioning class 1 constrained nodes.

What would be in that example?
I’d be very hesitant in adding much to this section, which should stay concise.

>  
> * Section 1.1, Point 4, sub-bullet 1
>  
> "and by implementation complexity maintining a lower bound" does not read correctly to me. Am I missing something?

Citing the full text:

4. The serialization must be reasonably compact, but data compactness
   is secondary to code compactness for the encoder and decoder.

    * "Reasonable" here is bounded by JSON as an upper bound in size,
      and by implementation complexity maintaining a lower bound.

This is a bound to “reasonable”: we don’t want to increase implementation complexity considerably to improve compactness.  So implementation complexity maintains a lower bound to what is reasonable to achieve for the size.  I have tried rephrasing that in the new branch.
 
> * Section 1.1, Point 5, sub-bullet 1
>  
> I would suggest to add "for example for devices of class 1" as an example of what "reasonably frugal in CPU" means.

But it means much more than that, as the next sentence explains.
Again, I’d prefer to avoid adding much complexity to this section.
 
> * Section 1.2
>  
> The term "representation format" is only used in this section twice, everywhere else "encoded data item" is used. I would go ahead and remove this formulation, and only use encoded data item. This would also make the part on decoder and encoder more symmetric (right now Decoder talks about "encoded data item" and Encoder talks about "represetnation format").

While this would simplify things, it would also increase the reliance on a term that the reader is still trying to understand at this point (essentially increasing circularity).  “Representation format” should be sufficiently generic for introductory text, which no need for definitions.
 
> * Section 1.2, "Valid: "
>  
> This paragraph talks about "semantic restrictions that apply to CBOR data items"; it would be good to add a hint on where these are defined in the specification.

Right.  That would be a reference to Section 5.3, I’d assume, not to all ~ 81 places that talk about validity?
 
> * Section 2
>  
> "A simple value, identified by a number between 0 and 255, but distinct from that number"
>  
> I had to read this sentence several times to understand that the part "but distinct from that number" is meant to note to the reader that the value of the item is not the number it's identified by. I would formulate as written here, rather than as it is now. ("Note that the value of the item is not the number it's identified by")

Well, we need to define it, not just adding notes.
Hmm.
 
> * Section 3.1, Major type 4
>  
> The text states that arrays can also be called sequences. With the publication of CBOR Sequences, can we remove this statement, as sequences are (although related) different things?

Good point.
CBOR sequences are, but arrays are still called sequences in many other places (which might include the applications sitting on top of CBOR).
Need to think about a better way to say “are also called”.
Attempt in new branch.
 
> * Section 3.1, Major type 5
>  
> Using underscoring to highlight a term (in this case "pairs") should be explained in terminology.

Solved by RFCXMLv3 :-)
(Still needed for the plain text version, as is the equivalent for typeset versions.)
 
> * Section 3.1, Major type 6
>  
> To be consistent with other major types, it might be good to shortly mention ranges for tags here.

Good point!
 
> * Section 3.2
>  
> While the motivation for arrays and maps is obvious, I would have appreciated some more text on motivation (or an example of use case) where indefinite-length strings are useful.

That would be an expansion of calling out “streaming”, right?
Attempt is in the branch.

> * Section 3.2.3
>  
> I don't understand the link between the previous sentence and this one:
>  
> "   Note that this implies that the bytes of a single UTF-8 character
>    cannot be spread between chunks: a new chunk can only be started at a
>    character boundary."
>  
> Nor am I sure of the meaning of the term "spread" here.

All component strings need to be valid, i.e., sequences of UTF-8 characters; this means a single character cannot be started in one component string and then go on in the next component string.
"split up" may be better.

>  
> * Section 3.3
>  
> Re-appearance of the term "sequence", which I would still avoid.

Yes.  We also wanted to avoid “byte string”, because that is always confusing (in particular if the data item encodes a byte string).  Maybe “sequence of bytes”?  Or maybe:

For example, assume an encoded data item consisting of the bytes:


>  
> * Table 4
>  
> I would have explicitely stated what data items where allowed for each tag number, rather than writing multiple.

We could do that.  There are two cases: Essentially anything (21–23, 55799), and the numbers allowed by tag 1; for the latter we could write “integer or float”, and “(any)” for the former.

> * Section 3.4.4
>  
> The term "preferred encoding" appears here for the first time without any reference or introduction.

(I think this is now 3.4.3.)
The next now has a pointer to Section 4.1, and we are using “preferred serialization”.

>  
> * Section 3.4.5
>  
> "while the mantissa also can be" -> "while the mantissa can also be"

Yes.
 
> * Section 3.4.5
>  
> Expand NaN on first appearance here (instead of 5.6.1)

#165 now has a mention (and expansion) in the terminology section.
We could expand here, as well, I’d probably leave the expansion in 5.6.1. because there is not much context about non-finites here, while there is in 3.4.5.
 
> * Section 4.2.2
>  
> "may want to exclude them from interchange, interchanging"
>  
> I would reformulate this.

Because it is wrong, misleading, or because the same word is used twice?

If the latter,
"may want to exclude them from the protocol format, interchanging"
maybe?
 
> * Section 4.2.3
>  
> Capitalize section title

I’m sure the RFC editor will have a lot of these.
Fixed in the branch now.
 
> * Section 4.2.3
>  
> First paragraph: please add a reference to 4.2.1 when talking about core deterministic encoding requirements.

Yes.
 
> * Section 5.4
>  
> "A generic encoder also may want" -> "A generic encoder may also want"

Yes.
 
> * Section 5.6
>  
> "Duplicate keys are also prohibited by CBOR decoders that
>    enforce validity (Section 5.4)."
>  
> I have a slight problem with the term "prohibited by" decoders... Decoders do not prohibit, at most they do not accept.

Good point.  So let’s say “not accepted”.

>  
> * Section 5.6
>  
> "except to specify that some, orders are disallowed" -> remove comma

(Fixed previously.)

>  
> * Section 7.1, last sub-bullet
>  
> Please reference section 7.2.

Yes.  (Let’s see whether the RFC editor throws that out again…)

>  
> * Section 8.1
>  
> I like examples. I would have liked an example for the second paragraph of this section.

Good point.
 
> * Appendix G
>  
> "this may not be actually be an error" -> "this may not actually be an error"

Yes.

Now PR #166: https://github.com/cbor-wg/CBORbis/pull/166

GrĂĽĂźe, Carsten