Re: [Cbor] [Technical Errata Reported] RFC7049 (6221)

Carsten Bormann <cabo@tzi.org> Mon, 06 July 2020 09:37 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <CE6DBF36-E47A-4794-B37E-367BA15C61C7@apple.com>
Date: Mon, 06 Jul 2020 11:37:38 +0200
Cc: Jim Schaad <ietf@augustcellars.com>, Paul Hoffman <paul.hoffman@vpnc.org>, cbor@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <7195076E-C5B1-461B-A2BC-B1B5E8DC2E69@tzi.org>
References: <20200704225242.3264EF406D5@rfc-editor.org> <25ADFCDD-1B4D-4A9C-87DE-780F89DC0F87@tzi.org> <CE6DBF36-E47A-4794-B37E-367BA15C61C7@apple.com>
To: Stuart Cheshire <cheshire@apple.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/W8HL0H5gxMWOeRXBI4T3M2ihnfU>
Subject: Re: [Cbor] [Technical Errata Reported] RFC7049 (6221)
Precedence: list

Hi Stuart,

(Trimming CC list)

Indeed, feedback is very welcome.

The repository is at https://github.com/cbor-wg/CBORbis, which is also where pull requests are being made that lead to future versions.  The current I-D version, for which publication has been requested, indeed is draft-ietf-cbor-7049bis-14

> A minor textual suggestion. One of the function definitions below has a space before the opening parenthesis and one does not. I would suggest making them consistent.
> well_formed (breakable = false)
> well_formed_indefinite(mt, breakable)

Fixed.

> The comments in the pseudocode use the terms “finite data item” and “finite-length chunk”, which are not used elsewhere in the RFC. Perhaps, to be consistent with the rest of the document, I suggest “definite-length item” and “definite-length chunk”.

Yes!  Fixed.

> One thing I would suggest to improve the clarity of the code would be not to overload the value zero to mean *both* that an item with major type zero was consumed (positive integer) *and* that an indefinite-length item was consumed (of any major type).
> 
> Perhaps well_formed_indefinite could return -1 when a "break" stop code is consumed, and -2 when an entire indefinite-length item is consumed.
> 
> Or maybe make well_formed_indefinite end with “return mt | 0x80” so that it *does* return the correct major type, but with the top bit set to show that it was an indefinite-length variant of that major type.

I went for 99.
(Shades of FORTRAN…)

> Also, the final comment “no break out” is a little confusing. What it really means is that this *is* the end of an indefinite-length item, so it really *does* break out of processing a complete indefinite-length item.
> 
> Maybe the comments at the end of well_formed and well_formed_indefinite (respectively) could be:
> 
> well_formed:
>     return mt;                    // definite-length data item
> 
> well_formed_indefinite:
>     return mt | 0x80;             // indefinite-length data item
> or
>     return -2;                    // indefinite-length data item

Fixed (with 99).

>> * Shouldn’t we allow indefinite strings as chunks in indefinite strings?
>> 
>> No.  An implementation that wants to put together indefinite strings from strings that are (possibly) indefinite can simply take out the brackets off the inner ones.  The actual use case for indefinite strings is “streaming”, where you just don’t know how long your string will be before you have to start sending it, called “chunking” in other contexts.  When you do that and have an indefinite string to send, it is very easy to take off the brackets (0x5f/0x7f and 0xff).
>> 
>> Having multiple ways to say the same thing can always lead to interoperability issues (and increases the cost of interoperability tests).
>> 
>> Worse, some applications or implementations will start to ascribe semantics to the presence or absence of redundant pairs of brackets.
>> 
>> Indefinite-length strings already add considerable complexity to some CBOR-consuming code; removing the ability to rely on only definit-length chunks being in there would add further complexity.
> 
> I agree with you 100% on this. The arguments you make, plus the stack usage point that Jim Schaad made, are all good.
> 
> However I didn’t find any explanation like this in draft-ietf-cbor-7049bis-14.

This is what I wrote to go into Section 3.2.3:

(Note that a decision has been made not to allow nesting
indefinite-length strings as chunks into indefinite-length strings.
This would require decoder implementations to keep a stack, or at
least a count, of nesting levels.  It is also unnecessary on the
encoder side, as the inner indefinite-length string would consist of
chunks, which can simply be put right into the outer indefinite-length
string.)

> Given that the CBOR format could trivially support nested indefinite-length strings, there may be temptation for creative implementers to “improve” CBOR by allowing this. I can imagine a discussion between engineers debating whether to do this. Having clear text in the RFC stating that this was considered and rejected, and is considered unnecessary and a bad idea, and why, would help avoid those engineering debates ending with the wrong conclusion.

I agree that it is good to be explicit about these decisions.

I turned these fixes/additions into a pull request, which I hope to be turning into -15 together with any changes form the AD review.

https://github.com/cbor-wg/CBORbis/pull/203
https://cbor-wg.github.io/CBORbis/input-stuart/draft-ietf-cbor-7049bis.html
https://tools.ietf.org/rfcdiff?url1=https://tools.ietf.org/id/draft-ietf-cbor-7049bis.txt&url2=https://cbor-wg.github.io/CBORbis/input-stuart/draft-ietf-cbor-7049bis.txt

(The rfcdiff is less useful than expected because the xml2rfc processing in the I-D submission pipeline is different from that in the version of Martin Thomson’s I-D repo template we seem to be using.)

Grüße, Carsten

Re: [Cbor] [Technical Errata Reported] RFC7049 (6… Carsten Bormann
Re: [Cbor] [Technical Errata Reported] RFC7049 (6… Jim Schaad
Re: [Cbor] [Technical Errata Reported] RFC7049 (6… Stuart Cheshire
Re: [Cbor] [Technical Errata Reported] RFC7049 (6… Stuart Cheshire
Re: [Cbor] [Technical Errata Reported] RFC7049 (6… Jim Schaad
Re: [Cbor] [Technical Errata Reported] RFC7049 (6… Carsten Bormann
Re: [Cbor] [Technical Errata Reported] RFC7049 (6… Barry Leiba