Re: [Cbor] [Technical Errata Reported] RFC7049 (6221)

Carsten Bormann <cabo@tzi.org> Mon, 06 July 2020 09:37 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1C4453A127E for <cbor@ietfa.amsl.com>; Mon, 6 Jul 2020 02:37:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z6t5tyMKVFhL for <cbor@ietfa.amsl.com>; Mon, 6 Jul 2020 02:37:41 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 339A53A127D for <cbor@ietf.org>; Mon, 6 Jul 2020 02:37:40 -0700 (PDT)
Received: from [172.16.42.112] (p5089ae91.dip0.t-ipconnect.de [80.137.174.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4B0gTZ5WlTzyhD; Mon, 6 Jul 2020 11:37:38 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <CE6DBF36-E47A-4794-B37E-367BA15C61C7@apple.com>
Date: Mon, 06 Jul 2020 11:37:38 +0200
Cc: Jim Schaad <ietf@augustcellars.com>, Paul Hoffman <paul.hoffman@vpnc.org>, cbor@ietf.org
X-Mao-Original-Outgoing-Id: 615721058.075371-98ac6aa433625019a1d2220c692960eb
Content-Transfer-Encoding: quoted-printable
Message-Id: <7195076E-C5B1-461B-A2BC-B1B5E8DC2E69@tzi.org>
References: <20200704225242.3264EF406D5@rfc-editor.org> <25ADFCDD-1B4D-4A9C-87DE-780F89DC0F87@tzi.org> <CE6DBF36-E47A-4794-B37E-367BA15C61C7@apple.com>
To: Stuart Cheshire <cheshire@apple.com>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/W8HL0H5gxMWOeRXBI4T3M2ihnfU>
Subject: Re: [Cbor] [Technical Errata Reported] RFC7049 (6221)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jul 2020 09:37:44 -0000

Hi Stuart,

(Trimming CC list)

Indeed, feedback is very welcome.

The repository is at https://github.com/cbor-wg/CBORbis, which is also where pull requests are being made that lead to future versions.  The current I-D version, for which publication has been requested, indeed is draft-ietf-cbor-7049bis-14

> A minor textual suggestion. One of the function definitions below has a space before the opening parenthesis and one does not. I would suggest making them consistent.
> well_formed (breakable = false)
> well_formed_indefinite(mt, breakable)

Fixed.

> The comments in the pseudocode use the terms “finite data item” and “finite-length chunk”, which are not used elsewhere in the RFC. Perhaps, to be consistent with the rest of the document, I suggest “definite-length item” and “definite-length chunk”.

Yes!  Fixed.

> One thing I would suggest to improve the clarity of the code would be not to overload the value zero to mean *both* that an item with major type zero was consumed (positive integer) *and* that an indefinite-length item was consumed (of any major type).
> 
> Perhaps well_formed_indefinite could return -1 when a "break" stop code is consumed, and -2 when an entire indefinite-length item is consumed.
> 
> Or maybe make well_formed_indefinite end with “return mt | 0x80” so that it *does* return the correct major type, but with the top bit set to show that it was an indefinite-length variant of that major type.

I went for 99.
(Shades of FORTRAN…)

> Also, the final comment “no break out” is a little confusing. What it really means is that this *is* the end of an indefinite-length item, so it really *does* break out of processing a complete indefinite-length item.
> 
> Maybe the comments at the end of well_formed and well_formed_indefinite (respectively) could be:
> 
> well_formed:
>     return mt;                    // definite-length data item
> 
> well_formed_indefinite:
>     return mt | 0x80;             // indefinite-length data item
> or
>     return -2;                    // indefinite-length data item

Fixed (with 99).

>> * Shouldn’t we allow indefinite strings as chunks in indefinite strings?
>> 
>> No.  An implementation that wants to put together indefinite strings from strings that are (possibly) indefinite can simply take out the brackets off the inner ones.  The actual use case for indefinite strings is “streaming”, where you just don’t know how long your string will be before you have to start sending it, called “chunking” in other contexts.  When you do that and have an indefinite string to send, it is very easy to take off the brackets (0x5f/0x7f and 0xff).
>> 
>> Having multiple ways to say the same thing can always lead to interoperability issues (and increases the cost of interoperability tests).
>> 
>> Worse, some applications or implementations will start to ascribe semantics to the presence or absence of redundant pairs of brackets.
>> 
>> Indefinite-length strings already add considerable complexity to some CBOR-consuming code; removing the ability to rely on only definit-length chunks being in there would add further complexity.
> 
> I agree with you 100% on this. The arguments you make, plus the stack usage point that Jim Schaad made, are all good.
> 
> However I didn’t find any explanation like this in draft-ietf-cbor-7049bis-14.

This is what I wrote to go into Section 3.2.3:

(Note that a decision has been made not to allow nesting
indefinite-length strings as chunks into indefinite-length strings.
This would require decoder implementations to keep a stack, or at
least a count, of nesting levels.  It is also unnecessary on the
encoder side, as the inner indefinite-length string would consist of
chunks, which can simply be put right into the outer indefinite-length
string.)

> Given that the CBOR format could trivially support nested indefinite-length strings, there may be temptation for creative implementers to “improve” CBOR by allowing this. I can imagine a discussion between engineers debating whether to do this. Having clear text in the RFC stating that this was considered and rejected, and is considered unnecessary and a bad idea, and why, would help avoid those engineering debates ending with the wrong conclusion.

I agree that it is good to be explicit about these decisions.

I turned these fixes/additions into a pull request, which I hope to be turning into -15 together with any changes form the AD review.

https://github.com/cbor-wg/CBORbis/pull/203
https://cbor-wg.github.io/CBORbis/input-stuart/draft-ietf-cbor-7049bis.html
https://tools.ietf.org/rfcdiff?url1=https://tools.ietf.org/id/draft-ietf-cbor-7049bis.txt&url2=https://cbor-wg.github.io/CBORbis/input-stuart/draft-ietf-cbor-7049bis.txt

(The rfcdiff is less useful than expected because the xml2rfc processing in the I-D submission pipeline is different from that in the version of Martin Thomson’s I-D repo template we seem to be using.)

Grüße, Carsten