Re: [Cbor] Artart early review of draft-ietf-cbor-file-magic-02

Carsten Bormann <cabo@tzi.org> Wed, 04 August 2021 19:58 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F2E063A0764; Wed, 4 Aug 2021 12:58:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cKVnd5zDmbVl; Wed, 4 Aug 2021 12:58:50 -0700 (PDT)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 749CE3A0755; Wed, 4 Aug 2021 12:58:47 -0700 (PDT)
Received: from [192.168.217.118] (p548dcc89.dip0.t-ipconnect.de [84.141.204.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Gg2cM2zysz31Ls; Wed, 4 Aug 2021 21:58:43 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <162801879089.20863.15297297905691186349@ietfa.amsl.com>
Date: Wed, 04 Aug 2021 21:58:43 +0200
Cc: art@ietf.org, cbor@ietf.org, draft-ietf-cbor-file-magic.all@ietf.org
X-Mao-Original-Outgoing-Id: 649799922.914301-8982c6240db173d907880a7cda88f1c5
Content-Transfer-Encoding: quoted-printable
Message-Id: <67F388BD-ED6B-42A5-8BEA-8C49ED9774D3@tzi.org>
References: <162801879089.20863.15297297905691186349@ietfa.amsl.com>
To: Bernard Aboba <bernard.aboba@gmail.com>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/NBSIylNU1-p7lYvpvuko63o6pDc>
Subject: Re: [Cbor] Artart early review of draft-ietf-cbor-file-magic-02
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Aug 2021 19:58:54 -0000

Hi Bernard,

thank you for this thoughtful review.

> On 2021-08-03, at 21:26, Bernard Aboba via Datatracker <noreply@ietf.org> wrote:
> 
> Reviewer: Bernard Aboba
> Review result: Ready with Issues
> 
> Review of draft-ietf-cbor-file-magic-02
> Reviewer: Bernard Aboba
> Result: Ready with Issues
> 
> An overall comment. This document seems to propose more than one option in
> several cases.  I wonder whether the multiple options will turn out to be
> used in practice.  This makes me wonder if a document status of Experimental
> might be a better choice (so we could try it out and see what turns out to be
> needed), rather than BCP.

We started out with the intention of having just one way.
It turned out that the flexibility of the CBOR-sequence-based approach is needed, but also the compatibility of supporting implementations that do not support CBOR sequences.
Both will be used in practice, as only the CBOR-sequence-based approach is applicable to file formats encoding a CBOR-sequence, and only the CBOR-data-item-based approach has the compatibility with data-item-only implementations.

> Section 2.
> 
>   A magic number is ideally a unique fingerprint, present in the first
>   4 or 8 bytes of the file, which does not change when the contents
>   change, and does not depend upon the length of the file.
> 
> [BA] Why have two supported lengths? I realize that they can be distinguished,
> but having two potential lengths is likely to lead to one of them being less
> widely supported or tested.

This section provides a quick look at the requirements, or maybe at potential goals that we could set ourselves.
Applications such as file(1) that interpret magic numbers actually have significantly more flexibility than just 4 or 8 bytes.

> Section 3.2
> 
> [BA] Why support both CBOR Tag Wrapped and CBOR Tag Sequence?
> The logic is explained in Section 4, but I'm not sure I buy it:
> 
>   "The use of CBOR Tag Wrapped format is easier to retrofit to an
>   existing format with existing and unchangeable on-disk format."
> 
> [BA] Overall, it seems more likely to me that CBOR will be used for new file
> formats than being retrofitted to new ones (which would be complicated by
> backward compatibility issues). Or are there specific cases where a retrofit
> is being seriously considered?

Many CBOR-based structures are already defined, usually mainly considering the needs for interchange over protocols such as CoAP and HTTP.
The formats differ in their characteristics (e.g., we have RFC 8949 data items and RFC 8742 CBOR sequences).
But, for all of them, there is a need to store them at some point, if only for diagnostics.
To support both data items and sequences, we best have magic number formats that are appropriate for each.

Most likely, a single application (that is not like file(1)) will simply need to recognize its own file format or a few related ones.

I added "even if the program only
supports a single data item.”, as this was the main argument why the CBOR-sequence-based magic number could not cover all applications.

> NITs:
> 
> Section 3.3:
> 
> "
>   3.  The three byte CBOR byte string containing 0x42_4F_52.  When
>       encoded it shows up as "CBOR"
> 
> ...
> 
>   The third part is a constant value 0x43_42_4f_52, "CBOR".  That is,
>   the CBOR encoded data item for the three byte sequence 0x42_4f_52
>   ("BOR").  This is the data item that is tagged.
> "
> 
> [BA] The latter text is more clear than the former, since 0x42_4F_52
> is indeed "BOR". I'd suggest deleting the second sentence in 3, "When
> encoded..."

Thanks.  I deleted the sentence as suggested and clarified some more of the text.

All the changes I am proposing are in
https://github.com/cbor-wg/cbor-magic-number/pull/8

Grüße, Carsten