Re: [Cbor] Éric Vyncke's No Objection on draft-ietf-cbor-file-magic-11: (with COMMENT)

Carsten Bormann <cabo@tzi.org> Wed, 20 April 2022 20:43 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0627D3A15E8; Wed, 20 Apr 2022 13:43:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.907
X-Spam-Level:
X-Spam-Status: No, score=-1.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pYVfKvI10HFl; Wed, 20 Apr 2022 13:43:25 -0700 (PDT)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [134.102.50.15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5964F3A1715; Wed, 20 Apr 2022 13:43:16 -0700 (PDT)
Received: from [192.168.217.118] (p5089ad4f.dip0.t-ipconnect.de [80.137.173.79]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4KkCL92zvWzDCcX; Wed, 20 Apr 2022 22:43:13 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <165037180364.2720.1812701632357176153@ietfa.amsl.com>
Date: Wed, 20 Apr 2022 22:43:13 +0200
Cc: The IESG <iesg@ietf.org>, Christian Amsüss <christian@amsuess.com>, cbor@ietf.org, cbor-chairs@ietf.org, draft-ietf-cbor-file-magic@ietf.org
X-Mao-Original-Outgoing-Id: 672180193.033794-618d63883ac3d5019671ac05eb636fc7
Content-Transfer-Encoding: quoted-printable
Message-Id: <614BB4BC-6925-4B9E-9180-10BFD44E5D55@tzi.org>
References: <165037180364.2720.1812701632357176153@ietfa.amsl.com>
To: Éric Vyncke <evyncke@cisco.com>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/tqzskUxz-IjbSDAn3XY3BkyfylU>
Subject: Re: [Cbor] Éric Vyncke's No Objection on draft-ietf-cbor-file-magic-11: (with COMMENT)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Apr 2022 20:43:39 -0000

Hi Éric,

thank you for this review.

I have collected my proposed changes based on these and other comments in 

https://github.com/cbor-wg/cbor-magic-number/pull/21

under the commit
https://github.com/cbor-wg/cbor-magic-number/pull/21/commits/21e6541

Grüße, Carsten


[…]
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-cbor-file-magic/
> 
> 
> 
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> Thank you for the work put into this document. This document describes a nice
> addition to CBOR.
> 
> Please find below some non-blocking COMMENT points (but replies would be
> appreciated even if only for my own education), and some nits.
> 
> Special thanks to Christian Amsüss for the shepherd's write-up even if the
> justification for the intended status is somehow weak (but at least present).
> 
> I hope that this helps to improve the document,
> 
> Regards,
> 
> -éric
> 
> ## General comment
> 
> I was about to ballot a DISCUSS due to the absence of BCP14 and any normative
> language in a standard track document. Explanations from the authors/WG/AD will
> be more than welcome as I am not convinced by the shepherd's explanation
> ("let's avoid down ref in the future").

I’m not sure that BCP14 language MUST be used in standards-track documents.
This specification follows the style of describing a format as it is, which is equivalent to saying how it MUST be…

The more interesting question is whether specifications of this kind need to be standards-track.  This is indeed something I would love to see guidance on.
The style “let’s approve this as Informational and then use downref to make it normative” doesn’t strike me as the right way to do things either, even though it is widely used in certain areas.

> ## Abstract
> 
> Just wondering whether the "on-disk" actually reflects the use of the Unix
> `file` command as this command also works on many other file systems / storage.

Indeed, I replaced this with “stored” already in
https://github.com/cbor-wg/cbor-magic-number/pull/21/commits/8152734

> The abstract seems to indicate that there is ONE format while the rest of the
> document is about THREE formats. Suggest to update the abstract to reflect the
> choice among 3 formats.

A choice among 3 subformats is a format :-)
But more seriously, I don’t know whether the choice is really the information that needs to stand out in the abstract.  All three subformats are rather similar (“culturally compatible”), so I do think of this as a single format that just happens to need three subvariants.

> ## Section 1
> 
> Unsure whether the comparison of Unix file with TCP/IP stream brings any value.

The underlying observation sure does to me, but it does confuse more than elucidate here.
Elided.

> Should there be a reference for "MIME type" ?

Not really, as we don’t want to reinforce this outdated terminology (but we still need to mention it, as it is way more selective for many people than the more correct one).  [Would you want to reference RFC 6101 “SSL” where the aim is to explain “TLS”?]

> Should there be a reference for "media type registration template" ?

Yes!  Referencing 6838 now.

> BTW, this section reads like a nice set of anecdotes, which is easy to read,
> but a more logical flow would be a plus for the reader.

Cleaned up some based on Pete’s comments, not sure when we’ll have reached diminishing returns.

> A reference to CBOR RFC 8742 is probably required.

Yes, that’s why there is one in a couple of places, including Section 1.
(You probably mean a specific point that we missed?)

> "magic number" should perhaps be introduced before citing it ?

Yes!  Citing the term actually also is a great idea:

  MAGIC:
    title: "archive (library) file format"
    target: https://www.bell-labs.com/usr/dmr/www/man51.pdf#page=4
    date: 1971-11-03
    author:
    - name: Dennis Ritchie
    rc: >
      in Bell Labs, Unix Programmer's Manual, First Edition: File Formats


> Unsure whether the paragraph "A major inspiration ..." brings any value to the
> text.

As Michael said, this is about his motivation for starting this work, and this particular observation certainly helped motivate me joining the effort later as well.

> ## Section 1.2
> 
> Is the 'magic number' unique per CBOR protocol or just for any CBOR protocol?
> (i.e., this definition seems really restricted to CBOR and not to `file`)
> Should it also be distinct from other 'magic numbers' in /etc/magic ?

We already added text plus a reference to MAGIC above.
Added “specific” (CBOR protocol).
The “unique” really already says that this number is intended to be distinct — it is hard to completely avoid collisions in this space though.

> ## Section 2
> 
> I am confused, and probably misread the document, but earlier I had the
> impression that THREE methods will be specified in this document and this
> section defines only TWO.

Section 1 now says about the third method:

>> This third method has been placed in {{headers}} because it is not
>> about identifying media types containing CBOR-encoded data items.


> ## Section 2.1
> 
> Should the 4-byte magic number be different than the existing 'magic numbers'
> supported by `file` ?

It generally doesn’t have to be, as it is preceded by the first four bytes of the signature d9d9f7da or d9d9f8da.  If these sequences are used in /etc/magic (/usr/share/file/magic/*), that would create a higher probability of collision, but that doesn’t seem to be the case today.

> ## Section 2.2.1
> 
> I must be really confused (and sorry about that) but the example contains a
> 0x00 which contradicts the above "avoid values that have an embedded zero byte"
> (from section 2.1).

Right.  Maybe the mapping from the 2¹⁶ content-format numbers to the 2¹⁶ tags allocated in Appendix A could be modified to follow the suggestion.

So far, simplicity won out over consideration for bad C code; base-255 arithmetic is one obvious solution but essentially would require a little program to convert the content-format number to the tag number (and some 511 more allocations).

But wait, we can snip out the “65000-65535 | Experimental use (no operational use)” part of the range — which may be the right thing to do anyway, so we could use something almost reasonable like (ct / 255 + 1) << 8 | (ct % 255 + 1) or some such.
Maybe let’s think about the mapping for a couple more days…
I’ll make a proposal.

> ## Section 3
> 
> I really love the "COVID vaccination certificate that needs to be displayed in
> QR code form" example (just use "COVID-19" perhaps) but should a reference be
> added ?

Not sure, as there are several ones, and the documents to be referenced are somewhat volatile.  Or do you have a good reference to use here?  (It wouldn't really add to the substance, but it would still feel right to have one.)

> The considerations for the protocol designers do not really fit my ideas about
> a standard track document but rather of a BCP.

There is a base protocol that is hard and fast and quite OK for standards track.
There are choices in that base protocol that do require some trade-offs; these are the considerations for protocol designers.
I seem to remember that we have a lot of standards-track protocols with choices left to the user :-) (and these are often much less well explained).

> # NITS
> 
> ## Section 1
> 
> In "two possible methods of enveloping data are presented: a CBOR Protocol
> designer will specify one", should the ":" rather be a ";" ? Anyway, the RFC
> editor will review it ;-)

Pete Resnick already made me change this :-)

https://github.com/cbor-wg/cbor-magic-number/pull/21/commits/e476afb

> ## Section 2.3
> 
> Please be consistent with the use of lower case or upper case for hexadecimal
> numbers, e.g., 0x42_4f_52

Right.

Cbor-pretty is upper-case though, and the tool I’m using to check diagnostic notation is, too (RFC 8949 is silent on this, but RFC 4648 Section 8 uses upper case).  But we can fix the 0x… constants and examples to be lower case.

Grüße, Carsten