Re: [Cbor] AD review of draft-ietf-cbor-file-magic-10

Carsten Bormann <cabo@tzi.org> Sat, 02 April 2022 17:45 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 516393A00E5; Sat, 2 Apr 2022 10:45:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pbawPuf9G24I; Sat, 2 Apr 2022 10:45:53 -0700 (PDT)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 580623A0C00; Sat, 2 Apr 2022 10:45:50 -0700 (PDT)
Received: from [192.168.217.118] (p5089ad4f.dip0.t-ipconnect.de [80.137.173.79]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4KW4Fk0sZ6zDCdR; Sat, 2 Apr 2022 19:45:46 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <HE1PR07MB4217656476550D97A1ACEA6A98E09@HE1PR07MB4217.eurprd07.prod.outlook.com>
Date: Sat, 02 Apr 2022 19:45:45 +0200
Cc: "draft-ietf-cbor-file-magic@ietf.org" <draft-ietf-cbor-file-magic@ietf.org>, "cbor@ietf.org" <cbor@ietf.org>
X-Mao-Original-Outgoing-Id: 670614345.5552469-01492a1d9368ca3183e6680fb134bb21
Content-Transfer-Encoding: quoted-printable
Message-Id: <47C46EEB-3736-403A-AA39-6989A3BD755A@tzi.org>
References: <HE1PR07MB4217656476550D97A1ACEA6A98E09@HE1PR07MB4217.eurprd07.prod.outlook.com>
To: Francesca Palombini <francesca.palombini@ericsson.com>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/HEsGTm2dw6y3CincJuBC7h_zZ_4>
Subject: Re: [Cbor] AD review of draft-ietf-cbor-file-magic-10
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Apr 2022 17:45:59 -0000

Hi Francesca,

thank you for this detailed review.

I have created a pull request with changes for the items that I think we could resolve:

https://github.com/cbor-wg/cbor-magic-number/pull/20

A few responses below, with a few questions embedded.

I haven’t done 14 to 19 yet.

Next up will be coordinating with Michael, who apparently didn’t get my messages.

Grüße, Carsten


> 4. ----
>  
>    Examples of CBOR Protocols currently under development include CoSWID
>    [I-D.ietf-sacm-coswid] and EAT [I-D.ietf-rats-eat].  COSE itself
>  
> FP: Please expand acronyms on first use (here and later - see https://www.rfc-editor.org/materials/abbrev.expansion.txt for the list of acronyms that don't need to be expanded, such as ASN.1). Also for later in the text, it would be good to add references when possible (for example, where DER or PKCS1 are mentioned).

One interesting case is PEM, where I cannot find a useful reference (PEM is really the industry term for “multi-line base64 in an envelope that has BEGIN/END and a fuzzy format/purpose indicator”).  It is also an abbreviation that according to the RFC editor needs to be expanded, but it does not mean “Privacy-enhanced mail” in the context of base64-encoded certificates any more.

>  5. -----
>  
>    A major inspiration for this document is observing the mess in
>    certain ASN.1 based systems where most files are PEM encoded; these
>  
> FP: Can we replace "mess" with a less colloquial term such as "confusion"?

My dictionary suggested “disarray” :-)
 
> 6. -----
>  
>    data item defined in [STD94].
>  
>    The term "diagnostic notation" refers to the human-readable notation
>    for CBOR data items defined in Section 8 of [RFC8949] and Appendix G
>  
> FP: (Here and later on in the doc) Please reference consistently either STD94 or RFC8949 - just use one (and fix the reference accordingly).

There is some brokenness about referencing Internet Standards in xml2rfc; I hope the workaround I found survives the RFC editor.
 
> 7. -----
>  
>    16777216) to 0xFFFFFFFF (decimal 4294967295).  It is further
>    suggested to avoid values that have an embedded zero byte in the four
>    bytes of their binary representation (such as 0x12003456).
>  
> FP: Why is it suggested to avoid embedded zero byte?

Some broken implementations use (zero-byte terminated) C strings for magic numbers, so avoiding zero bytes can avoid some pain.

➔
 It is further suggested to avoid values that have an embedded zero byte in
-the four bytes of their binary representation (such as 0x12003456).
+the four bytes of their binary representation (such as 0x12003456), as
+these may confuse implementations that treat the magic number as a C string.

> 9. -----
>  
>    Whether these two tags should be removed for specific further
>    processing is a decision made by the CBOR Protocol specification.
>  
> FP: It is not very clear to me what the sentence adds: this does not help the implementer make that decision, since it does not give some guidance about what could be a reason for removing the tags for processing. It is a "you may do that if you so wish" without any additional information.

I came up with this replacement:

A CBOR Protocol specification may specify the use of two tags only for
specific cases.  For instance, it might use them when storing the
representation in a local file or for Web access, but not when using them
in protocol messages that already provide the necessary context.
 
> 10. -----
>  
> Section 2.2.1.
>  
> FP: (Weak) suggestion to add an informative reference to senml+cbor Content Format IANA registration (so just to the IANA registry).

Right now we don’t have a way to point to specific registrations (I have suggested that to IANA earlier, but it will take some time until procedures are adjusted and it can be decided whether and how to enable that).
So this reference now points to the entire (sub-)registry.
 
> 11. -----
>  
>       0x42_4F_52 ('BOR' in diagnostic notation).
>  
> FP: Could you avoid the underscores? I don't think they are necessary and they are not hex valid syntax (and same comments for other occurrences throughout the doc)

They are certainly not necessary.
Since Ada started this convention in 1979, in many languages (e.g, Perl, Ruby, Rust, Swift, recently Java 7+, Python 3.6, but notably not C++), underscores are part of the valid syntax for numbers:
>> 0x42_4F_52
=> 4345682
>> 0x42_4F_52.to_s(16)
=> "424f52"
I think this is more of a matter of taste.

(So, weak pushback, but if others agree with the comment, the underscores are of course removed quickly.  If only we had thin spaces in RFCs...)

> 12. -----
>  
> 3.  Advice to Protocol Developers
>  
> FP: Really nitpicking here :) But could we use "Protocol Authors", to be consistent with the terminology used elsewhere in the doc?

Actually, "protocol designers" is the most used term in the current text.
We normalized to that now.
 
> 13. -----
>  
>  
>    As an example of where there was a problem with previous security
>    systems, "PEM" format certificate files grew to be able to contain
>    multiple certificates by simple concatenation.  The PKCS1 format
>  
>  
> FP: Can we add an informative reference if readers want to dive deeper into this?

If you mean informative reference to PEM, see above.
If you mean an informative reference to the, er, disarray reported here, there are lots of stackoverflow entries with people confused by this (and by the openssl conversion routines that silently drop all but the first cert).
We certainly can go searching for a better citation than those, but I don’t know whether there actually is a specification we could point to.
 

TODO section follows.


> 14. -----
>  
>    then it may be easier to explain if the Labeled CBOR Sequence format
>    is used.
>  
> FP: I am not sure I understand the meaning of the sentence.
>  
> 15. -----
>  
> Section 3
>  
> FP: This is a very useful section, however I think I haven't understood from it the main advantage from using the tag wrapped vs the labeled CBOR sequence, except the fact that "programs need to be able to skip the first item of the sequence" in order to use the labeled CBOR sequence, which to me does not seem like such a huge requirement. Is that another way to say "programs that do not support CBOR sequences"? Then it would be good to explicitely state that, IMO.
>  
> 16. -----
>  
> Section 5.1 and 5.2
>  
> FP: Please add that this is the CBOR Tags registry. Also please add a note to IANA that the existing registration should be updated to reference this document and update the fields as defined.
>  
> Additional comment: I think it is better to described the Data Item as "byte string" as for the current registration, rather than update to "tagged byte string" - that is not a CBOR Data Item per se.
>  
> 17. -----
>  
>    Semantics:
>       for each tag number NNNNNNNN, the representation of content-format
>       (RFC7252) NNNNNNNN-1668546560
>  
> FP: I think this should get a reference to https://www.iana.org/assignments/core-parameters/core-parameters.xhtml#content-formats rather than RFC7252. I also wonder how this is going to look like in the registry, but I guess IANA will tell us.
>  
> Also it is a bit confusing to use "NNNNNNNN" and not just "N" (as I read it as a decimal, and 8xN doesn't really give me any more information about the number).
>  
> 18. -----
>  
> Appendix A
>  
> FP: I have some vague memory of this being discussed during interims - so please remind me if I have forgotten, but why is this an appendix rather than in the document itself?
>  
> 19. -----
>  
> Appendix C
>  
> FP: I think a bit more text around the motivation and need for such a tag would be useful.