Re: [Cbor] file-magic: "What to pick" text?

Michael Richardson <mcr+ietf@sandelman.ca> Wed, 23 June 2021 18:42 UTC

Return-Path: <mcr@sandelman.ca>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C3F5C3A0AF7 for <cbor@ietfa.amsl.com>; Wed, 23 Jun 2021 11:42:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m_6yZDuArGWP for <cbor@ietfa.amsl.com>; Wed, 23 Jun 2021 11:42:46 -0700 (PDT)
Received: from relay.sandelman.ca (relay.cooperix.net [176.58.120.209]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BCC5E3A0AEE for <cbor@ietf.org>; Wed, 23 Jun 2021 11:42:46 -0700 (PDT)
Received: from dooku.sandelman.ca (cpef81d0f835a73-cmf81d0f835a70.sdns.net.rogers.com [174.116.10.168]) by relay.sandelman.ca (Postfix) with ESMTPS id 4C3D61F455; Wed, 23 Jun 2021 18:42:43 +0000 (UTC)
Received: by dooku.sandelman.ca (Postfix, from userid 179) id 3B0E41A00E3; Wed, 23 Jun 2021 14:42:41 -0400 (EDT)
From: Michael Richardson <mcr+ietf@sandelman.ca>
To: Christian =?iso-8859-1?Q?Ams=FCss?= <christian@amsuess.com>, cbor@ietf.org
In-reply-to: <YNNdHguTxEY32yEa@hephaistos.amsuess.com>
References: <YNNdHguTxEY32yEa@hephaistos.amsuess.com>
Comments: In-reply-to Christian =?iso-8859-1?Q?Ams=FCss?= <christian@amsuess.com> message dated "Wed, 23 Jun 2021 18:11:10 +0200."
X-Mailer: MH-E 8.6+git; nmh 1.7.1; GNU Emacs 26.3
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Date: Wed, 23 Jun 2021 14:42:41 -0400
Message-ID: <400788.1624473761@dooku>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/-VX54BuvNmCyXhhAKAn8cpyFs6c>
Subject: Re: [Cbor] file-magic: "What to pick" text?
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Jun 2021 18:42:52 -0000

Christian Amsüss <christian@amsuess.com> wrote:
    > Listening to today's COSE discussion, I think that with file-magic, it'd
    > help to include some guidance on what to pick when developing a CBOR
    > based format, especially (but not only) if the content-format proposal
    > gets included:

    > * Why would an application pick a wrapped or a sequence tag?

So, how about a new section "Advice to Protocol Developers"?

    > * Are there good reasons to register a content format (for inline use)
    > *and* a tag for the above? Under which conditions should which be
    > used?

By content-format, do you mean a MIME-TYPE?
I think that there is a good reason to have a MIME-TYPE: even if a writer
thinks that their CBOR is perfect and will never be revised to be not-CBOR,
the mime-type can go in an Accept: header in order to ask for that type.

https://github.com/cbor-wg/cbor-magic-number/pull/5

# Adice to Protocol Developers

This document introduces a choice between a CBOR Sequence and a wrapped CBOR Tag.
Which should a protocol designer use?

In this discussion, one assumes that there is an object stored in a file, perhaps specified by a system operator in a configuration file.

For example: a private key used in COSE operations, a public key/certificate in C509 or CBOR format, a recorded sensor reading stored for later transmission, or a COVID vaccination certificate that needs to be displayed in QRcode form.

Both the CBOR Tag Sequence and the wrapped tag can be trivially removed by an application before sending the CBOR content out on the wire.

The CBOR Tag Sequence is a little bit easier to remove as in most cases, CBOR parsers will return it as a unit, and then return the actual CBOR item, which could be anything at all, and could include CBOR tags that *do* need to be sent on wire.

On the other hand, having the CBOR Tag Sequence in the file requires that all programs that expect to examine that file are able to skip what appears to be an empty CBOR item.
Programs which might not expect the CBOR Tag Sequence, but which would operate without a problem would include any program that expects to process CBOR Sequences from the file.

As an example of where there was a problem with previous security systems, "PEM" format certificate files grew to be able to contain multiple certificates by simple concatenation.
The PKCS1 format could also contain a private key object followed by a one or more certificate  objects: but only when in PEM format.
But, when in binary DER format, concatenation of certificates was not compatible with most programs.

The use of CBOR Tag Wrapped format is easier to retrofit to an existing format with existing and unchangeable on-disk format.
This new sequence of tags are expected to be trivially ignored by an existing program when reading CBOR from disk.
But, a naive program might also then transmit them across the network.
Removing the CBOR Tag Wrapped format requires knowledge of the two tags involved.
Other tags present might be needed.

Here are some considerations:

## Is the on-wire format new?

If the on-wire format is new, then it could be specified with the CBOR Tag Wrapped format if the extra eight bytes are not a problem.
The disk format is then identical to the on-wire format.

If the eight bytes are a problem (and they usually are if CBOR is being considered), then the CBOR Tag Sequence format should be adopted for on-disk storage.

## Can many items be trivially concatenated?

If the programs that read the contents of the file already expect to process all of the items in the file (not just the first), then the CBOR Tag Sequence format may be easily retrofitted.

The program involved may throw errors or warnings on the CBOR Tag Sequence if they have not yet been updated, but this may not be a problem.
If it is, then consideration should be given to CBOR Tag Wrapped.

If only one item is ever expected in the file, the the use of CBOR Tag Sequence may present an implementation hurdle to programs that previously just read a single value and used it.

## Are there tags at the start?

If the Protocol expects to use other tags values at the top-level, then it may be easier to explain if the CBOR Tag Sequence format is used.


-- 
Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
 -= IPv6 IoT consulting =-