[Cbor] CBOR magic number, file format and tags

Michael Richardson <mcr+ietf@sandelman.ca> Wed, 20 January 2021 23:56 UTC

From: Michael Richardson <mcr+ietf@sandelman.ca>
To: cbor@ietf.org
CC: cose <cose@ietf.org>
Date: Wed, 20 Jan 2021 18:56:01 -0500
Subject: [Cbor] CBOR magic number, file format and tags
Hi, I was thinking about this yesterday too, and after the discussion this
morning at COSE, I wrote:


which is at:

# Introduction

Since very early in computing, operating systems have sought ways to mark
which files could be proposed by which programs.

For instance, the Unix file(1) command, which has existed since 1973
({{file}}), has been able to identify many file formats for decades.


As CBOR becomes a more and more common encoding for artifacts, identifying
them as CBOR is probably not useful.

This document provides a way to encode a magic number into the beginning of a CBOR format file.
Two options are presented, with the intention of standardizing only one.

These proposals are invasive to how CBOR protocols are written to disk, but in both cases, the
proposed envelope does not require that the tag be transfered on the wire.

Some protocols may benefit from having such a magic on the wire if they
presently using a different (legacy) encoding scheme, and need to determine
before invoking a CBOR decoder if the sender is using the legacy scheme, or the new CBOR scheme.

# Requirements for a Magic Number

A magic number is ideally a unique fingerprint, present in the first 4 or 8 bytes of the file,
which does not change when the content change, and does not depend upon the length of the file.

Less ideal solutions have a pattern that needs to be matched, but in which some bytes need to be ignored.

# Proposal One

This proposal uses a CBOR Array of size two.
The first byte is therefore 0b100_00010 (0x82).

Array element number one is a CBOR integer in the range 0x80000000 to 0xffffffff.
This number is the magic number described below in {{magictable}}

For a magic number 0x87654321, this results in a total of a six byte sequence:

  0b100_00010 0b000_11010 0x87 0x65 0x43 0x21

Array element number two is whatever the original CBOR content is supposed to be.
Due the array construct with known size, there is no further syntax required.

# Proposal Two

This proposal uses a CBOR Sequence {{!RFC8742}}.

Array element number one is a CBOR integer in the range 0x80000000 to 0xffffffff.
This number is the magic number described below in {{magictable}}

For a magic number 0x87653412, this results in a total of a five byte sequence:

  0b000_11010 0x87 0x65 0x34 0x12

This is followed by one or more CBOR data items of whatever type was intended.

... and then some variations.

(I probably had more important work I should have been doing)

