Re: [Cbor] [COSE] CBOR magic number, file format and tags

David Waite <david@alkaline-solutions.com> Thu, 21 January 2021 01:30 UTC

Return-Path: <david@alkaline-solutions.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA3A43A167E; Wed, 20 Jan 2021 17:30:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=alkaline-solutions.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lM7xXqID8oHY; Wed, 20 Jan 2021 17:30:47 -0800 (PST)
Received: from caesium6.alkaline.solutions (caesium6.alkaline.solutions [157.230.133.164]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E43D93A167A; Wed, 20 Jan 2021 17:30:46 -0800 (PST)
Received: from authenticated-user (PRIMARY_HOSTNAME [PUBLIC_IP]) by caesium6.alkaline.solutions (Postfix) with ESMTPA id C06F2204734; Thu, 21 Jan 2021 01:30:43 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alkaline-solutions.com; s=dkim; t=1611192645; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o/0Po87PUy+jEBAFzBfNisFRpbh+j33IHvye4njSvhY=; b=Gg3YJA8JnTR1ugUrNF8UrC6JOEtEVMi3jLykDdrhS7kJc9hsqEFP9jZbSm9ZEDPMvbjcO+ wWA0hjsQjtSUtbex9Gv9dsETmkjOSgpzwmBZl1HodUDG//Fpa0Vs5BjX9bNIs4ImA9IpRU Ajs4nTQPEutkFbii+pbstdoKDo3YwMg=
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: David Waite <david@alkaline-solutions.com>
Mime-Version: 1.0
Date: Wed, 20 Jan 2021 18:30:41 -0700
Message-Id: <93533744-E76C-42C6-A0C3-F95708727A0A@alkaline-solutions.com>
References: <10306.1611186961@localhost>
Cc: cbor@ietf.org, cose <cose@ietf.org>
In-Reply-To: <10306.1611186961@localhost>
To: Michael Richardson <mcr+ietf@sandelman.ca>
Authentication-Results: caesium6.alkaline.solutions; auth=pass smtp.auth=david@alkaline-solutions.com smtp.mailfrom=david@alkaline-solutions.com
X-Spamd-Bar: /
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/Xu0Qm2dnWlRl-MrQpiW1KVWLkWk>
Subject: Re: [Cbor] [COSE] CBOR magic number, file format and tags
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jan 2021 01:30:49 -0000

There are a lot of ways to go here - for instance it could be a tag indicating a two element array, the first being an integer or a mime type

-DW

> On Jan 20, 2021, at 4:56 PM, Michael Richardson <mcr+ietf@sandelman.ca> wrote:
> 
> 
> Hi, I was thinking about this yesterday too, and after the discussion this
> morning at COSE, I wrote:
> 
>         https://datatracker.ietf.org/doc/draft-richardson-cbor-file-magic/
> 
> which is at:
>         https://github.com/mcr/cbor-magic-number
> 
> 
> # Introduction
> 
> Since very early in computing, operating systems have sought ways to mark
> which files could be proposed by which programs.
> 
> For instance, the Unix file(1) command, which has existed since 1973
> ({{file}}), has been able to identify many file formats for decades.
> 
> ...
> 
> As CBOR becomes a more and more common encoding for artifacts, identifying
> them as CBOR is probably not useful.
> 
> This document provides a way to encode a magic number into the beginning of a CBOR format file.
> Two options are presented, with the intention of standardizing only one.
> 
> These proposals are invasive to how CBOR protocols are written to disk, but in both cases, the
> proposed envelope does not require that the tag be transfered on the wire.
> 
> Some protocols may benefit from having such a magic on the wire if they
> presently using a different (legacy) encoding scheme, and need to determine
> before invoking a CBOR decoder if the sender is using the legacy scheme, or the new CBOR scheme.
> 
> # Requirements for a Magic Number
> 
> A magic number is ideally a unique fingerprint, present in the first 4 or 8 bytes of the file,
> which does not change when the content change, and does not depend upon the length of the file.
> 
> Less ideal solutions have a pattern that needs to be matched, but in which some bytes need to be ignored.
> 
> # Proposal One
> 
> This proposal uses a CBOR Array of size two.
> The first byte is therefore 0b100_00010 (0x82).
> 
> Array element number one is a CBOR integer in the range 0x80000000 to 0xffffffff.
> This number is the magic number described below in {{magictable}}
> 
> For a magic number 0x87654321, this results in a total of a six byte sequence:
> 
> ~~~~
>  0b100_00010 0b000_11010 0x87 0x65 0x43 0x21
> ~~~~
> 
> Array element number two is whatever the original CBOR content is supposed to be.
> Due the array construct with known size, there is no further syntax required.
> 
> # Proposal Two
> 
> This proposal uses a CBOR Sequence {{!RFC8742}}.
> 
> Array element number one is a CBOR integer in the range 0x80000000 to 0xffffffff.
> This number is the magic number described below in {{magictable}}
> 
> For a magic number 0x87653412, this results in a total of a five byte sequence:
> 
> ~~~~
>  0b000_11010 0x87 0x65 0x34 0x12
> ~~~~
> 
> This is followed by one or more CBOR data items of whatever type was intended.
> 
> ... and then some variations.
> 
> (I probably had more important work I should have been doing)
> 
> --
> Michael Richardson <mcr+IETF@sandelman.ca>   . o O ( IPv6 IøT consulting )
>           Sandelman Software Works Inc, Ottawa and Worldwide
> _______________________________________________
> COSE mailing list
> COSE@ietf.org
> https://www.ietf.org/mailman/listinfo/cose