Re: [Cbor] [COSE] CBOR magic number, file format and tags

worley@ariadne.com Fri, 12 February 2021 03:27 UTC

Return-Path: <worley@alum.mit.edu>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 29A273A1143 for <cbor@ietfa.amsl.com>; Thu, 11 Feb 2021 19:27:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.984
X-Spam-Level:
X-Spam-Status: No, score=-0.984 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcastmailservice.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3Kn8GHjoF9ai for <cbor@ietfa.amsl.com>; Thu, 11 Feb 2021 19:27:12 -0800 (PST)
Received: from resqmta-ch2-03v.sys.comcast.net (resqmta-ch2-03v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4BCA53A1142 for <cbor@ietf.org>; Thu, 11 Feb 2021 19:27:12 -0800 (PST)
Received: from resomta-ch2-11v.sys.comcast.net ([69.252.207.107]) by resqmta-ch2-03v.sys.comcast.net with ESMTP id AP3PlDC9YzQhpAP6slV1D7; Fri, 12 Feb 2021 03:27:10 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20180828_2048; t=1613100430; bh=zVLJux7sJtNZF/WTIMfbSxFPVm1tBgJ5MJsl4NOrCpU=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=GEGmjuUXrX7qc/Cu6dmztmtDgePXjmYFzKAbrVosFOhgDxN1pQItBpmi4medm3koi /+jWU4jpXpcY9VEes63Zh3tmL6Vyb3wTRWvBGLMUy0jeiAx/DhzrHBYiFYpJXyEg7z zntVN13tBQxhLtTA9T/W/yJOSZLHw+2jPFbpAZeFqSGuJjdOr0o7E8tET8uXZ07rXG mWVo4szTWJhei3ehemKleN6BDLvh9nEfH3FxXME8VrDKQ16/KBR7AyfjCeKK4NcQBx hxC8sxBsXaCc6JqifQNYWjJsjjhrDntl177FbGqFHOnrPuqkr4eVaFO41LeDbsgWpr kgPoAYdagxN3Q==
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4a00:430:222:fbff:fe91:d396]) by resomta-ch2-11v.sys.comcast.net with ESMTPA id AP6nlLytr1sLKAP6olEAVN; Fri, 12 Feb 2021 03:27:08 +0000
X-Xfinity-VMeta: sc=-100.00;st=legit
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id 11C3R4SA010522; Thu, 11 Feb 2021 22:27:04 -0500
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id 11C3R3jD010512; Thu, 11 Feb 2021 22:27:03 -0500
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com (Dale R. Worley)
To: Carsten Bormann <cabo@tzi.org>
Cc: mcr@sandelman.ca, cbor@ietf.org, doug@ewellic.org
In-Reply-To: <3DD6CB17-103F-48BF-A4EF-B2AEF1573C93@tzi.org> (cabo@tzi.org)
Sender: worley@ariadne.com (Dale R. Worley)
Date: Thu, 11 Feb 2021 22:27:02 -0500
Message-ID: <87h7miotix.fsf@hobgoblin.ariadne.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/Lwit786EUBatnOe1MNNId-9PuPw>
Subject: Re: [Cbor] [COSE] CBOR magic number, file format and tags
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Feb 2021 03:27:14 -0000

Carsten Bormann <cabo@tzi.org> writes:

>> On 2021-01-23, at 04:14, Dale R. Worley <worley@ariadne.com> wrote:
>> 
>> Here's an alternative.  It is aligned with the magic number tag of CBOR
>> itself (55799) and the CBOR way of doing things.  Specifically, reserve
>> a large range of tags (such as 55800 to 65000, or 100000 to 109999) for
>> use as magic numbers; the CBOR object has the appropriate magic number
>> tag applied to that, and optionally, the CBOR magic number tag 55799
>> applied to that.  That can leave the generic CBOR magic number visible
>> at the start of the file, immediately followed by the bytes of the
>> specific magic number for the object type.
>
> Right.  Something like this would probably make sense as an
> alternative to the primary approach proposed, which is based on CBOR
> sequences.  If you want to keep the magic-numbered file a single data
> item, this is the way to go.
>
> Two issues:
>
> - This should probably be a 1+4-byte tag.  The range you mention leads
> to a zero byte in the magic number.  Not sure if that is a bug or a
> feature.  (It is also way too small.)

I'm sure that zero bytes in magic numbers are OK, as "file" allows the
byte values it tests against to be specified in hex.  And in general,
"file" processes lots of binary formats.  Of course the range(s) you
allocate for magic number tags depend on the number of different magic
numbers you anticipate.  Naively I expect 10,000 to be sufficient.
OTOOH, you can always define an additional range if you run out.

> - As I mentioned before, we don't have a good way to assign a purpose
> to a range and then let IANA do the allocation within the range.

I'm not so sure that's important.  As the CBOR Tags registry
demonstrates, you can specify to IANA how registration is to be done in
different tag number ranges; presumably the parties allowed to make the
registration in those ranges can be trusted to enforce the desired
semantics, otherwise other rules would have been specified.

The deficiency of a tag-only approach is that it doesn't allow
"paramaterized" types.  A lot of "file" entries not only discern the
overall type of the file, but various additional facts about it.  E.g.,
picking at random:

     # Digital Symphony data files
     # From: Bernard Jungen (bern8817@euphonynet.be)
     0		string	\x02\x01\x13\x13\x13\x01\x0d\x10	Digital Symphony sound sample (RISC OS),
     >8		byte	x	version %d,
     >9		pstring	x	named "%s",
     >(9.b+19)	byte	=0	8-bit logarithmic
     >(9.b+19)	byte	=1	LZW-compressed linear
     >(9.b+19)	byte	=2	8-bit linear signed
     >(9.b+19)	byte	=3	16-bit linear signed
     >(9.b+19)	byte	=4	SigmaDelta-compressed linear
     >(9.b+19)	byte	=5	SigmaDelta-compressed logarithmic
     >(9.b+19)	byte	>5	unknown format

determines that the file is "Digital Symphone sound same" but also its
"version", its "named", and an encoding type.

Simply prefixing a "Digital Symphony CBOR object" with a tag specifying
such provides no place to hold such data.  Having the magic number
information contain a sequence makes it easy to store annotations like
that.  OTOH, presumably interesting prameters in a "Digital Symphony
CBOR object" would be stored at fixed indexes within the base object
itself.

Dale