Re: [Cbor] [COSE] CBOR magic number, file format and tags

Carsten Bormann <cabo@tzi.org> Thu, 21 January 2021 03:31 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DAC8F3A1704; Wed, 20 Jan 2021 19:31:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.219
X-Spam-Level:
X-Spam-Status: No, score=-4.219 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uYvoKzCYIIe0; Wed, 20 Jan 2021 19:31:22 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C70993A1701; Wed, 20 Jan 2021 19:31:21 -0800 (PST)
Received: from [192.168.217.118] (p548dc939.dip0.t-ipconnect.de [84.141.201.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4DLnx359KzzySZ; Thu, 21 Jan 2021 04:31:19 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <YAjkmwsdqw0P+gA1@meili.valhalla.31bits.net>
Date: Thu, 21 Jan 2021 04:31:19 +0100
Cc: Michael Richardson <mcr+ietf@sandelman.ca>, cbor@ietf.org, cose <cose@ietf.org>
X-Mao-Original-Outgoing-Id: 632892679.063926-904af0235a936ac889fd0474ea693078
Content-Transfer-Encoding: quoted-printable
Message-Id: <4192413D-0D60-4AFB-8897-FE2A09780E83@tzi.org>
References: <3C77CB5D-6AEA-4D70-96A2-3826DB8DAB18@island-resort.com> <10306.1611186961@localhost> <YAjT3j4cwvnLR4AA@meili.valhalla.31bits.net> <14857.1611195109@localhost> <YAjkmwsdqw0P+gA1@meili.valhalla.31bits.net>
To: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/Ap16ktmMzPOlrcqYFpGHToz2AJk>
Subject: Re: [Cbor] [COSE] CBOR magic number, file format and tags
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jan 2021 03:31:25 -0000

Challenge accepted.
CBOR diagnostic notation is quite versatile, you just have to tell cbor2diag.rb what you want.

cbor2diag.rb -t: bytes_as_text, i.e., use text form for bytes if possible.

$ printf cbor | cbor2diag.rb -t
"bor"
$ printf CBOR | cbor2diag.rb -t
‘BOR'

Oh, and, BTW:

cbor2diag.rb -e: try_decode_embedded, i.e., try embedded CBOR.

$ printf CBOR | cbor2diag.rb -te
<< 'OR' >>

(Do a “gem update” to get cbor2diag.rb version 0.5.9 for -t.)

Back to the magic number issue:

For CBOR data items, I prefer registering a tag and prefixing with that head (*) over prefixing with an array head and a string.  See, we have a magic number registry built right into CBOR...

For CBOR sequences, obviously prefixing the sequence with a CBOR-encoded (text or byte) string sounds quite good.  Making that optional (to save bytes when you don’t need it) requires that the sequence otherwise cannot start with that kind of string.

Grüße, Carsten

(*) E.g., tag 15309736 for the RAINS protocol:

>> CBOR::Tagged.new(15309736, "oooooo").to_cbor
=> "\xDA\x00\xE9\x9B\xA8foooooo"
>> CBOR::Tagged.new(15309736, "oooooo").to_cbor[2..].force_encoding(Encoding::UTF_8)
=> "雨foooooo"

Oh, for those who don’t speak Chinese:

$ trans 雨
雨
(Yǔ)

rain

Definitions of 雨
[ 简体中文 -> English ]

noun
    rain
        雨, 雨天, 霅

adjective
    rainy
        雨, 多雨的

雨
    rain


> On 2021-01-21, at 03:19, Josef 'Jeff' Sipek <jeffpc@josefsipek.net> wrote:
> 
> On Wed, Jan 20, 2021 at 21:11:49 -0500, Michael Richardson wrote:
>> 
>> Josef 'Jeff' Sipek <jeffpc@josefsipek.net> wrote:
>>> Recently, I was thinking about the fun fact that the Unicode string
>>> "bor" and the byte string "BOR" end up getting encoded as "cbor" and
>>> "CBOR", respectively.
>> 
>> obiwan-[work/richardson/cbor-file-magic](2.6.6) mcr 10355 %echo -n cbor | cbor2diag.rb
>> "bor"
>> 
>> obiwan-[work/richardson/cbor-file-magic](2.6.6) mcr 10356 %echo -n CBOR | cbor2diag.rb
>> h'424F52'
> 
> Which is hex for 'B' 'O' 'R'. :)
> 
> I'm guessing cbor2diag.rb sees the raw data and hexdumps it to be helpful.
> Here's what python has to say:
> 
> In [1]: import cbor
> 
> In [2]: cbor.loads(b'cbor')
> Out[2]: 'bor'
> 
> In [3]: cbor.loads(b'CBOR')
> Out[3]: b'BOR'
> 
> Note the difference in return type - the first is unicode string the second
> is a byte string.
> 
> Jeff.
> 
> _______________________________________________
> COSE mailing list
> COSE@ietf.org
> https://www.ietf.org/mailman/listinfo/cose