Re: [Cbor] Proposal for Deterministic CBOR (dCBOR) discussion at May 17th meeting

Carsten Bormann <cabo@tzi.org> Mon, 08 May 2023 20:09 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 85220C169526 for <cbor@ietfa.amsl.com>; Mon, 8 May 2023 13:09:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.885
X-Spam-Level:
X-Spam-Status: No, score=-1.885 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5zf1zBLqnybh for <cbor@ietfa.amsl.com>; Mon, 8 May 2023 13:09:08 -0700 (PDT)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [134.102.50.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CB86EC169536 for <cbor@ietf.org>; Mon, 8 May 2023 13:09:06 -0700 (PDT)
Received: from [192.168.217.124] (p548dc0f6.dip0.t-ipconnect.de [84.141.192.246]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4QFXRz2Fk1zDCbs; Mon, 8 May 2023 22:09:03 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <CAAse2dGXh-NUvt1FpFXQk3G54vsPaEvZfrHk-YhXzYoMKO5FLA@mail.gmail.com>
Date: Mon, 08 May 2023 22:09:02 +0200
Cc: cbor@ietf.org, Wolf McNally <wolf@wolfmcnally.com>, Shannon.Appelcline@gmail.com
X-Mao-Original-Outgoing-Id: 705269342.874545-1f9dc35d32a33e2ab721a94ac5ebeec2
Content-Transfer-Encoding: quoted-printable
Message-Id: <4B379EEA-6A0A-4C35-894B-A4441D5A29E5@tzi.org>
References: <CAAse2dGXh-NUvt1FpFXQk3G54vsPaEvZfrHk-YhXzYoMKO5FLA@mail.gmail.com>
To: Christopher Allen <christophera@lifewithalacrity.com>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/xGJBGtUeJHIa4rY5xIh2ROLr4Gc>
Subject: Re: [Cbor] Proposal for Deterministic CBOR (dCBOR) discussion at May 17th meeting
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 May 2023 20:09:14 -0000

Hi Christopher,

Thank you for this proposal.

I will be on vacation on the 17th (from this Friday), so I’ll send some comments ahead on the list.

> ## Numerical Reduction
> 
> Proposal: All numbers MUST be reduced to the smallest possible representation. Failure to do so is a well-formness error.

(I don’t think we want to change the term “well-formedness”.  I’d say this is a validity error.)

I went ahead and implemented what I think this is in the cbor-dcbor gem.

  gem install cbor-dcbor

and go play.

Here are a few examples (*):

$ irb -rcbor-dcbor

>> 0.to_dcbor.hexs
=> "00"
>> 1.to_dcbor.hexs
=> "01"
>> 1.0.to_dcbor.hexs
=> "01"
>> 1.1.to_dcbor.hexs
=> "fb 3f f1 99 99 99 99 99 9a"
>> 1.5.to_dcbor.hexs
=> "f9 3e 00"

So we see the usual problems that decimal fractions only hit the short binary numbers by accident.  (Asking the application to find the closest number that is encodable as a float16, as in 
>> 1.099609375.to_dcbor.hexs
=> "f9 3c 66"
would probably require the application to care about the encoding length plus have a good understand what precision is actually desired.)

>> (2**31).to_dcbor.hexs
=> "1a 80 00 00 00"
>> (2**32-1).to_dcbor.hexs
=> "1a ff ff ff ff"
>> (2**32).to_dcbor.hexs
=> "c2 45 01 00 00 00 00"

shows the transition from 1+4-byte ints (major type 0) to 1+1+4-byte bignums (tag 2, major type 2 inside).

>> (2**48-1).to_dcbor.hexs
=> "c2 46 ff ff ff ff ff ff"
>> (2**48).to_dcbor.hexs
=> "1b 00 01 00 00 00 00 00 00"

shows the transition back from 1+1+6 bignums to 1+8 ints.

Now there is an interesting interaction with floating point numbers here:

>> (2**31).to_f.to_dcbor.hexs
=> "1a 80 00 00 00"
>> (2**31-1).to_f.to_dcbor.hexs
=> "1a 7f ff ff ff"
>> (2**32-1).to_f.to_dcbor.hexs
=> "1a ff ff ff ff"
>> (2**32).to_f.to_dcbor.hexs
=> "fa 4f 80 00 00”
>> (2**33-1).to_f.to_dcbor.hexs
=> "c2 45 01 ff ff ff ff”
>> (2**33).to_f.to_dcbor.hexs
=> "fa 50 00 00 00"
>> (2**40-1).to_f.to_dcbor.hexs
=> "c2 45 ff ff ff ff ff”
>> (2**40).to_f.to_dcbor.hexs
=> "fa 53 80 00 00"
>> (2**48-1).to_f.to_dcbor.hexs
=> "c2 46 ff ff ff ff ff ff"
>> (2**48).to_f.to_dcbor.hexs
=> "fa 57 80 00 00”
>> (2**48+1).to_f.to_dcbor.hexs
=> "1b 00 01 00 00 00 00 00 01"
>> (2**48+2**25).to_f.to_dcbor.hexs
=> "fa 57 80 00 01"

I think this needs pretty strong test vectors, but was trivial to implement.

> * This includes the reference implementation at https://cbor.me 

(Another flag for this can easily be added.)

> * Some integer values (such as 1099511627775 [0xffffffffff]) can actually be larger than BIGNUM equivalents!

Which the above PoC code correctly handles.

> Alternative: The alternative would be to mandate that floats remain unchanged, which trends away from the suggestions in §4.2 of RFC 8949 for moving data to the shortest encoding possible and has the possibility of damaging determinism, but which matches some current implementations:
>    * Floats MUST be maintained as Floats, but otherwise all numbers MUST be reduced to the smallest possible representation. Failure to do so is a well-formness error.

This rule again creates tag2/tag3 numbers for unsigned values of 2**32 to 2**48-1.

> ## Adding dCBOR Support to Flagship Implementations and Tools
> 
> Proposal: Add a "deterministic" flag to existing implementations and tools such as https://cbor.me and QCBOR.

(The code behind cbor.me already has a “deterministic” flag which should be easy to export to the GUI.)

> * Requires work to implement, test, document, and maintain

I spent some 50 minutes on this so far (including this mail) :-)

> ## Bringing the dCBOR specification under purview of the CBOR Working Group and putting it on a track to RFC status.
> […]
> 
> * Would require active participation from the CBOR WG.

Indeed, we’d need to find out who else is interested in such a set of additional rules.

Note that there is a whole universe of potential additional rules in the tags, e.g. should 0("2013-03-21T20:04:00.000Z”) be converted to 0("2013-03-21T20:04:00Z”)?  Maybe even 0("2013-03-21T20:04Z”)?
What about float vs. int for tag 1?

Grüße, Carsten


(*)  Add this convenience function:

class String
  def hexs
    bytes.map{|x| "%02x" % x}.join(" ")
  end
end

to your .irbrc to play these examples.