Re: [Cbor] Proposal for Deterministic CBOR (dCBOR) discussion at May 17th meeting
Wolf McNally <wolf@wolfmcnally.com> Mon, 08 May 2023 22:09 UTC
Return-Path: <wolf@wolfmcnally.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9758DC16953A for <cbor@ietfa.amsl.com>; Mon, 8 May 2023 15:09:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.892
X-Spam-Level:
X-Spam-Status: No, score=-1.892 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wolfmcnally-com.20221208.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1VbXn5ZiyRR6 for <cbor@ietfa.amsl.com>; Mon, 8 May 2023 15:09:19 -0700 (PDT)
Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2EBF2C16B5A5 for <cbor@ietf.org>; Mon, 8 May 2023 15:09:19 -0700 (PDT)
Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-64395e741fcso5128399b3a.2 for <cbor@ietf.org>; Mon, 08 May 2023 15:09:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wolfmcnally-com.20221208.gappssmtp.com; s=20221208; t=1683583758; x=1686175758; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=hGTW2jlqZ9VTvTYw1wl3NhdzFaU/4IOkdHrnTHlar5o=; b=uPQBbNvy+iM8aeg8pLbXIWZEplmYJHSr8dxVxxUWXFYF9kgGyEDd+B4NoQfBbS3zsR 8zWFPzA8D/WMy7QBYMMMI8AgwbBhbOcdJoBWGVTSkeRYUSsyDTC6JRFs8T/7voniFYTg OlG8JZgt1d7S1U3+NgP5YdoJWMqOp3BVgdJNIKUQrZocrf1FgSWrqWrkfV6PiYMd1fR7 F+2P441PY+3BgMSzIVybfyrFtW/LSMjasPqCbgp5KSsQNX8dJ95ZKtvaW6poV0+TBQh8 2KNTujp63T8l50FG+d5kV33yKooz2qjR+yyHpV55Qp4y6s+MD0fJdBs/D7gu+LlsMH7S 8i8w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683583758; x=1686175758; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hGTW2jlqZ9VTvTYw1wl3NhdzFaU/4IOkdHrnTHlar5o=; b=WFRi+fMTuGdxIZKJhD6QASFuA8q+09Ynkg6DipU0ITXsukPoQE2ikScrDnecoLpnaA piVMXSH3t2z6jCzGqX0L99uOjiK5oab6qzEhK2QtFXxy+KKdt/lx7alncvJ+ZMe99/qQ YZkG4xCntTrT9RT5aRJl4WGXeD4mBzgNHr1VUSDKSayAnh0gGaU/79wum44QZh1ci8XB H59IMnkMQpVD3Eh3ZEp0H7waz+neElryp9sbL/x4fWS1R058OdY4rOwGSG8igmDCfkJd 7NkSpKVv0G3Lj2/EkinoCbT+YOccLEoWUGy8fHfHXv2J/+TEHiQP9u1NXlVEe/2CoCA7 UmsA==
X-Gm-Message-State: AC+VfDwjTR4QNPSgLasrJK5PSLvMH6vGdue4FacetezqzSH3jDAzxEF2 YkOpmJp+C+Ri4nRBbPvns7WAfQF5YtbyPrXKVVcfYw==
X-Google-Smtp-Source: ACHHUZ6qO65zGdcZMbB9rfP40EB7vOdKP/7cVUQzCW0IqWNBdBN8rn96xMV8PeCev++0iSb1gfLGJw==
X-Received: by 2002:a05:6a00:1747:b0:644:ad29:fd5a with SMTP id j7-20020a056a00174700b00644ad29fd5amr10437823pfc.21.1683583757796; Mon, 08 May 2023 15:09:17 -0700 (PDT)
Received: from smtpclient.apple (ip70-180-193-108.lv.lv.cox.net. [70.180.193.108]) by smtp.gmail.com with ESMTPSA id s4-20020a62e704000000b006414b2c9efasm419981pfh.123.2023.05.08.15.09.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 May 2023 15:09:17 -0700 (PDT)
From: Wolf McNally <wolf@wolfmcnally.com>
Message-Id: <79A70FC4-3A72-4482-8DA2-297657812789@wolfmcnally.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_7D77E92E-8ABE-42A0-9FDE-2333264FB28A"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Date: Mon, 08 May 2023 15:09:05 -0700
In-Reply-To: <4B379EEA-6A0A-4C35-894B-A4441D5A29E5@tzi.org>
Cc: Christopher Allen <christophera@lifewithalacrity.com>, cbor@ietf.org, Shannon.Appelcline@gmail.com
To: Carsten Bormann <cabo@tzi.org>
References: <CAAse2dGXh-NUvt1FpFXQk3G54vsPaEvZfrHk-YhXzYoMKO5FLA@mail.gmail.com> <4B379EEA-6A0A-4C35-894B-A4441D5A29E5@tzi.org>
X-Mailer: Apple Mail (2.3731.500.231)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/irdmnQsLCNLunKM3MX1sE6CQgng>
Subject: Re: [Cbor] Proposal for Deterministic CBOR (dCBOR) discussion at May 17th meeting
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 May 2023 22:09:23 -0000
Carsten, Thank you for experimenting with this proposal. In recoding your tests in my Swift test harness, I realize that I left an ambiguity in the spec that you interpreted fairly the way I wrote it, but not the way I meant. I wrote: > While there is no requirement that dCBOR codecs implement support for floating point numbers, dCBOR codecs that do support them MUST reduce floating point values with no fractional part to the smallest integer value that can accurately represent it. This resulted in your tests choosing BIGNUM representations with a smaller encoded representation. What I actually meant was to choose the representation with the smallest *maximum* number of represented bits. Since BIGNUM is arbitrarily large, it is least-preferred regardless of its serialized size. The downside is that some BIGNUM representations with smaller serializations will not be used. The upside is that implementers do not have to support every size of serialization; in particular they do not have to support BIGNUM (as my Swift implementation currently does not) and in fact implementations on constrained hardware don’t even have to support 64-bit integers if they don’t need to. For the unsigned integers: UInt8: [0 ... 2^8 - 1] [0 ... 255] UInt16: [2^8 ... 2^16 - 1] [256 ... 65535] UInt32: [2^16 ... 2^32 - 1] [65536 ... 4294967295] UInt64: [2^32 ... 2^64 - 1] [4294967296 ... 18446744073709551615] BIGNUM: [2^64 ...] [18446744073709551616 ...] For the signed integers: Int8: [-2^7 … 2^7 - 1] [-128, 127] Int16: [-2^15 … 2^15 - 1] [-32768, 32767] Int32: [-2^31 … 2^31 - 1] [-2147483648, 2147483647] Int64: [-2^63 … 2^63 - 1] [-9223372036854775808, 9223372036854775807] BIGNUM: [… -2^63 - 1 U 2^63 …] [… -9223372036854775809 U 9223372036854775808 …] So none of the examples you gave for your Ruby implementation actually require BIGNUM, so according to the spec the way I meant it, they MUST NOT use it. Here are the same test cases in Swift: > XCTAssertEqual((0.0).cborData.hex, "00") > XCTAssertEqual((1).cborData.hex, "01") > XCTAssertEqual((1.0).cborData.hex, "01") > XCTAssertEqual((1.1).cborData.hex, "fb3ff199999999999a") > XCTAssertEqual((1.5).cborData.hex, "f93e00") > XCTAssertEqual((1.099609375).cborData.hex, "f93c66") > > XCTAssertEqual(pow(2.0, 31).cborData.hex, "1a80000000") > XCTAssertEqual(2147483648.cborData.hex, "1a80000000") > > XCTAssertEqual((pow(2.0, 32) - 1).cborData.hex, "1affffffff") > XCTAssertEqual(4294967295.cborData.hex, "1affffffff") > > XCTAssertEqual(pow(2.0, 32).cborData.hex, "1b0000000100000000") > XCTAssertEqual(UInt64(4294967296).cborData.hex, "1b0000000100000000") > > XCTAssertEqual((pow(2.0, 33) - 1).cborData.hex, "1b00000001ffffffff") > XCTAssertEqual(UInt64(8589934591).cborData.hex, "1b00000001ffffffff") > > XCTAssertEqual(pow(2.0, 33).cborData.hex, "1b0000000200000000") > XCTAssertEqual(UInt64(8589934592).cborData.hex, "1b0000000200000000") > > XCTAssertEqual((pow(2.0, 40) - 1).cborData.hex, "1b000000ffffffffff") > XCTAssertEqual(UInt64(1099511627775).cborData.hex, "1b000000ffffffffff") > > XCTAssertEqual(pow(2.0, 40).cborData.hex, "1b0000010000000000") > XCTAssertEqual(UInt64(1099511627776).cborData.hex, "1b0000010000000000") > > XCTAssertEqual((pow(2.0, 48) - 1).cborData.hex, "1b0000ffffffffffff") > XCTAssertEqual(UInt64(281474976710655).cborData.hex, "1b0000ffffffffffff") > > XCTAssertEqual(pow(2.0, 48).cborData.hex, "1b0001000000000000") > XCTAssertEqual(UInt64(281474976710656).cborData.hex, "1b0001000000000000") > > XCTAssertEqual((pow(2.0, 48) + 1).cborData.hex, "1b0001000000000001") > XCTAssertEqual(UInt64(281474976710657).cborData.hex, "1b0001000000000001") > > XCTAssertEqual((pow(2.0, 48) + pow(2.0, 25)).cborData.hex, "1b0001000002000000") > XCTAssertEqual(UInt64(281475010265088).cborData.hex, "1b0001000002000000") A theoretical “big float” type would similarly be less-preferred than a 64-bit floating point encoding. On another related topic, one potential downside of numerical reduction that I haven’t mentioned is that implementations that support floating point also need to have minimal support for 16-bit floating point values. By “minimal” I mean the ability to downcast 32-bit floats to 16-bit, retrieve the bit pattern, and upcast them again. Swift has a fully-featured native Float16 type, but it actually doesn’t work on the Intel family of processors, and since many Macs still use them I ended up coding a minimal implementation. Rust does not have a built-in `f16` type, but does have minimal support for it via a 3rd-party crate. Of course, I think most CBOR implementations already support 16-bit float serialization as they expect to read it. dCBOR introduces the additional requirement that they be able to write it. ~ Wolf > (2**32).to_dcbor.hexs > => "c2 45 01 00 00 00 00" > On May 8, 2023, at 1:09 PM, Carsten Bormann <cabo@tzi.org> wrote: > > Hi Christopher, > > Thank you for this proposal. > > I will be on vacation on the 17th (from this Friday), so I’ll send some comments ahead on the list. > >> ## Numerical Reduction >> >> Proposal: All numbers MUST be reduced to the smallest possible representation. Failure to do so is a well-formness error. > > (I don’t think we want to change the term “well-formedness”. I’d say this is a validity error.) > > I went ahead and implemented what I think this is in the cbor-dcbor gem. > > gem install cbor-dcbor > > and go play. > > Here are a few examples (*): > > $ irb -rcbor-dcbor > >>> 0.to_dcbor.hexs > => "00" >>> 1.to_dcbor.hexs > => "01" >>> 1.0.to_dcbor.hexs > => "01" >>> 1.1.to_dcbor.hexs > => "fb 3f f1 99 99 99 99 99 9a" >>> 1.5.to_dcbor.hexs > => "f9 3e 00" > > So we see the usual problems that decimal fractions only hit the short binary numbers by accident. (Asking the application to find the closest number that is encodable as a float16, as in >>> 1.099609375.to_dcbor.hexs > => "f9 3c 66" > would probably require the application to care about the encoding length plus have a good understand what precision is actually desired.) > >>> (2**31).to_dcbor.hexs > => "1a 80 00 00 00" >>> (2**32-1).to_dcbor.hexs > => "1a ff ff ff ff" >>> (2**32).to_dcbor.hexs > => "c2 45 01 00 00 00 00" > > shows the transition from 1+4-byte ints (major type 0) to 1+1+4-byte bignums (tag 2, major type 2 inside). > >>> (2**48-1).to_dcbor.hexs > => "c2 46 ff ff ff ff ff ff" >>> (2**48).to_dcbor.hexs > => "1b 00 01 00 00 00 00 00 00" > > shows the transition back from 1+1+6 bignums to 1+8 ints. > > Now there is an interesting interaction with floating point numbers here: > >>> (2**31).to_f.to_dcbor.hexs > => "1a 80 00 00 00" >>> (2**31-1).to_f.to_dcbor.hexs > => "1a 7f ff ff ff" >>> (2**32-1).to_f.to_dcbor.hexs > => "1a ff ff ff ff" >>> (2**32).to_f.to_dcbor.hexs > => "fa 4f 80 00 00” >>> (2**33-1).to_f.to_dcbor.hexs > => "c2 45 01 ff ff ff ff” >>> (2**33).to_f.to_dcbor.hexs > => "fa 50 00 00 00" >>> (2**40-1).to_f.to_dcbor.hexs > => "c2 45 ff ff ff ff ff” >>> (2**40).to_f.to_dcbor.hexs > => "fa 53 80 00 00" >>> (2**48-1).to_f.to_dcbor.hexs > => "c2 46 ff ff ff ff ff ff" >>> (2**48).to_f.to_dcbor.hexs > => "fa 57 80 00 00” >>> (2**48+1).to_f.to_dcbor.hexs > => "1b 00 01 00 00 00 00 00 01" >>> (2**48+2**25).to_f.to_dcbor.hexs > => "fa 57 80 00 01" > > I think this needs pretty strong test vectors, but was trivial to implement. > >> * This includes the reference implementation at https://cbor.me > > (Another flag for this can easily be added.) > >> * Some integer values (such as 1099511627775 [0xffffffffff]) can actually be larger than BIGNUM equivalents! > > Which the above PoC code correctly handles. > >> Alternative: The alternative would be to mandate that floats remain unchanged, which trends away from the suggestions in §4.2 of RFC 8949 for moving data to the shortest encoding possible and has the possibility of damaging determinism, but which matches some current implementations: >> * Floats MUST be maintained as Floats, but otherwise all numbers MUST be reduced to the smallest possible representation. Failure to do so is a well-formness error. > > This rule again creates tag2/tag3 numbers for unsigned values of 2**32 to 2**48-1. > >> ## Adding dCBOR Support to Flagship Implementations and Tools >> >> Proposal: Add a "deterministic" flag to existing implementations and tools such as https://cbor.me and QCBOR. > > (The code behind cbor.me already has a “deterministic” flag which should be easy to export to the GUI.) > >> * Requires work to implement, test, document, and maintain > > I spent some 50 minutes on this so far (including this mail) :-) > >> ## Bringing the dCBOR specification under purview of the CBOR Working Group and putting it on a track to RFC status. >> […] >> >> * Would require active participation from the CBOR WG. > > Indeed, we’d need to find out who else is interested in such a set of additional rules. > > Note that there is a whole universe of potential additional rules in the tags, e.g. should 0("2013-03-21T20:04:00.000Z”) be converted to 0("2013-03-21T20:04:00Z”)? Maybe even 0("2013-03-21T20:04Z”)? > What about float vs. int for tag 1? > > Grüße, Carsten > > > (*) Add this convenience function: > > class String > def hexs > bytes.map{|x| "%02x" % x}.join(" ") > end > end > > to your .irbrc to play these examples. >
- [Cbor] Proposal for Deterministic CBOR (dCBOR) di… Christopher Allen
- Re: [Cbor] Proposal for Deterministic CBOR (dCBOR… Carsten Bormann
- Re: [Cbor] Proposal for Deterministic CBOR (dCBOR… Wolf McNally
- Re: [Cbor] Proposal for Deterministic CBOR (dCBOR… Wolf McNally