Re: [Cbor] Proposal for Deterministic CBOR (dCBOR) discussion at May 17th meeting
Wolf McNally <wolf@wolfmcnally.com> Mon, 08 May 2023 22:23 UTC
Return-Path: <wolf@wolfmcnally.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C273EC14CF1B for <cbor@ietfa.amsl.com>; Mon, 8 May 2023 15:23:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.893
X-Spam-Level:
X-Spam-Status: No, score=-1.893 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wolfmcnally-com.20221208.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id goui00h-yQk6 for <cbor@ietfa.amsl.com>; Mon, 8 May 2023 15:23:45 -0700 (PDT)
Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3765CC14F75F for <cbor@ietf.org>; Mon, 8 May 2023 15:23:45 -0700 (PDT)
Received: by mail-pg1-x535.google.com with SMTP id 41be03b00d2f7-52c6504974dso4645682a12.2 for <cbor@ietf.org>; Mon, 08 May 2023 15:23:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wolfmcnally-com.20221208.gappssmtp.com; s=20221208; t=1683584624; x=1686176624; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=Cjd2JZnekJkSEtPepJOazJy5AcnbptYwOzgH/Jl3M3I=; b=VHBjeOao5HQe6BMovI55AD0jvAYuyj8P+r8zT+mq82N664MWtpCERnNd3jWaJRXcLM T+gWRq8i/Na/w3CCPMzaYRXs52wuqTX42tKr8yQTOrYGglIXHD6O39zbvps/ncdrqUWz /yxtNe3CdfzFlji3CSYb7L7fV/x6zyTIovhkyhRoWMbVeTwZDTZS50wXVo4WmtlV+Phv PMl9HR0e5V3fhjwJnQzfAg8iqTa+/tnrhHhpRTkaXqOwuDR32qBMG3dqKC7c3ej4XHhd 3nh0mnpRrg0broJn3Vy0RoDZ/OX9uzhL8iRrcGt5AIlsIWKu25ustjqdTMILAy40JP7h i49w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683584624; x=1686176624; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Cjd2JZnekJkSEtPepJOazJy5AcnbptYwOzgH/Jl3M3I=; b=cQX8tWBKQ/pCCi5MQhGnBACD1hzz0kAFAljPv2x9PS0vAgv+qfD4Q12BN+vYaxXjax dmuKsu+Aec0sqC6R9spsg1e/6yvn3XLSpj8+dpWYT8WalOxEV1+8VnFumn3Dr5G/h2vj N4GBEWZurNBqkeAGjRkJ/FTS+VKNdOZfN6zT9r+44+gShqrbS8/13AnHPbjotMMym97d oa8SqHP+bqTHnRSbi7xZyPD6DtblUPv9SMZpP8IZtIgWTmRfTEe6bsDHFERR3NOr/uDC idNjl5GHnL/fZZin6yToCBcX9VpAUFEwUnEfHV6gQx8242rBR+Fkcy7xVC9IO/Mt4j7D VTpw==
X-Gm-Message-State: AC+VfDxAgXiS0Y8c/FH6pkqqOh1GFqLyeNAE30CBRP5ajnZJFzKG8EFO ZgorDuOKEiIqpBYwpH4kOrqVcw==
X-Google-Smtp-Source: ACHHUZ42gIITF+q74qkSqAc57wQeoBCzbaX4IqVI1r6K4lLUpSW/wGqATlrIv8+pvlXsmzzHVjWSPg==
X-Received: by 2002:a17:902:d34b:b0:1aa:dd14:da98 with SMTP id l11-20020a170902d34b00b001aadd14da98mr12198959plk.50.1683584624383; Mon, 08 May 2023 15:23:44 -0700 (PDT)
Received: from smtpclient.apple (ip70-180-193-108.lv.lv.cox.net. [70.180.193.108]) by smtp.gmail.com with ESMTPSA id w10-20020a170902904a00b001a96496f250sm2498519plz.34.2023.05.08.15.23.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 May 2023 15:23:44 -0700 (PDT)
From: Wolf McNally <wolf@wolfmcnally.com>
Message-Id: <12FDF055-9A1F-4084-A62E-8DE757FE76A5@wolfmcnally.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_D9D8C641-40CA-48B0-8B09-FE255410BFFF"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
Date: Mon, 08 May 2023 15:23:32 -0700
In-Reply-To: <79A70FC4-3A72-4482-8DA2-297657812789@wolfmcnally.com>
Cc: Christopher Allen <christophera@lifewithalacrity.com>, cbor@ietf.org, Shannon.Appelcline@gmail.com
To: Carsten Bormann <cabo@tzi.org>
References: <CAAse2dGXh-NUvt1FpFXQk3G54vsPaEvZfrHk-YhXzYoMKO5FLA@mail.gmail.com> <4B379EEA-6A0A-4C35-894B-A4441D5A29E5@tzi.org> <79A70FC4-3A72-4482-8DA2-297657812789@wolfmcnally.com>
X-Mailer: Apple Mail (2.3731.500.231)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/BxYgBnc4x9uBn-Eqz_Xnpvg2ZGo>
Subject: Re: [Cbor] Proposal for Deterministic CBOR (dCBOR) discussion at May 17th meeting
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 May 2023 22:23:50 -0000
Carsten, I should also add that between the integer representations and BIGNUM is the possibility of encoding large integer values as floats with no fractional part. So for example, 2^100, while not representable as a binary integer, is representable with no loss of accuracy as a floating point value (fa71800000). This is my current edit: ## Reduction of Floating Point Values to Integers While there is no requirement that dCBOR codecs implement support for floating point numbers, dCBOR codecs that do support them MUST reduce floating point values with no fractional part to the integer value that can accurately represent it in the fewest bits. If a numeric value has a fractional part or an exponent that takes it out of the range of representable integers, then it SHALL be encoded as a floating point value. If it cannot be represented as a floating point value, then it SHALL be encoded as a BIGNUM by encoders that support them. For the unsigned integers, from most to least preferred: ~~~ UInt8: [0 ... 2^8 - 1] [0 ... 255] UInt16: [2^8 ... 2^16 - 1] [256 ... 65535] UInt32: [2^16 ... 2^32 - 1] [65536 ... 4294967295] UInt64: [2^32 ... 2^64 - 1] [4294967296 ... 18446744073709551615] Float: [2^64 ...] [18446744073709551616 ...] BIGNUM: [2^64 ...] [18446744073709551616 ...] ~~~ For the signed integers, from most to least preferred: ~~~ Int8: [-2^7 … 2^7 - 1] [-128, 127] Int16: [-2^15 … 2^15 - 1] [-32768, 32767] Int32: [-2^31 … 2^31 - 1] [-2147483648, 2147483647] Int64: [-2^63 … 2^63 - 1] [-9223372036854775808, 9223372036854775807] Float: [… -2^63 - 1 U 2^63 …] [… -9223372036854775809 U 9223372036854775808 …] BIGNUM: [… -2^63 - 1 U 2^63 …] [… -9223372036854775809 U 9223372036854775808 …] ~~~ > On May 8, 2023, at 3:09 PM, Wolf McNally <wolf@wolfmcnally.com> wrote: > > Carsten, > > Thank you for experimenting with this proposal. In recoding your tests in my Swift test harness, I realize that I left an ambiguity in the spec that you interpreted fairly the way I wrote it, but not the way I meant. > > I wrote: > >> While there is no requirement that dCBOR codecs implement support for floating point numbers, dCBOR codecs that do support them MUST reduce floating point values with no fractional part to the smallest integer value that can accurately represent it. > > This resulted in your tests choosing BIGNUM representations with a smaller encoded representation. What I actually meant was to choose the representation with the smallest *maximum* number of represented bits. Since BIGNUM is arbitrarily large, it is least-preferred regardless of its serialized size. > > The downside is that some BIGNUM representations with smaller serializations will not be used. The upside is that implementers do not have to support every size of serialization; in particular they do not have to support BIGNUM (as my Swift implementation currently does not) and in fact implementations on constrained hardware don’t even have to support 64-bit integers if they don’t need to. > > For the unsigned integers: > > UInt8: [0 ... 2^8 - 1] [0 ... 255] > UInt16: [2^8 ... 2^16 - 1] [256 ... 65535] > UInt32: [2^16 ... 2^32 - 1] [65536 ... 4294967295] > UInt64: [2^32 ... 2^64 - 1] [4294967296 ... 18446744073709551615] > BIGNUM: [2^64 ...] [18446744073709551616 ...] > > For the signed integers: > > Int8: [-2^7 … 2^7 - 1] [-128, 127] > Int16: [-2^15 … 2^15 - 1] [-32768, 32767] > Int32: [-2^31 … 2^31 - 1] [-2147483648, 2147483647] > Int64: [-2^63 … 2^63 - 1] [-9223372036854775808, 9223372036854775807] > BIGNUM: [… -2^63 - 1 U 2^63 …] [… -9223372036854775809 U 9223372036854775808 …] > > So none of the examples you gave for your Ruby implementation actually require BIGNUM, so according to the spec the way I meant it, they MUST NOT use it. Here are the same test cases in Swift: > >> XCTAssertEqual((0.0).cborData.hex, "00") >> XCTAssertEqual((1).cborData.hex, "01") >> XCTAssertEqual((1.0).cborData.hex, "01") >> XCTAssertEqual((1.1).cborData.hex, "fb3ff199999999999a") >> XCTAssertEqual((1.5).cborData.hex, "f93e00") >> XCTAssertEqual((1.099609375).cborData.hex, "f93c66") >> >> XCTAssertEqual(pow(2.0, 31).cborData.hex, "1a80000000") >> XCTAssertEqual(2147483648.cborData.hex, "1a80000000") >> >> XCTAssertEqual((pow(2.0, 32) - 1).cborData.hex, "1affffffff") >> XCTAssertEqual(4294967295.cborData.hex, "1affffffff") >> >> XCTAssertEqual(pow(2.0, 32).cborData.hex, "1b0000000100000000") >> XCTAssertEqual(UInt64(4294967296).cborData.hex, "1b0000000100000000") >> >> XCTAssertEqual((pow(2.0, 33) - 1).cborData.hex, "1b00000001ffffffff") >> XCTAssertEqual(UInt64(8589934591).cborData.hex, "1b00000001ffffffff") >> >> XCTAssertEqual(pow(2.0, 33).cborData.hex, "1b0000000200000000") >> XCTAssertEqual(UInt64(8589934592).cborData.hex, "1b0000000200000000") >> >> XCTAssertEqual((pow(2.0, 40) - 1).cborData.hex, "1b000000ffffffffff") >> XCTAssertEqual(UInt64(1099511627775).cborData.hex, "1b000000ffffffffff") >> >> XCTAssertEqual(pow(2.0, 40).cborData.hex, "1b0000010000000000") >> XCTAssertEqual(UInt64(1099511627776).cborData.hex, "1b0000010000000000") >> >> XCTAssertEqual((pow(2.0, 48) - 1).cborData.hex, "1b0000ffffffffffff") >> XCTAssertEqual(UInt64(281474976710655).cborData.hex, "1b0000ffffffffffff") >> >> XCTAssertEqual(pow(2.0, 48).cborData.hex, "1b0001000000000000") >> XCTAssertEqual(UInt64(281474976710656).cborData.hex, "1b0001000000000000") >> >> XCTAssertEqual((pow(2.0, 48) + 1).cborData.hex, "1b0001000000000001") >> XCTAssertEqual(UInt64(281474976710657).cborData.hex, "1b0001000000000001") >> >> XCTAssertEqual((pow(2.0, 48) + pow(2.0, 25)).cborData.hex, "1b0001000002000000") >> XCTAssertEqual(UInt64(281475010265088).cborData.hex, "1b0001000002000000") > > > A theoretical “big float” type would similarly be less-preferred than a 64-bit floating point encoding. > > On another related topic, one potential downside of numerical reduction that I haven’t mentioned is that implementations that support floating point also need to have minimal support for 16-bit floating point values. By “minimal” I mean the ability to downcast 32-bit floats to 16-bit, retrieve the bit pattern, and upcast them again. Swift has a fully-featured native Float16 type, but it actually doesn’t work on the Intel family of processors, and since many Macs still use them I ended up coding a minimal implementation. Rust does not have a built-in `f16` type, but does have minimal support for it via a 3rd-party crate. Of course, I think most CBOR implementations already support 16-bit float serialization as they expect to read it. dCBOR introduces the additional requirement that they be able to write it. > > ~ Wolf > > >> (2**32).to_dcbor.hexs >> => "c2 45 01 00 00 00 00" > > >> On May 8, 2023, at 1:09 PM, Carsten Bormann <cabo@tzi.org> wrote: >> >> Hi Christopher, >> >> Thank you for this proposal. >> >> I will be on vacation on the 17th (from this Friday), so I’ll send some comments ahead on the list. >> >>> ## Numerical Reduction >>> >>> Proposal: All numbers MUST be reduced to the smallest possible representation. Failure to do so is a well-formness error. >> >> (I don’t think we want to change the term “well-formedness”. I’d say this is a validity error.) >> >> I went ahead and implemented what I think this is in the cbor-dcbor gem. >> >> gem install cbor-dcbor >> >> and go play. >> >> Here are a few examples (*): >> >> $ irb -rcbor-dcbor >> >>>> 0.to_dcbor.hexs >> => "00" >>>> 1.to_dcbor.hexs >> => "01" >>>> 1.0.to_dcbor.hexs >> => "01" >>>> 1.1.to_dcbor.hexs >> => "fb 3f f1 99 99 99 99 99 9a" >>>> 1.5.to_dcbor.hexs >> => "f9 3e 00" >> >> So we see the usual problems that decimal fractions only hit the short binary numbers by accident. (Asking the application to find the closest number that is encodable as a float16, as in >>>> 1.099609375.to_dcbor.hexs >> => "f9 3c 66" >> would probably require the application to care about the encoding length plus have a good understand what precision is actually desired.) >> >>>> (2**31).to_dcbor.hexs >> => "1a 80 00 00 00" >>>> (2**32-1).to_dcbor.hexs >> => "1a ff ff ff ff" >>>> (2**32).to_dcbor.hexs >> => "c2 45 01 00 00 00 00" >> >> shows the transition from 1+4-byte ints (major type 0) to 1+1+4-byte bignums (tag 2, major type 2 inside). >> >>>> (2**48-1).to_dcbor.hexs >> => "c2 46 ff ff ff ff ff ff" >>>> (2**48).to_dcbor.hexs >> => "1b 00 01 00 00 00 00 00 00" >> >> shows the transition back from 1+1+6 bignums to 1+8 ints. >> >> Now there is an interesting interaction with floating point numbers here: >> >>>> (2**31).to_f.to_dcbor.hexs >> => "1a 80 00 00 00" >>>> (2**31-1).to_f.to_dcbor.hexs >> => "1a 7f ff ff ff" >>>> (2**32-1).to_f.to_dcbor.hexs >> => "1a ff ff ff ff" >>>> (2**32).to_f.to_dcbor.hexs >> => "fa 4f 80 00 00” >>>> (2**33-1).to_f.to_dcbor.hexs >> => "c2 45 01 ff ff ff ff” >>>> (2**33).to_f.to_dcbor.hexs >> => "fa 50 00 00 00" >>>> (2**40-1).to_f.to_dcbor.hexs >> => "c2 45 ff ff ff ff ff” >>>> (2**40).to_f.to_dcbor.hexs >> => "fa 53 80 00 00" >>>> (2**48-1).to_f.to_dcbor.hexs >> => "c2 46 ff ff ff ff ff ff" >>>> (2**48).to_f.to_dcbor.hexs >> => "fa 57 80 00 00” >>>> (2**48+1).to_f.to_dcbor.hexs >> => "1b 00 01 00 00 00 00 00 01" >>>> (2**48+2**25).to_f.to_dcbor.hexs >> => "fa 57 80 00 01" >> >> I think this needs pretty strong test vectors, but was trivial to implement. >> >>> * This includes the reference implementation at https://cbor.me >> >> (Another flag for this can easily be added.) >> >>> * Some integer values (such as 1099511627775 [0xffffffffff]) can actually be larger than BIGNUM equivalents! >> >> Which the above PoC code correctly handles. >> >>> Alternative: The alternative would be to mandate that floats remain unchanged, which trends away from the suggestions in §4.2 of RFC 8949 for moving data to the shortest encoding possible and has the possibility of damaging determinism, but which matches some current implementations: >>> * Floats MUST be maintained as Floats, but otherwise all numbers MUST be reduced to the smallest possible representation. Failure to do so is a well-formness error. >> >> This rule again creates tag2/tag3 numbers for unsigned values of 2**32 to 2**48-1. >> >>> ## Adding dCBOR Support to Flagship Implementations and Tools >>> >>> Proposal: Add a "deterministic" flag to existing implementations and tools such as https://cbor.me and QCBOR. >> >> (The code behind cbor.me already has a “deterministic” flag which should be easy to export to the GUI.) >> >>> * Requires work to implement, test, document, and maintain >> >> I spent some 50 minutes on this so far (including this mail) :-) >> >>> ## Bringing the dCBOR specification under purview of the CBOR Working Group and putting it on a track to RFC status. >>> […] >>> >>> * Would require active participation from the CBOR WG. >> >> Indeed, we’d need to find out who else is interested in such a set of additional rules. >> >> Note that there is a whole universe of potential additional rules in the tags, e.g. should 0("2013-03-21T20:04:00.000Z”) be converted to 0("2013-03-21T20:04:00Z”)? Maybe even 0("2013-03-21T20:04Z”)? >> What about float vs. int for tag 1? >> >> Grüße, Carsten >> >> >> (*) Add this convenience function: >> >> class String >> def hexs >> bytes.map{|x| "%02x" % x}.join(" ") >> end >> end >> >> to your .irbrc to play these examples. >> >
- [Cbor] Proposal for Deterministic CBOR (dCBOR) di… Christopher Allen
- Re: [Cbor] Proposal for Deterministic CBOR (dCBOR… Carsten Bormann
- Re: [Cbor] Proposal for Deterministic CBOR (dCBOR… Wolf McNally
- Re: [Cbor] Proposal for Deterministic CBOR (dCBOR… Wolf McNally