[Cbor] dCBOR moving from numerically-typeless systems

Wolf McNally <wolf@wolfmcnally.com> Sun, 12 March 2023 08:21 UTC

Return-Path: <wolf@wolfmcnally.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5BDAFC14CE31 for <cbor@ietfa.amsl.com>; Sun, 12 Mar 2023 00:21:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wolfmcnally-com.20210112.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O5LeCnDXVu3l for <cbor@ietfa.amsl.com>; Sun, 12 Mar 2023 00:21:21 -0800 (PST)
Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 95540C14CF17 for <cbor@ietf.org>; Sun, 12 Mar 2023 00:21:21 -0800 (PST)
Received: by mail-oi1-x22b.google.com with SMTP id q15so7410647oiw.11 for <cbor@ietf.org>; Sun, 12 Mar 2023 00:21:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wolfmcnally-com.20210112.gappssmtp.com; s=20210112; t=1678609280; h=to:date:message-id:subject:mime-version:from:from:to:cc:subject :date:message-id:reply-to; bh=RgyYEg3CRHPkkg3VMrniCE522QOwH658ujTiR9ytwV0=; b=JIe3PItNZUUgFkhC6DVBUyTFpf0SnK7tMaOuk6XYpc8SdsNPoXH9CIfPxGFDTRyWed IhBhUkMkXjXWWWkc+FWv7mlJAvgudbEJ6lTMym+THUt+v/Xik/4dBNnHbqDVeH20eezI cHlUoPbIH314R/KWNV3SI+lCgh2IOWhgxZz85CWEZNHOSb2er4Fzujb1RirzgtCRbiS4 fCl47lTxpnyei4XjuWor9FjVo7sAevKdJ9kwg/hQ6aME0KZjjdVDSBo8E3QMVB4pn2gi +hSn/S30WOzJRErAD3+eTerjlySn3ptljDwqTE4VSC3da2UX2P0YbJQK6BwLVHUMoqKM 2umg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678609280; h=to:date:message-id:subject:mime-version:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=RgyYEg3CRHPkkg3VMrniCE522QOwH658ujTiR9ytwV0=; b=tLJGDVlVF+83fvyJ7ipkGDMwaJyDo2hMTHkV5f+iRQQeTNvdnALGoii97aekKd0aLG OEfPaI0VzAQKUKX5M+I0XgKAmPlICvu4emo0jqmLEotADlOStpJ6bkmf/Uf5jTP5KbMQ cJXpMw33c3gS60HDkV93KvmnZD4pkhl0ZeZmUryFEpizLsLkhFhy/mIgZ99yItQ9xUbr otqiRETmls6wxP6lf+Y9syKEKn2nkVLtVGRTbodABlQl/be6TPS2qVtcCuTtaSmvNttm aBVKeTBcQVaDwaaHd0E8yovFK0MenjLlQy1tN3n7xH6CH/JXWg/bm+pUOn2SGft0gAP7 fekA==
X-Gm-Message-State: AO0yUKXwr/xC9HEb28ZUO4uaLsB1cMuhB4JzqwjjwzZ8+CmgIGnH5DZR 3cW6JkpJP5Ho4Q7LkA5RjwWD7Gszo+p7UJZ+pFg=
X-Google-Smtp-Source: AK7set/xeadYFUo52i11+ip+Nf4pFT6mTUDkHA35rqo23MfH3heJn+NKfpD5p7WTMITZb3DWb7c1Gg==
X-Received: by 2002:a05:6808:2805:b0:384:3a60:e2de with SMTP id et5-20020a056808280500b003843a60e2demr14180935oib.29.1678609280051; Sun, 12 Mar 2023 00:21:20 -0800 (PST)
Received: from smtpclient.apple ([185.222.243.89]) by smtp.gmail.com with ESMTPSA id h4-20020a05687003c400b0017703cd8ff6sm1926897oaf.7.2023.03.12.00.21.19 for <cbor@ietf.org> (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 12 Mar 2023 00:21:19 -0800 (PST)
From: Wolf McNally <wolf@wolfmcnally.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_1A0E62C2-DB1A-41C3-BFC8-F4F45875F39C"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\))
Message-Id: <2B1FA8CC-AD83-4E58-BE27-B6504F555694@wolfmcnally.com>
Date: Sun, 12 Mar 2023 00:21:08 -0800
To: cbor@ietf.org
X-Mailer: Apple Mail (2.3731.400.51.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/aiGvqw1-sQWJ4pXY3zzQuWwNVzE>
Subject: [Cbor] dCBOR moving from numerically-typeless systems
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Mar 2023 08:21:22 -0000

Consider the case of a dCBOR codec written in Javascript: a language that only has a “number” type and therefore makes no programmer or API-level distinction between integers and floating point values. I have the array:

var a = [10, 10.0, 0, -0.0, 0.0];

Let’s also say our application protocol specifies that positions 1, 3, and 4 in `a` MUST be encoded as floating point values and the others MUST be encoded as integers (yes this is a toy example, but someone working first in a language with a type distinction between integers and floating point values might very well make this kind of choice, which becomes binding on its adopters.)

Remember: we’re dealing with *deterministic* CBOR, where different agents encoding the same data should automatically converge on the exact same serialized representation, so if the protocol specifier says “this element is floating point,” it MUST be floating point.

Now, how shall we serialize this to dCBOR in Javascript? First of all, as soon as we execute the above line of code we have already lost the distinction between “integers” and “floating point values with no fractional part” (Javascript preserves the sign bit for zero, apparently):

console.log(a);
[10, 10, 0, -0, 0]

What if we start with JSON?

JSON.stringify(a);
[10,10,0,0,0]

Now we’ve lost everything redundant and have the canonical numeric representation I’m proposing for dCBOR.

But back to the original question. Let’s say we've insisted that that “official” dCBOR keeps CBOR’s type distinction and must therefore validate not just the numeric value but also the type of encoding (or again, it’s not deterministic.) The canonical serialization of `a` will then b:

85         # array(5)
   0A      # unsigned(10)
   F9 4900 # primitive(18688)
   00      # unsigned(0)
   F9 8000 # primitive(32768)
   F9 0000 # primitive(0)

(As an aside given the input, to me this looks like the antithesis of “deterministic.” As a further aside, it is only one byte smaller than the JSON.stringify() version.)

To create a JavaScript dCBOR API that can output this serialization it would need to afford something like this:

let cborA = new CBORArray();
cborA.appendInt(a[0]);
cborA.appendFloat(a[1]);
cborA.appendInt(a[2]);
cborA.appendFloat(a[3]);
cborA.appendFloat(a[4]);
let data = cborA.serialize();

Each element in such a `CBORArray` struct would need to be a type that keeps track of whether it is to be serialized as a floating point value or an integer.

The problem is even worse on deserialization, as validation would require passing the decoder some sort of schema, or require manual validation that the elements of `a` were properly encoded as integers or floats.

On the other hand, if we encode the canonical representation, which both JSON.stringify() and our I-D agree upon, we get:

85    # array(5)
   0A # unsigned(10)
   0A # unsigned(10)
   00 # unsigned(0)
   00 # unsigned(0)
   00 # unsigned(0)
 
And to produce this result in Javascript, the API is minimal:

let data = CBOR.serialize(a);

This would be portable to any other CBOR or conformant dCBOR implementation, where every element of `a` could be successfully extracted as Double regardless of how it was serialized.

~ Wolf