Re: [Cbor] Decoding numbers and compliance verification in dCBOR

Wolf McNally <wolf@wolfmcnally.com> Sun, 12 March 2023 07:03 UTC

Return-Path: <wolf@wolfmcnally.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F032C1522BE for <cbor@ietfa.amsl.com>; Sat, 11 Mar 2023 23:03:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.894
X-Spam-Level:
X-Spam-Status: No, score=-1.894 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wolfmcnally-com.20210112.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RgpDrD-btbtO for <cbor@ietfa.amsl.com>; Sat, 11 Mar 2023 23:03:30 -0800 (PST)
Received: from mail-oa1-x2b.google.com (mail-oa1-x2b.google.com [IPv6:2001:4860:4864:20::2b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3980EC14CF17 for <cbor@ietf.org>; Sat, 11 Mar 2023 23:03:30 -0800 (PST)
Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-17711f56136so10606486fac.12 for <cbor@ietf.org>; Sat, 11 Mar 2023 23:03:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wolfmcnally-com.20210112.gappssmtp.com; s=20210112; t=1678604609; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=K0wFpavDmj3/MsDyuzsxlah3zLDeKheOszjZmFdW9Yw=; b=uzYNAXRbQONseSgqmUaqVBuF5P9fCC+yoq8Nfwt3+PGfAyTxtXJMkmiW2+TGzw8VrL hJ3/2QHd+Kc4SfuNBPkD7NvRoTaBhfewdQtApy2Z09D9aqqxt20+EaObZCQeGb7Fl0yV 091rW+Lh0W4PBAeQhc3DRIC9MZcPhEECkNIUTrlwSw6t9xAKpAm1MAoZci1dmXcHbSsy 3nbfhqjINGwzFZH+EiAm9HjNP/h6NlrReZrrZ1Olwpmr76ARRHX6Sr7DIR88Y38Lb3om 3r36L/F0eFCYWx4FzEb/uIMrIVCdQV41np3NQjq4SQta8dgz5GbYn3RyR3shauQFJAvZ +TjA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678604609; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K0wFpavDmj3/MsDyuzsxlah3zLDeKheOszjZmFdW9Yw=; b=g11fGAgS52yDmqU85tbmL1jSX55RMh2re4BY/uRrW05PM/hcQNj2LI5ZVdjSMJ0bOs QaHWd8Xs3eG+hXYfaPjHUV4jIanTHm3SCStVfRptxwRkmSPT51NaS9VEG2LgJrDd/11Q fp6y0vLeWZ1d12Az/XeNwWMK10nCDDTuqCzl+z/VsV6n0ik9AUnbuhLv3cLfC0saK+5E VMcrK2aJx3wl+g/hzi2Eh1kwrYZ/vZMitOoJ+QXAz2YNG5mBkgvfQSlh0qtFrBKAW9bZ 5jY9px7vDqqfuryXxo5mMcHckp7QpKugSifAu6IVM8ZEm+/JFq5VAni7vIGwbCF/enp/ wfZQ==
X-Gm-Message-State: AO0yUKWUnr9Gby85VLFnk34lkfeW1RiZMWZWC2XOpHhWH7jebc2v/iJs ed20h6GUckqTzUI/DVMVyt/oP4W+AhovluK+WOo=
X-Google-Smtp-Source: AK7set/uoLxuD/jgt53c8cFHnF3P8TlC7Xj5cPiibXCAWBqPtRKOjCaRJyrbtVWKFmF8KGU2kdOMjg==
X-Received: by 2002:a05:6870:591:b0:177:8692:dcab with SMTP id m17-20020a056870059100b001778692dcabmr2851602oap.40.1678604608819; Sat, 11 Mar 2023 23:03:28 -0800 (PST)
Received: from smtpclient.apple ([185.222.243.89]) by smtp.gmail.com with ESMTPSA id x3-20020a05680801c300b0037d59e90a07sm1815066oic.55.2023.03.11.23.03.27 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 11 Mar 2023 23:03:28 -0800 (PST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\))
From: Wolf McNally <wolf@wolfmcnally.com>
In-Reply-To: <d0f82da7-77c6-86f9-f1b8-a9cd38dbc5ee@gmail.com>
Date: Sat, 11 Mar 2023 23:03:15 -0800
Cc: Laurence Lundblade <lgl@island-resort.com>, Carsten Bormann <cabo@tzi.org>, cbor@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <5A7DB347-2247-4986-92BF-A053866ADB40@wolfmcnally.com>
References: <83BF059D-BEF2-4C5F-9DE8-7A99A529833F@island-resort.com> <8999DCEA-6572-4A69-85EC-AA7AD0170837@tzi.org> <38de8a78-0140-45af-b4fb-f601265809e4@gmail.com> <09207367-8B74-434C-89B1-881780DCECA5@wolfmcnally.com> <B3E53C3A-7205-4D3F-B3A3-ED27D52D2A70@island-resort.com> <d0f82da7-77c6-86f9-f1b8-a9cd38dbc5ee@gmail.com>
To: Anders Rundgren <anders.rundgren.net@gmail.com>
X-Mailer: Apple Mail (2.3731.400.51.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/OYVJW7h96z6QIBlSCWKemZbcr00>
Subject: Re: [Cbor] Decoding numbers and compliance verification in dCBOR
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Mar 2023 07:03:34 -0000

Anders,

> On Mar 11, 2023, at 10:09 PM, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:
> 
> AFAICT, properly defined deterministic encoding rules can actually *simplify* decoders.

I don’t think my decoder implementations are more than fractionally larger or slower for the validation they do.

> Map sorting gets trivial, you just keep a copy of the preceding key and verify that the new key is bigger.

In fact, this is exactly how my implementations do it.

> However, not sticking to Rule 2 in section 4.2.2 of RFC 8949 is asking for trouble.  Really constrained systems probably do not even use floating point, making the space argument seems pretty weak.
> 
> Although not an IETF document the following specification holds a condensed description of what I consider a "reasonable" dCBOR scheme:
> https://cyberphone.github.io/javaapi/org/webpki/cbor/package-summary.html#deterministic-serialization

I’m still not sure why we should cling to the idea that floating point 0.0, -0.0, and integer 0 need distinct encodings when we’re specifically discussing *deterministic* CBOR. This is essentially burdening developers with an implementation detail, and isn’t even true in practice for JSON, where “0” is treated as truly equal to “0.0” in both type and value. This is especially true when we’re considering JSON canonicalization practices that exist, such as https://www.rfc-editor.org/rfc/rfc8785

In fact, Giving a site such as https://codebeautify.org/jsonviewer the JSON `[10, 10.0, 0, -0.0, 0.0]` and telling it to “minify” produces `[10,10,0,0,0]`. I conclude that having more than one canonical representation for the same numeric values seems like a step backwards for anything wishing to call itself “deterministic CBOR”. I’m trying to to *better* than JSON here, not worse.

Please, let me know if you think I’m missing some important benefit of having redundant encodings.

~ Wolf