[Cbor] draft-mcnally-deterministic-cbor: rationale for not going with pre-encoded data (was: Updated Drafts for dCBOR I-D and Gordian Envelope Structured Data Format I-D [...])

Christian Amsüss <christian@amsuess.com> Wed, 31 May 2023 15:57 UTC

Return-Path: <christian@amsuess.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E26B3C151075 for <cbor@ietfa.amsl.com>; Wed, 31 May 2023 08:57:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.596
X-Spam-Level:
X-Spam-Status: No, score=-2.596 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0qSOb58l3t5u for <cbor@ietfa.amsl.com>; Wed, 31 May 2023 08:57:47 -0700 (PDT)
Received: from smtp.akis.at (smtp.akis.at [IPv6:2a02:b18:500:a515::f455]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DD96BC14CE47 for <cbor@ietf.org>; Wed, 31 May 2023 08:57:44 -0700 (PDT)
Received: from poseidon-mailhub.amsuess.com (095129206250.cust.akis.net [95.129.206.250]) by smtp.akis.at (8.17.1/8.17.1) with ESMTPS id 34VFve9f030439 (version=TLSv1.2 cipher=ECDHE-ECDSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 31 May 2023 17:57:40 +0200 (CEST) (envelope-from christian@amsuess.com)
X-Authentication-Warning: smtp.akis.at: Host 095129206250.cust.akis.net [95.129.206.250] claimed to be poseidon-mailhub.amsuess.com
Received: from poseidon-mailbox.amsuess.com (hermes.lan [10.13.13.254]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id 747DE21DD6; Wed, 31 May 2023 17:57:39 +0200 (CEST)
Received: from hephaistos.amsuess.com (hephaistos.lan [IPv6:2a02:b18:c13b:8010::d5b]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id 45E372386C; Wed, 31 May 2023 17:57:39 +0200 (CEST)
Received: (nullmailer pid 9085 invoked by uid 1000); Wed, 31 May 2023 15:57:38 -0000
Date: Wed, 31 May 2023 17:57:38 +0200
From: Christian Amsüss <christian@amsuess.com>
To: Christopher Allen <christophera@lifewithalacrity.com>
Cc: cbor@ietf.org, Wolf McNally <wolf@wolfmcnally.com>, Shannon.Appelcline@gmail.com
Message-ID: <ZHducp0jYdjNthbE@hephaistos.amsuess.com>
References: <CAAse2dEFB_FVP6_KkNANSYPW+yX4-M9pN3YkUq5=FTgLZnyWGw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="IhNtLYDLZY5qO0h7"
Content-Disposition: inline
In-Reply-To: <CAAse2dEFB_FVP6_KkNANSYPW+yX4-M9pN3YkUq5=FTgLZnyWGw@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/X4K-UjBZP6wxxZdWgBYEfCp2vH4>
Subject: [Cbor] draft-mcnally-deterministic-cbor: rationale for not going with pre-encoded data (was: Updated Drafts for dCBOR I-D and Gordian Envelope Structured Data Format I-D [...])
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 May 2023 15:57:51 -0000

Hello Wolf, hello Christopher,

this got cut a bit short during today's interim, so taking it to the
list:

On Mon, May 08, 2023 at 10:40:21AM -0700, Christopher Allen wrote:
> The first IETF Internet-Draft, “Gordian dCBOR: Deterministic CBOR
> Implementation Practices,” offers specifications and guidelines for dCBOR
> implementers. You can find the updated version of the draft at the
> following link:
> https://www.ietf.org/archive/id/draft-mcnally-deterministic-cbor-01.html

I'd like to understand (and see explained in the document) why
deterministic encoding is this important for the purpose of signatures
-- given that any verifier of the signature will also at some point have
received the original data to verify. A common pattern (used eg. in
COSE) is to encode the data to be signed in a byte string, and sign over
that. (In the case of partially erasable data, it'd probably be a
nesting of byte strings, but at the level where non-bstr data comes in,
it'd be encoded at least once).

You mentioned some points I didn't get completely:

* It's harder with multi-signatures.

  I don't quite understand how so; at least when reading
  multi-signatures as countersignatures, COSE seems not to have issues
  with it.

* It's known to cause problems with JWT.

  Yes. But JSON is not CBOR -- JSON data can't be used in-place, whereas
  CBOR data is unescaped and can be processed directly. To stick with
  the Rust examples you mentioned, decoding a CBOR data structure that's
  encoded with several times nesting will still allow you to get a
  str reference into data that's in the deepest item without copying.

* People pass data into their graph storage and export out again.

  That's an argument I can understand, but the point above on it not
  being JSON applies as well. While keeping the JSON around would
  roughly duplicating the amount of RAM used, encoded CBOR can share
  much data between a triple store and the originally signed data.

  For example, looking at how CoRAL data might be encoded in such a
  triple store, the creation of the graph tables would primarily verify
  that the CBOR in the document is indeed valid, and then be a table of
  pointers into that data. (Currently CoRAL is described to be
  serialized as a tree. An encoding as triples is being considered, but
  the ordering of the triples would not be lexicographic but following
  some tree walk).

If this was all well thought through, and deterministic encoding is
really the only way to go, I'm all for it -- I'd just like to make sure
the alternatives have been explored, and that that process is
documented.

BR
c

-- 
To use raw power is to make yourself infinitely vulnerable to greater powers.
  -- Bene Gesserit axiom