Re: [Cbor] Unusual map labels, dCBOR and interop

Wolf McNally <wolf@wolfmcnally.com> Thu, 28 March 2024 05:05 UTC

Return-Path: <wolf@wolfmcnally.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 49FCAC14CEFE for <cbor@ietfa.amsl.com>; Wed, 27 Mar 2024 22:05:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.894
X-Spam-Level:
X-Spam-Status: No, score=-6.894 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wolfmcnally-com.20230601.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gVbd3ZCgoW5N for <cbor@ietfa.amsl.com>; Wed, 27 Mar 2024 22:05:00 -0700 (PDT)
Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EB58CC180B45 for <cbor@ietf.org>; Wed, 27 Mar 2024 22:05:00 -0700 (PDT)
Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-6da202aa138so416029b3a.2 for <cbor@ietf.org>; Wed, 27 Mar 2024 22:05:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wolfmcnally-com.20230601.gappssmtp.com; s=20230601; t=1711602300; x=1712207100; darn=ietf.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=6WS0Oa1kUxiuNmp2q/7H9stV4/zctLfESH5OJsOVPVw=; b=druzc5IBtspXjtDdh7y+tHVD4AfdlG8AsEnxuxzkBkmmR8DLWesumWyfOtIsNIiV13 j+wevmcwVg72YSrQbfBVzTV6vH8zQxGtPC8qYLJ/FdFIEvXZmBTEp7+skOHN1R+TYxJR idKeMEFAis4VkpRba7bSk5el8yR1bL+w9nPK7NwcqdJH7TlhXzk0YG1tqDLEJsPNRCM4 +Z825drMjvPxXH7Oq//BYmhscTbBdWxJkMULtqpwzZpzbU5+iwZ444L7hK7dhIHTeySd shDKRyFN3NWpL2tcTK0fcfF5kjQChim9Kq00qS9VHnkRRMHnvfChv/PZvHud8kX9Xdk4 TBGA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711602300; x=1712207100; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6WS0Oa1kUxiuNmp2q/7H9stV4/zctLfESH5OJsOVPVw=; b=rsKcx+yoiDS5vzF2YRIxTr1N2e0wIg7Qb9yr37rHXx9JJoIQdUQAgOx6vLUAsxzxzO XtZGhUBL/p73ocp1Rbewv5e4yky/+Q63KiRpIhNcDqodyzPtW32FyIcRHtSX8v7hx0M5 0mXUYkPK3irvWIMI+Sn1azsOizW1gp/XW1AdfpFjYC+RRrLYfwj5KC6nzaHl23OHvMs9 LZjkGIuqVVaB2eRpqEt0vsmDKx5WsduGRQq+qKTnS7vqOr6FZammyTe+vzccchN3IQQ/ UOYNuvhmfsXAkDSgFR91FNFWFAUcbd0dD4Co20KH2mXVkGREgVV8tI2MQfYb+SGiO3iJ BegA==
X-Forwarded-Encrypted: i=1; AJvYcCXKud557jDJiT74Hp5UrqxbgYBE2piIiRygiPkx9ltxzillXJVhL3rPIiIUmGOo3kbZSKmgoAeqEC6kpqBs
X-Gm-Message-State: AOJu0YxEgPM14SyCLtmwpjgdYG5irvnBSvLY6Fi4f8kmW7mbOGE8DwAO Q/ohgcwLmMB0PMrTQCkzILBSSfOb85FuXz4/YVAhJBIDSpn+yjEDGzyeE+ncy7u8iNZy83/aorK I+Sc=
X-Google-Smtp-Source: AGHT+IHeIviZZmRQMayymqARCxOAdiA1+JtFyCGNmximDdKk254BSuTulcbHQ9gVMSsEdJYZe/BqyA==
X-Received: by 2002:a05:6a00:2d07:b0:6ea:bf1c:9dfd with SMTP id fa7-20020a056a002d0700b006eabf1c9dfdmr2064705pfb.27.1711602299518; Wed, 27 Mar 2024 22:04:59 -0700 (PDT)
Received: from smtpclient.apple (ip70-180-193-108.lv.lv.cox.net. [70.180.193.108]) by smtp.gmail.com with ESMTPSA id t1-20020a63b241000000b005cfc1015befsm402455pgo.89.2024.03.27.22.04.57 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Mar 2024 22:04:59 -0700 (PDT)
From: Wolf McNally <wolf@wolfmcnally.com>
Message-Id: <0C46CF95-365A-4C44-AF19-9C3455D30BA4@wolfmcnally.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_8685B0FE-85E2-4C46-871E-7732A4FB2144"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\))
Date: Wed, 27 Mar 2024 22:04:46 -0700
In-Reply-To: <CAN8C-_JJZ6uS5mBj_gNozC7H3+RG7ULBJ6gO55R9B7=gv=D_RA@mail.gmail.com>
Cc: "lgl island-resort.com" <lgl@island-resort.com>, Carsten Bormann <cabo@tzi.org>, cbor@ietf.org, Christopher Allen <christophera@lifewithalacrity.com>, Shannon Appelcline <shannon.appelcline@gmail.com>
To: Orie Steele <orie@transmute.industries>
References: <8C245824-1990-4616-AB70-FFD4FACB1AE9@island-resort.com> <11E8A8A5-D891-49FF-AF16-697C06F463B3@tzi.org> <9A0CE364-C141-4EBE-9703-292C416D12F5@island-resort.com> <3D62C4F0-D570-4EE4-AF6A-163C708AA6BE@tzi.org> <58BA8F8C-0C63-4534-9BF7-255C32D02C16@island-resort.com> <5F1E1133-4565-4D0A-98EE-A13C6F5F67AA@wolfmcnally.com> <CAN8C-_+72_H=mk6xGuSk72rZWVg9Ff0d_b_o8Rz+kRWn1FruCQ@mail.gmail.com> <E585B8F9-BA13-4018-8D50-3C7560183BC4@wolfmcnally.com> <CAN8C-_JJZ6uS5mBj_gNozC7H3+RG7ULBJ6gO55R9B7=gv=D_RA@mail.gmail.com>
X-Mailer: Apple Mail (2.3774.500.171.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/WjEYbTg6ZVNepv9OyWcpDOlN6DQ>
Subject: Re: [Cbor] Unusual map labels, dCBOR and interop
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Mar 2024 05:05:03 -0000

Orie,

No, dCBOR does *not* serialize maps differently from CDE, and all dCBOR is absolutely `application/cbor`.

dCBOR is a “profile” of CDE, which is a profile of CBOR. Hence, all dCBOR is valid CDE and all CDE is valid CBOR. All CBOR decoders can decode dCBOR without even knowing it *is* dCBOR. All CBOR encoders can be used to encode dCBOR as long as the data presented to the encoder conforms to the dCBOR rules.

Now: not all CBOR is dCBOR. This is expected because CBOR does not have determinism as a goal, while dCBOR does. CBOR lets you encode non-preferred numerical encodings, have map keys not in canonical order, have NaN values with payloads, and numerous other things that RFC CBOR decoders *must* accept as valid CBOR. dCBOR decoders on the other hand *must* enforce dCBOR validation: preferred numerical encoding (including numeric reduction of floating point values) *must* be used. Map keys *must* be in canonical order. There is exactly *one* valid value for NaN. The only simple values accepted are `true`, `false`, and `null`, etc.

If you want a protocol that allows for either CBOR or dCBOR, then a tag can be assigned for that purpose, but you don’t even really need one: we already have a tag for Gordian Envelope: #6.200, and Gordian Envelope requires dCBOR. Therefore, in its simplest expression, you have 200(42(“Hello”)), where #6.200 is “Gordian Envelope”, #6.24 is “Encoded CBOR Data Item”, and “Hello” could be any dCBOR at all:

D8 C8               # tag(200) Envelope
   D8 2A            # tag(42) CBOR Data Item
      65            # text(5) Any dCBOR
         48656C6C6F # "Hello"

Hence there is absolutely no ambiguity.

~ Wolf

> On Mar 27, 2024, at 8:36 PM, Orie Steele <orie@transmute.industries> wrote:
> 
> (with no hat)
> 
> Set size comparison, and equivalence is key to distinguishing types in mathematics.
> 
> The set of valid application/cbor instances is strictly larger than the set of application/dcbor (or whatever) instances.
> 
> This exact property is what makes dcbor useful especially in the context of unique digests for a given serialized data structure.
> 
> If undefined were encountered when deserializing dcbor to JavaScript, an error should be thrown?
> 
> SCITT Receipts and CWTs are built on the assumption that no error would be thrown.
> 
> I must have not understood the comment made regarding map labels, because I thought you were saying that dcbor serializes maps differently from vanilla CDE... If it doesn't, why the need for any new serialization?
> 
> I don't think dcbor can be called application/cbor any more than json without booleans can be called application/json.
> 
> Processors not expecting Booleans in JSON would explode, and processors of dcbor expecting it to be meaningfully different than application/cbor will also explode right?
> 
> Feeding unexpected content to a parser is the normal expectation for security formats.
> 
> We've seen real problems arise from JSON-LD, being valid json, but not quite valid "JSON-LD"... It can lead to ambiguous processing, where some middle boxes throw errors and others don't.
> 
> Distinguishing dbcor from cbor seems a prerequisite to making any progress on its application in a security context.
> 
> OS
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Mar 27, 2024, 9:42 PM Wolf McNally <wolf@wolfmcnally.com <mailto:wolf@wolfmcnally.com>> wrote:
>> Orie,
>> 
>> > On Mar 27, 2024, at 6:57 PM, Orie Steele <orie@transmute.industries> wrote:
>> > 
>> > Using deterministic encoding to serialize map keys is interesting.
>> > 
>> > I wonder about security issues with distinguishability.
>> > 
>> > Consider the case where a map has keys that are serialized with and without deterministic encoding.
>> 
>> Map keys in dCBOR are themselves part of the dCBOR serialization and must therefore also conform to dCBOR.
>> 
>> > 
>> > Restricting map keys, reminds me of another flavor of CBOR, I think it was called dag-cbor, it required all map keys to be tstr... Whereas dCBOR requires them all to be bstr?
>> 
>> No, the handling of map keys as serialized dCBOR strings internally is a detail of our reference implementation. dCBOR map keys are simply serialized as dCBOR items like every other dCBOR item— they are not segregated into typed byte strings. So the map:
>> 
>> {[1, 2, 3]: "Hello”}
>> 
>> Is serialized as:
>> 
>> A1               # map(1)
>>    83            # array(3)
>>       01         # unsigned(1)
>>       02         # unsigned(2)
>>       03         # unsigned(3)
>>    65            # text(5)
>>       48656C6C6F # "Hello"
>> 
>> Not as:
>> 
>> {h'83010203': "Hello”}
>> 
>> A1               # map(1)
>>    44            # bytes(4)
>>       83010203   # "\x83\u0001\u0002\u0003"
>>    65            # text(5)
>>       48656C6C6F # "Hello"
>> 
>> > Is it critical to assign unique media types in order for deserialization to succeed?
>> 
>> No, for the reasons stated above.
>> 
>> > I also wonder about interactions with COSE and related protocols especially regarding unprotected headers.
>> 
>> COSE is not necessarily dCBOR, although you can certainly encode COSE in compliance with dCBOR. And for particular applications you can certainly state that any of the signed byte strings in the protected header or payload must conform to dCBOR, which would help ensure determinism. For example, the payload could be a serialized Gordian Envelope, which is a format that supports verifiable claims based on dCBOR. We at Blockchain Commons are considering proposing the inclusion of Gordian Envelope as a Verifiable Data Structure for inclusion in the COSE Receipts registries.
>> 
>> https://datatracker.ietf.org/doc/draft-mcnally-envelope/
>> 
>> ~ Wolf