Re: [Cbor] Updated Drafts for dCBOR I-D and Gordian Envelope Structured Data Format I-D & IANA Tag Registration

Wolf McNally <wolf@wolfmcnally.com> Wed, 31 May 2023 23:39 UTC

Return-Path: <wolf@wolfmcnally.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 10376C14CE2F for <cbor@ietfa.amsl.com>; Wed, 31 May 2023 16:39:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.894
X-Spam-Level:
X-Spam-Status: No, score=-6.894 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wolfmcnally-com.20221208.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cfg73frpj-mb for <cbor@ietfa.amsl.com>; Wed, 31 May 2023 16:39:51 -0700 (PDT)
Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9A01EC14CE2E for <cbor@ietf.org>; Wed, 31 May 2023 16:39:51 -0700 (PDT)
Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-1b18474cbb6so1179545ad.1 for <cbor@ietf.org>; Wed, 31 May 2023 16:39:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wolfmcnally-com.20221208.gappssmtp.com; s=20221208; t=1685576390; x=1688168390; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=k4CbflV1qQ8rorvL1dhDWAAAzlBmAuHztkExhebsmXY=; b=aveNnWl2KGHN8Rwl7zneFyjlsUMk1jhHwCAWZPFtqTNS7PqKSbl0Q7Ja9ioT34KCR2 hW6MUtBYhYsQ7//a7XAtq+OVBTowA1hq7F3/+LX9HccCgg6LqmACwKZ+6PvSpRb+OlrB QcmOMmEN3ElgFcIHc87XEznxRO9VBnztZpjNbi+sjpiI5imA3KKMQ1PUAHTtt4KAUqdN QBN4JdE//T+iQ169uQXETT8u55q1Zp/HCqCzhFIiyyuLIa2E8V7DotjwzNDCGjBGolLU RPlR6zBbkyfzAzMw1A+bxXdrBis+/IDrPO6HMbr9uSS9XlXRz1lTI04zhkWqq/KY5Ags 5N2A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685576390; x=1688168390; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=k4CbflV1qQ8rorvL1dhDWAAAzlBmAuHztkExhebsmXY=; b=PgzfBlFHEO9bdtJCx5/okVDq4J+MXwSsMF4W4SBBpLymyspBt4P/jfdMR+3b5je2fM FPaGX4Q3AUdwonOrUm+afuwPwOuRLn2r/c2GCOAyB9gKNMhHZ8q/SNgVaeqkgZ9CX1wQ QT4fcE4l3bILaFiye3J8/AFV5YiWvts3NLfEJo6A8MyrlTMCz/ZuPLAfZ18/WkM+vBQK mP17YLSQQAbKIdIVcA+B7SUVGb0wIExZ4veFZi3woSqKi/QT9WRsSl5UmXkHFPfU7qbf TEPVg59izcy0rymD3gm4wsgWLH1jeYF9zBl4tTJ3Jjmbmt1PJa4o51u+RcL7GqN3vN8h 7c/w==
X-Gm-Message-State: AC+VfDwEuEPosXrKRgYRAwTuc/91KDTGx1LMD9qgAw3xtS5fe0R1z51j /8u5RENrn2aescRMash4Wsh3WA==
X-Google-Smtp-Source: ACHHUZ66dtttCwMpFilDEnlSs+8rloGGyEy54ZsAlX0BmBM+PlyKWQVG1BR0UrXUbimuzxL3DoqwKA==
X-Received: by 2002:a17:902:e74c:b0:1ab:1355:1a45 with SMTP id p12-20020a170902e74c00b001ab13551a45mr5571086plf.30.1685576390499; Wed, 31 May 2023 16:39:50 -0700 (PDT)
Received: from smtpclient.apple (ip70-180-193-108.lv.lv.cox.net. [70.180.193.108]) by smtp.gmail.com with ESMTPSA id n3-20020a170902e54300b001ae268978cfsm1936957plf.259.2023.05.31.16.39.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 31 May 2023 16:39:50 -0700 (PDT)
From: Wolf McNally <wolf@wolfmcnally.com>
Message-Id: <7D31AA12-5D9F-47C1-AAC5-98A802A60163@wolfmcnally.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_2E4EB37D-9129-4B20-8812-A8BACCBE5CB0"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\))
Date: Wed, 31 May 2023 16:39:38 -0700
In-Reply-To: <42FDA88A-7D6A-4AA7-986A-C94EBC1B0999@wolfmcnally.com>
Cc: Christopher Allen <ChristopherA@lifewithalacrity.com>, cbor@ietf.org, Shannon.Appelcline@gmail.com
To: Carsten Bormann <cabo@tzi.org>
References: <CAAse2dEFB_FVP6_KkNANSYPW+yX4-M9pN3YkUq5=FTgLZnyWGw@mail.gmail.com> <FFD1AFCB-45F7-4893-B4B8-F2F093FEE6E5@tzi.org> <42FDA88A-7D6A-4AA7-986A-C94EBC1B0999@wolfmcnally.com>
X-Mailer: Apple Mail (2.3731.600.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/V85bYzmGkilIvd2heBq1vLSsM0Q>
Subject: Re: [Cbor] Updated Drafts for dCBOR I-D and Gordian Envelope Structured Data Format I-D & IANA Tag Registration
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 May 2023 23:39:56 -0000

Errata:

#202(1) in our current implementations represents the predicate `id`; a unique identifier of some kind for an object. The predicate in this case doesn’t specify the form of the object; it simply declares that when it appears as the predicate of an assertion that the assertions' object is a unique identifier of the subject, for example:

> On May 31, 2023, at 4:33 PM, Wolf McNally <wolf@wolfmcnally.com> wrote:
> 
> Carsten,
> 
> First just to reiterate our tag requests in this thread:
> 
> | code point | data item | semantics |
> |:----|:-----|:-----|
> | #6.200 | multiple | envelope; see section 3.1 |
> | #6.201 | array | assertion; see section 3.3.3 |
> | #6.202 | uint | known-value; see section 3.2.2 |
> | #6.203 | multiple | wrapped-envelope; see section 3.3.2 |
> | #6.204 | bytes | elided; see section 3.2.5 |
> | #6.205 | array | encrypted; see section 3.2.3 |
> | #6.206 | array | compressed; see section 3.2.4 |
> 
> (1) These tags represent the top-level envelope #200 and six of envelope’s eight case arms, the ones not requiring new tags being `leaf`, which is tagged #6.24 as it’s just wrapped CBOR, and `node`, which is encoded directly as a CBOR array. The main function of the top-level tag is being able to direct parsing from from a single, universally-recognized tag for “envelope”. Inner tags are used to distinguish the other seven case arms of envelopes, which are not sufficiently disambiguated by their CBOR data item: for instance three of the case arms: `assertion`, `encrypted`, and `compressed`, are also conveyed as CBOR arrays, and `leaf` could be *any* dCBOR encoding including arrays, int, or whatever. So relying purely on internal structure for disambiguation would be a semantic layer violation.
> 
> Once #200 has been parsed an envelope parser would know that its contents are *some* kind of envelope content, For the other six case arms I considered using existing very short tags for this, like #6.1 through #6.6, which would save a byte, but would interfere with other envelope-naive CBOR parsers that expect tags like #6.1 to represent a date or have other tag-specific semantics. I considered using much larger values that are purely first-come-first-served, but we want envelopes to introduce minimal overhead, and so requesting #200-#206 seems to us to strike a balance.
> 
> Several of these tags are valuable in their stand-alone form, including `known-value` (see discussion below), `elided` (which is just a SHA-256 digest, and for which I found no existing tag), `encrypted` and `compressed`, which are informatively defined in the I-D and implemented in our reference implementations. Our implementations include `EncryptedMessage` and `Compressed` types that directly encode to tagged CBOR as specified without any dependence on the envelope environment; the main additional requirement being that they afford the encoding and verification of a hash, which envelope uses. Hence they and their tags are more generally useful than purely with envelope.
> 
> So we believe these six tags to be a minimal set we’re requesting in the IANA registry, where we have considered the space utility of the tags themselves, respected the limited real-estate of the tag namespace, and considered their usefulness both within the context of envelope as well as their possible utility to others outside the envelope context.
> 
> (2) These are typographical errors: they are holdovers from our previous numbering scheme before they were renumbered in sequence starting with #200, in preparation for our IANA request. Thank you for calling this to my attention, and they will be fixed in the next draft version.
> 
> #221 -> #201 `assertion`
> #224 -> #203 `wrapped-envelope`.
> 
> (3) A “known value” is a namespace of uints we have declared primarily for use with envelope, but which could have uses outside it. They would be administered with a separate registry of known values. A known value tagged #202 can be used by itself, or can be wrapped with #200 and then exist as the subject of an envelope, or as the predicate or object of an assertion. We therefore originally called this namespace `known-predicate`, because when defining semantic structures, a common set of predicates are used frequently, and we realized that having a concise encoding for predicates would be desirable. We realized that this namespace also has other potential use-cases as a space of generalized concepts, so we now call it known-value. We believe this to be a non-overlapping use-case with CBOR tags. We hope popular ontologies may eventually choose to map their defined predicates and other concepts to known values, rather than bloat envelopes with repeated text strings for the concepts they use; of course nothing in the envelope spec stops them from using strings as predicates, or any other envelope for that matter, including envelopes containing their own assertions.
> 
> This also brings us to a simple example of why the double-tagging you asked about in (1) above:
> 
> #203(1) in our current implementations represents the predicate `id`; a unique identifier of some kind for an object. The predicate in this case doesn’t specify the form of the object; it simply declares that when it appears as the predicate of an assertion that the assertions' object is a unique identifier of the subject, for example:
> 
> "Alice" [
>     id: 1234
> ]
> 
> Known values by themselves are not envelopes and may have other uses outside of envelopes, or when passed to other subsystems such as ontology catalogs. On the other hand #200(#203(1)) is a stand-alone envelope with only the subject `id` and no other information, and can be processed by any software that handles envelopes. One can imagine taking such an envelope and then adding a number of assertions to it describing the semantics of the specific predicate, such as the URL at which to access its full schema:
> 
> id [
>     isA: "predicate"
>     dereferenceVia: "https://example.com/ontology/id"
> ]
> 
> In this example, `id`, `isA` and `dereferenceVia` are all known values (note the lack of quotes used for strings in envelope notation), and especially in cases like `dereferenceVia` are going to take far fewer bytes to encode than the full string name, much less a URI.
> 
> (4) I think there may be a misunderstanding here. Crypto-agility is a non-goal of Gordian Envelope; in fact, we feel that crypto-agility is often a huge barrier to understanding and adoption; I think Christopher will have more to say about this. Our main promise, as described in the section on Futureproofing, is that as the crypto landscape changes we are sure that we can update the spec to incorporate additional cryptographic constructs as they become necessary. For now, the only actual normative construct is the hash algorithm: SHA-256. This is due to the intractability of handling envelopes (and hence hash trees) with hashes created by a multiplicity of algorithms. Nothing in the spec requires that a particular construct for encryption (or compression) must be used; it only informatively states what we’re using now. As was indicated in the Futureproofing section, we don’t believe that we need algorithm specifiers in envelope as the only normative algorithm is defined by the spec, and other normative documents will define standards for other constructs as well as ways to disambiguate their envelope encodings. To be clear: the `encrypted` arm of the envelope is *not* a specification for IETF-ChaCha20Poly1305, it is a specification that the envelope is *encrypted* by some means and the only requirement is that it publishes the hash of the plaintext in a verifiable way: nothing more. Likewise the `compressed` arm of the enveloped is *not* a specification for DEFLATE, it is a specification that the envelope is *compressed* and that it publishes the hash of the uncompressed contents.
> 
> Incidentally, I did a comparison between your packed CBOR scheme and the DEFLATE scheme used in the `compressed` arm of Envelope:
> 
>         // This CBOR is encoded based on:
>         // https://datatracker.ietf.org/doc/html/draft-ietf-cbor-packed#appendix-A
>         // Figures 2-3.
>         
>         // Before packing: 400 bytes
>         // After packing:  310 bytes 76%
>         // After DEFLATE:  259 bytes 64%
> 
>         // This CBOR is encoded based on:
>         // https://datatracker.ietf.org/doc/html/draft-ietf-cbor-packed#appendix-A
>         // Figures 4-5.
>         
>         // Before packing: 1210 bytes
>         // After packing:   504 bytes 41%
>         // After DEFLATE:   321 bytes 26%
> 
> (a) The backslashes I added as line continuation characters in order to keep to the I-D linter’s line length requirements for pre-formatted text. Perhaps they aren’t necessary for understanding and I can remove them.
> 
> (b) For many people in the communities we deal with, Gordian Envelope will be their first exposure to CBOR. CBOR’s conciseness is one of its strengths, and I primarily included the CBOR hex examples out of a desire to be abundantly clear about the binary form of the encoding; particularly for people who may be new to CBOR and thinking of adopting envelope. As we approach RFC status, this may be deemed unnecessary and removed.
> 
> ~ Wolf
> 
>> On May 31, 2023, at 10:00 AM, Carsten Bormann <cabo@tzi.org> wrote:
>> 
>> On 2023-05-08, at 19:40, Christopher Allen <ChristopherA@lifewithalacrity.com> wrote:
>>> 
>>> Gordian Envelope Internet-Drafts, which have been recently submitted to the Internet Engineering Task Force (IETF).
>> 
>> I did a quick skim after today’s meeting.
>> 
>> First a few comprehension questions:
>> 
>> (1) Why do things that already identify themselves as Gordian Elements (tag 201 to 206) need to be tagged again with 200?  Is there any function of this I’m missing?
>> 
>> (2) What is 221? 224?
>> 
>> (3) I don’t understand the way “known values” (Section 6) mix with the things they are predicates over.
>> 
>> Editorial comments:
>> 
>> (a) Some forward slashes probably want to be backslashes.
>> (You don’t need backslashes inside hex values in CBOR diagnostic notation.)
>> Did you check those examples?
>> 
>> (b) I don’t know that you need “CBOR hex” for all the examples, but if you do, can I recommend "CBOR pretty” format?  (Generate with diag2pretty.rb, which is part of the cbor-diag gem.)
>> 
>> Technical comments:
>> 
>> (4) I don’t find the crypto agility that was promised.
>> You probably want to use algorithm identifiers from the COSE algorithms registry (RFC 9053).
>> 
>> Grüße, Carsten
>> 
>