[Cbor] Limits of .feature and JC<> generic for CBOR/JSON

Laurence Lundblade <lgl@island-resort.com> Mon, 25 April 2022 19:50 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 360C8C2C33CA for <cbor@ietfa.amsl.com>; Mon, 25 Apr 2022 12:50:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.003
X-Spam-Level:
X-Spam-Status: No, score=0.003 tagged_above=-999 required=5 tests=[HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ESpocDy4lbNm for <cbor@ietfa.amsl.com>; Mon, 25 Apr 2022 12:50:51 -0700 (PDT)
Received: from p3plsmtpa08-02.prod.phx3.secureserver.net (p3plsmtpa08-02.prod.phx3.secureserver.net [173.201.193.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 72587C2BCDC8 for <cbor@ietf.org>; Mon, 25 Apr 2022 12:50:48 -0700 (PDT)
Received: from [192.168.1.224] ([187.223.246.65]) by :SMTPAUTH: with ESMTPSA id j4jPnNMuZcBmuj4jPn472Z; Mon, 25 Apr 2022 12:50:47 -0700
X-CMAE-Analysis: v=2.4 cv=W8796Tak c=1 sm=1 tr=0 ts=6266fb97 a=iLj4PbV1ZS/bOr1nN3pTHg==:117 a=iLj4PbV1ZS/bOr1nN3pTHg==:17 a=48vgC7mUAAAA:8 a=oM6ca4tdH-Z5v2dyHdUA:9 a=QEXdDO2ut3YA:10 a=z-riCrqMhUWq6Ovf:21 a=_W_S_7VecoQA:10 a=w1C3t2QeGrPiZgrLijVG:22
X-SECURESERVER-ACCT: lgl@island-resort.com
From: Laurence Lundblade <lgl@island-resort.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_5FE4281A-D8FE-4390-8128-A6161FDC10FF"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
Message-Id: <1C98F19E-65F6-440E-B493-99D469855FB0@island-resort.com>
Date: Mon, 25 Apr 2022 13:50:47 -0600
To: cbor@ietf.org
X-Mailer: Apple Mail (2.3608.120.23.2.4)
X-CMAE-Envelope: MS4xfJZlkaB/5Tz2N9jjC2I5bWlJ4eZFT/qxZiHSXTvYKTdu5DKfUwBEhmvjAdvhYGnPUwIaitNfL2M3CcD1Ctiqv0xnpUG7I5bH15AjxWCAXdmqPwXSo/Cm TIBMnar/Bvb12UcOPLF0+K+8nZoOxk58q7Ko018dU0nEkUcYsPK1BImlN4yTqiKcggphrKzJpcLgmg==
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/ZzDLBYzEZ91fktKKvXqjOzFmYIo>
Subject: [Cbor] Limits of .feature and JC<> generic for CBOR/JSON
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Apr 2022 19:50:55 -0000

Here’s an example EAT top level definition that can be either JSON or CBOR to illustrate my point here. I’m just using UCCS and UJCS  here because it makes the example work well:

EAT = JC<UJCS, UCCS>
UJCS = Claims-Set ; Unsigned JSON Claims Set
UCCS = Claims-Set ; Unsigned CBOR Claims Set

The definition of Claims-Set and JC<> are from UCCS Appendix A <https://datatracker.ietf.org/doc/html/draft-ietf-rats-uccs-02#appendix-A>. 

The definition of Claims-Set uses the JC<> generic for claim labels and other things that vary between JSON and CBOR.

If I feed a valid __CBOR__ token into the cddl validation tool with the above CDDL, it will think it’s __JSON__ because the PEG will cause it to match UJCS, not UCCS because there is no difference between the definition of UJCS and UCCS.

The output diagnostic from the validation will say the input is JSON, which is defnitely not the desired result because the input is correct CBOR. This is happening because the the JSON is first in JC<> and the PEG is greedy. So, what to do?

(This is a bit of a degenerate case, kept simple to illustrate the issue. In the larger definition of EAT with nested tokens this is not so degenerate)

My take away is:

- I need to define the top-level EAT for CBOR separately from JSON, even though they will share most of their actual definition. The JC<> generic won’t work at the top level, even though it works well for labels and such.

- The use of .feature to separate JSON and CBOR has limitations
   - Can’t distinguish structures like the above
   - Results in a lot of diagnostic output — output for every claim label and for many claim values that have to be sorted through manually to determine if the validation succeeded


Maybe a .serialization control would be better? It binds hard to the serialization type given as the input. The PEG matching rules don’t apply. Any CDDL alternative that is not the same as the input serialization type is entirely disregarded.


To go a bit further, the input being matched is likely to be entirely CBOR or JSON, not a mix, but there are cases where the input to be matched might be a mix of CBOR and JSON. This happens in EAT when you nest a CBOR token inside a JSON token or vice versa. It only happens at one well-defined point in EAT as arbitrarily mixing serialization formats would be silly and chaotic.

In EAT when JSON appears in CBOR it is a text string and when CBOR appears in JSON it is also a text string that is b64 encoded.

I don’t think it is necessary for CDDL validation to deal with this mixed input case. It seems rare and it doesn’t seem necessary to the general function of validation, so it seems like .serialization as I mentioned above is an OK solution.

LL