Re: [Cbor] tag 24 and 55799 (was Re: my (WGLC re-)views on error processing in RFC7049bis and future-proofing)

Laurence Lundblade <lgl@island-resort.com> Fri, 29 May 2020 19:51 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9F38B3A1028 for <cbor@ietfa.amsl.com>; Fri, 29 May 2020 12:51:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yHn-vLKhWVxD for <cbor@ietfa.amsl.com>; Fri, 29 May 2020 12:51:44 -0700 (PDT)
Received: from p3plsmtpa12-03.prod.phx3.secureserver.net (p3plsmtpa12-03.prod.phx3.secureserver.net [68.178.252.232]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 03E3E3A1027 for <cbor@ietf.org>; Fri, 29 May 2020 12:51:43 -0700 (PDT)
Received: from [192.168.1.78] ([76.167.193.86]) by :SMTPAUTH: with ESMTPA id el2cjUKdDLJbhel2djwE5c; Fri, 29 May 2020 12:51:43 -0700
X-CMAE-Analysis: v=2.3 cv=GfJpYjfL c=1 sm=1 tr=0 a=t2DvPg6iSvRzsOFYbaV4uQ==:117 a=t2DvPg6iSvRzsOFYbaV4uQ==:17 a=gKmFwSsBAAAA:8 a=K6EGIJCdAAAA:8 a=l70xHGcnAAAA:8 a=48vgC7mUAAAA:8 a=oGbIuagOAAAA:20 a=K66lgTXJroBqEgRV2mMA:9 a=bg0PuE9aELsMXmur:21 a=uHyis1-MEgXJ5PKD:21 a=QEXdDO2ut3YA:10 a=9LDti0Gtzx2_Mf1CQ7QA:9 a=79FV-ZAur8EPG3bL:21 a=5dHfOENl6CG81EvN:21 a=B-n0CDPZXYMxRdyI:21 a=_W_S_7VecoQA:10 a=nnPW6aIcBuj1ljLj_o6Q:22 a=L6pVIi0Kn1GYQfi8-iRI:22 a=JtN_ecm89k2WOvw5-HMO:22 a=w1C3t2QeGrPiZgrLijVG:22
X-SECURESERVER-ACCT: lgl@island-resort.com
From: Laurence Lundblade <lgl@island-resort.com>
Message-Id: <674CEEEB-4C75-4E8D-A4F1-13058507D558@island-resort.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_C5028828-4E63-44F9-9278-2506035C9D87"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Fri, 29 May 2020 12:51:41 -0700
In-Reply-To: <AD183B67-2B49-4CB3-B81D-BB024B4317E7@tzi.org>
Cc: Michael Richardson <mcr+ietf@sandelman.ca>, cbor@ietf.org
To: Carsten Bormann <cabo@tzi.org>
References: <17300.1588779159@localhost> <38BB6FFF-737F-4C11-AD7A-DA3F28A9F570@tzi.org> <CANh-dXkdjMyO=WFUxrF06OfP+RE9v11unKJXL8P3UtEe+prV1w@mail.gmail.com> <13690.1588894939@localhost> <CANh-dXmjD=RCwh7ExjSvFx+5ciew+eqHoVS88OommQ2xVnX5=Q@mail.gmail.com> <2963.1589473899@localhost> <BC0EC9BE-4202-4EED-A619-CDEB9BF312CE@tzi.org> <26665.1589593222@localhost> <589BF33E-9A41-400B-A91B-F45F85062269@island-resort.com> <AD183B67-2B49-4CB3-B81D-BB024B4317E7@tzi.org>
X-Mailer: Apple Mail (2.3445.104.11)
X-CMAE-Envelope: MS4wfOZAaFoFK04uBjZbBDJgkFxW8Feh6RAlp+HTp7zdZ83clWLlWBn72YD81BPjUqc5bweXhio42BtZYTg4l9QmZf9L4aK+xpY5/h5oFK1AqlPzhdompWTB i1kWQ2WoDwkK74JaOBv7lyQTFs6ZXJKzmGTLlR2KBcCwMVHMeLZdPzNGJ4aybQexpPZFhky+3wGgMcuL6EbO2K8aP7sgveMdwicEnGln5+80l5ahwQ+DqNYh
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/yK3IGh7OlYSpKVEMqk4ULlC3MCg>
Subject: Re: [Cbor] tag 24 and 55799 (was Re: my (WGLC re-)views on error processing in RFC7049bis and future-proofing)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 29 May 2020 19:51:47 -0000

Really appreciate the discussion here (despite my incorrect statements).

> On May 26, 2020, at 2:43 PM, Carsten Bormann <cabo@tzi.org> wrote:
> 
>> On 2020-05-26, at 23:04, Laurence Lundblade <lgl@island-resort.com> wrote:
>> 
>>> On May 15, 2020, at 6:40 PM, Michael Richardson <mcr+ietf@sandelman.ca> wrote:
>>> 
>>>>> I note that AFAIK, we do not use tag#24 (Encoded CBOR data item) for
>>>>> the signed object, in COSE.  Should we?  What's the difference between
>>>>> #24 and #55799.
>>> 
>>>> 55799 is a tag that can have any CBOR data item as tag content 24 is a
>>>> tag that can only be on byte strings.  The byte string then *encodes*
>>>> another CBOR data item.  (The main use here is to keep the decoder from
>>>> decoding, to provide easy skip-ability or because we need exact bytes
>>>> as in COSE.)  As often with tags, there is no need for tag 24 on a byte
>>>> string when it is clear from context that the byte string contains
>>>> encoded CBOR; this is the case in COSE.
>>> 
>>> Understood.
>> 
>> My answer on the difference is that you use 55799 when the surrounding data / file / protocol is not CBOR and 24 when it is CBOR. 55799 is intended to work as a magic number, 24 is not because it is not unique enough.
>> 
>> From a decoder point of view, they should be handled exactly the same
> 
> Actually, no.
> 
> 55799 has any CBOR data item as tag content and essentially has the semantics of that CBOR data item.

This is well-formed and valid CBOR by RFC 7049 & CBORbis:

D9 D9F7          # tag(55799)
   81            # array(1)
      D9 D9F7    # tag(55799)
         D9 D9F7 # tag(55799)
            01   # unsigned(1)


One is tempted to say that 55799, as a true CBOR no-op, must always be ignored by generic decoders. However, I think most implementations of CBOR protocols (e.g., CWT, COSE) today would not tolerate 55799’s showing up anywhere in the encoded CBOR like in my example above because ignoring it universally has never been the rule.

So you really need to mean it when you put tag 55799 in, the same as any other tag. Its use needs to be mandated as part of the protocol specification so implementors know it will be occurring and are ready to ignore it.

Probably it is not even CBOR protocol specifications that says to use 55799. It is the protocol or file format that carries the encoded CBOR that would indicate it is in use.

Probably 55799 should never occur except at the beginning of some encoded CBOR.


> 
> 24 has a byte string as tag content.  That byte string is identified by the tag as encoded CBOR. 
> A byte string with embedded encoded CBOR is a data item, but it is different from the data item that was encoded and embedded. Decoders will differ in their tag 24 handling: they might go ahead and decode that CBOR or just hand the byte string and tag to the application.  In either case, the decoded CBOR is not simply “in place”, making the tag and the byte string vanish.

Yes of course (I should have known that).

A generic decoder could have a mode that skips the tag24-plus-byte-string the same it skips 55799, but it shouldn’t be the default because protocols usually assign some semantics to it such as having it hashed or distinguishing it from JSON (protocols should never assign such semantics to 55799).


> This is kind of twisted, but seems legal.
>>  Encoding: 
>>    - start with an RFC 3339 date string
>>    - base 64 encode it and tag it as so with tag 34
> 
> Why would one want to do that?
> 
>>    - tag it with tag 24 (55799 ) or to say it is CBOR
> 
> Ignoring the 55799 case, you are missing one encoding step (tag 24 requires a byte string with encoded CBOR, not a tagged text string).
> 
>>    - tag it with tag 0 to say it is a date string
> 
> Not allowed.  This only takes major type 3, no tags.
> 
>>    - tag it with 22 to say it should be b64’d if re-encoded later

I’m after a deeper understanding of how base64, tag 24 and 55799 interact or don’t interact. My date string example was poor (sorry).

Base64 is really a content transfer encoding, not a data type.  The original data before base64 encoding is of some type or other. Whom ever processes it after the base64 encoding is removed has to come to know what type it is.

However, CBOR generally treats base64 as a data type itself and as it is has no means for indicating the type of the original data before base 64 encoding was added (by contrast, MIME can describe both type and transfer encoding).

So how can the receiver of base64-tagged data know the type of original data? It pretty much has to be in the definition of the CBOR protocol. It might say it is always an X.509 cert or an elf executable or such. It might say it is looked up by magic number (in which case 55799 might apply). Lots more options...

It is also possible to define tags whose content type can be bstr, base64, base64-url. If it was really common to base64 encode X.509 certs (maybe because of JOSE) and a tag was defined to indicate an X.509 cert, it might be useful to say the X.509 tag content could be bstr, base64 or base64-url.

No suggestion for change and nothing at issue so far, just thinking through how it fits together.

What would really be helpful to me is some detailed example of when you’d use the base64 tags. Why would you ever use them when CBOR carries binary data just fine? Why wouldn’t every CBOR-based protocol just say you always strip the base64 before CBOR encoding? And why wouldn’t every CBOR bstr always be base64 encoded when translating to JSON?


> 
> Well, not “it”, but any byte string in the CBOR data item.
> So, here it would base-64 encode for JSON conversion the byte string in tag 24 (the tag 24 itself would presumably be stripped in any JSON conversion, but I’m already confused — you can do “JSON in JSON” as well, just not tag it).
> 
>> Decoding:
>>    - remove the base 64 encoding because of the tag 34
> 
> That is inside, so you see it last.
> 
>>    - feed it back to the CBOR encoder because of tag 24
>>    - interpret it as an RFC 3339 date because of tag 0
> 
> Lost track.
> 
>> Base64 encoding / decoding is not that much code or that difficult, so a generic decoder might actually do this.
> 
> I wouldn’t touch that with a ten-foot pole — it might convert that pole into base64 without asking me.

I wasn’t suggesting any base64 decoding conversion that wasn’t indicating by a tag in an input stream or any base 64 encoding that wasn’t requested through an explicit API request. Section 3.4.5.3 seems to suggest the decoding is a good thing to do:

3.4.5.3 <https://tools.ietf.org/html/draft-ietf-cbor-7049bis-13#section-3.4.5.3>.  Encoded Text

   Some text strings hold data that have formats widely used on the
   Internet, and sometimes those formats can be validated and presented
   to the application in appropriate form by the decoder.  There are
   tags for some of these formats.  As with tag numbers 21 to 23, if
   these tags are applied to an item other than a text string, they
   apply to all text string data items it contains.

Is there anything wrong with a generic decoder validating and removing bas64 encoding when encountering tag 34?

Also see this issue <https://github.com/cbor-wg/CBORbis/issues/194> I filed against this text and Table 5.

Thanks very much.

LL