Re: [Cbor] Tagging requirement

Carsten Bormann <cabo@tzi.org> Mon, 17 August 2020 07:41 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3C7843A095A for <cbor@ietfa.amsl.com>; Mon, 17 Aug 2020 00:41:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dZZo7WSPXLzO for <cbor@ietfa.amsl.com>; Mon, 17 Aug 2020 00:40:57 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 302483A08FF for <cbor@ietf.org>; Mon, 17 Aug 2020 00:40:56 -0700 (PDT)
Received: from client-0100.vpn.uni-bremen.de (client-0100.vpn.uni-bremen.de [134.102.107.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4BVQvW1YVWzyhL; Mon, 17 Aug 2020 09:40:55 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <5F695632-CF27-40FF-BC23-E731AAA95771@island-resort.com>
Date: Mon, 17 Aug 2020 09:40:54 +0200
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 619342854.840881-f7a10bc290a3d7840905c033cfae9c63
Content-Transfer-Encoding: quoted-printable
Message-Id: <895A3DF8-DF11-479F-9DC6-9EF98465A7E0@tzi.org>
References: <5F695632-CF27-40FF-BC23-E731AAA95771@island-resort.com>
To: Laurence Lundblade <lgl@island-resort.com>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/4KGWvY5HUYVqKjSLcbqy47P5v5E>
Subject: Re: [Cbor] Tagging requirement
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Aug 2020 07:41:00 -0000

On 2020-08-17, at 07:09, Laurence Lundblade <lgl@island-resort.com> wrote:
> 
> My understanding is that the surrounding protocol that uses a tagged type dictates whether the tag should be present or not. For example, CWT dictates that tag 1 (epoch time) must not be used for the “exp”, “nbf” and “iat" claims.

Yes.  More specifically, the CWT protocol does not use Tag 1 for exp, nbf, iat, but instead borrows that Tag’s specification for its internal tag content.

> So the surrounding protocol can pick one of these three when it use a tagged type.

Yes.  The application protocol needs to define the application protocol.

> - Tag is FORBIDDEN (e.g., date fields in CWT). Decode error if tag is present.

That is not what the protocol says — the protocol says it’s a number.  The Tag is FORBIDDEN as much as an array of four byte strings would be FORBIDDEN.

(I’m going with your upper case language here; not sure that adds anything because it makes it seem these words are a menu to choose from.  They aren’t.  FORBIDDEN really means “does not use the Tag”.)

> - Tag is REQUIRED tags (e.g., COSE type tag when CWT tag is present). Error occurs if tag is absent, probably because item is not of the expected type.

This case is similar: The protocol says what it is.  The Tag may not be “REQUIRED" (there may be other choices for that position), but to get the semantics of the Tag, you have to provide the Tag.  (I.e., you only get the Tag if you do use the Tag.)

> - Tag is OPTIONAL (e.g., application/cwt and the CWT tag). Maybe an error, maybe not. 

That is indeed a special case that does not make a lot of sense *within* an application protocol, but is useful at its outer boundary: If you have context (such as a media type), you can use the unadorned data structure for this protocol, otherwise you use the tag.

> As far as I can see there isn’t a clear rule about which of these it is when the surrounding protocol doesn’t explicitly pick.

There is no rule about arrays of four byte strings, either.
The application protocol has to define the application protocol.

> Deterministic encoding does disallow OPTIONAL.

It says that this decision must be deterministic.  There is nothing wrong about OPTIONAL if all senders arrive at the same decision.

> If I were to guess as to which is the default, I would say it is OPTIONAL.

I have no idea what a “default" is here.  When would you get to exercise this default?  With all the language that argues against making Tags optional in an application protocol, why or how would this be the default for anything?

> Big floats and decimal fractions make use of the bignum type, but the definition of big floats and decimal fractions doesn’t explicitly say which of the three above it selects,

No, because Section 3.4.4 defines a couple of Tags.  So for the definition of the Tag, the Tag is “REQUIRED”, because you only have a tag if you have the tag.  Whether other protocols borrow the specification of the tag content (array of e, m) is not relevant for the definition of the Tag.

> so one might think it is OPTIONAL if one assumes that is the default of the three and that you could distinguish a bignum mantissa because it is a byte string not an integer.

Ah, that’s the confusion.

RFC 7049 predates RFC 8610, so we are not formally describing the tag content in RFC 7049.  RFC 7049 is clear that m and e are integers, but does not say which kind of integer (that may be a remnant of semantic validity).  Nowhere it says that if the mantissa is something else (like an array of four byte strings), try to interpret this by guessing tags that could be used to convert the mantissa into an integer.

RFC 8610 actually does the work of fully defining the structure of decfracs and bigfloats:

                  decfrac = #6.4([e10: int, m: integer])
                  bigfloat = #6.5([e2: int, m: integer])

(integer is defined here:

                  integer = int / bigint
                  bigint = biguint / bignint
                  biguint = #6.2(bstr)
                  bignint = #6.3(bstr)
                  int = uint / nint
                  uint = #0
                  nint = #1
)

> One second though, you realize you can’t tell the sign of the bignum unless there is a tag so it has to be REQUIRED to work.

Yes.
(More specifically, there are no Tag 2/3 bignums without tags, just as there are no Tag 1 datetimes without Tag 1.  In both cases, an application protocol could borrow the definition of the structure of the tag content — I could define a protocol that accepts a byte string in a specific position that is then interpreted as an integer by the application like it would be if it carried Tag 2.  Note that this borrowing is common enough that the CDDL operator ~ (unwrap) directly supports this, but it is entirely distinct from using the Tag.)

> My thought is that OPTIONAL adds no value.  It is a form of “be liberal in what you receive” that seems to be less in favor these days. When you nest tagged type usage with OPTIONAL, you can get a larger fan out.  

Don’t do that then [1].

> However, OPTIONAL is allowed in 7049 and things like the application/cwt media type registry use it.

What do you mean by “is allowed”?
The application protocol can indeed define (or use) a Tag that can be used to identify a data item if the identification does not come from the context.
If it doe,s the tag doesn’t add value.  Application/cwt could have been defined such that it never includes the CWT Tag (i.e., the COSE tag is outermost); I don’t have a strong opinion on whether it should have done that.

None of this is something that could be defined for Tags in general; the application protocol (here: RFC 8392) needs to make the call.  The media type application/cwt now includes the unneeded choice; an application might still make further restrictions (always tag 61, never tag 61).

> My thought is that it would be helpful to say somewhere that users of tagged types should explicitly say what the tagging requirement is and should avoid OPTIONAL tagging.

There is nothing wrong with what you call “OPTIONAL”.  You wouldn’t use it on the inside of an application protocol.  You also would specific whether you want a Tag or not in all cases where the application protocol is embedded into another application protocol.

Tag 55799 is the prototype of all these “signature tags” — you only use it if you don’t have other context to detect that the byte sequence you have in hand is an encoded CBOR data item.
An application protocol does not allow or specify this tag unless it allows or specifies it.

Indeed, maybe we should say so in notable-tags.
I’ll draft some text.

> (I came to this trying to write code for tag handling for big float, EAT, CWT, COSE and such. I’m trying not to be annoying, but also think this is an issue :-) )

It becomes an issue if you somehow associate each Tag with your three labels FORBIDDEN, REQUIRED, OPTIONAL.  Don’t to that then...

Grüße, Carsten

[1]: http://www.catb.org/~esr/jargon/html/D/Don-t-do-that-then-.html