Re: [Cbor] Tagging requirement

Laurence Lundblade <lgl@island-resort.com> Mon, 17 August 2020 17:56 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 913B73A12CC for <cbor@ietfa.amsl.com>; Mon, 17 Aug 2020 10:56:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.894
X-Spam-Level:
X-Spam-Status: No, score=-1.894 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mh5Vl78L3EKQ for <cbor@ietfa.amsl.com>; Mon, 17 Aug 2020 10:56:23 -0700 (PDT)
Received: from p3plsmtpa06-06.prod.phx3.secureserver.net (p3plsmtpa06-06.prod.phx3.secureserver.net [173.201.192.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F20B23A12CE for <cbor@ietf.org>; Mon, 17 Aug 2020 10:56:22 -0700 (PDT)
Received: from [192.168.1.49] ([76.167.193.86]) by :SMTPAUTH: with ESMTPA id 7jMrkaiKccPrZ7jMskLogg; Mon, 17 Aug 2020 10:56:22 -0700
X-CMAE-Analysis: v=2.3 cv=K7hc4BeI c=1 sm=1 tr=0 a=t2DvPg6iSvRzsOFYbaV4uQ==:117 a=t2DvPg6iSvRzsOFYbaV4uQ==:17 a=gKmFwSsBAAAA:8 a=K6EGIJCdAAAA:8 a=LXL8el5rAAAA:8 a=9XquhrdV3ppfDJvA2GUA:9 a=SEwbdsqeENNFMSxp:21 a=xGOqiz8-qcL-fP7d:21 a=QEXdDO2ut3YA:10 a=YdbchcOiJdQA:10 a=TYPw6LfZSljyupAYXHcA:9 a=OXClMn2VRH8Lrw3V:21 a=Hr5-kW3BoJSU9MxF:21 a=ddrMlxdIdRYvZFYJ:21 a=_W_S_7VecoQA:10 a=nnPW6aIcBuj1ljLj_o6Q:22 a=L6pVIi0Kn1GYQfi8-iRI:22 a=VxrG4A_UoVp_6Idp4yW2:22
X-SECURESERVER-ACCT: lgl@island-resort.com
From: Laurence Lundblade <lgl@island-resort.com>
Message-Id: <D8A304BA-897A-46D2-9B67-4FF458883478@island-resort.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_66686154-B009-4DF4-B27C-737B5BBEC44B"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
Date: Mon, 17 Aug 2020 10:56:21 -0700
In-Reply-To: <895A3DF8-DF11-479F-9DC6-9EF98465A7E0@tzi.org>
Cc: cbor@ietf.org
To: Carsten Bormann <cabo@tzi.org>
References: <5F695632-CF27-40FF-BC23-E731AAA95771@island-resort.com> <895A3DF8-DF11-479F-9DC6-9EF98465A7E0@tzi.org>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
X-CMAE-Envelope: MS4wfFn7FDFC5h5ZJfVIS4lQo7aFmRo8AEsSekwwYgg8CzyIjU413r7rjF4GepAomd0wPMhPa9EjvlyANFPq4dFKU2LR7JofD8k8TIkE0ACuizyq9HPK/9XO Ljy4G53X9/9rz1VS1m2gaH0InwpZKtfJS5oQqLRWRqizPjnr4HssN9t7foNvzLtCgWucyGZRGi4S7A==
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/A1aOWu11aI1xTxC7G9mC8Z1_t0w>
Subject: Re: [Cbor] Tagging requirement
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Aug 2020 17:56:26 -0000

I have been thinking of the standardized tag definitions as data type definitions first and the assignment and use of a tag to indicate the type as second.

So a protocol designer decides to use a type, say an epoch date, a CWT or such first. Then, second, the designer decides how to identify it as that type in the protocol. The second step is the choice between FORBIDDEN, REQUIRED or OPTIONAL. The “default" I’m talking about is when the protocol designer says the type is in use, but doesn’t say how its type is indicated.

8392 more or less proceeds like this in the definition of NumericDate. First it references 2.4.1 of 7049. Second, it says the tag MUST be omitted (FORBIDDEN).


You used the term “borrow” below to say what CWT does for NumericDate.  We could describe it like this:
   BORROW means use the data type without the type 6 tag
   USE means a type 6 tag and the content as defined

Then USE maps to my REQUIRED and BORROW maps to my FORBIDDEN and OPTIONAL is when the protocol permits either BORROW or USE.

BORROW and USE seem like an OK way to characterize too, but...


At this point the word “tag” is confusing to me.  When you say “use Tag 1” you seem to mean encode a type 6 integer with value 1 followed by the content which is a number. Unless explicitly allowed, the type 6 integer and number content are inseparable. When you say USE tag 1, it means both the type 6 integer and the number content. By contrast, when CWT refers to the “CWT tag” it is just referring to the type 6 integer.


Maybe this is it?  CWT is defined first as a *protocol* not a tag. It is not necessarily part of a CBOR structure. CWT then it says it can be tagged as one means to identify it. You never need to BORROW the definition of it. By contrast bignum and such are defined as tags first so you have to explicitly BORROW if you want to use them without the type 6 indicating integer. Bignums and friends are assume to be part of a CBOR structure.

With this distinction made manifest, it starts to make sense.

Unless explicitly stated bignums, big floats and such always have the type 6 integer because they are defined in terms of it.

Protocols like CWT each state how they wish to be identified and tagging may be one way.

You have to know the nature of a particular tag definition to know whether the type 6 integer is required or not.

Section 6 of 8392 implies some mutual exclusion between the user of application/cwt and #6.61 to identify a CWT, but it is only implied. In another thread Jim thought it was OK to have both application/cwt and #6.61 on the same CWT even though they are redundant.

A few more comments below.


> On Aug 17, 2020, at 12:40 AM, Carsten Bormann <cabo@tzi.org> wrote:
> 
> On 2020-08-17, at 07:09, Laurence Lundblade <lgl@island-resort.com> wrote:
>> 
>> My understanding is that the surrounding protocol that uses a tagged type dictates whether the tag should be present or not. For example, CWT dictates that tag 1 (epoch time) must not be used for the “exp”, “nbf” and “iat" claims.
> 
> Yes.  More specifically, the CWT protocol does not use Tag 1 for exp, nbf, iat, but instead borrows that Tag’s specification for its internal tag content.
> 
>> So the surrounding protocol can pick one of these three when it use a tagged type.
> 
> Yes.  The application protocol needs to define the application protocol.
> 
>> - Tag is FORBIDDEN (e.g., date fields in CWT). Decode error if tag is present.
> 
> That is not what the protocol says — the protocol says it’s a number.  The Tag is FORBIDDEN as much as an array of four byte strings would be FORBIDDEN.
> 
> (I’m going with your upper case language here; not sure that adds anything because it makes it seem these words are a menu to choose from.  They aren’t.  FORBIDDEN really means “does not use the Tag”.)
> 
>> - Tag is REQUIRED tags (e.g., COSE type tag when CWT tag is present). Error occurs if tag is absent, probably because item is not of the expected type.
> 
> This case is similar: The protocol says what it is.  The Tag may not be “REQUIRED" (there may be other choices for that position), but to get the semantics of the Tag, you have to provide the Tag.  (I.e., you only get the Tag if you do use the Tag.)
> 
>> - Tag is OPTIONAL (e.g., application/cwt and the CWT tag). Maybe an error, maybe not. 
> 
> That is indeed a special case that does not make a lot of sense *within* an application protocol, but is useful at its outer boundary: If you have context (such as a media type), you can use the unadorned data structure for this protocol, otherwise you use the tag.
> 
>> As far as I can see there isn’t a clear rule about which of these it is when the surrounding protocol doesn’t explicitly pick.
> 
> There is no rule about arrays of four byte strings, either.
> The application protocol has to define the application protocol.
> 
>> Deterministic encoding does disallow OPTIONAL.
> 
> It says that this decision must be deterministic.  There is nothing wrong about OPTIONAL if all senders arrive at the same decision.
> 
>> If I were to guess as to which is the default, I would say it is OPTIONAL.
> 
> I have no idea what a “default" is here.  When would you get to exercise this default?  With all the language that argues against making Tags optional in an application protocol, why or how would this be the default for anything?
> 
>> Big floats and decimal fractions make use of the bignum type, but the definition of big floats and decimal fractions doesn’t explicitly say which of the three above it selects,
> 
> No, because Section 3.4.4 defines a couple of Tags.  So for the definition of the Tag, the Tag is “REQUIRED”, because you only have a tag if you have the tag.  Whether other protocols borrow the specification of the tag content (array of e, m) is not relevant for the definition of the Tag.
> 
>> so one might think it is OPTIONAL if one assumes that is the default of the three and that you could distinguish a bignum mantissa because it is a byte string not an integer.
> 
> Ah, that’s the confusion.
> 
> RFC 7049 predates RFC 8610, so we are not formally describing the tag content in RFC 7049.  RFC 7049 is clear that m and e are integers, but does not say which kind of integer (that may be a remnant of semantic validity).  Nowhere it says that if the mantissa is something else (like an array of four byte strings), try to interpret this by guessing tags that could be used to convert the mantissa into an integer.
> 
> RFC 8610 actually does the work of fully defining the structure of decfracs and bigfloats:
> 
>                  decfrac = #6.4([e10: int, m: integer])
>                  bigfloat = #6.5([e2: int, m: integer])
> 
> (integer is defined here:
> 
>                  integer = int / bigint
>                  bigint = biguint / bignint
>                  biguint = #6.2(bstr)
>                  bignint = #6.3(bstr)
>                  int = uint / nint
>                  uint = #0
>                  nint = #1
> )

But the definition of big floats and decimal fractions should stand on their own without 8610.


> 
>> One second though, you realize you can’t tell the sign of the bignum unless there is a tag so it has to be REQUIRED to work.
> 
> Yes.
> (More specifically, there are no Tag 2/3 bignums without tags, just as there are no Tag 1 datetimes without Tag 1.  In both cases, an application protocol could borrow the definition of the structure of the tag content — I could define a protocol that accepts a byte string in a specific position that is then interpreted as an integer by the application like it would be if it carried Tag 2.  Note that this borrowing is common enough that the CDDL operator ~ (unwrap) directly supports this, but it is entirely distinct from using the Tag.)
> 
>> My thought is that OPTIONAL adds no value.  It is a form of “be liberal in what you receive” that seems to be less in favor these days. When you nest tagged type usage with OPTIONAL, you can get a larger fan out.  
> 
> Don’t do that then [1].

The point is to help other not do that.

> 
>> However, OPTIONAL is allowed in 7049 and things like the application/cwt media type registry use it.
> 
> What do you mean by “is allowed”?
> The application protocol can indeed define (or use) a Tag that can be used to identify a data item if the identification does not come from the context.
> If it doe,s the tag doesn’t add value.  Application/cwt could have been defined such that it never includes the CWT Tag (i.e., the COSE tag is outermost); I don’t have a strong opinion on whether it should have done that.
> 
> None of this is something that could be defined for Tags in general; the application protocol (here: RFC 8392) needs to make the call.  The media type application/cwt now includes the unneeded choice; an application might still make further restrictions (always tag 61, never tag 61).
> 
>> My thought is that it would be helpful to say somewhere that users of tagged types should explicitly say what the tagging requirement is and should avoid OPTIONAL tagging.
> 
> There is nothing wrong with what you call “OPTIONAL”.  You wouldn’t use it on the inside of an application protocol.  You also would specific whether you want a Tag or not in all cases where the application protocol is embedded into another application protocol.
> 
> Tag 55799 is the prototype of all these “signature tags” — you only use it if you don’t have other context to detect that the byte sequence you have in hand is an encoded CBOR data item.
> An application protocol does not allow or specify this tag unless it allows or specifies it.
> 
> Indeed, maybe we should say so in notable-tags.
> I’ll draft some text.
> 
>> (I came to this trying to write code for tag handling for big float, EAT, CWT, COSE and such. I’m trying not to be annoying, but also think this is an issue :-) )
> 
> It becomes an issue if you somehow associate each Tag with your three labels FORBIDDEN, REQUIRED, OPTIONAL.  Don’t to that then...

The association is not with a tag, it is with an item in a protocol. The APIs are to decode a protocol item as a bignum, an epoch date, a CWT, a COSE and such and the API wants to know if the tag must be present, must be absent or can be either.

LL


> 
> Grüße, Carsten
> 
> [1]: http://www.catb.org/~esr/jargon/html/D/Don-t-do-that-then-.html
>