Re: [Cbor] RFC7049bis processing of unknown tags

Laurence Lundblade <lgl@island-resort.com> Thu, 07 May 2020 17:57 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E81E83A081F for <cbor@ietfa.amsl.com>; Thu, 7 May 2020 10:57:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.894
X-Spam-Level:
X-Spam-Status: No, score=-1.894 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5cPwTknBk6jL for <cbor@ietfa.amsl.com>; Thu, 7 May 2020 10:57:03 -0700 (PDT)
Received: from p3plsmtpa07-05.prod.phx3.secureserver.net (p3plsmtpa07-05.prod.phx3.secureserver.net [173.201.192.234]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 927713A0816 for <cbor@ietf.org>; Thu, 7 May 2020 10:57:03 -0700 (PDT)
Received: from [192.168.1.34] ([76.167.193.86]) by :SMTPAUTH: with ESMTPA id WklajQIMZnixGWklajo7Ok; Thu, 07 May 2020 10:57:03 -0700
X-CMAE-Analysis: v=2.3 cv=EId4LGRC c=1 sm=1 tr=0 a=t2DvPg6iSvRzsOFYbaV4uQ==:117 a=t2DvPg6iSvRzsOFYbaV4uQ==:17 a=IkcTkHD0fZMA:10 a=gKmFwSsBAAAA:8 a=l70xHGcnAAAA:8 a=oGbIuagOAAAA:20 a=48vgC7mUAAAA:8 a=MyrpCqbCbTeHI8NfTcAA:9 a=glFH8JdS2aZt9yrC:21 a=w6Ru0WYD7LVBO0l3:21 a=QEXdDO2ut3YA:10 a=nnPW6aIcBuj1ljLj_o6Q:22 a=JtN_ecm89k2WOvw5-HMO:22 a=w1C3t2QeGrPiZgrLijVG:22
X-SECURESERVER-ACCT: lgl@island-resort.com
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
From: Laurence Lundblade <lgl@island-resort.com>
In-Reply-To: <38BB6FFF-737F-4C11-AD7A-DA3F28A9F570@tzi.org>
Date: Thu, 07 May 2020 10:57:02 -0700
Cc: Michael Richardson <mcr+ietf@sandelman.ca>, cbor@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <B62A7615-499C-4EFE-9E14-08830D3F0397@island-resort.com>
References: <17300.1588779159@localhost> <38BB6FFF-737F-4C11-AD7A-DA3F28A9F570@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
X-CMAE-Envelope: MS4wfKmPXPz+wSjpthlnONPdwBLvr5NxuPbhiu9INDWHcVgW6gyzb+QtqTfCecV29CUct/wlu0tk6kOnO6s1bK4J2Fjdhgws3xEJz8p9U45jm+Gs7tUh2tJd vpmO2NfBKw4NiscNP/+wj0JAKITsiBE1POeF5WTq38IMOfXS7x/atDMi2Spb8dV7mgGxpHCSOphodrFt4UrDnBYqqbjITjrBRuZFjRWy/Qcx5vZkWi1igB+s
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/IFSrEhwSrYrvhMd2Kx1POd6eRW8>
Subject: Re: [Cbor] RFC7049bis processing of unknown tags
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 May 2020 17:57:06 -0000

The upshot seems to be, that if you put a tag into encoded CBOR you better really mean it, because you can’t count on it being ignored by apps and/or decoders that don’t understand it. Tags are not like HTTP or RFC 822 headers where unknowns are explicitly to be ignored.

Looking through the registered tags, most define a new data type. In most cases decoders can’t ignore the new type and successfully implement the protocol using it.

Also, in the registry, just a few tags are conversion hints. These seem like the ones to ignore because 1) they are hints and 2) the decoder might not be converting them and 3) applicability is only in certain operating environments.

However, there is no way to generically know if a tag is a hint, particularly for newly defined tags, so decoders and apps can’t be counted on to ignore even hints and it comes back to the same. One can’t expect tags to be ignored, so don’t put them in unless they are really necessary.  Probably a good thing to discourage rampant use of gratuitous tagging.

LL




> On May 6, 2020, at 9:06 AM, Carsten Bormann <cabo@tzi.org> wrote:
> 
> Hi Michael,
> 
> Thank you for your interjection at the CBOR interim.
> I think this is a good issue to roll up.
> 
> On 2020-05-06, at 17:32, Michael Richardson <mcr+ietf@sandelman.ca> wrote:
>> 
>> 
>> After discussion about #176/#181 I submitted #182:
>> 
>> https://github.com/cbor-wg/CBORbis/issues/182
>> 
>> RFC7049 specified that CBOR tags which were not recognized should be ignored.
> 
> That is not exactly what it says:
> 
> (3.5)
>   A decoder that comes across a tag (Section 2.4) that it does not
>   recognize, such as a tag that was added to the IANA registry after
>   the decoder was deployed or a tag that the decoder chose not to
>   implement, might issue a warning, might stop processing altogether,
>   might handle the error and present the unknown tag value together
>   with the contained data item to the application (as is expected of
>   generic decoders), might ignore the tag and simply present the
>   contained data item only to the application, or take some other type
>   of action.
> 
> So there always was a choice.
> Note that there always was a preference to present the unknown tag to the application, for “generic decoders”; i.e., the choice is more for application-specific decoders.
> 
>> RFC7049bis wishes to change this behaviour such that unknown tags would not
>> be ignored, but would at least, be presented to the application for further
>> determination. This is a change that would render existing CBOR parsers
>> instantly invalid.
> 
> A change that would remove the choice is not what was intended so far, just increased emphasis on the options:
> 
> — 1 might issue a warning, 
> — 2 might stop processing altogether,
> — 3 might handle the error and present the unknown tag value together
>   with the contained data item to the application (as is expected of
>   generic decoders), 
> 
> as opposed to
> 
> — 4 might ignore the tag and simply present the
>   contained data item only to the application, or 
> — 5 take some other type
>   of action.
> 
> Generally, it is a good idea if users of a library know what they will get, so the behavior to be expected needs to be documented.  A generic decoder that only does 3 (as is “expected”) will be the most interoperable one.
> 
> There are two pieces of text in 7049bis that may not entirely be aligned:
> 
> (3.4:)
>   Decoders do not need to understand tags of every tag number, and tags
>   may be of little value in applications where the implementation
>   creating a particular CBOR data item and the implementation decoding
>   that stream know the semantic meaning of each item in the data flow.
>   Their primary purpose in this specification is to define common data
>   types such as dates.  A secondary purpose is to provide conversion
>   hints when it is foreseen that the CBOR data item needs to be
>   translated into a different format, requiring hints about the content
>   of items.  Understanding the semantics of tags is optional for a
>   decoder; it can simply present both the tag number and the tag
>   content to the application, without interpreting the additional
>   semantics of the tag.
> 
> But also:
> 
> (7.1:)
>   CBOR has three major extension points:
> 
>   […]
> 
>   *  the "tag" space (values in major type 6).  Again, only a small
>      part of the codepoint space has been allocated, and the space is
>      abundant (although the early numbers are more efficient than the
>      later ones).  Implementations receiving an unknown tag number can
>      choose to simply ignore it (process just the enclosed tag content)
>      or to process it as an unknown tag number wrapping the tag
>      content.  The IANA registry in Section 9.2 is the appropriate way
>      to address the extensibility of this codepoint space.
> 
>> The suggestion is that parsers should be in RFC7049 mode by default,
> 
> But there is no RFC 7049 mode — there is a choice.
> 
>> and
>> applications that want RFC7049bis behaviour should initialize the parser with
>> an option that enables it [or use a new parser with awareness].
> 
> It definitely is a good idea to either always get behavior 3, or to provide flags that control the behavior.
> 
>> Applications that want to make use of tags defined in RFC7049bis need to put
>> the parser in RFC7049bis mode.
> 
> RFC7049bis does not define any new tags (so far).
> 
>> I think that Carsten does not agree with my suggested solution, but I'm not
>> attached to it.
> 
> I hope I have explained how the situation is a bit more nuanced than might have come over in the meeting.
> 
> Grüße, Carsten
> 
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor