Re: [Cbor] RFC7049bis processing of unknown tags

Carsten Bormann <cabo@tzi.org> Wed, 06 May 2020 16:06 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DFEAA3A0B6A for <cbor@ietfa.amsl.com>; Wed, 6 May 2020 09:06:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Et1S3X32v9Do for <cbor@ietfa.amsl.com>; Wed, 6 May 2020 09:06:52 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 632CE3A0B69 for <cbor@ietf.org>; Wed, 6 May 2020 09:06:40 -0700 (PDT)
Received: from [172.16.42.112] (p548DCD70.dip0.t-ipconnect.de [84.141.205.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 49HM0Z4M5RzyT3; Wed, 6 May 2020 18:06:38 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <17300.1588779159@localhost>
Date: Wed, 06 May 2020 18:06:37 +0200
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 610473997.474916-3a1970a50c17f5d0ab9558dc011b677e
Content-Transfer-Encoding: quoted-printable
Message-Id: <38BB6FFF-737F-4C11-AD7A-DA3F28A9F570@tzi.org>
References: <17300.1588779159@localhost>
To: Michael Richardson <mcr+ietf@sandelman.ca>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/rxEiPJSYMtapNtMICgaDDsOCp_w>
Subject: Re: [Cbor] RFC7049bis processing of unknown tags
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 May 2020 16:07:00 -0000

Hi Michael,

Thank you for your interjection at the CBOR interim.
I think this is a good issue to roll up.

On 2020-05-06, at 17:32, Michael Richardson <mcr+ietf@sandelman.ca> wrote:
> 
> 
> After discussion about #176/#181 I submitted #182:
> 
> https://github.com/cbor-wg/CBORbis/issues/182
> 
> RFC7049 specified that CBOR tags which were not recognized should be ignored.

That is not exactly what it says:

(3.5)
   A decoder that comes across a tag (Section 2.4) that it does not
   recognize, such as a tag that was added to the IANA registry after
   the decoder was deployed or a tag that the decoder chose not to
   implement, might issue a warning, might stop processing altogether,
   might handle the error and present the unknown tag value together
   with the contained data item to the application (as is expected of
   generic decoders), might ignore the tag and simply present the
   contained data item only to the application, or take some other type
   of action.

So there always was a choice.
Note that there always was a preference to present the unknown tag to the application, for “generic decoders”; i.e., the choice is more for application-specific decoders.

> RFC7049bis wishes to change this behaviour such that unknown tags would not
> be ignored, but would at least, be presented to the application for further
> determination. This is a change that would render existing CBOR parsers
> instantly invalid.

A change that would remove the choice is not what was intended so far, just increased emphasis on the options:

— 1 might issue a warning, 
— 2 might stop processing altogether,
— 3 might handle the error and present the unknown tag value together
   with the contained data item to the application (as is expected of
   generic decoders), 

as opposed to

— 4 might ignore the tag and simply present the
   contained data item only to the application, or 
— 5 take some other type
   of action.

Generally, it is a good idea if users of a library know what they will get, so the behavior to be expected needs to be documented.  A generic decoder that only does 3 (as is “expected”) will be the most interoperable one.

There are two pieces of text in 7049bis that may not entirely be aligned:

(3.4:)
   Decoders do not need to understand tags of every tag number, and tags
   may be of little value in applications where the implementation
   creating a particular CBOR data item and the implementation decoding
   that stream know the semantic meaning of each item in the data flow.
   Their primary purpose in this specification is to define common data
   types such as dates.  A secondary purpose is to provide conversion
   hints when it is foreseen that the CBOR data item needs to be
   translated into a different format, requiring hints about the content
   of items.  Understanding the semantics of tags is optional for a
   decoder; it can simply present both the tag number and the tag
   content to the application, without interpreting the additional
   semantics of the tag.

But also:

(7.1:)
   CBOR has three major extension points:

   […]

   *  the "tag" space (values in major type 6).  Again, only a small
      part of the codepoint space has been allocated, and the space is
      abundant (although the early numbers are more efficient than the
      later ones).  Implementations receiving an unknown tag number can
      choose to simply ignore it (process just the enclosed tag content)
      or to process it as an unknown tag number wrapping the tag
      content.  The IANA registry in Section 9.2 is the appropriate way
      to address the extensibility of this codepoint space.

> The suggestion is that parsers should be in RFC7049 mode by default,

But there is no RFC 7049 mode — there is a choice.

> and
> applications that want RFC7049bis behaviour should initialize the parser with
> an option that enables it [or use a new parser with awareness].

It definitely is a good idea to either always get behavior 3, or to provide flags that control the behavior.

> Applications that want to make use of tags defined in RFC7049bis need to put
> the parser in RFC7049bis mode.

RFC7049bis does not define any new tags (so far).

> I think that Carsten does not agree with my suggested solution, but I'm not
> attached to it.

I hope I have explained how the situation is a bit more nuanced than might have come over in the meeting.

Grüße, Carsten