Re: [Cbor] 7049bis: The concept of "optional tagging" is not really used in practice #126

Carsten Bormann <cabo@tzi.org> Sun, 03 November 2019 21:04 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1FCA61200CE for <cbor@ietfa.amsl.com>; Sun, 3 Nov 2019 13:04:05 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kaesINT_yrjz for <cbor@ietfa.amsl.com>; Sun, 3 Nov 2019 13:04:02 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ECA051200C4 for <cbor@ietf.org>; Sun, 3 Nov 2019 13:04:01 -0800 (PST)
Received: from [192.168.217.102] (p548DC893.dip0.t-ipconnect.de [84.141.200.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 475pM43PhjzyWd; Sun, 3 Nov 2019 22:04:00 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <06592ab7-f3cb-1a59-1b32-ffba3194162c@imt-atlantique.fr>
Date: Sun, 03 Nov 2019 22:03:59 +0100
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 594507838.0811599-f7a75664086a1285a6bde5f772ba8956
Content-Transfer-Encoding: quoted-printable
Message-Id: <B6CB0130-87F9-4DF0-A55B-07AFAC247823@tzi.org>
References: <92400DAA-A713-4905-A721-34B138E25192@tzi.org> <ed45e995-1858-3169-1be6-0cce5ce37ce3@imt-atlantique.fr> <87889E65-0152-455A-A6B7-C5F336DC97A4@island-resort.com> <06592ab7-f3cb-1a59-1b32-ffba3194162c@imt-atlantique.fr>
To: Christophe Lohr <christophe.lohr@imt-atlantique.fr>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/TnylwXS0MJb-Mw2i_NKmEmLSJvw>
Subject: Re: [Cbor] 7049bis: The concept of "optional tagging" is not really used in practice #126
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Nov 2019 21:04:05 -0000

> 
> However, I read "schema description" from the /semantic/ point of view:
> a description which explains the meaning of data items.  

Right.  “Schema” probably is one of the most misused terms in this space.
(That’s why CDDL is called a “data definition language”.)

> The IP RFC not
> only tells that the first 4bits are an unsigned int, it also tells that
> this number is the protocol version. 
> CBOR (neither JSON) can't tells this by itself, except if one defines a
> TAG for this.

Tags (we prefer to write simple English words in lower case) tell you how to interpret an enclosed item with different (additional) data semantics.  So a tag with number 1 tells you the enclosed number really is to be interpreted as a POSIX epoch-based date.

> So, the next question is: “is there some guidelines for using TAGs?"
> 
> Well, it's probably too early. One may have to wait that CBOR usages
> grow in maturity.

CBOR has been around for half a decade now, so I think we have a pretty good comprehension now of when to use tags.

> What should I decide for my system design regarding CBOR TAGs?

➔ Use tags when they are useful.

There are no general guidelines like the ones you propose below, because the usefulness depends on the specific context.

> Shall I:
> - prohibit TAGs since this is redundant with other parts of my design
> specifications (which already explicit the meaning of each field); or

If you have a relatively rigid data shape (“schema” in the usual structural sense), you may indeed not need tags, because you can infer an alternative interpretation from structure (e.g., field names in a map used as a struct, position in a record, etc.).  They may still be useful when you want to express a choice, e.g., if you want to support both epoch-based and text-based dates, use Tag 0 or Tag 1.  Another example is integers: If you expect to interchange integers that might not fit into 64 bits, use a choice between a built-in integer (major types 0 and 1) and a tag 2/3:

                  uint = #0
                  nint = #1
                  int = uint / nint
                  biguint = #6.2(bstr)
                  bignint = #6.3(bstr)
                  bigint = biguint / bignint
                  integer = int / bigint


> - put TAGs everywhere for everything because TAGs bring semantic to data; or

"Everywhere” I don’t know.  But if you expect your implementations to rely on generic decoders/encoders doing the work, using tags may be a labor-saving device.  This is particularly useful when CBOR is used for general serialization in a programming environment (where you may not have a hard and fast data definition with your data).

> - add TAGs to some fields and not to others (which ones and why?)

Yes.  Only add them when they are useful.  To express a choice, and/or to have the generic codec do the work.

Grüße, Carsten