Re: [Cbor] 7049bis: The concept of "optional tagging" is not really used in practice #126

Laurence Lundblade <lgl@island-resort.com> Sun, 03 November 2019 16:45 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A3D79120047 for <cbor@ietfa.amsl.com>; Sun, 3 Nov 2019 08:45:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dgODpRpJmM5t for <cbor@ietfa.amsl.com>; Sun, 3 Nov 2019 08:45:35 -0800 (PST)
Received: from p3plsmtpa06-04.prod.phx3.secureserver.net (p3plsmtpa06-04.prod.phx3.secureserver.net [173.201.192.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B9977120044 for <cbor@ietf.org>; Sun, 3 Nov 2019 08:45:35 -0800 (PST)
Received: from [10.122.0.58] ([45.56.150.85]) by :SMTPAUTH: with ESMTPA id RJ0PiHLU3r3FgRJ0Qid1OJ; Sun, 03 Nov 2019 09:45:34 -0700
From: Laurence Lundblade <lgl@island-resort.com>
Message-Id: <87889E65-0152-455A-A6B7-C5F336DC97A4@island-resort.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_AB45DAC2-3C57-4534-8595-68F9EF3AFDFE"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Sun, 03 Nov 2019 08:45:33 -0800
In-Reply-To: <ed45e995-1858-3169-1be6-0cce5ce37ce3@imt-atlantique.fr>
Cc: cbor@ietf.org
To: Christophe Lohr <christophe.lohr@imt-atlantique.fr>
References: <92400DAA-A713-4905-A721-34B138E25192@tzi.org> <ed45e995-1858-3169-1be6-0cce5ce37ce3@imt-atlantique.fr>
X-Mailer: Apple Mail (2.3445.104.11)
X-CMAE-Envelope: MS4wfN0zjrWB0HLsoiSX9glYyxtha7aBlSgwZ1EXFOj0AmrKR7p0X2grl0UkVXvLNn6kgd+OdBoYEDPfXDMx7cXGfMXynJQZmcVU238UQ7cs2iAtCTarL7FY ktaGQX2nwAXcvs/8PfMZ5i2diwxBlp2dTV31S1rdJTd7QP9rm/Tk/mu2x03bCK2K+FIGSo+aKEZ2aQXAxq/EueLpx0Kgck2KDRU=
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/Hz7VjeBab9DxPas9E5_KfOmZwN0>
Subject: Re: [Cbor] 7049bis: The concept of "optional tagging" is not really used in practice #126
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Nov 2019 16:45:39 -0000

I’m not really a data structure scientist or such, but I think I can see Christophe’s point. 

Maybe CBOR-based (and JSON-based) protocols don’t have a formal schema language, but these protocols rely on ordering and such. For example in a COSE_Sign1 it is expected that the first data item is the protected headers, the second the unprotected headers, the third the payload and the fourth the signature. I don’t think you can call them self-describing.

It seems like CBOR and JSON say “no schema’” to distance from the horror of XML schemas, but in reality CDDL and prose protocol specs are schemas in spirit.

Maybe a key question here is whether you can say in CDDL “this next item must always be interpreted as a date even though it will never have a date tag”. If CDDL doesn’t have than, then you can’t describe some CBOR-protocols with it. CWT would be one of those protocols as it forbids adding the tag to dates.

To summarize what I understand about tagging:

The designer of a new CBOR data item type like a date format will generally register a tag for it. These new data types can be really simple, like epoch dates or really complex like COSE_Sign1.

The designer of a protocol using a new data type will indicate in their protocol for each occurrence of it whether the tag must be present or not (never saying the tag may or may not be present). The designer will typically require the tag only when necessary to disambiguate the type of the data item.

The implementor of a general purpose library to generate one of these new data item types must give the caller the option to include or not include the tag. Maybe this is just by never automatically outputting the tag and having a distinct output tag function.

The implementor of a general purpose library to decode one of these new data types must allow the caller to say that the next data item should be decoded as this new data type whether or not it is tagged. Maybe it even errors out if it is tagged for the cases where the protocol document says no tag should be used.

What I don’t know is whether CDDL can describe all this desired behavior.

LL




> On Oct 24, 2019, at 1:50 AM, Christophe Lohr <christophe.lohr@imt-atlantique.fr> wrote:
> 
> On 23/10/2019 13:38, Carsten Bormann wrote:
>> Section 3.4 talks about "optional tagging" as a secondary purpose of tags. But in today's CBOR protocols, tags are rarely "optional" in the sense that they can simply be left out without a change in semantics, as 3.4 para 3 implies.
>> 
>> This concept comes up again in 4.2.2, where "optional tagging" is outlawed in deterministic encoding (but then the text goes on to explain that protocols might choose to retain tags, but doesn't say why).
> 
> To be honest, I don't really understand how much optional are tags.
> 
> A CDD rule with tags matchs cbor items with tags and reject cbor items
> without tags. Tags are not optional from the data-model point of view.
> 
> 
> Moreover, please consider this CDDL objective:
> (https://tools.ietf.org/html/rfc7049#section-1.1)
> 
>    3.  Data must be able to be decoded without a schema description.
>        *  Similar to JSON, encoded data should be self-describing so
>           that a generic decoder can be written.
> 
> 
> Well, how to do this without putting tags everywhere for everything?
> (Or I need more explanation about what is "self-describing" and what is
> a "schema description")
> 
> Let say I receive data. How may I know that this number is a temperature
> and not a distance, and that this byte-string is an uuid and not a small
> picture?
> 
> The first way is to have a schema (written or not): That is to say a
> certain preliminary knowledge of expected data which tell me that this
> number at this place or associated to this map-key is a temperature.
> The second way is to decorate data with tags, all data.
> A third way is a compromise between the two first ones: I have a certain
> level of preliminary knoledge of what data are (a kind of schema
> description), with possibly some missing parts that are filled by tags.
> 
> But the only way to decode data _without_ a schema description is to
> have tags everywhere for everything.
> Surprisingly, json has no tags and is claimed to be self-describing. Is
> it really? I'm lost.
> 
> My feeling is that this objective CBOR should be not so demanding.
> 
> Best regards,
> Christophe
> 
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor