Re: [Cbor] 7049bis: The concept of "optional tagging" is not really used in practice #126

From: CBOR <cbor-bounces@ietf.org> On Behalf Of Laurence Lundblade
Sent: Sunday, November 3, 2019 8:46 AM
To: Christophe Lohr <christophe.lohr@imt-atlantique.fr>
Cc: cbor@ietf.org
Subject: Re: [Cbor] 7049bis: The concept of "optional tagging" is not really used in practice #126

I’m not really a data structure scientist or such, but I think I can see Christophe’s point. 

Maybe CBOR-based (and JSON-based) protocols don’t have a formal schema language, but these protocols rely on ordering and such. For example in a COSE_Sign1 it is expected that the first data item is the protected headers, the second the unprotected headers, the third the payload and the fourth the signature. I don’t think you can call them self-describing.

[JLS] I consider the term self-describing to be a completely different beast.  The data can be parsed without the knowledge of the schema.  This is not a true statement with ASN.1 where a data description label, such as this is a SEQUENCE can be replace with a tag and the parser without the schema has no idea what the data type and structure of the data type is supposed to be.  This is not an issue for XML either as far as I know.

Jim

It seems like CBOR and JSON say “no schema’” to distance from the horror of XML schemas, but in reality CDDL and prose protocol specs are schemas in spirit.

Maybe a key question here is whether you can say in CDDL “this next item must always be interpreted as a date even though it will never have a date tag”. If CDDL doesn’t have than, then you can’t describe some CBOR-protocols with it. CWT would be one of those protocols as it forbids adding the tag to dates.

To summarize what I understand about tagging:

The designer of a new CBOR data item type like a date format will generally register a tag for it. These new data types can be really simple, like epoch dates or really complex like COSE_Sign1.

The designer of a protocol using a new data type will indicate in their protocol for each occurrence of it whether the tag must be present or not (never saying the tag may or may not be present). The designer will typically require the tag only when necessary to disambiguate the type of the data item.

The implementor of a general purpose library to generate one of these new data item types must give the caller the option to include or not include the tag. Maybe this is just by never automatically outputting the tag and having a distinct output tag function.

The implementor of a general purpose library to decode one of these new data types must allow the caller to say that the next data item should be decoded as this new data type whether or not it is tagged. Maybe it even errors out if it is tagged for the cases where the protocol document says no tag should be used.

What I don’t know is whether CDDL can describe all this desired behavior.

LL

On Oct 24, 2019, at 1:50 AM, Christophe Lohr <christophe.lohr@imt-atlantique.fr <mailto:christophe.lohr@imt-atlantique.fr> > wrote:

On 23/10/2019 13:38, Carsten Bormann wrote:

Section 3.4 talks about "optional tagging" as a secondary purpose of tags. But in today's CBOR protocols, tags are rarely "optional" in the sense that they can simply be left out without a change in semantics, as 3.4 para 3 implies.

This concept comes up again in 4.2.2, where "optional tagging" is outlawed in deterministic encoding (but then the text goes on to explain that protocols might choose to retain tags, but doesn't say why).

To be honest, I don't really understand how much optional are tags.

A CDD rule with tags matchs cbor items with tags and reject cbor items
without tags. Tags are not optional from the data-model point of view.

Moreover, please consider this CDDL objective:
(https://tools.ietf.org/html/rfc7049#section-1.1)

   3.  Data must be able to be decoded without a schema description.
       *  Similar to JSON, encoded data should be self-describing so
          that a generic decoder can be written.

Well, how to do this without putting tags everywhere for everything?
(Or I need more explanation about what is "self-describing" and what is
a "schema description")

Let say I receive data. How may I know that this number is a temperature
and not a distance, and that this byte-string is an uuid and not a small
picture?

The first way is to have a schema (written or not): That is to say a
certain preliminary knowledge of expected data which tell me that this
number at this place or associated to this map-key is a temperature.
The second way is to decorate data with tags, all data.
A third way is a compromise between the two first ones: I have a certain
level of preliminary knoledge of what data are (a kind of schema
description), with possibly some missing parts that are filled by tags.

But the only way to decode data _without_ a schema description is to
have tags everywhere for everything.
Surprisingly, json has no tags and is claimed to be self-describing. Is
it really? I'm lost.

My feeling is that this objective CBOR should be not so demanding.

Best regards,
Christophe

_______________________________________________
CBOR mailing list
CBOR@ietf.org <mailto:CBOR@ietf.org> 
https://www.ietf.org/mailman/listinfo/cbor