[Cbor] 7049bis: Diagnostic notation gaps

Carsten Bormann <cabo@tzi.org> Fri, 11 September 2020 20:46 UTC

From: Carsten Bormann <cabo@tzi.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
Date: Fri, 11 Sep 2020 22:46:10 +0200
Message-Id: <2766F4E6-0E67-472B-8BFA-75C529F4EE80@tzi.org>
To: cbor@ietf.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/7K5f0rJ-08MTg4C8s8rIwKUldOU>
Subject: [Cbor] 7049bis: Diagnostic notation gaps
Precedence: list

Over at https://github.com/cbor-wg/CBORbis/issues/204, we have had an interesting discussion about closing some gaps in diagnostic notation coverage (there also is a proposal in https://github.com/cbor-wg/CBORbis/pull/205 which I’m not sure how to handle).

Basically, #204 stipulates that there should be diagnostic notation for all well-formed encoded CBOR data items, which makes a lot of sense to me.  
It then identifies two gaps:

* (_ ) is ambiguous whether it refers to 0x5fff (an indefinite length byte string with no chunks) or 0x7fff (an indefinite length text string with no chunks).  The proposal in #205 is to disambiguate this by adding characters, (_b ) vs. (_t ).

It occurred to me that a better way to represent these two encoded CBOR data items would be to use ‘’_ and “”_ (please excuse the smart quotes).  This is already allowed by the text in RFC 7049; we would just need to point it out by adding a sentence to the last paragraph of § 8.1.  Note that in RFC 7049 these notations are in principle ambiguous, i.e. ‘’_ could be a (_ ) (0x5fff) or (_ ‘’) or (_ ‘’ ‘’) and so on; we would probably just clarify that these are to be used for (_ ) only as the other ones already have good diagnostic notation.

* There is only one NaN in diagnostic notation, but there are many NaN values in IEEE 754.  (This is a bit like JavaScript, where the core language only exposes one NaN value, even if extensions such as array buffers give the programmer full access to all of them.)  #205 suggests a notation:

> The IEEE 754 representation of NaN carries a "payload" of up to 54 bits,
> not all of which may be zero, so we allow an encoding indicator to specify
> the exact hex representation. Thus the standard half-precision NaN may be
> represented as `NaN`, `NaN_1`, or `NaN_1_x7E00`, while a single-precision
> NaN with an all-ones payload is represented as `NaN_2_x7FFFFFFF`. Items
> like `NaN_1_x0000` or `NaN_1_x7C00` do not encode NaN in the hex
> representation and so are not valid diagnostic notation.

(The payload here is the sign bit and the 11/24/53 significand [sic!] bits; in the binaryxx formats there is an exponent intervening of 5/8/11 bits which must be all ones, and the significand may not be all zeroes — those are the values for the two infinities, so only ~ 53.9999999999999998398 bits are actually available.)

I think this is ugly as hell, but it also is the best proposal so far.


My questions to the WG are:

1. Are we ready to go for ‘’_ and “”_, with the clarification to be added?

2. Do we believe that this is the right way to handle NaN payloads?
   Should we add this to 7049bis (which would be new functionality) or should we do this in a separate document?
   (We did this for other diagnostic notation extensions, which are listed in RFC 8610.)

Note that diagnostic notation is qualified as “not for interchange”, but its use for testing in tools and for communication in specification documents makes it rather desirable to nail it down at a good level of specificity.

Grüße, Carsten

Re: [Cbor] 7049bis: Diagnostic notation gaps Thiago Macieira
[Cbor] 7049bis: Diagnostic notation gaps Carsten Bormann
Re: [Cbor] 7049bis: Diagnostic notation gaps Thiago Macieira
Re: [Cbor] 7049bis: Diagnostic notation gaps Carsten Bormann
[Cbor] 0x5fff/0x7fff (Re: 7049bis: Diagnostic not… Carsten Bormann
[Cbor] NaN payload notation (Re: 7049bis: Diagnos… Carsten Bormann
Re: [Cbor] 0x5fff/0x7fff (Re: 7049bis: Diagnostic… Thiago Macieira
Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Thiago Macieira
Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Carsten Bormann
Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Thiago Macieira
Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Carsten Bormann
Re: [Cbor] 7049bis: Diagnostic notation gaps Carsten Bormann