[Cbor] 0x5fff/0x7fff (Re: 7049bis: Diagnostic notation gaps)

Carsten Bormann <cabo@tzi.org> Wed, 16 September 2020 11:32 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 331443A0F71 for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 04:32:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.919
X-Spam-Level:
X-Spam-Status: No, score=-1.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EZZRwZEKEdHm for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 04:32:12 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2820A3A0BF9 for <cbor@ietf.org>; Wed, 16 Sep 2020 04:32:11 -0700 (PDT)
Received: from [172.16.42.104] (p5089ae91.dip0.t-ipconnect.de [80.137.174.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4BrycT5HmKzyZD; Wed, 16 Sep 2020 13:32:09 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <1973898.N1gx0QA8IB@tjmaciei-mobl1>
Date: Wed, 16 Sep 2020 13:32:08 +0200
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 621948728.6300811-96fe16da17cad13af93006b89e0d891f
Content-Transfer-Encoding: quoted-printable
Message-Id: <8B90392E-6DA2-428E-BB96-BDB2FEC1CE01@tzi.org>
References: <2766F4E6-0E67-472B-8BFA-75C529F4EE80@tzi.org> <1973898.N1gx0QA8IB@tjmaciei-mobl1>
To: Thiago Macieira <thiago.macieira@intel.com>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/o9VPH8CcGWwCw4BSxJPdiv3Z_40>
Subject: [Cbor] 0x5fff/0x7fff (Re: 7049bis: Diagnostic notation gaps)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Sep 2020 11:32:15 -0000

Hi Thiago,

Let me try to recap the two proposals for 0x5fff/0x7fff:

* Add b/t.  The minimum change would be to only allow this for empty indefinite length strings (no chunks).  In which case there still would need to be some logic to only emit b/t when the indefinite length string closes right away.  If we don’t do the minimum change, we would have interop issues with all implementations that don’t know about this.

* Use ‘’_/“”_.  This is indeed the minimum change, because it is no change from RFC 7049, just pointing out that this stands for 0x5fff and 0x7fff, respectively (well, there is a small change in that this is clarified to stand for no chunks, so you would need to say (_ ‘') etc if you have empty chunks, but that is already natural).  A generator of diagnostic notation would indeed need some look-ahead (one byte) to do this on the fly.  We don’t know how many consumers already decode ‘’_/“”_, so it is hard to say what interop issues we’d have, but the alternative (_ ) cannot be decoded unambiguously anyway.

I have a preference for not adding mechanism where it is not needed (even though this requires some code in the implementation — but that is true for both cases).

Let’s discuss this some more today at the CBOR interim meeting.

Grüße, Carsten


> On 2020-09-11, at 23:30, Thiago Macieira <thiago.macieira@intel.com> wrote:
> 
> On Friday, 11 September 2020 13:46:10 PDT Carsten Bormann wrote:
>> * (_ ) is ambiguous whether it refers to 0x5fff (an indefinite length byte
>> string with no chunks) or 0x7fff (an indefinite length text string with no
>> chunks).  The proposal in #205 is to disambiguate this by adding
>> characters, (_b ) vs. (_t ).
>> 
>> It occurred to me that a better way to represent these two encoded CBOR data
>> items would be to use ‘’_ and “”_ (please excuse the smart quotes).  This
>> is already allowed by the text in RFC 7049; we would just need to point it
>> out by adding a sentence to the last paragraph of § 8.1.  Note that in RFC
>> 7049 these notations are in principle ambiguous, i.e. ‘’_ could be a (_ )
>> (0x5fff) or (_ ‘’) or (_ ‘’ ‘’) and so on; we would probably just clarify
>> that these are to be used for (_ ) only as the other ones already have good
>> diagnostic notation.
> 
> That would mean needing to keep a state whether the payload was empty or not 
> and backtrack. A diagnostic printer currently can see the 0x5f or 0x7f and 
> print "(_ ", then continue on. In order to print ""_ for 0x7fff, it needs to 
> forego printing the parenthesis opening.
> 
> The simplest solution is to add the "b" or "t" unconditionally.
> 
>> (The payload here is the sign bit and the 11/24/53 significand [sic!] bits;
>> in the binaryxx formats there is an exponent intervening of 5/8/11 bits
>> which must be all ones, and the significand may not be all zeroes — those
>> are the values for the two infinities, so only ~ 53.9999999999999998398
>> bits are actually available.)
>> 
>> I think this is ugly as hell, but it also is the best proposal so far.
> 
> It would be far easier to dump the binary representation of the entire 
> floating point number, including the exponent bits. If this format were 
> allowed, a diagnostic printer with no support for stringifying floating point 
> numbers could use it for other values too.
> 
> Second option would be to use it like gdb prints:
> 
> 	-nan(0x20000)
> 
> The type information can be encoded as either "nanf" (float) and "nanf16" (for 
> _Float16).
> 
> If parentheses are not a good idea, then a third option is to print only the 
> significand bits, dropping the "0x" too. It's clearly hexadecimal, we don't 
> need the "x". In that case, we'd use the _ modification to indicate encoding 
> length, as "-nan_001_1" for a negative 16-bit signalling NaN.
> 
> Either way, the document should advise what to do when it comes to signalling 
> / quiet NaNs. The CBOR spec recommends that only the IEEE-recommended non-
> signalling form be used
> 
> "If NaN is an allowed value, it must always be represented as 0xf97e00."
> 
> The table in Appendix A also lists 0xfa7fc00000 and 0xfb7ff8000000000000, 
> which match the IEEE recommendations for QNaN. But the Wikipedia page warns 
> that some (older?) machines invert it and encode signalling NaN with the 
> topmost mantissa bit set.
> 
> I would recommend that it print "nan" only if it's one of those three. And if 
> it's distinguishing the payload length, "nan" is only for the double-precision 
> case, the others requiring "nanf" or "nanf16" (if using the variant with 
> parentheses), or for the 16-bit with "nan_2" and "nan_3" for the single- and 
> double-precision ones respectively if not using parentheses.
> 
> -- 
> Thiago Macieira - thiago.macieira (AT) intel.com
>  Software Architect - Intel DPG Cloud Engineering
> 
> 
> 
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor