[Cbor] NaN payload notation (Re: 7049bis: Diagnostic notation gaps)
Carsten Bormann <cabo@tzi.org> Wed, 16 September 2020 12:08 UTC
Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 63DC73A0A16 for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 05:08:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.919
X-Spam-Level:
X-Spam-Status: No, score=-1.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id maNzGnMXTV37 for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 05:08:08 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 131783A0A06 for <cbor@ietf.org>; Wed, 16 Sep 2020 05:08:07 -0700 (PDT)
Received: from [172.16.42.104] (p5089ae91.dip0.t-ipconnect.de [80.137.174.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4BrzPx07vGz106t; Wed, 16 Sep 2020 14:08:04 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <1973898.N1gx0QA8IB@tjmaciei-mobl1>
Date: Wed, 16 Sep 2020 14:08:03 +0200
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 621950883.818118-df000a2d38d0fdabdf723d79b4c91fea
Content-Transfer-Encoding: quoted-printable
Message-Id: <4933A00D-CD85-405D-BDEB-10F06C6E4673@tzi.org>
References: <2766F4E6-0E67-472B-8BFA-75C529F4EE80@tzi.org> <1973898.N1gx0QA8IB@tjmaciei-mobl1>
To: Thiago Macieira <thiago.macieira@intel.com>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/bKaZKPV0tBcatfSJ8o2ppxdcJ6I>
Subject: [Cbor] NaN payload notation (Re: 7049bis: Diagnostic notation gaps)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Sep 2020 12:08:11 -0000
Obviously, we need to add something to enable diagnostic notation to represent NaN payloads. First of all, I wouldn’t mind doing this in a separate document, so we don’t tie getting experience with such a notation into the “real soon now” schedule for 7049bis. Discussing the technical merits of the various proposals: We should make sure that the parsing doesn’t become complicated; that was one reason why having the “x” in there might be helpful. Also, I would like the values that do not carry a _ to be independent of length, so “nan” would continue to stand for 0x7e00, 0x7f800000, etc. (Note that there are some people that like those to stand for 0x7fff/0x7fffffff — one reason is that the negative NaN becomes 0xffffffff, which is also the result of some SIMD operations.) I like the approach of enabling the dumping/loading of floating point values without understanding floating point at all. Hex floats (part of EDN in RFC 8610) go a long way, but specifically do not address NaNs. If we use something that looks like a number for that (0x…), we need to require length indicators (_1 _2 _3). Maybe dumping the whole item in hex, including the head (f9/fa/fb), is the most versatile extension of DN/EDN that we can make. If we go this way, we probably should make up a somewhat jarring syntax so this becomes readily visible and is visually *very* distinct from hexadecimal CBOR-in-a-byte-string (h’f97c00’). Grüße, Carsten > On 2020-09-11, at 23:30, Thiago Macieira <thiago.macieira@intel.com> wrote: > > On Friday, 11 September 2020 13:46:10 PDT Carsten Bormann wrote: >> * (_ ) is ambiguous whether it refers to 0x5fff (an indefinite length byte >> string with no chunks) or 0x7fff (an indefinite length text string with no >> chunks). The proposal in #205 is to disambiguate this by adding >> characters, (_b ) vs. (_t ). >> >> It occurred to me that a better way to represent these two encoded CBOR data >> items would be to use ‘’_ and “”_ (please excuse the smart quotes). This >> is already allowed by the text in RFC 7049; we would just need to point it >> out by adding a sentence to the last paragraph of § 8.1. Note that in RFC >> 7049 these notations are in principle ambiguous, i.e. ‘’_ could be a (_ ) >> (0x5fff) or (_ ‘’) or (_ ‘’ ‘’) and so on; we would probably just clarify >> that these are to be used for (_ ) only as the other ones already have good >> diagnostic notation. > > That would mean needing to keep a state whether the payload was empty or not > and backtrack. A diagnostic printer currently can see the 0x5f or 0x7f and > print "(_ ", then continue on. In order to print ""_ for 0x7fff, it needs to > forego printing the parenthesis opening. > > The simplest solution is to add the "b" or "t" unconditionally. > >> (The payload here is the sign bit and the 11/24/53 significand [sic!] bits; >> in the binaryxx formats there is an exponent intervening of 5/8/11 bits >> which must be all ones, and the significand may not be all zeroes — those >> are the values for the two infinities, so only ~ 53.9999999999999998398 >> bits are actually available.) >> >> I think this is ugly as hell, but it also is the best proposal so far. > > It would be far easier to dump the binary representation of the entire > floating point number, including the exponent bits. If this format were > allowed, a diagnostic printer with no support for stringifying floating point > numbers could use it for other values too. > > Second option would be to use it like gdb prints: > > -nan(0x20000) > > The type information can be encoded as either "nanf" (float) and "nanf16" (for > _Float16). > > If parentheses are not a good idea, then a third option is to print only the > significand bits, dropping the "0x" too. It's clearly hexadecimal, we don't > need the "x". In that case, we'd use the _ modification to indicate encoding > length, as "-nan_001_1" for a negative 16-bit signalling NaN. > > Either way, the document should advise what to do when it comes to signalling > / quiet NaNs. The CBOR spec recommends that only the IEEE-recommended non- > signalling form be used > > "If NaN is an allowed value, it must always be represented as 0xf97e00." > > The table in Appendix A also lists 0xfa7fc00000 and 0xfb7ff8000000000000, > which match the IEEE recommendations for QNaN. But the Wikipedia page warns > that some (older?) machines invert it and encode signalling NaN with the > topmost mantissa bit set. > > I would recommend that it print "nan" only if it's one of those three. And if > it's distinguishing the payload length, "nan" is only for the double-precision > case, the others requiring "nanf" or "nanf16" (if using the variant with > parentheses), or for the 16-bit with "nan_2" and "nan_3" for the single- and > double-precision ones respectively if not using parentheses. > > -- > Thiago Macieira - thiago.macieira (AT) intel.com > Software Architect - Intel DPG Cloud Engineering > > > > _______________________________________________ > CBOR mailing list > CBOR@ietf.org > https://www.ietf.org/mailman/listinfo/cbor
- Re: [Cbor] 7049bis: Diagnostic notation gaps Thiago Macieira
- [Cbor] 7049bis: Diagnostic notation gaps Carsten Bormann
- Re: [Cbor] 7049bis: Diagnostic notation gaps Thiago Macieira
- Re: [Cbor] 7049bis: Diagnostic notation gaps Carsten Bormann
- [Cbor] 0x5fff/0x7fff (Re: 7049bis: Diagnostic not… Carsten Bormann
- [Cbor] NaN payload notation (Re: 7049bis: Diagnos… Carsten Bormann
- Re: [Cbor] 0x5fff/0x7fff (Re: 7049bis: Diagnostic… Thiago Macieira
- Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Thiago Macieira
- Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Carsten Bormann
- Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Thiago Macieira
- Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Carsten Bormann
- Re: [Cbor] 7049bis: Diagnostic notation gaps Carsten Bormann