Re: [Cbor] NaN payload notation (Re: 7049bis: Diagnostic notation gaps)

Thiago Macieira <thiago.macieira@intel.com> Wed, 16 September 2020 16:05 UTC

Return-Path: <thiago.macieira@intel.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C30863A0CE1 for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 09:05:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7nP7-TltZMGY for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 09:05:54 -0700 (PDT)
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5EB073A0C8A for <cbor@ietf.org>; Wed, 16 Sep 2020 09:05:54 -0700 (PDT)
IronPort-SDR: 8w/puv2mYGkZOHFi05WNuDAzSQs03E732gLcPkrAsXy1ox5y0qIYaAqdP/CWTGjTkdWJGizHjn OUGXGiFz0HtQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9746"; a="244341349"
X-IronPort-AV: E=Sophos;i="5.76,433,1592895600"; d="scan'208";a="244341349"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Sep 2020 09:05:53 -0700
IronPort-SDR: X8HI3dQfKiykmHB5hngiNaiz1N0kV/5ser4q6EH0akCi6i7iTsQluCvklx+JyivJktgshpZfPb eRPetzaeuseA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.76,433,1592895600"; d="scan'208";a="508054786"
Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by fmsmga005.fm.intel.com with ESMTP; 16 Sep 2020 09:05:53 -0700
Received: from orsmsx602.amr.corp.intel.com (10.22.229.15) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Wed, 16 Sep 2020 09:05:53 -0700
Received: from orsmsx101.amr.corp.intel.com (10.22.225.128) by orsmsx602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.1713.5 via Frontend Transport; Wed, 16 Sep 2020 09:05:53 -0700
Received: from tjmaciei-mobl1.localnet (10.255.230.4) by ORSMSX101.amr.corp.intel.com (10.22.225.128) with Microsoft SMTP Server (TLS) id 14.3.439.0; Wed, 16 Sep 2020 09:05:52 -0700
From: Thiago Macieira <thiago.macieira@intel.com>
To: Carsten Bormann <cabo@tzi.org>
CC: cbor@ietf.org
Date: Wed, 16 Sep 2020 09:05:52 -0700
Message-ID: <1686854.WtuvBSIOmm@tjmaciei-mobl1>
Organization: Intel Corporation
In-Reply-To: <4933A00D-CD85-405D-BDEB-10F06C6E4673@tzi.org>
References: <2766F4E6-0E67-472B-8BFA-75C529F4EE80@tzi.org> <1973898.N1gx0QA8IB@tjmaciei-mobl1> <4933A00D-CD85-405D-BDEB-10F06C6E4673@tzi.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="UTF-8"
X-Originating-IP: [10.255.230.4]
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/-IMl7iPdarEFlB2Gc5HuQFem4Ik>
Subject: Re: [Cbor] NaN payload notation (Re: 7049bis: Diagnostic notation gaps)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Sep 2020 16:05:56 -0000

On Wednesday, 16 September 2020 05:08:03 PDT Carsten Bormann wrote:
> Discussing the technical merits of the various proposals:
> 
> We should make sure that the parsing doesn’t become complicated; that was
> one reason why having the “x” in there might be helpful.
> 
> Also, I would like the values that do not carry a _ to be independent of
> length, so “nan” would continue to stand for 0x7e00, 0x7f800000, etc. (Note
> that there are some people that like those to stand for 0x7fff/0x7fffffff —
> one reason is that the negative NaN becomes 0xffffffff, which is also the
> result of some SIMD operations.)

Clarification, do you mean that "nan" stands for all three possibilities? Or 
that it's one of them, the others being represented by nan_2 and nan_3? The 
same answer would apply for "inf", I presume.

As for the all-bits-set... I would prefer not to. The current 7049 says that 
the preferred forms are 7e00, 7fc00000, 7ff8000000000000, which match the 
quiet NaN that CPUs like x86 generate when they need to create a NaN out of 
non-NaN parameters. The other way around is important and is detailed on the 
Intel manual: the SIMD all-bits-set value has the NaN's quiet bit set.

> I like the approach of enabling the dumping/loading of floating point values
> without understanding floating point at all.  Hex floats (part of EDN in
> RFC 8610) go a long way, but specifically do not address NaNs.  If we use
> something that looks like a number for that (0x…), we need to require
> length indicators (_1 _2 _3).  Maybe dumping the whole item in hex,
> including the head (f9/fa/fb), is the most versatile extension of DN/EDN
> that we can make.  If we go this way, we probably should make up a somewhat
> jarring syntax so this becomes readily visible and is visually *very*
> distinct from hexadecimal CBOR-in-a-byte-string (h’f97c00’).

We can just use the 'f' prefix instead of 'h', unless you mean that that is 
not distinct enough. So, is f'7eff'_1 a sufficient output?

Personally, I'd prefer never to generate that. Printing as hexfloat isn't too 
difficult, so whenever possible I'd like to have my own code do that. For this 
reason, I'd like an alternative for NaN that has the letters "nan" somewhere 
in them, so I can see from the dump output that it is a NaN.

Especially as it gets bigger. Is f'7ff8000000000000'_3 or f'7f800000'_2 a NaN, 
infinity or just a very large number? 

"binary32 has 24 bits in the fraction, but one is implicit, so that is 23 bits 
dedicated to the mantissa; that leaves 32 - 23 = 9 bits for the sign and 
exponent; 9 bits need two hexadecimal digits plus one bit, so that "8" in 
"7f8" is still part of the exponent, which means the mantissa is zero, which 
means this number is positive infinity. Gah, just kill me now!"

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering