Re: [Cbor] NaN payload notation (Re: 7049bis: Diagnostic notation gaps)

Carsten Bormann <cabo@tzi.org> Wed, 16 September 2020 16:17 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6391B3A0E7A for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 09:17:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.919
X-Spam-Level:
X-Spam-Status: No, score=-1.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rodMKuSlAS-i for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 09:17:26 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 09F473A0E79 for <cbor@ietf.org>; Wed, 16 Sep 2020 09:17:25 -0700 (PDT)
Received: from [172.16.42.104] (p5089ae91.dip0.t-ipconnect.de [80.137.174.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Bs4xc0fkMz109g; Wed, 16 Sep 2020 18:17:24 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <1686854.WtuvBSIOmm@tjmaciei-mobl1>
Date: Wed, 16 Sep 2020 18:17:23 +0200
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 621965843.304333-c1ac084fb45df6360c279780156caff1
Content-Transfer-Encoding: quoted-printable
Message-Id: <B5903EB7-8030-4A79-B73B-AF96B4F8E342@tzi.org>
References: <2766F4E6-0E67-472B-8BFA-75C529F4EE80@tzi.org> <1973898.N1gx0QA8IB@tjmaciei-mobl1> <4933A00D-CD85-405D-BDEB-10F06C6E4673@tzi.org> <1686854.WtuvBSIOmm@tjmaciei-mobl1>
To: Thiago Macieira <thiago.macieira@intel.com>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/OHmdgngSdXQYE2e2u1lKyzOkZXE>
Subject: Re: [Cbor] NaN payload notation (Re: 7049bis: Diagnostic notation gaps)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Sep 2020 16:17:29 -0000


> On 2020-09-16, at 18:05, Thiago Macieira <thiago.macieira@intel.com> wrote:
> 
> On Wednesday, 16 September 2020 05:08:03 PDT Carsten Bormann wrote:
>> Discussing the technical merits of the various proposals:
>> 
>> We should make sure that the parsing doesn’t become complicated; that was
>> one reason why having the “x” in there might be helpful.
>> 
>> Also, I would like the values that do not carry a _ to be independent of
>> length, so “nan” would continue to stand for 0x7e00, 0x7f800000, etc. (Note
>> that there are some people that like those to stand for 0x7fff/0x7fffffff —
>> one reason is that the negative NaN becomes 0xffffffff, which is also the
>> result of some SIMD operations.)
> 
> Clarification, do you mean that "nan" stands for all three possibilities?

Yes, like 1.0 does. (With f93c00 as the preferred encoding of 1.0.)

> Or 
> that it's one of them, the others being represented by nan_2 and nan_3? The 
> same answer would apply for "inf", I presume.

It’s actually Infinity, -Infinity, and NaN.

> As for the all-bits-set... I would prefer not to. The current 7049 says that 
> the preferred forms are 7e00, 7fc00000, 7ff8000000000000, which match the 
> quiet NaN that CPUs like x86 generate when they need to create a NaN out of 
> non-NaN parameters. The other way around is important and is detailed on the 
> Intel manual: the SIMD all-bits-set value has the NaN's quiet bit set.

Some diagnostic notation writers will put out 0xffffffff as NaN (as defined in RFC 7049).  Some may make use of new notation we come up with.

>> I like the approach of enabling the dumping/loading of floating point values
>> without understanding floating point at all.  Hex floats (part of EDN in
>> RFC 8610) go a long way, but specifically do not address NaNs.  If we use
>> something that looks like a number for that (0x…), we need to require
>> length indicators (_1 _2 _3).  Maybe dumping the whole item in hex,
>> including the head (f9/fa/fb), is the most versatile extension of DN/EDN
>> that we can make.  If we go this way, we probably should make up a somewhat
>> jarring syntax so this becomes readily visible and is visually *very*
>> distinct from hexadecimal CBOR-in-a-byte-string (h’f97c00’).
> 
> We can just use the 'f' prefix instead of 'h', unless you mean that that is 
> not distinct enough. So, is f'7eff'_1 a sufficient output?

Well, that is rather close.  Foo’baaa’ is for byte strings, so this would be a clear deviation.

> Personally, I'd prefer never to generate that. Printing as hexfloat isn't too 
> difficult, so whenever possible I'd like to have my own code do that. For this 
> reason, I'd like an alternative for NaN that has the letters "nan" somewhere 
> in them, so I can see from the dump output that it is a NaN.

OK, sticking with your proposal:  NaN’7e00’_1 (or maybe NaN’7e00’ as we don’t really have leading zero bytes in a NaN?)
This of course means NaN’7c00’ (or NaN’3c00’) would be invalid, as these are not NaNs.

Grüße, Carsten