Re: [Cbor] 7049bis: Diagnostic notation gaps
Thiago Macieira <thiago.macieira@intel.com> Fri, 11 September 2020 21:30 UTC
Return-Path: <thiago.macieira@intel.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3A6473A09CC for <cbor@ietfa.amsl.com>; Fri, 11 Sep 2020 14:30:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 35ZytAgyZL3L for <cbor@ietfa.amsl.com>; Fri, 11 Sep 2020 14:30:29 -0700 (PDT)
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D928E3A09C9 for <cbor@ietf.org>; Fri, 11 Sep 2020 14:30:28 -0700 (PDT)
IronPort-SDR: IXKx2/qckgu5gnDQNz1ZFd3L7mX+7NhFfrtgwSCgfxLnr9Ue56U4OtFGUFV1ujIWe94xHC/+yG P4WIZdhMAznQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9741"; a="158892314"
X-IronPort-AV: E=Sophos;i="5.76,417,1592895600"; d="scan'208";a="158892314"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2020 14:30:26 -0700
IronPort-SDR: e3GZAkthNNsPNxZHmnHPwWmOTwQyAshsD0pe3JXRxYBOXOuIXJn/c6eRrm/Xxa75jqSDczVAIn yb0IygMr94Tw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.76,417,1592895600"; d="scan'208";a="506406609"
Received: from orsmsx606.amr.corp.intel.com ([10.22.229.19]) by fmsmga005.fm.intel.com with ESMTP; 11 Sep 2020 14:30:26 -0700
Received: from orsmsx606.amr.corp.intel.com (10.22.229.19) by ORSMSX606.amr.corp.intel.com (10.22.229.19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Fri, 11 Sep 2020 14:30:25 -0700
Received: from orsmsx101.amr.corp.intel.com (10.22.225.128) by orsmsx606.amr.corp.intel.com (10.22.229.19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.1713.5 via Frontend Transport; Fri, 11 Sep 2020 14:30:25 -0700
Received: from tjmaciei-mobl1.localnet (10.254.85.139) by ORSMSX101.amr.corp.intel.com (10.22.225.128) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 11 Sep 2020 14:30:24 -0700
From: Thiago Macieira <thiago.macieira@intel.com>
To: cbor@ietf.org
Date: Fri, 11 Sep 2020 14:30:24 -0700
Message-ID: <1973898.N1gx0QA8IB@tjmaciei-mobl1>
Organization: Intel Corporation
In-Reply-To: <2766F4E6-0E67-472B-8BFA-75C529F4EE80@tzi.org>
References: <2766F4E6-0E67-472B-8BFA-75C529F4EE80@tzi.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="UTF-8"
X-Originating-IP: [10.254.85.139]
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/HJOv0r6IcVe1yHO5BMMd1D6Xqck>
Subject: Re: [Cbor] 7049bis: Diagnostic notation gaps
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Sep 2020 21:30:30 -0000
On Friday, 11 September 2020 13:46:10 PDT Carsten Bormann wrote: > * (_ ) is ambiguous whether it refers to 0x5fff (an indefinite length byte > string with no chunks) or 0x7fff (an indefinite length text string with no > chunks). The proposal in #205 is to disambiguate this by adding > characters, (_b ) vs. (_t ). > > It occurred to me that a better way to represent these two encoded CBOR data > items would be to use ‘’_ and “”_ (please excuse the smart quotes). This > is already allowed by the text in RFC 7049; we would just need to point it > out by adding a sentence to the last paragraph of § 8.1. Note that in RFC > 7049 these notations are in principle ambiguous, i.e. ‘’_ could be a (_ ) > (0x5fff) or (_ ‘’) or (_ ‘’ ‘’) and so on; we would probably just clarify > that these are to be used for (_ ) only as the other ones already have good > diagnostic notation. That would mean needing to keep a state whether the payload was empty or not and backtrack. A diagnostic printer currently can see the 0x5f or 0x7f and print "(_ ", then continue on. In order to print ""_ for 0x7fff, it needs to forego printing the parenthesis opening. The simplest solution is to add the "b" or "t" unconditionally. > (The payload here is the sign bit and the 11/24/53 significand [sic!] bits; > in the binaryxx formats there is an exponent intervening of 5/8/11 bits > which must be all ones, and the significand may not be all zeroes — those > are the values for the two infinities, so only ~ 53.9999999999999998398 > bits are actually available.) > > I think this is ugly as hell, but it also is the best proposal so far. It would be far easier to dump the binary representation of the entire floating point number, including the exponent bits. If this format were allowed, a diagnostic printer with no support for stringifying floating point numbers could use it for other values too. Second option would be to use it like gdb prints: -nan(0x20000) The type information can be encoded as either "nanf" (float) and "nanf16" (for _Float16). If parentheses are not a good idea, then a third option is to print only the significand bits, dropping the "0x" too. It's clearly hexadecimal, we don't need the "x". In that case, we'd use the _ modification to indicate encoding length, as "-nan_001_1" for a negative 16-bit signalling NaN. Either way, the document should advise what to do when it comes to signalling / quiet NaNs. The CBOR spec recommends that only the IEEE-recommended non- signalling form be used "If NaN is an allowed value, it must always be represented as 0xf97e00." The table in Appendix A also lists 0xfa7fc00000 and 0xfb7ff8000000000000, which match the IEEE recommendations for QNaN. But the Wikipedia page warns that some (older?) machines invert it and encode signalling NaN with the topmost mantissa bit set. I would recommend that it print "nan" only if it's one of those three. And if it's distinguishing the payload length, "nan" is only for the double-precision case, the others requiring "nanf" or "nanf16" (if using the variant with parentheses), or for the 16-bit with "nan_2" and "nan_3" for the single- and double-precision ones respectively if not using parentheses. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel DPG Cloud Engineering
- Re: [Cbor] 7049bis: Diagnostic notation gaps Thiago Macieira
- [Cbor] 7049bis: Diagnostic notation gaps Carsten Bormann
- Re: [Cbor] 7049bis: Diagnostic notation gaps Thiago Macieira
- Re: [Cbor] 7049bis: Diagnostic notation gaps Carsten Bormann
- [Cbor] 0x5fff/0x7fff (Re: 7049bis: Diagnostic not… Carsten Bormann
- [Cbor] NaN payload notation (Re: 7049bis: Diagnos… Carsten Bormann
- Re: [Cbor] 0x5fff/0x7fff (Re: 7049bis: Diagnostic… Thiago Macieira
- Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Thiago Macieira
- Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Carsten Bormann
- Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Thiago Macieira
- Re: [Cbor] NaN payload notation (Re: 7049bis: Dia… Carsten Bormann
- Re: [Cbor] 7049bis: Diagnostic notation gaps Carsten Bormann