[Cbor] Number formats (Re: Reminder and call for agenda: CBOR WG Virtual Meeting on 2023-05-31)

Carsten Bormann <cabo@tzi.org> Sat, 27 May 2023 07:11 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 22485C151543 for <cbor@ietfa.amsl.com>; Sat, 27 May 2023 00:11:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.887
X-Spam-Level:
X-Spam-Status: No, score=-1.887 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xYg5oa9V3k9V for <cbor@ietfa.amsl.com>; Sat, 27 May 2023 00:11:12 -0700 (PDT)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AC87BC14F736 for <cbor@ietf.org>; Sat, 27 May 2023 00:11:11 -0700 (PDT)
Received: from smtpclient.apple (p548dc0f6.dip0.t-ipconnect.de [84.141.192.246]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4QStHb59TqzDCbY; Sat, 27 May 2023 09:11:07 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <CAM70yxCJSF=9aDcpSOTQvZuT3rTUxVZZ5nJao-ANbDZ4U6Y-HQ@mail.gmail.com>
Date: Sat, 27 May 2023 09:10:57 +0200
Cc: cbor@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <4D48B889-8870-40A2-AE10-25A054389819@tzi.org>
References: <CALaySJJ8kwtR8y9us4Qi49KFAYwus0uBoRi49rMsEO4smwfKSA@mail.gmail.com> <CALaySJJqusJ=6X06Ee4UrhQp236h079Ng3MLbTgEzEd4=9EUhQ@mail.gmail.com> <CALaySJLGk9Ztg_kMmvk=PW+=2SLf1Bkb-kmQyPz=Dbs8=DuXMA@mail.gmail.com> <CALaySJLfJqcdy+GbpC0U44t1wi4p+zf7ObogAJFZuVheZ1UC0w@mail.gmail.com> <CALaySJ+eHZ5EeRM8wrO3o7b3UVzMwwAn+6Kuq_wMDLBxtOQmiw@mail.gmail.com> <CALaySJKOqZ0wp6ZBUTo=z6_pLKbQfekfZwJapOzRLWBvAjDiCA@mail.gmail.com> <CALaySJJR0ouauKKsy2uyYVtT2nsuawXGL_JKa0jLFNbxQCHLAw@mail.gmail.com> <118bed90-9c98-0da9-eefb-906e5b714369@gmail.com> <CAM70yxCJSF=9aDcpSOTQvZuT3rTUxVZZ5nJao-ANbDZ4U6Y-HQ@mail.gmail.com>
To: Emile Cormier <emile.cormier.jr@gmail.com>
X-Mailer: Apple Mail (2.3731.600.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/lFY2jvRQhBzog_EpUXsoor-x8FA>
Subject: [Cbor] Number formats (Re: Reminder and call for agenda: CBOR WG Virtual Meeting on 2023-05-31)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 May 2023 07:11:19 -0000

Hi Emile,

interesting observations.  It seems to me you would like XDR (RFC 4506), except that this requires following a data description for ingesting (and was developed when 32 bits were a canonical size).

I think it is important for the CBOR WG to keep an eye on where binary (as in not text-based) representation formats should be moving.  And that should involve skating where the puck will be, not necessarily Brownian motion on old discussions.

With respect to number representation, right now an important trend is being able to accomodate low-precision numbers.
In 2013, CBOR was one of the first representation formats to embrace fp16 (16-bit se5m10 floating point), while others generally only had the 32-bit and 64-bit floating point formats.
Low-precision numbers typically occur in large amounts, so in 2020 we added tags for typed arrays (RFC 8746), which by the way do provide for direct use of little-endian numbers as well as of two’s complement formats for signed numbers — the conversion overhead into a preferred number format can be significant for arrays of numbers, and a single tag can control the format for the entire array.

The next step here might be to look for popular low-precision formats not yet covered in that tag set, e.g., Google’s bfloat16 or bf16 (se8m7 truncated fp32), the popular fp8 formats (se2m5 to se5m2), unsigned formats, weirder lengths (19-bit Nvidia se8m10 tensorfloat/“tf32", 24-bit AMD se7m16 fp24), shorter floats (fp4, fp2, fp1!), integers (int4, int1), or the various new proposals to leave the mold of IEEE 754 entirely.

The tag system provides CBOR with a way to easily accommodate these formats as they become more popular in the AI/ML applications.

Grüße, Carsten