[Cbor] Unsigned and negative

Carsten Bormann <cabo@tzi.org> Tue, 08 September 2020 16:44 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 269CF3A0FB6 for <cbor@ietfa.amsl.com>; Tue, 8 Sep 2020 09:44:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id 45952nVx6iBf for <cbor@ietfa.amsl.com>; Tue, 8 Sep 2020 09:44:42 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 93E5D3A0FB5 for <cbor@ietf.org>; Tue, 8 Sep 2020 09:44:42 -0700 (PDT)
Received: from [] (p5089ae91.dip0.t-ipconnect.de []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Bm9wn0wR5zyQC; Tue, 8 Sep 2020 18:44:41 +0200 (CEST)
From: Carsten Bormann <cabo@tzi.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Mao-Original-Outgoing-Id: 621276280.155665-c62ea111c43b0882f7d277219c099fda
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.\))
Date: Tue, 8 Sep 2020 18:44:40 +0200
Message-Id: <71E87728-2381-4FF3-8134-A45D34D55983@tzi.org>
To: cbor@ietf.org
X-Mailer: Apple Mail (2.3608.
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/S_tZJkgMX1c9EOUcYZjUyuPcGp4>
Subject: [Cbor] Unsigned and negative
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2020 16:44:45 -0000

I see that we have several IESG comments that question the “unsigned or negative” language that we are using for describing integers in ℤ.

Most people are brought up with a trinity of positive, zero, and negative integers, as this has a certain symmetry.
In the math world, there is some confusion about “natural numbers”, which are sometimes the set of positive integers (ℕ₁), sometimes the set of non-negative integers (ℕ₀, which we call “unsigned” in CBOR).

In programming languages, integer numbers (ℤ restricted to some range) have been available since the 1950s.  When it became obvious that ℕ₀ was another interesting range as it could support twice as many positive numbers and machine-supported modulo arithmetic, it was much harder to agree on a name.  “Natural" was ambiguous.  Some 1970s languages (notably Modula-2) adopted “Cardinal".  But these did not stick.  What did stick was “unsigned”, as introduced into the C language in the late 1970s.  Initially, this was intended as an attribute that could be added to a type name such as int or long, but the type name could be left off (defaulting to int), and thus “unsigned” became the name for ℕ₀ in many environments.

CBOR has non-negative and negative numbers, separated into two major types (“unsigned” and “negative”) so it is easier to support unsigned numbers, which are rather important in constrained environments.  Instead of creating overlapping types with “unsigned” and “signed”, CBOR opted for a separate “negative” type, inspired by protobuf’s slightly weird integer encoding.  Protobuf puts the sign into the LSB and shifts the integer one to the left (“ZigZag encoding”), which is relatively expensive to do.  CBOR instead encodes the sign bit into the major type, so that the combination of unsigned and negative integers creates 9-, 17-, 33-, and 65-bit representations of signed integers (which does cause a little cognitive dissonance for advocates of two’s complement signed representations, but avoids having to encode the representation data type for numbers that could be represented in both unsigned and signed platform types).

So when we talk about ℕ₀, we don’t have to use the clumsy “non-negative”, we can just say “unsigned" (the name we gave to major type 0).  With major type 1, the sign is implied negative, so we just call it “negative”.  The symmetry is the line between 0 (0x00 in CBOR) and -1 (0x20 in CBOR); there is no zero in the “negative” ranges (as it should not be if the name is taken for face value).

I propose we continue to use this clean terminology — we can still say “positive integer” if we really want to restrict something to ℕ₁, but the CBOR types for integers are “unsigned” (mt=0, ℕ₀) and “negative” (mt=1, ℤ \ ℕ₀).  Tag 2/3 continue exactly this split for more than 64-bit (65-bit if you think signed) integers.

Grüße, Carsten