[Cbor] Challenges in supporting bignum tag
Emile Cormier <emile.cormier.jr@gmail.com> Wed, 01 December 2021 23:38 UTC
Return-Path: <emile.cormier.jr@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 210743A0CFA for <cbor@ietfa.amsl.com>; Wed, 1 Dec 2021 15:38:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nu60VSHQWcnl for <cbor@ietfa.amsl.com>; Wed, 1 Dec 2021 15:38:35 -0800 (PST)
Received: from mail-yb1-xb2d.google.com (mail-yb1-xb2d.google.com [IPv6:2607:f8b0:4864:20::b2d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 30A283A0CF6 for <cbor@ietf.org>; Wed, 1 Dec 2021 15:38:35 -0800 (PST)
Received: by mail-yb1-xb2d.google.com with SMTP id f186so68153086ybg.2 for <cbor@ietf.org>; Wed, 01 Dec 2021 15:38:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=4htvDYwNCDhO+kT+ZarIi2Me0hvnN7VeqakU5LpTt2g=; b=Ctif0P8/NvKHXxMo8NqbX3n4LYvlkWSNgH1+Vq4Q5H+glwLlRQOnBmEqz2HpbMJT9x uPGG4qrGXXKK3ixDwp8OhWgfqz8oWBpqi/J7Sq/hs5f1floTFwmGOGds2dJp3arxgxhn ScC/VWvyORiH/RqmvVKU8x6daYDt+r5UJkd3UTcrgeDGolQOcIOJI7U4SLZYFXdfLkiD zvbm66/5UpdE92H7XcBc01TnZJTqaoytAq4XAvgrNdCrN/RnEspIj2WT6jluAGQK+hWb 5HOtvLyBDOFMhOBS74qLFRxvuNrV7pbXatwaQAVrVL8QYm+1QfqJ4zahgCWutxIHKKoD OelA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=4htvDYwNCDhO+kT+ZarIi2Me0hvnN7VeqakU5LpTt2g=; b=aVSCdRoOR36thZZrTg1rk0L4rS/APYI9r/hZDlKfnBXhxfb05g13IyTHkcItRQwr8o 6CK+wkjq4APTFeBmZix2kVpg1mbHk0pUw+4Q9ZXNsXAVQm+k7XMdpOcMEkklOnYVM9Ye esTWdcqqP/NHowL2doWM1sPZ4DtFH7m+V+Jyk99Savfn3sw6fw8Ooh087i3+jlHAv4vo lmVPeryT/vT9B3t3miubSPOYwNcRSxDL3dQVI1pkiduIbjcJAUNwj/VQy7upXH29x2o1 SidrWOtTzuGWU5O7yRk1+MT6Sbf96SjH9FKe3g0S2lScB0C2zcZSaAzRhVu3zj6ek9Go GexA==
X-Gm-Message-State: AOAM531MMOmcTHO24lvr3itXWpQ1t32wgLZaki9OgMC4wDxAh3+WnyN0 5iJmnxUX+POQ8QIAPAmdN+PHK6KhqYRLAx/8KAhfbKNmEBI=
X-Google-Smtp-Source: ABdhPJyU1jj/WtChQs0/rY8hQ+DETHF80hE8rla1QxHlSDyGLgvmZZDVILz2vURGnCx0YWedqWAFX+3qFnxUdoRXeRM=
X-Received: by 2002:a25:bf8f:: with SMTP id l15mr11116786ybk.670.1638401911474; Wed, 01 Dec 2021 15:38:31 -0800 (PST)
MIME-Version: 1.0
From: Emile Cormier <emile.cormier.jr@gmail.com>
Date: Wed, 01 Dec 2021 19:38:20 -0400
Message-ID: <CAM70yxDvL1HKZSz8FMwFb64Tvse6TrqUwNgTmeY7kbMYGazDjg@mail.gmail.com>
To: cbor@ietf.org
Content-Type: multipart/alternative; boundary="000000000000eece7d05d21e2b53"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/rBkvIe6h-RXqaswZ0qyz01AdcBs>
Subject: [Cbor] Challenges in supporting bignum tag
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Dec 2021 23:38:38 -0000
Hi Everyone, I'm in the process of supporting CBOR "bignums" (which I think should have been named "bigint") in my C++ serialization library, and have encountered the following challenges: 1. When transcoding to other binary formats (such as MessagePack), the sign of the bignum is lost because it is deduced by the tag value and is not part of the message payload. I have to resort to converting it to [sign, byte_array], which makes it no longer round-trippable to its original CBOR format. 2. In CBOR decoders that ignore tags, the sign of the bignum is lost when passed up to the application. This goes against the following passage in section 3.4 of the spec: *"Decoders do not need to understand tags of every tag number, and tags may be of little value in applications where the implementation creating a particular CBOR data item and the implementation decoding that stream know the semantic meaning of each item in the data flow."* So even if the application knows the semantic meaning of the received byte array (it's expecting a bignum), it cannot reconstitute the sign of the received magnitude without knowing the tag. Note that the same sort of information loss occurs with typed arrays, although it is more likely that an application would know the endianness and element size of a received typed array. 3. Most arbitrary precision libraries represent big integers in signed magnitude format. CBOR wants negative values to be stored as (-1 - N). I therefore need to subtract one from the magnitude stored in a negative multiprecision integer. Because the multiprecision subtraction algorithm needs to start from the least significant byte (to propagate carries), I have to perform subtraction on the entire bigint before I can serialize it to CBOR. This is because CBOR wants the bytes to be in big endian order. It's therefore not possible to process and serialize each source byte "on the fly" starting from the most significant byte. If I'm serializing a huge bigint, then there's a space doubling penalty in the temporary bigint needed for the subtraction. 4. Because of #3, my CBOR codec implementation now depends on multiprecision integer arithmetic. I'm now forced to do one of the following: a) Add a dependency to a multiprecision arithmetic library, which may not be the same one that's used by the application. For example, my library depends on Boost.Multiprecision, but the application wants to serialize a GMP bignum. The application now ends up with dependencies on *two* multiprecision libraries - yuck! b) Write the multiprecision subtraction by hand to avoid introducing a dependency on a multiprecision arithmetic library. While the algorithm isn't too complicated, what I write is certain to be slower than a multiprecision library. c) Require that the intermediary byte format passed to the encoder be the same as what CBOR expects (which may not be the same format as what other binary serialization formats want). For example, the header in my library which adds GMP serialization support could use GMP itself to perform the -1 subtraction and pass those bytes to the encoder. I should note here that the encoder interface in my library is agnostic of the serialization format. That is, the JSON encoder interface is the same as the CBOR encoder interface. This makes it easy for the application to serialize to different formats (JSON, CBOR, etc) without having to change anything in the application's serialization code. In short, my life would be much easier if CBOR bignums were different in the following way: - The sign is part of the message payload, and not "deduced" by the tag value, - The bytes represent the magnitude of the bignum: no subtraction/addition by 1 for negative values. There are other tags that currently depend on the existing bignum tag (bigdecimal, bigfloat, extended time), so I don't expect there to be much interest in a new "sane bignum" tag that addresses the "flaws" I brought up here. I'm just documenting the challenges I'm facing with the existing bignum tag so that they're considered if there's ever a fork of this standard, or if some other binary protocol wants to implement bignums. Overall I feel there's too much emphasis in CBOR on cramming everything into as few bytes as possible, and not enough on decoding simplicity and speed. I also think some tags have been "abused" for storing information that should otherwise be part of the message payload (all in the name of saving a few bytes). This makes tags go beyond their purpose of providing semantic meaning. Sorry if this comes across as a rant. It's not my intention to belittle anyone. I just wanted to share my challenges in implementing CBOR for my C++ serialization library. Perhaps someone here will have advice on how I can overcome these challenges. Cheers, Emile Cormier
- [Cbor] Challenges in supporting bignum tag Emile Cormier
- Re: [Cbor] Challenges in supporting bignum tag Carsten Bormann
- Re: [Cbor] Challenges in supporting bignum tag Laurence Lundblade
- Re: [Cbor] Challenges in supporting bignum tag Emile Cormier
- Re: [Cbor] Challenges in supporting bignum tag Emile Cormier