Re: [Cbor] Decoding numbers and compliance verification in dCBOR

Anders Rundgren <anders.rundgren.net@gmail.com> Sun, 12 March 2023 06:09 UTC

Return-Path: <anders.rundgren.net@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 979A2C14CE4D for <cbor@ietfa.amsl.com>; Sat, 11 Mar 2023 22:09:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.096
X-Spam-Level:
X-Spam-Status: No, score=-7.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BkIj8ZJjsXkq for <cbor@ietfa.amsl.com>; Sat, 11 Mar 2023 22:09:27 -0800 (PST)
Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BDE37C14F74E for <cbor@ietf.org>; Sat, 11 Mar 2023 22:09:27 -0800 (PST)
Received: by mail-wm1-x32d.google.com with SMTP id p16so5885620wmq.5 for <cbor@ietf.org>; Sat, 11 Mar 2023 22:09:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678601366; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=gNfUGuULePK8gix/a1wnKogQQWIyaBa0tA5BZxN+rog=; b=qgHHisoxOR6+75nN1FDodq5VTGN1fpxFMChbraoE8HtEk4sToBu8tTdsoC4gi7L+/x HXlsXaAw0Qeikdghg9l2JOvqHoj9KXrk1dAm2iYxgJDSjzVUIL+UfctWJ1MQFeThOZ0U 3x9gMlYlhyLFaikF82Kpjjb3sQtJqLPBPhOFrRtgbzv5frN2K2LdjIApjuohG3HiTBRJ +icsa0wqidj1YxUF9MXyuFyIcNFs1I92hHermhvfXT0l24lkEcSfqApnJk+dErltK1Qw jVJONl1ndrQ5Ni9up5i7j/mL4dF0/qAxwvdHGygbqAElDiaU0j+lozQ76L/p4jPBpX3z UmUw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678601366; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gNfUGuULePK8gix/a1wnKogQQWIyaBa0tA5BZxN+rog=; b=qsyqbSoMkD1w7W9We6/9zQE5/+kYl5UGN9qG7ex9ZTVpuw+/Io9HLU7yAYGy+jTArC 2uo2GFJxlq19ujoXcKp2vQhaV1KJYdPrk3PRTEwa6i72yLRH5p7JPPCELIiZf4uN1y5T xHRpB9ok97gT9N8cZ+KQQlmxHeTsOMvWTqL7VWiCltq57tzE4cmOx/SwMR7FMxV4cyA8 MeCMK6wCcES4mWchkBJyCvhhlltkv3CO5p7eMJyFAtYszRpXhPZIHbWEnqBjBstOP0gi syzxL5useX9Xoqqw9Hh3+DaDoF2mymoYlLK0BRdF05kEMpN3BqndD4XQyhm823Xgsld6 s+Tg==
X-Gm-Message-State: AO0yUKWGaq3m/UE/KFtu2dDA7PQ+bHEqfXJWS8W68GpbQKxXWX2LT/hS FIOHhroGHFYN/BmrB62QpRHxsz6msMM=
X-Google-Smtp-Source: AK7set9i4xRm6RWtqElYj2c+w3hGGpJlhgniPDRi2luKV9jsSpxXu19eo+A5WQIXTI28r+iwxtpgRA==
X-Received: by 2002:a05:600c:4f4e:b0:3eb:3945:d406 with SMTP id m14-20020a05600c4f4e00b003eb3945d406mr7192131wmq.16.1678601365680; Sat, 11 Mar 2023 22:09:25 -0800 (PST)
Received: from ?IPV6:2a01:e34:ec4e:5670:58c7:2aad:cc47:2c3? ([2a01:e34:ec4e:5670:58c7:2aad:cc47:2c3]) by smtp.googlemail.com with ESMTPSA id j26-20020a05600c1c1a00b003e8f0334db8sm5152197wms.5.2023.03.11.22.09.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 11 Mar 2023 22:09:25 -0800 (PST)
Message-ID: <d0f82da7-77c6-86f9-f1b8-a9cd38dbc5ee@gmail.com>
Date: Sun, 12 Mar 2023 07:09:23 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0
To: Laurence Lundblade <lgl@island-resort.com>, Wolf McNally <wolf@wolfmcnally.com>
Cc: Carsten Bormann <cabo@tzi.org>, cbor@ietf.org
References: <83BF059D-BEF2-4C5F-9DE8-7A99A529833F@island-resort.com> <8999DCEA-6572-4A69-85EC-AA7AD0170837@tzi.org> <38de8a78-0140-45af-b4fb-f601265809e4@gmail.com> <09207367-8B74-434C-89B1-881780DCECA5@wolfmcnally.com> <B3E53C3A-7205-4D3F-B3A3-ED27D52D2A70@island-resort.com>
Content-Language: en-US
From: Anders Rundgren <anders.rundgren.net@gmail.com>
In-Reply-To: <B3E53C3A-7205-4D3F-B3A3-ED27D52D2A70@island-resort.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/A8z0gNJbiwWigRT42zH165c6ArA>
Subject: Re: [Cbor] Decoding numbers and compliance verification in dCBOR
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Mar 2023 06:09:31 -0000

AFAICT, properly defined deterministic encoding rules can actually *simplify* decoders.

Map sorting gets trivial, you just keep a copy of the preceding key and verify that the new key is bigger.

However, not sticking to Rule 2 in section 4.2.2 of RFC 8949 is asking for trouble.  Really constrained systems probably do not even use floating point, making the space argument seems pretty weak.

Although not an IETF document the following specification holds a condensed description of what I consider a "reasonable" dCBOR scheme:
https://cyberphone.github.io/javaapi/org/webpki/cbor/package-summary.html#deterministic-serialization

Anders

On 2023-03-11 21:47, Laurence Lundblade wrote:
> On Mar 11, 2023, at 12:27 AM, Wolf McNally <wolf@wolfmcnally.com> wrote:
>>
>> For us, minimizing absolute size of serialization of a numeric value is less of a goal than having a single, deterministic serialization for a given value. In addition, dCBOR codec implementors should be able to forego floating point or bignum support, and still be able to expect the same canonical serialization for all the integers representable by major types 0 and 1.
>>
> 
> Yes, that generally makes sense, but a few comments.
> 
> You have a requirement that all decoders MUST validate compliance with the dCBOR spec — check sort order, check that 0 is not encoded as a float or bignum. This check is to re enforce hygiene in the dCBOR ecosyste, not a necessity for correct functioning of the decoder.
> 
> This results in extra code on the decode side, not the encode side. The decode side is always bigger and more complex already just for dealing with regular input validation.
> 
> I can see that for highly constrained protocols this hygiene check is much smaller, maybe zero for integers only, no maps….
> 
> So is it really is a MUST, not a really strong SHOULD? Something like “while dCBOR doesn’t absolutely require full compliance (e.g. map is sorted correctly) on decode, this exception is only intended for use in very constrained environments that just can’t afford the extra code size for the check. These are only environments where a few hundred bytes of extra code would affect the cost of the device.”
> 
> This is not a critical issue for me in the end. I mostly want to make sure we’re clear about the implications of the requirement, because it is essentially a prophylaxis for the ecosystem, not something needed for correct function of the decoder.
> 
> LL
> 
> 
> 
>> ~ Wolf
>> Lead Researcher, Blockchain Commons
>>
>>> On Mar 10, 2023, at 11:06 PM, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:
>>>
>>> If shortest possible number representation is an absolute goal, you already have a problem with "pure" integers.  An integer value of 1099511627775 (0xffffffffff) would actually yield two bytes less(!) using the Bignums type.
>>>
>>> It would (IMO) be unwise trying to fix this in dCBOR.
>>>
>>> Anders
>>>
>>> On 2023-03-10 20:14, Carsten Bormann wrote:
>>>> Hi Laurence,
>>>> I think your arguments are important.
>>>> But for a receiver of information, there are also benefits from knowing what to expect.
>>>> In particular map processing can be simpler if only a specific sequence of the entries needs to be accepted.
>>>> Whether floating point values are important for an application may also influence whether it is worth to expend some additional processing.  The simplest devices often get by without any floating point operations.  Once floating point becomes important for the functioning of a device, processors such as those of ARM’s Cortex M4 series can help reduce the power-hungry on-time of the device by efficiently performing the floating point computations.  With these processors (and certainly with processors of the smartphone, laptop, and server classes), the requirements of dCBOR become almost trivial.
>>>> I don’t want to take a particular side here, just point out that not all applications are the same.  Having a go-to profile of CBOR that covers an interesting subset of applications appears to be a net win to me.
>>>> Additional CDDL support such as that provided in draft-bormann-cbor-cddl-more-control with .cbordet and .cborseqdet may be desirable.  There is currently no way in CDDL to express the rules about number representation that dCBOR has adopted; that may be one interesting gap.
>>>> Little aside: When I started to think about this, I started to wonder: What is the dCBOR way to represent 65536000000.0?
>>>> 5 bytes:
>>>>>> 65536000000.0.to_cbor.hexs
>>>> => "fa 51 74 24 00"
>>>> 9 bytes:
>>>>>> 65536000000.to_cbor.hexs
>>>> => "1b 00 00 00 0f 42 40 00 00”
>>>> Here, the floating point representation is shorter…
>>>> But wait, even if float32 cannot be exact (e.g., for 65536000001), try:
>>>> 7 bytes:
>>>>>> CBOR.decode("c2 45 0f 42 40 00 01".xeh)
>>>> => 65536000001
>>>> 9 bytes:
>>>>>> CBOR.decode("c2 45 0f 42 40 00 01".xeh).to_cbor.hexs
>>>> => "1b 00 00 00 0f 42 40 00 01”
>>>> (Number systems are always an unwieldy part of representation formats.)
>>>> Grüße, Carsten
>>>> _______________________________________________
>>>> CBOR mailing list
>>>> CBOR@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/cbor
>>>
>>
>