Re: [Cbor] Decoding numbers and compliance verification in dCBOR

Wolf McNally <wolf@wolfmcnally.com> Sun, 12 March 2023 01:41 UTC

Return-Path: <wolf@wolfmcnally.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D6B5AC14CE46 for <cbor@ietfa.amsl.com>; Sat, 11 Mar 2023 17:41:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.893
X-Spam-Level:
X-Spam-Status: No, score=-1.893 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DC_PNG_UNO_LARGO=0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wolfmcnally-com.20210112.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jYTQzIcIJ5Qy for <cbor@ietfa.amsl.com>; Sat, 11 Mar 2023 17:41:16 -0800 (PST)
Received: from mail-oi1-x233.google.com (mail-oi1-x233.google.com [IPv6:2607:f8b0:4864:20::233]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1C2F0C14CE38 for <cbor@ietf.org>; Sat, 11 Mar 2023 17:41:16 -0800 (PST)
Received: by mail-oi1-x233.google.com with SMTP id be16so7206048oib.0 for <cbor@ietf.org>; Sat, 11 Mar 2023 17:41:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wolfmcnally-com.20210112.gappssmtp.com; s=20210112; t=1678585275; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=D2SQYceCLZVOcJtmOeZBlD6hvNcOVi46wbET5dOG0tc=; b=NoJ42OdQdx+yZjcJkeW8FsTn0hNntuS5hzGpvQ6Be4j7+hzA2JBNpr5n089lP3eZSB pNZV1D9kbat3Uu/rqWl/1rFoJtrSVPEOB2ZyAzCI/486Y8P8f4NaPEP5CPm7di1f7cYU e7uBziKF3ol1Q7vNrO7XZ+CM9Mtlz9w917v6WYQzcSyYrF7LsbFBmYU2U4sCW1JKiIGb INOyP7ZBpnmta4kDSaBl24904Y/DW2Ojha4dMFoKGaws/ybd2r0NUm18W6OfSnz5EpBI TRKasa4oQOV65YLUbXZ1rE9GGQ4kTZAujvj5gojV56NxXVD2MeRBt9J00I/9j4UsnCP0 cmzA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678585275; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=D2SQYceCLZVOcJtmOeZBlD6hvNcOVi46wbET5dOG0tc=; b=ZVQoetPLMr1Y+1jwb4JoSGFJUBJn2+MyMk73cM9ZgH/HdfZduDLPjNbStxGRiJ1dxb 2PNOY+4M8/gbV4e/8qcbepx8xgZclQNuNx713WxQQ8FlwtsZ5akvIy+kVGTHviOOGh/t OChPLLwk3b8EnlF3igN3UrBkgVIuehuon7d4+g+YRwJ+r6hDmZTVpgJvfrgScrGyPgoH y+Hx2lilUDf3ynEhZm4sX5Izlg0mOIRCtFab2tk93FKheoa1nWaDY597qd3uiE+x1bw9 +c6P8Fm7Q3UHNnF2gOMUjMbuX4YIp+r2KrYdxyzxU/EmunwQV/yuj5BDLHLI3iuIfXaU NqXw==
X-Gm-Message-State: AO0yUKU/IYLW2EhNNYt2j9Ua4PhDk2P337Ro5OpxL5H72WirYM0gQXty DihOUKKe//BdB3dfJNPxjPTjNQ==
X-Google-Smtp-Source: AK7set/0CH6OMj61l082m1/Tzc7uRrSM/FN1qZDR6y6JTrN5PeXOX71Z0+rjTow1vcPn6TwfLs7udg==
X-Received: by 2002:a05:6808:1a03:b0:383:bd30:88ba with SMTP id bk3-20020a0568081a0300b00383bd3088bamr3570621oib.6.1678585275206; Sat, 11 Mar 2023 17:41:15 -0800 (PST)
Received: from smtpclient.apple ([185.222.243.89]) by smtp.gmail.com with ESMTPSA id s2-20020a4ae542000000b00524f381f681sm1627663oot.27.2023.03.11.17.41.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 11 Mar 2023 17:41:14 -0800 (PST)
From: Wolf McNally <wolf@wolfmcnally.com>
Message-Id: <98B39A38-D6F7-46C8-8BFE-ABAE22B7F1FB@wolfmcnally.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_FF80DDFA-48CD-46A3-8BAD-D17E32A9AF21"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\))
Date: Sat, 11 Mar 2023 17:41:02 -0800
In-Reply-To: <B3E53C3A-7205-4D3F-B3A3-ED27D52D2A70@island-resort.com>
Cc: Anders Rundgren <anders.rundgren.net@gmail.com>, Carsten Bormann <cabo@tzi.org>, cbor@ietf.org
To: Laurence Lundblade <lgl@island-resort.com>
References: <83BF059D-BEF2-4C5F-9DE8-7A99A529833F@island-resort.com> <8999DCEA-6572-4A69-85EC-AA7AD0170837@tzi.org> <38de8a78-0140-45af-b4fb-f601265809e4@gmail.com> <09207367-8B74-434C-89B1-881780DCECA5@wolfmcnally.com> <B3E53C3A-7205-4D3F-B3A3-ED27D52D2A70@island-resort.com>
X-Mailer: Apple Mail (2.3731.400.51.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/8NVpVtVyeml8NqJqJaadgOzZBEw>
Subject: Re: [Cbor] Decoding numbers and compliance verification in dCBOR
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Mar 2023 01:41:19 -0000

Laurence,

Thanks for your comments. Blockchain Commons is targeting dCBOR at a number of applications where security is a high priority. So if you’re in an environment where you *know* that your agent will never decode a document that is malformed, or into which malicious content may have been smuggled, then you could relax the validation at decode time considerably both for the codec and at the application level. But this situation is not the default case we are considering, hence the use of MUST.

If performance becomes an issue then it will of course be up to the developer to determine whether they are increasing the potential attack surface by reducing validation requirements. And if you want to use a decoder that is lax where we specify stringency, then pretty much any other existing decoder will do. It is a non-goal of our specification to have the tightest, smallest codec.

In addition, throughout the I-D we’re emphasizing that while as much validation work as possible must be placed on the codec, it can’t get you all the way there: the protocol specifier/application developer MUST perform further validations, for example to ensure that a map contains no additional unauthorized data in undocumented entries. This is something the next version of the I-D will make even clearer, and is why I am treating codec-level validation, API best practices, and application developer responsibilities with equal emphasis: determinism isn’t something that can be achieved in a completely automated way (at least not with the tools at hand.)

BTW, if you’re interested I recommend following along with the editor’s copy of the I-D; it’s where I’m putting all the incremental changes that will lead to the 01 revision:

https://blockchaincommons.github.io/WIPs-IETF-draft-deterministic-cbor/draft-mcnally-deterministic-cbor.html

Of course, we also welcome issues and PRs in the repo itself:

https://github.com/BlockchainCommons/WIPs-IETF-draft-deterministic-cbor
BlockchainCommons/WIPs-IETF-draft-deterministic-cbor: Norms and practices for encoding and validating deterministic CBOR (dCBOR).
github.com

~ Wolf

> On Mar 11, 2023, at 12:47 PM, Laurence Lundblade <lgl@island-resort.com> wrote:
> 
> On Mar 11, 2023, at 12:27 AM, Wolf McNally <wolf@wolfmcnally.com> wrote:
>> 
>> For us, minimizing absolute size of serialization of a numeric value is less of a goal than having a single, deterministic serialization for a given value. In addition, dCBOR codec implementors should be able to forego floating point or bignum support, and still be able to expect the same canonical serialization for all the integers representable by major types 0 and 1.
>> 
> 
> Yes, that generally makes sense, but a few comments.
> 
> You have a requirement that all decoders MUST validate compliance with the dCBOR spec — check sort order, check that 0 is not encoded as a float or bignum. This check is to re enforce hygiene in the dCBOR ecosyste, not a necessity for correct functioning of the decoder.
> 
> This results in extra code on the decode side, not the encode side. The decode side is always bigger and more complex already just for dealing with regular input validation.
> 
> I can see that for highly constrained protocols this hygiene check is much smaller, maybe zero for integers only, no maps….
> 
> So is it really is a MUST, not a really strong SHOULD? Something like “while dCBOR doesn’t absolutely require full compliance (e.g. map is sorted correctly) on decode, this exception is only intended for use in very constrained environments that just can’t afford the extra code size for the check. These are only environments where a few hundred bytes of extra code would affect the cost of the device.”
> 
> This is not a critical issue for me in the end. I mostly want to make sure we’re clear about the implications of the requirement, because it is essentially a prophylaxis for the ecosystem, not something needed for correct function of the decoder.
> 
> LL
> 
> 
> 
>> ~ Wolf
>> Lead Researcher, Blockchain Commons
>> 
>>> On Mar 10, 2023, at 11:06 PM, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:
>>> 
>>> If shortest possible number representation is an absolute goal, you already have a problem with "pure" integers.  An integer value of 1099511627775 (0xffffffffff) would actually yield two bytes less(!) using the Bignums type.
>>> 
>>> It would (IMO) be unwise trying to fix this in dCBOR.
>>> 
>>> Anders
>>> 
>>> On 2023-03-10 20:14, Carsten Bormann wrote:
>>>> Hi Laurence,
>>>> I think your arguments are important.
>>>> But for a receiver of information, there are also benefits from knowing what to expect.
>>>> In particular map processing can be simpler if only a specific sequence of the entries needs to be accepted.
>>>> Whether floating point values are important for an application may also influence whether it is worth to expend some additional processing.  The simplest devices often get by without any floating point operations.  Once floating point becomes important for the functioning of a device, processors such as those of ARM’s Cortex M4 series can help reduce the power-hungry on-time of the device by efficiently performing the floating point computations.  With these processors (and certainly with processors of the smartphone, laptop, and server classes), the requirements of dCBOR become almost trivial.
>>>> I don’t want to take a particular side here, just point out that not all applications are the same.  Having a go-to profile of CBOR that covers an interesting subset of applications appears to be a net win to me.
>>>> Additional CDDL support such as that provided in draft-bormann-cbor-cddl-more-control with .cbordet and .cborseqdet may be desirable.  There is currently no way in CDDL to express the rules about number representation that dCBOR has adopted; that may be one interesting gap.
>>>> Little aside: When I started to think about this, I started to wonder: What is the dCBOR way to represent 65536000000.0?
>>>> 5 bytes:
>>>>>> 65536000000.0.to_cbor.hexs
>>>> => "fa 51 74 24 00"
>>>> 9 bytes:
>>>>>> 65536000000.to_cbor.hexs
>>>> => "1b 00 00 00 0f 42 40 00 00”
>>>> Here, the floating point representation is shorter…
>>>> But wait, even if float32 cannot be exact (e.g., for 65536000001), try:
>>>> 7 bytes:
>>>>>> CBOR.decode("c2 45 0f 42 40 00 01".xeh)
>>>> => 65536000001
>>>> 9 bytes:
>>>>>> CBOR.decode("c2 45 0f 42 40 00 01".xeh).to_cbor.hexs
>>>> => "1b 00 00 00 0f 42 40 00 01”
>>>> (Number systems are always an unwieldy part of representation formats.)
>>>> Grüße, Carsten
>>>> _______________________________________________
>>>> CBOR mailing list
>>>> CBOR@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/cbor
>>> 
>> 
>