Re: [Cbor] Handling duplicate map keys

Laurence Lundblade <lgl@island-resort.com> Sat, 23 November 2019 15:21 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 91E93120116 for <cbor@ietfa.amsl.com>; Sat, 23 Nov 2019 07:21:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kwlNqwWYRjCy for <cbor@ietfa.amsl.com>; Sat, 23 Nov 2019 07:21:51 -0800 (PST)
Received: from p3plsmtpa06-09.prod.phx3.secureserver.net (p3plsmtpa06-09.prod.phx3.secureserver.net [173.201.192.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6F0EF1200B9 for <cbor@ietf.org>; Sat, 23 Nov 2019 07:21:51 -0800 (PST)
Received: from [10.86.0.74] ([45.56.150.43]) by :SMTPAUTH: with ESMTPA id YXELi8ky2Xf0IYXELicbL1; Sat, 23 Nov 2019 08:21:50 -0700
From: Laurence Lundblade <lgl@island-resort.com>
Message-Id: <F81E1A57-6072-44FA-A148-8F3ED7520791@island-resort.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_B634E199-74B2-4A14-AFEE-0FF99865E609"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Sat, 23 Nov 2019 07:21:49 -0800
In-Reply-To: <CANh-dX=1gkAtfSG-yzCsVAkk=oLM-=dN_JLCr1kQK3d6Jb0fSw@mail.gmail.com>
Cc: Jeffrey Yasskin <jyasskin=40google.com@dmarc.ietf.org>, cbor@ietf.org
To: Jeffrey Yasskin <jyasskin@chromium.org>
References: <CANh-dX=EVDa4EwrgWKof4kCD3nkfV3BvH0Cg5ZKOivmXJ_Dm8g@mail.gmail.com> <E5C74E1F-A5F4-4AB8-9787-3999C4697C3B@island-resort.com> <CANh-dX=1gkAtfSG-yzCsVAkk=oLM-=dN_JLCr1kQK3d6Jb0fSw@mail.gmail.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-CMAE-Envelope: MS4wfNaX6II8ldZcz/TGAMVkA5R9shhcbtMSducBkF5d874PpyHf+jo9Ut4LcNhVVzEKkMNqN6/wi8onvZQLMWTlvataL2HPofnt7Qhmz1bVEol/vkImn+4f o3uVfkgMbGK6XmT889gaPW/r5QdeiErJp/8oYhdHY/ZwmyIVgaFGtrrxtdZNduNQ0cdX3T6y4KdzcMyGNQiX+djiUfNDAFK0R5CK7try41bH0iUokfdgvtBl bPNksb88iLb58nxEI0nF8w==
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/bxBrfblqFsyCmHPlneN6unJMJNY>
Subject: Re: [Cbor] Handling duplicate map keys
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Nov 2019 15:21:53 -0000


> On Nov 23, 2019, at 1:11 AM, Jeffrey Yasskin <jyasskin@chromium.org> wrote:
> 
> On Sat, Nov 23, 2019 at 10:00 AM Laurence Lundblade <lgl@island-resort.com <mailto:lgl@island-resort.com>> wrote:
> Here’s a rough proposed text:
> 
> The protocol designer should make a choice for maps as to whether duplicates are allowed or not, particularly as to whether duplicates would cause security or functionality problems.
> 
> The protocol designer should only require duplicate detection when necessary as it can have the following implications:
> Some generic decoders do not support duplicate detection because the underlying facilities in their programming environment to represent maps can’t detect duplicates
> Some generic decoders do not support duplicate detection because it requires more code and is not required
> It requires more resources to implement: 1) memory to store all the keys in the map, 2) more code, 3) more CPU time, as much as O(n^2).. This is particularly an issue with big maps in constrained environments.
> It is suggested that protocol designers require duplicate detection only no the particular maps for which there is an issue.
> 
> Decoders will typically fall into one of these categories:
> Full duplicate detection
> Pass all items up to caller, allowing the caller to implement duplicate detection or not
> No duplicate detection
> A generic decoder should identify which it is. Some may support more than one, selectable by configuration.
> 
> I don't think generic decoders are relevant here. Protocol implementations often accumulate their input into a data structure even if they use a streaming decoder to read the input. Any such protocol can detect duplicates (of the keys it uses at all) for at most the cost of an extra bit per field.

My intention was that “pass all items up” covers your case. 

My QCBOR + t_cose implementations work exactly as you describe to implement dup detection in COSE header parameters. It’s the data structures that hold the COSE header params that are used to detect the duplicates, not the generic decoder (QCBOR).  This seems like a good way to handle this problem for lots of use cases. 


> 
> Here’s why.
> 
> I see no value in take first or take last over duplicate detection. In all three all, the keys encountered must be recorded and processed. If the map is large this will take a lot of memory. In all cases there is some extra code and CPU. CPU time is worse than O(n), perhaps O(n^2). It doesn’t matter if the use case is streaming or not. It is the memory, code and CPU time that matter.  (Take last can only be implemented by decoding the whole map, as it is only after decoding the whole map that it can know it has encountered the last; in a way you cannot do the last for a streaming decoder).
> 
> This is a guess — I don’t think map libraries that don’t support duplicate detection commonly support  take first. I suspect some do take first, some do take last and some do take random. To me that makes any mention of take first or take last in the specification not very valuable especially as it doesn’t have much memory, CPU or code advantage over duplicate detection.
> 
> I think you make a good point that if a protocol is going to specify "take first", it can probably specify "error on duplicate key" for the same cost. That might imply some much simpler text:
> 
> "Protocols based on CBOR SHOULD fail with an error if a map contains a duplicate key, except that if the key isn't used at all, they MAY ignore it instead. Protocols that do not reject duplicate keys MUST (?) document why the cost of rejecting duplicates is too high and why accepting them will not lead to security vulnerabilities. An array might be a better choice for such protocols.”

I think I’d invert it and say that protocols that require duplicate detection for security reasons should describe that requirement in security considerations so the implementor gets a good solid hint that they need to worry about it.

There is lot of text implying that duplicates are bad, but I think it would be worth being explicit.

You are an idiot if you design a protocol that considers duplicate input valid or necessary. Duplicate map keys are always an error in CBOR and will not work with many generic decoders.

> 
> Do we have any concrete examples of protocols or formats built on CBOR that don't store anything while parsing their input? One of those might provide a good example of a protocol that shouldn't reject duplicates.


Duplicates will / must only occur in error situations. It is always right to reject encoded CBOR that has duplicates.

The main thing is that when duplicates have security implications for the protocol, that must be called out and the implementor told to do something about them. 

To say it more in prose: They should either use a generic CBOR decoder that errors out on duplicates, or a CBOR decoder that passes duplicates up so the protocol implementation on top of CBOR can do the detection. What they must not do is use a decoder that does take first, take last or randomly takes one of the occurrences.

LL