Re: [Cbor] Handling duplicate map keys

Laurence Lundblade <lgl@island-resort.com> Sat, 23 November 2019 02:00 UTC

From: Laurence Lundblade <lgl@island-resort.com>
Message-Id: <E5C74E1F-A5F4-4AB8-9787-3999C4697C3B@island-resort.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_4E416485-5BF0-4C6D-B865-31F33140BE6F"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Date: Sat, 23 Nov 2019 09:59:52 +0800
In-Reply-To: <CANh-dX=EVDa4EwrgWKof4kCD3nkfV3BvH0Cg5ZKOivmXJ_Dm8g@mail.gmail.com>
Cc: cbor@ietf.org
To: Jeffrey Yasskin <jyasskin=40google.com@dmarc.ietf.org>
References: <CANh-dX=EVDa4EwrgWKof4kCD3nkfV3BvH0Cg5ZKOivmXJ_Dm8g@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/y9EwKVL9Sz0GuuRl6rZ5GAplDxk>
Subject: Re: [Cbor] Handling duplicate map keys
Precedence: list

Here’s a rough proposed text:

The protocol designer should make a choice for maps as to whether duplicates are allowed or not, particularly as to whether duplicates would cause security or functionality problems.

The protocol designer should only require duplicate detection when necessary as it can have the following implications:
Some generic decoders do not support duplicate detection because the underlying facilities in their programming environment to represent maps can’t detect duplicates
Some generic decoders do not support duplicate detection because it requires more code and is not required
It requires more resources to implement: 1) memory to store all the keys in the map, 2) more code, 3) more CPU time, as much as O(n^2). This is particularly an issue with big maps in constrained environments.
It is suggested that protocol designers require duplicate detection only no the particular maps for which there is an issue.

Decoders will typically fall into one of these categories:
Full duplicate detection
Pass all items up to caller, allowing the caller to implement duplicate detection or not
No duplicate detection
A generic decoder should identify which it is. Some may support more than one, selectable by configuration.

Here’s why.

I see no value in take first or take last over duplicate detection. In all three all, the keys encountered must be recorded and processed. If the map is large this will take a lot of memory. In all cases there is some extra code and CPU. CPU time is worse than O(n), perhaps O(n^2). It doesn’t matter if the use case is streaming or not. It is the memory, code and CPU time that matter. (Take last can only be implemented by decoding the whole map, as it is only after decoding the whole map that it can know it has encountered the last; in a way you cannot do the last for a streaming decoder).

This is a guess — I don’t think map libraries that don’t support duplicate detection commonly support take first. I suspect some do take first, some do take last and some do take random. To me that makes any mention of take first or take last in the specification not very valuable especially as it doesn’t have much memory, CPU or code advantage over duplicate detection.

[Cbor] Handling duplicate map keys Jeffrey Yasskin
Re: [Cbor] Handling duplicate map keys Laurence Lundblade
Re: [Cbor] Handling duplicate map keys Jeffrey Yasskin
Re: [Cbor] Handling duplicate map keys Laurence Lundblade
Re: [Cbor] Handling duplicate map keys Laurence Lundblade
Re: [Cbor] Handling duplicate map keys Jeffrey Yasskin
Re: [Cbor] Handling duplicate map keys Laurence Lundblade
Re: [Cbor] [EXTERNAL] Re: Handling duplicate map … Mike Jones
Re: [Cbor] Handling duplicate map keys Jim Schaad
Re: [Cbor] Handling duplicate map keys Michael Richardson
Re: [Cbor] Handling duplicate map keys Laurence Lundblade
Re: [Cbor] Handling duplicate map keys David Kemp
Re: [Cbor] Handling duplicate map keys Carsten Bormann