Re: [Cbor] Handling duplicate map keys

Carsten Bormann <cabo@tzi.org> Wed, 26 February 2020 15:17 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0826D3A093F for <cbor@ietfa.amsl.com>; Wed, 26 Feb 2020 07:17:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a1SrqcV9Cwqn for <cbor@ietfa.amsl.com>; Wed, 26 Feb 2020 07:17:28 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2E2773A093B for <cbor@ietf.org>; Wed, 26 Feb 2020 07:17:28 -0800 (PST)
Received: from [192.168.217.147] (p548DC4D8.dip0.t-ipconnect.de [84.141.196.216]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 48SKD62f2KzyhL; Wed, 26 Feb 2020 16:17:26 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.60.0.2.5\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <CANh-dX=EVDa4EwrgWKof4kCD3nkfV3BvH0Cg5ZKOivmXJ_Dm8g@mail.gmail.com>
Date: Wed, 26 Feb 2020 16:17:25 +0100
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 604423045.835664-06379a7c63a6472ab66914465c4229ea
Content-Transfer-Encoding: quoted-printable
Message-Id: <43D4F474-609D-4B25-981B-F98463DD1957@tzi.org>
References: <CANh-dX=EVDa4EwrgWKof4kCD3nkfV3BvH0Cg5ZKOivmXJ_Dm8g@mail.gmail.com>
To: Jeffrey Yasskin <jyasskin=40google.com@dmarc.ietf.org>
X-Mailer: Apple Mail (2.3608.60.0.2.5)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/JkyW2y-Y26kdWcXklTTof5nxuHQ>
Subject: Re: [Cbor] Handling duplicate map keys
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Feb 2020 15:17:32 -0000

I finally generated a pull request, trying to summarize the mailing list discussion, stating facts about what approaches can be used under which circumstances.

https://github.com/cbor-wg/CBORbis/pull/171

Options B and C were pretty much excluded by the discussion, and I think we also have a consensus to address this, i.e., not do A.  So the text has a bit of D, but also is explicit about choices and consequences.

I hope this is close enough to the (rough) consensus we had in the Singapore meeting.

Comments welcome (e.g., in the interim we’ll have in 42 minutes).

Grüße, Carsten



> On 2019-11-21, at 13:52, Jeffrey Yasskin <jyasskin=40google.com@dmarc.ietf.org> wrote:
> 
> Re https://github.com/cbor-wg/CBORbis/issues/63, we had a long discussion today at IETF106 about what CBORbis should say about how protocols handle the (invalid) situation in which a CBOR map contains two entries with the same key.
> 
> Clearly basic-validating parsers will return an error. The disagreement comes from what if anything to require from non-validating parsers. An interesting piece of background information comes from CVE-2013-4787 and the ZIP format: A ZIP central directory can have a single filename multiple times (with no mention of this possibility in the specification). The Android signature checker (in Java) checked the last occurrence, while the package loader (in C) used the first. It takes two mistakes for that kind of vulnerability to happen: 1) The implementer has to parse the format twice with two different parsers instead of parsing once and operating over that parse result twice. 2) The format has to fail to say how to deal with invalid input. We have control over (2).
> 
> CBORbis currently says:
> 
> "A CBOR-based protocol MUST define what to do when a receiving application does see multiple identical keys in a map. The resulting rule in the protocol MUST respect the CBOR data model: it cannot prescribe a specific handling of the entries with the identical keys, except that it might have a rule that having identical keys in a map indicates a malformed map and that the decoder has to stop with an error. Duplicate keys are also prohibited by CBOR decoders that enforce validity (Section 5.4)."
> 
> A) Leaving this alone is one of the possibilities.
> 
> B) CBOR itself could say that a protocol implementation with a non-validating decoder MUST use the *first* entry with a particular key and discard the rest. (Note that this works even with a streaming decoder, since it's a requirement at the protocol level.) This would ban three kinds of implementations:
> 1) Ones that build up a native map from the full content of the CBOR map, and unconditionally overwrite any existing entry with the next entry read from the CBOR map. These can be fixed by checking whether an entry with the key is already present before writing to it. Some native map APIs might not be able to do this without a redundant lookup.
> 2) Ones that initialize a native structure with default values (as opposed to "missing") and then overwrite those values with the entries read from the CBOR map. These can be fixed by adding a "missing" value to all the native structure fields, which might increase the storage needed.
> 3) Ones that perform some real-world action as they stream each CBOR map entry and never store the entries at all. I don't see a way to get those to ignore duplicates. I also worry about the real-world effect of performing one of those actions twice if it was important to use a map instead of an array.
> 
> C) CBOR could require protocols to use the *last* entry with a given key. This requires any implementation to scan to the end of a map before using any of the content, so it's probably a bad idea. It is potentially easier to implement in a decoder that builds up a native map of the content.
> 
> D) CBOR could allow protocols to leave the behavior of duplicate keys unspecified if they explicitly declare that the fields are not security-sensitive.
> 
> 
> I prefer B and then D. What do other people think?
> 
> Jeffrey
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor