[Cbor] Handling duplicate map keys

Jeffrey Yasskin <jyasskin@google.com> Thu, 21 November 2019 12:52 UTC

Return-Path: <jyasskin@google.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E21A21208C3 for <cbor@ietfa.amsl.com>; Thu, 21 Nov 2019 04:52:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.499
X-Spam-Level:
X-Spam-Status: No, score=-17.499 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L73zArEEARuc for <cbor@ietfa.amsl.com>; Thu, 21 Nov 2019 04:52:21 -0800 (PST)
Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D254C120043 for <cbor@ietf.org>; Thu, 21 Nov 2019 04:52:20 -0800 (PST)
Received: by mail-qv1-xf2a.google.com with SMTP id y18so1339375qve.2 for <cbor@ietf.org>; Thu, 21 Nov 2019 04:52:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=9c+lPOCjysvM8bbfKuxwHaK90ygyxvwcWmaEEuK4b+E=; b=WY22PDEtl+wr90L8B0a4p1JJQnoIUaDiCM1coh9P/1CmEWxP8AtWGZ0wKlO7W3YCeT +w1lDYCQa4flgFGLOyMPW6NThhxxlApPP+54uRL92MiB7Scn7+K+ggcW6kjfUDKhhA9U QG5dF0bB509XRkhfBPcc3rkMs5SX8ZWQntk1f1p9gnqn8zaWC0dMPtdOna7SCn5YqYAk mBtTTmKOOX+qmVUK+ciLRwaVAijqrAxyOe+IwMSktgUCALit4Nfy0I65bosvt/uHZ5Kj UUUwYcRUjaHyvQ7MB3nOCAdtRyAUOE8VKSMiQNLgEV6MDznVhXzQX7jIw6GY4tVz08JC AxeQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=9c+lPOCjysvM8bbfKuxwHaK90ygyxvwcWmaEEuK4b+E=; b=JpATpFwzWwxa8SP33HNn3b87MwQp2XZdDTm+eT+8y79nd/e4gfo/yYXJtwQqRutyqT b3DOForoZf54WO7R3RYGRPHqKUQzO1KYC/x7LlgkWsst4z5fFaKawcU1l22etD1SzwHH VCK5PdX80HHaW9XscqkC/XXxoqLJHGyTSDI3ELnQ1lbfiK03tyjHVmpPIsV037iok4ys 366ooFIC6vzJC9goXKq5sNkFIA7TGzqpFFVhnLGnAp4PnyofTuNN1wrjphMOVVQ/zEVA +cP1kAZOmpHPsz0+dj8AHvLX3zztX2zoXspDnMjzpgygzuu63509RTB8layYNuhdbTyV 9U6Q==
X-Gm-Message-State: APjAAAW3supCk0Pp7VakeUY7VTnrelqc5CVXkH3arMNiAsDeUsIY/6HJ qiB0sA0fW0/JkyhYuHoTl7rK1ur24XVGYcINn4xocg/6R3pmO4sc
X-Google-Smtp-Source: APXvYqz/ItuRkv+FWRg4O6dKPKoGpblAjt9n1lQ2WxjAvjNZX+U7qw73rlZL7zcZ6htUhYHpEPNhAcPQNDvYXimliLU=
X-Received: by 2002:a0c:edcc:: with SMTP id i12mr7960618qvr.20.1574340738882; Thu, 21 Nov 2019 04:52:18 -0800 (PST)
MIME-Version: 1.0
From: Jeffrey Yasskin <jyasskin@google.com>
Date: Thu, 21 Nov 2019 20:52:07 +0800
Message-ID: <CANh-dX=EVDa4EwrgWKof4kCD3nkfV3BvH0Cg5ZKOivmXJ_Dm8g@mail.gmail.com>
To: cbor@ietf.org
Content-Type: multipart/alternative; boundary="0000000000007f89b30597dac470"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/hQDMXnvoAZZK-HQh40oqc48XR2k>
Subject: [Cbor] Handling duplicate map keys
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2019 12:52:23 -0000

Re https://github.com/cbor-wg/CBORbis/issues/63, we had a long discussion
today at IETF106 about what CBORbis should say about how protocols handle
the (invalid) situation in which a CBOR map contains two entries with the
same key.

Clearly basic-validating
<https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#basic-validity>
parsers will return an error. The disagreement comes from what if anything
to require from non-validating parsers. An interesting piece of background
information comes from CVE-2013-4787
<https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-4787> and the ZIP
format: A ZIP central directory can have a single filename multiple times
(with no mention of this possibility in the specification
<https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT>). The Android
signature checker (in Java) checked the last occurrence, while the package
loader (in C) used the first. It takes two mistakes for that kind of
vulnerability to happen: 1) The implementer has to parse the format twice
with two different parsers instead of parsing once and operating over that
parse result twice. 2) The format has to fail to say how to deal with
invalid input. We have control over (2).

CBORbis currently says:

"A CBOR-based protocol MUST define what to do when a receiving application
> does see multiple identical keys in a map. The resulting rule in the
> protocol MUST respect the CBOR data model: it cannot prescribe a specific
> handling of the entries with the identical keys, except that it might have
> a rule that having identical keys in a map indicates a malformed map and
> that the decoder has to stop with an error. Duplicate keys are also
> prohibited by CBOR decoders that enforce validity (Section 5.4)."


A) Leaving this alone is one of the possibilities.

B) CBOR itself could say that a protocol implementation with a
non-validating decoder MUST use the *first* entry with a particular key and
discard the rest. (Note that this works even with a streaming decoder,
since it's a requirement at the protocol level.) This would ban three kinds
of implementations:
1) Ones that build up a native map from the full content of the CBOR map,
and unconditionally overwrite any existing entry with the next entry read
from the CBOR map. These can be fixed by checking whether an entry with the
key is already present before writing to it. Some native map APIs might not
be able to do this without a redundant lookup.
2) Ones that initialize a native structure with default values (as opposed
to "missing") and then overwrite those values with the entries read from
the CBOR map. These can be fixed by adding a "missing" value to all the
native structure fields, which might increase the storage needed.
3) Ones that perform some real-world action as they stream each CBOR map
entry and never store the entries at all. I don't see a way to get those to
ignore duplicates. I also worry about the real-world effect of performing
one of those actions twice if it was important to use a map instead of an
array.

C) CBOR could require protocols to use the *last* entry with a given key.
This requires any implementation to scan to the end of a map before using
any of the content, so it's probably a bad idea. It is potentially easier
to implement in a decoder that builds up a native map of the content.

D) CBOR could allow protocols to leave the behavior of duplicate keys
unspecified if they explicitly declare that the fields are not
security-sensitive.


I prefer B and then D. What do other people think?

Jeffrey