Re: [Cbor] Handling duplicate map keys

Jeffrey Yasskin <jyasskin@chromium.org> Sun, 24 November 2019 10:49 UTC

Return-Path: <jyasskin@google.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7DFE212012A for <cbor@ietfa.amsl.com>; Sun, 24 Nov 2019 02:49:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.25
X-Spam-Level:
X-Spam-Status: No, score=-9.25 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=chromium.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TLiAQzxyzczB for <cbor@ietfa.amsl.com>; Sun, 24 Nov 2019 02:49:09 -0800 (PST)
Received: from mail-qv1-xf2c.google.com (mail-qv1-xf2c.google.com [IPv6:2607:f8b0:4864:20::f2c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A1DB312012C for <cbor@ietf.org>; Sun, 24 Nov 2019 02:49:09 -0800 (PST)
Received: by mail-qv1-xf2c.google.com with SMTP id cv8so4597286qvb.3 for <cbor@ietf.org>; Sun, 24 Nov 2019 02:49:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=D1zSoC2UF6+sm0Jb58JdETnwZpQc7ScWw+QOcIqL5cA=; b=IigGZs4upYzAIaXohQW6Lqxc9QLLbirkG/mH35xJU2hyyqp3ruTwSAgbGetptbNqh/ ZKw25fGlFUCTVIXs4Bg8q2M1/QfYhA8jCTVblwNHk3+jO01iJ85FHiHNbB+Xwnx2TSI+ V2hHx1RHqozZtB7no/04Cy6kFSdRh2Xcs8uWk=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=D1zSoC2UF6+sm0Jb58JdETnwZpQc7ScWw+QOcIqL5cA=; b=PsHowx26t1D/hr7qch8neHrl93qsze1CsFn+Wvi94sPNcdJPEfqT7i8NDParj+VL7I Or5BCkN3eMW3xu72EcbEI66s+F61mq02PZFBXtvezrJ4qx4kxIEsQ/94UHD+uOnRSmYB ZtKFL+Dr+arIfeFmPL+om/3CGUpJHcVT2QHpX64HOT5ST6qWg7TdgXmhX0z24NIaOx6r dPfntI67Ti4ddVciTyZjsiHtNrdyc/aE+2OQ0DJgfd1RRLIq4ZAnDONUHzMwYYKdK80x VJAZ84lELv1hXapCF40RYGWTWOHQ7ypuBVvU+WLwKtuLYqtYn96TKJcWitwPrFkd29Up tgfg==
X-Gm-Message-State: APjAAAVbTjTG/7SsCG/pr/J3Y+1qpQXi/GK3sB9g2v6GBguNjRH7UtGs js3fJInH+vYBITt/0HGF9869auapwqhxS2Y0nZq2yg==
X-Google-Smtp-Source: APXvYqx5QBYssMh35jwHtdCkU6kep0hD/EZQprBmZBpvbSbiid6umR9cMZ4OSaAEtaW8oFD+jAvE5hPRtlt+AtQHPCU=
X-Received: by 2002:a0c:fa0a:: with SMTP id q10mr21691022qvn.193.1574592547856; Sun, 24 Nov 2019 02:49:07 -0800 (PST)
MIME-Version: 1.0
References: <CANh-dX=EVDa4EwrgWKof4kCD3nkfV3BvH0Cg5ZKOivmXJ_Dm8g@mail.gmail.com> <E5C74E1F-A5F4-4AB8-9787-3999C4697C3B@island-resort.com> <CANh-dX=1gkAtfSG-yzCsVAkk=oLM-=dN_JLCr1kQK3d6Jb0fSw@mail.gmail.com> <F81E1A57-6072-44FA-A148-8F3ED7520791@island-resort.com>
In-Reply-To: <F81E1A57-6072-44FA-A148-8F3ED7520791@island-resort.com>
From: Jeffrey Yasskin <jyasskin@chromium.org>
Date: Sun, 24 Nov 2019 02:48:56 -0800
Message-ID: <CANh-dXmxzoEjb-yhL-R1p-xvBF8kZwOpzoS_fziPfuxFhbykhA@mail.gmail.com>
To: Laurence Lundblade <lgl@island-resort.com>
Cc: Jeffrey Yasskin <jyasskin@chromium.org>, Jeffrey Yasskin <jyasskin=40google.com@dmarc.ietf.org>, cbor@ietf.org
Content-Type: multipart/alternative; boundary="0000000000007b97c90598156577"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/7sf2pKY7hrKlBINMUcu8eV9herc>
Subject: Re: [Cbor] Handling duplicate map keys
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Nov 2019 10:49:11 -0000

Comments inline:

On Sat, Nov 23, 2019 at 7:21 AM Laurence Lundblade <lgl@island-resort.com>
wrote:

> On Nov 23, 2019, at 1:11 AM, Jeffrey Yasskin <jyasskin@chromium.org>
> wrote:
>
> On Sat, Nov 23, 2019 at 10:00 AM Laurence Lundblade <lgl@island-resort.com>
> wrote:
>
>> Here’s a rough proposed text:
>>
>> The protocol designer should make a choice for maps as to whether
>> duplicates are allowed or not, particularly as to whether duplicates would
>> cause security or functionality problems.
>>
>> The protocol designer should only require duplicate detection when
>> necessary as it can have the following implications:
>>
>>    - Some generic decoders do not support duplicate detection because
>>    the underlying facilities in their programming environment to represent
>>    maps can’t detect duplicates
>>    - Some generic decoders do not support duplicate detection because it
>>    requires more code and is not required
>>
>>
>>    - It requires more resources to implement: 1) memory to store all the
>>    keys in the map, 2) more code, 3) more CPU time, as much as O(n^2).. This
>>    is particularly an issue with big maps in constrained environments.
>>
>> It is suggested that protocol designers require duplicate detection only
>> no the particular maps for which there is an issue.
>>
>> Decoders will typically fall into one of these categories:
>>
>>    - Full duplicate detection
>>    - Pass all items up to caller, allowing the caller to implement
>>    duplicate detection or not
>>    - No duplicate detection
>>
>> A generic decoder should identify which it is. Some may support more than
>> one, selectable by configuration.
>>
>>
> I don't think generic decoders are relevant here. Protocol implementations
> often accumulate their input into a data structure even if they use a
> streaming decoder to read the input. Any such protocol can detect
> duplicates (of the keys it uses at all) for at most the cost of an extra
> bit per field.
>
>
> My intention was that “pass all items up” covers your case.
>
> My QCBOR + t_cose implementations work exactly as you describe to
> implement dup detection in COSE header parameters. It’s the data structures
> that hold the COSE header params that are used to detect the duplicates,
> not the generic decoder (QCBOR).  This seems like a good way to handle this
> problem for lots of use cases.
>

Yep, that's the model I'm thinking of. I'm clearly struggling to put it
into understandable spec language.

> I think you make a good point that if a protocol is going to specify "take
> first", it can probably specify "error on duplicate key" for the same cost.
> That might imply some much simpler text:
>
> "Protocols based on CBOR SHOULD fail with an error if a map contains a
>> duplicate key, except that if the key isn't used at all, they MAY ignore it
>> instead. Protocols that do not reject duplicate keys MUST (?) document why
>> the cost of rejecting duplicates is too high and why accepting them will
>> not lead to security vulnerabilities. An array might be a better choice for
>> such protocols.”
>
>
> I think I’d invert it and say that protocols that require duplicate
> detection for security reasons should describe that requirement in security
> considerations so the implementor gets a good solid hint that they need to
> worry about it.
>

There's an interesting difference of approach here. It's plausible to say
that protocol designers should pick the secure design only when they
realize their design has security implications, but I prefer to say that
they should pick the secure design unless they think about it and realize
the design doesn't have security implications. I think the second will get
us noticeably more security in a world where designers don't have time to
think about every aspect of every piece of their design.

There is lot of text implying that duplicates are bad, but I think it would
> be worth being explicit.
>
> You are an idiot if you design a protocol that considers duplicate input
> valid or necessary. Duplicate map keys are always an error in CBOR and will
> not work with many generic decoders.
>
>
I wouldn't say "idiot", just "wrong". The spec already says you "MUST NOT"
do this, but I'm not opposed to making that statement more direct.

Thanks,
Jeffrey