Re: [Cbor] Maps: Need for additional compactness?
Emile Cormier <emile.cormier.jr@gmail.com> Mon, 14 June 2021 19:51 UTC
Return-Path: <emile.cormier.jr@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 69D153A083C for <cbor@ietfa.amsl.com>; Mon, 14 Jun 2021 12:51:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QinmXRz5OYdI for <cbor@ietfa.amsl.com>; Mon, 14 Jun 2021 12:51:50 -0700 (PDT)
Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 016C43A0853 for <cbor@ietf.org>; Mon, 14 Jun 2021 12:51:49 -0700 (PDT)
Received: by mail-yb1-xb33.google.com with SMTP id p15so1946178ybe.6 for <cbor@ietf.org>; Mon, 14 Jun 2021 12:51:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NrZih3bpTT3HhOGzE+PMFwyTPiOHj5CBh2RR5o7FuJI=; b=sATyX3AiI7vMZkm92bEoB3D1z+bcu8Ejo0iiiDFzn6KpsgPa/gHc0B4rui7+gNc0UN mL9oE1bGv31kAp8lhz8Rx3okxvzcac8W2TwcCdjaGevjYzjycHGYrV8h0wNyReA2QnuF /GR8nXpcJnHKtROVNrn1/m/YCnLdnZ1oFOOQwX04gNLivVnn6+Y+3RK0sqqW+eNy2Rc9 LrjL2MN0gPhcvb7E2umURP5TWBLjj9hq1nVl+APerp4kWluuJjkxlJg8S8paTlZmF/d+ AZ4IB55dOuLX8k83bJZ5QBFgsZZzSCH6DgIPxGEjz5TMtzArePIbqJtylaXwWySiny98 Q/5w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NrZih3bpTT3HhOGzE+PMFwyTPiOHj5CBh2RR5o7FuJI=; b=Y4kj1asy2psBk80ZqhEMxOChnKVd+yvmALS640X58kgwtE1A1NN8CTdv4KTG0/Fm2P 4KMdL7bSs9Og3uQLfRYBeXnO/g29Px7nu9Tr/T/fDGctO0o4ssV7OUKZDmRVu5Tw0jS+ QOqhcmprrDEvF4exROcj7gDUBW3JbiXjWy42kZBaZNdKu9/jCfgfZhSVrGjHIrfXMW+E H38g4WiBVlR/eqRiCGD2LW4Bv6w8PnMiwt3Qdplk6qyVMwn+NRpj7T9E0hmeAijSnDG4 h1GR4wK7ojXQQH3OG2ULnMgM3nCtpTgHudyqXcBVqckHAWPAVAXmzImoR3TIZ2znUjYH 4O5A==
X-Gm-Message-State: AOAM5305SMvZvlOLVH1nJB87ZfFKDYnVKM5r40wIOvWk59RYV7yDdTxB Dm6oF6Vl5CAYMzwIDj43moQy8G4gA5az+yfCmlDIfWXmrrB4hg==
X-Google-Smtp-Source: ABdhPJwVeyCklKBziSgL1hEVN2AvYTCY8aCKmVjXLiRBlNAGyWLLX1rnEqBU+UWWNWgFuBugcKt0iv7iG1B7lIrpvA4=
X-Received: by 2002:a25:19d7:: with SMTP id 206mr24794191ybz.483.1623700307923; Mon, 14 Jun 2021 12:51:47 -0700 (PDT)
MIME-Version: 1.0
References: <YMDmJcrm9OwrKDk9@hephaistos.amsuess.com> <CAM70yxAPW4xFqP5tXx=s5HCm-m6tYyi+XsLTxkk58o=P6NRRCw@mail.gmail.com>
In-Reply-To: <CAM70yxAPW4xFqP5tXx=s5HCm-m6tYyi+XsLTxkk58o=P6NRRCw@mail.gmail.com>
From: Emile Cormier <emile.cormier.jr@gmail.com>
Date: Mon, 14 Jun 2021 16:51:37 -0300
Message-ID: <CAM70yxDHiMy0=Hpmb1rJghXXtJ37n-KQf07LrFRYk3w_xOqH5g@mail.gmail.com>
To: Christian Amsüss <christian@amsuess.com>
Cc: Kio Smallwood <kio@mothers-arms.co.uk>, cbor@ietf.org, Carsten Bormann <cabo@tzi.org>
Content-Type: multipart/alternative; boundary="0000000000001357b305c4bf3075"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/ZlrPZ4_HFjRRg2qRmdW676NBapI>
Subject: Re: [Cbor] Maps: Need for additional compactness?
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Jun 2021 19:52:03 -0000
Something else that needs to be discussed/decided is whether we include tags for list-like collections, like what I proposed in https://github.com/ecorm/cbor-map-like/tree/with-lists In that alternate proposal, I had included arbitrary key + uniform value maps in order to maintain "one-hot" bit encoding for both the maps and lists. If we don't want to include arbitrary key + uniform value maps to save on 4 tags, the interpretation of the list tag bits will be more complicated, but not terribly so. On Fri, Jun 11, 2021 at 7:19 PM Emile Cormier <emile.cormier.jr@gmail.com> wrote: > Hi Everyone, > > I've watched the recording of the meeting starting from the Map-like > slides. Here are my comments... > > The MList encoding becomes a pessimization if there are few duplicate keys > and a large number of key-value pairs. It might not be so bad to assume > that keys tend to be generally short - they are often things like database > IDs. > > The OMList encoding not allowing array values seems like a showstopper to > me. > > For OEMList, I'm not clear on this part: > > So [[“aaa”, 1], [“aaa”, 2], [“bbb”, 3], [“ccc”, [4]]] becomes [“aaa”, [1, >> 2], “bbb”, 3, [“ccc”, [[4]]]] >> > > Is the resulting singleton array [“ccc”, [[4]]] used as an indicator so > that the decoder interprets [[4]] as the array [4] to be associated with > the key "ccc"? If so, how is the decoder supposed to distinguish this from > an array key in heterogeneous maps? This problem is more obvious if you > change the example to this: > > [[“aaa”, 1], [“aaa”, 2], [“bbb”, [3]], [“ccc”, 4]] becomes [“aaa”, [1, 2], > [“bbb”, [[3]]], “ccc”, 4] > > and a decoder interprets this as: > Key, Value > "aaa", 1 > "aaa", 2 > [“bbb”, [[3]]], "ccc" > 4, <missing value> > > It seems to me the only way to avoid the possible pessimization of MList, > while also avoiding ambiguities with keys/values that can themselves be > arrays/maps, is to introduce a new CBOR special value (or "abuse" CBOR > break) to indicate the "shared key" special case. This has the disadvantage > that it can't be directly transcoded to JSON. > > Another possibility would be to use a special tag so that [[“aaa”, 1], > [“aaa”, 2], [“bbb”, [3]], [“ccc”, 4]] becomes [“aaa”, <MultiValuesTag>[1, > 2], [“bbb”, [3]], “ccc”, 4]. While this can be transcoded to JSON by > dropping the tag, an application consuming the resulting JSON would still > be vexed by the ambiguity of the (“aaa”, [1, 2]) pair even if it knows it's > decoding a multimap. > > While the desire to avoid repeating duplicate keys is understandable, > these compression schemes lead to the following complications for a generic > CBOR encoding/decoding library: > > 1. For something like OEMList, when encoding a multimap, an encoder needs > to look ahead to see if there are multiple values associated with the same > key. > > 2. A simple SAX-like decoder can no longer emit alternating (key, value) > events without having to memorize the shared key, which can be a complex, > nested entity. This can be overcome by emitting a MultiMapValuesStart > "meta" event after emitting the shared key, and then the SAX handler knows > there will be multiple value events associated with the same key while > populating the multimap. MultiMapValuesStart then has to be treated as a > special state (along with the usual ArrayStart and MapStart) in the > decoder's state stack. > > If there ends up being a viable compression scheme, the specification > should still allow an application to use the "naive" scheme with repeated > keys. If everyone agrees with this, then we should be able to proceed > without the optimized encodings and add them later, while preserving > backward compatibility with the naive scheme. > > Cheers, > Emile Cormier > > On Wed, Jun 9, 2021 at 1:03 PM Christian Amsüss <christian@amsuess.com> > wrote: > >> Hello Emile, hello Kio, >> and hello CBOR group in general, >> >> during the last CBOR interim meeting[1], the question as to how far we'd >> like to go in map-like data structure optimization came up. These do not >> relate to the information model (these variations are covered in tags), >> but only to how data is put on the wire, so it'd be about variations of >> not placing items into inner lists when avoidable, delta encoding for >> ordered lists or other compression steps. >> >> An idea floated during the meeting was go with the obvious encoding now, >> and only go there later if use cases really need those. >> >> What are your needs with respect to compactness? Does it suffice to >> address the high-level (homogenity) aspects of map data structures? >> >> BR >> Christian >> >> [1] >> minutes: >> https://datatracker.ietf.org/meeting/interim-2021-cbor-10/materials/minutes-interim-2021-cbor-10-202106021600-00 >> relevant slides: >> https://datatracker.ietf.org/meeting/interim-2021-cbor-10/materials/slides-interim-2021-cbor-10-sessa-map-like-data-draft-bormann-cbor-cddl-map-like-data-00 >> video: https://www.youtube.com/watch?v=IfrxbspRcQo >> ... and also from me: sorry for the time mismatch. >> >> -- >> There's always a bigger fish. >> -- Qui-Gon Jinn >> _______________________________________________ >> CBOR mailing list >> CBOR@ietf.org >> https://www.ietf.org/mailman/listinfo/cbor >> >
- [Cbor] Maps: Need for additional compactness? Christian Amsüss
- Re: [Cbor] Maps: Need for additional compactness? Emile Cormier
- Re: [Cbor] Maps: Need for additional compactness? Carsten Bormann
- Re: [Cbor] Maps: Need for additional compactness? Emile Cormier
- Re: [Cbor] Maps: Need for additional compactness? Emile Cormier
- Re: [Cbor] Maps: Need for additional compactness? Emile Cormier