Re: [Cbor] Maps: Need for additional compactness?

Emile Cormier <emile.cormier.jr@gmail.com> Mon, 14 June 2021 19:51 UTC

Return-Path: <emile.cormier.jr@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 69D153A083C for <cbor@ietfa.amsl.com>; Mon, 14 Jun 2021 12:51:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QinmXRz5OYdI for <cbor@ietfa.amsl.com>; Mon, 14 Jun 2021 12:51:50 -0700 (PDT)
Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 016C43A0853 for <cbor@ietf.org>; Mon, 14 Jun 2021 12:51:49 -0700 (PDT)
Received: by mail-yb1-xb33.google.com with SMTP id p15so1946178ybe.6 for <cbor@ietf.org>; Mon, 14 Jun 2021 12:51:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NrZih3bpTT3HhOGzE+PMFwyTPiOHj5CBh2RR5o7FuJI=; b=sATyX3AiI7vMZkm92bEoB3D1z+bcu8Ejo0iiiDFzn6KpsgPa/gHc0B4rui7+gNc0UN mL9oE1bGv31kAp8lhz8Rx3okxvzcac8W2TwcCdjaGevjYzjycHGYrV8h0wNyReA2QnuF /GR8nXpcJnHKtROVNrn1/m/YCnLdnZ1oFOOQwX04gNLivVnn6+Y+3RK0sqqW+eNy2Rc9 LrjL2MN0gPhcvb7E2umURP5TWBLjj9hq1nVl+APerp4kWluuJjkxlJg8S8paTlZmF/d+ AZ4IB55dOuLX8k83bJZ5QBFgsZZzSCH6DgIPxGEjz5TMtzArePIbqJtylaXwWySiny98 Q/5w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NrZih3bpTT3HhOGzE+PMFwyTPiOHj5CBh2RR5o7FuJI=; b=Y4kj1asy2psBk80ZqhEMxOChnKVd+yvmALS640X58kgwtE1A1NN8CTdv4KTG0/Fm2P 4KMdL7bSs9Og3uQLfRYBeXnO/g29Px7nu9Tr/T/fDGctO0o4ssV7OUKZDmRVu5Tw0jS+ QOqhcmprrDEvF4exROcj7gDUBW3JbiXjWy42kZBaZNdKu9/jCfgfZhSVrGjHIrfXMW+E H38g4WiBVlR/eqRiCGD2LW4Bv6w8PnMiwt3Qdplk6qyVMwn+NRpj7T9E0hmeAijSnDG4 h1GR4wK7ojXQQH3OG2ULnMgM3nCtpTgHudyqXcBVqckHAWPAVAXmzImoR3TIZ2znUjYH 4O5A==
X-Gm-Message-State: AOAM5305SMvZvlOLVH1nJB87ZfFKDYnVKM5r40wIOvWk59RYV7yDdTxB Dm6oF6Vl5CAYMzwIDj43moQy8G4gA5az+yfCmlDIfWXmrrB4hg==
X-Google-Smtp-Source: ABdhPJwVeyCklKBziSgL1hEVN2AvYTCY8aCKmVjXLiRBlNAGyWLLX1rnEqBU+UWWNWgFuBugcKt0iv7iG1B7lIrpvA4=
X-Received: by 2002:a25:19d7:: with SMTP id 206mr24794191ybz.483.1623700307923; Mon, 14 Jun 2021 12:51:47 -0700 (PDT)
MIME-Version: 1.0
References: <YMDmJcrm9OwrKDk9@hephaistos.amsuess.com> <CAM70yxAPW4xFqP5tXx=s5HCm-m6tYyi+XsLTxkk58o=P6NRRCw@mail.gmail.com>
In-Reply-To: <CAM70yxAPW4xFqP5tXx=s5HCm-m6tYyi+XsLTxkk58o=P6NRRCw@mail.gmail.com>
From: Emile Cormier <emile.cormier.jr@gmail.com>
Date: Mon, 14 Jun 2021 16:51:37 -0300
Message-ID: <CAM70yxDHiMy0=Hpmb1rJghXXtJ37n-KQf07LrFRYk3w_xOqH5g@mail.gmail.com>
To: =?UTF-8?Q?Christian_Ams=C3=BCss?= <christian@amsuess.com>
Cc: Kio Smallwood <kio@mothers-arms.co.uk>, cbor@ietf.org, Carsten Bormann <cabo@tzi.org>
Content-Type: multipart/alternative; boundary="0000000000001357b305c4bf3075"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/ZlrPZ4_HFjRRg2qRmdW676NBapI>
Subject: Re: [Cbor] Maps: Need for additional compactness?
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Jun 2021 19:52:03 -0000

Something else that needs to be discussed/decided is whether we include
tags for list-like collections, like what I proposed in
https://github.com/ecorm/cbor-map-like/tree/with-lists

In that alternate proposal, I had included arbitrary key + uniform value
maps in order to maintain "one-hot" bit encoding for both the maps and
lists. If we don't want to include arbitrary key + uniform value maps to
save on 4 tags, the interpretation of the list tag bits will be more
complicated, but not terribly so.

On Fri, Jun 11, 2021 at 7:19 PM Emile Cormier <emile.cormier.jr@gmail.com>
wrote:

> Hi Everyone,
>
> I've watched the recording of the meeting starting from the Map-like
> slides. Here are my comments...
>
> The MList encoding becomes a pessimization if there are few duplicate keys
> and a large number of key-value pairs. It might not be so bad to assume
> that keys tend to be generally short - they are often things like database
> IDs.
>
> The OMList encoding not allowing array values seems like a showstopper to
> me.
>
> For OEMList, I'm not clear on this part:
>
> So [[“aaa”, 1], [“aaa”, 2], [“bbb”, 3], [“ccc”, [4]]] becomes [“aaa”, [1,
>> 2], “bbb”, 3, [“ccc”, [[4]]]]
>>
>
> Is the resulting singleton array [“ccc”, [[4]]] used as an indicator so
> that the decoder interprets [[4]] as the array [4] to be associated with
> the key "ccc"? If so, how is the decoder supposed to distinguish this from
> an array key in heterogeneous maps? This problem is more obvious if you
> change the example to this:
>
> [[“aaa”, 1], [“aaa”, 2], [“bbb”, [3]], [“ccc”, 4]] becomes [“aaa”, [1, 2],
> [“bbb”, [[3]]], “ccc”, 4]
>
> and a decoder interprets this as:
> Key,     Value
> "aaa",    1
> "aaa",    2
> [“bbb”, [[3]]],    "ccc"
> 4,    <missing value>
>
> It seems to me the only way to avoid the possible pessimization of MList,
> while also avoiding ambiguities with keys/values that can themselves be
> arrays/maps, is to introduce a new CBOR special value (or "abuse" CBOR
> break) to indicate the "shared key" special case. This has the disadvantage
> that it can't be directly transcoded to JSON.
>
> Another possibility would be to use a special tag so that [[“aaa”, 1],
> [“aaa”, 2], [“bbb”, [3]], [“ccc”, 4]] becomes [“aaa”, <MultiValuesTag>[1,
> 2], [“bbb”, [3]], “ccc”, 4]. While this can be transcoded to JSON by
> dropping the tag, an application consuming the resulting JSON would still
> be vexed by the ambiguity of the (“aaa”, [1, 2]) pair even if it knows it's
> decoding a multimap.
>
> While the desire to avoid repeating duplicate keys is understandable,
> these compression schemes lead to the following complications for a generic
> CBOR encoding/decoding library:
>
> 1. For something like OEMList, when encoding a multimap, an encoder needs
> to look ahead to see if there are multiple values associated with the same
> key.
>
> 2. A simple SAX-like decoder can no longer emit alternating (key, value)
> events without having to memorize the shared key, which can be a complex,
> nested entity. This can be overcome by emitting a MultiMapValuesStart
> "meta" event after emitting the shared key, and then the SAX handler knows
> there will be multiple value events associated with the same key while
> populating the multimap. MultiMapValuesStart then has to be treated as a
> special state (along with the usual ArrayStart and MapStart) in the
> decoder's state stack.
>
> If there ends up being a viable compression scheme, the specification
> should still allow an application to use the "naive" scheme with repeated
> keys. If everyone agrees with this, then we should be able to proceed
> without the optimized encodings and add them later, while preserving
> backward compatibility with the naive scheme.
>
> Cheers,
> Emile Cormier
>
> On Wed, Jun 9, 2021 at 1:03 PM Christian Amsüss <christian@amsuess.com>
> wrote:
>
>> Hello Emile, hello Kio,
>> and hello CBOR group in general,
>>
>> during the last CBOR interim meeting[1], the question as to how far we'd
>> like to go in map-like data structure optimization came up. These do not
>> relate to the information model (these variations are covered in tags),
>> but only to how data is put on the wire, so it'd be about variations of
>> not placing items into inner lists when avoidable, delta encoding for
>> ordered lists or other compression steps.
>>
>> An idea floated during the meeting was go with the obvious encoding now,
>> and only go there later if use cases really need those.
>>
>> What are your needs with respect to compactness? Does it suffice to
>> address the high-level (homogenity) aspects of map data structures?
>>
>> BR
>> Christian
>>
>> [1]
>>   minutes:
>> https://datatracker.ietf.org/meeting/interim-2021-cbor-10/materials/minutes-interim-2021-cbor-10-202106021600-00
>>   relevant slides:
>> https://datatracker.ietf.org/meeting/interim-2021-cbor-10/materials/slides-interim-2021-cbor-10-sessa-map-like-data-draft-bormann-cbor-cddl-map-like-data-00
>>   video: https://www.youtube.com/watch?v=IfrxbspRcQo
>>   ... and also from me: sorry for the time mismatch.
>>
>> --
>> There's always a bigger fish.
>>   -- Qui-Gon Jinn
>> _______________________________________________
>> CBOR mailing list
>> CBOR@ietf.org
>> https://www.ietf.org/mailman/listinfo/cbor
>>
>