Re: [Cbor] Maps: Need for additional compactness?

Emile Cormier <emile.cormier.jr@gmail.com> Fri, 11 June 2021 22:19 UTC

Return-Path: <emile.cormier.jr@gmail.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 40E3F3A0E39 for <cbor@ietfa.amsl.com>; Fri, 11 Jun 2021 15:19:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.096
X-Spam-Level:
X-Spam-Status: No, score=-2.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CEi7FQRZ-JeX for <cbor@ietfa.amsl.com>; Fri, 11 Jun 2021 15:19:49 -0700 (PDT)
Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6198A3A0E3A for <cbor@ietf.org>; Fri, 11 Jun 2021 15:19:49 -0700 (PDT)
Received: by mail-yb1-xb33.google.com with SMTP id c8so4662568ybq.1 for <cbor@ietf.org>; Fri, 11 Jun 2021 15:19:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BMaUzVuU53HmMU3CnI7rH1rsGfShVccIjIwANsjn22o=; b=YUPct15PdMI4zAfU9L/0WB7gb/XXhcDFQCGe/Rskw6e9xHb4MH8J7rFot+RmdRMGx/ SKzVLCG0gnyg3ywG7/JMKfgw/PJI5yriG+lARnNrxF3KTRY9ecOYNhPIHjLbcHpvXQbH 1mCktSoszucPByYHDtD6Q9l5z6z3PFNgNq03OKEIRIGvGh7KUl0X9+IGgPD69nGJpZn0 5XxTinyvEvsfQKSaqyti+zYCBtf1QJs2MtpxVz/lSCa+V7ygvGFaUWayjGt4m3Ymgdjm vwSIGsDh9yFuYWMgpu3SURuHCCgkx6VSEwGlP++jp5WOifGxjggLo6kM5++FfeyLzm1F 8cNQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BMaUzVuU53HmMU3CnI7rH1rsGfShVccIjIwANsjn22o=; b=R+UTzkNJzGOrRJZ97D32LrG4aVS7K/Iqu0YqRV75utuK5R0RuJE+VoNun00yw/qK4W G3FHL0lYQcmBF1LQXkABlgngzbOf44KBO8LblHX4V5HKqvIoPgABWCPmLjt8+3Tv7IGI 12/4fyOuulgFePg0xMhJ1nW8vn1D68/p7cI8GDOPGWnehy8FR8yRbSOSvsBg/N9iIS4F L6E9ktBWbBqqTouvvxNf4hDntIXOCpnU7lhWje78EivhbVaj387H83Di1iBElmnVTk0k mfqfRiEnoF8R11Iy2nNGtZ/xwgjsvyBElbL5uudt5ad9Q1kdExRlaHW3WbiUAs9sVEdF 4EjA==
X-Gm-Message-State: AOAM531zztWXKasX2Ij/hjcqVHXiloxA4hWP4og5yNk0yNKDsp2LeWUe 2RovK7ik1oLus7bz/Yd8+hZOmaR9sWOGPyk9N2gOl1wj1Tv89g==
X-Google-Smtp-Source: ABdhPJzjcZ3WXWp0a2UDdrSfd2hyKTUREB1zjiIsmTx0IrUaWm2AWpkGLgRdLM2ozsRvE1QePy4C58u4CHuiShl9sK0=
X-Received: by 2002:a25:447:: with SMTP id 68mr8414509ybe.481.1623449987994; Fri, 11 Jun 2021 15:19:47 -0700 (PDT)
MIME-Version: 1.0
References: <YMDmJcrm9OwrKDk9@hephaistos.amsuess.com>
In-Reply-To: <YMDmJcrm9OwrKDk9@hephaistos.amsuess.com>
From: Emile Cormier <emile.cormier.jr@gmail.com>
Date: Fri, 11 Jun 2021 19:19:37 -0300
Message-ID: <CAM70yxAPW4xFqP5tXx=s5HCm-m6tYyi+XsLTxkk58o=P6NRRCw@mail.gmail.com>
To: Christian Amsüss <christian@amsuess.com>
Cc: Kio Smallwood <kio@mothers-arms.co.uk>, cbor@ietf.org, Carsten Bormann <cabo@tzi.org>
Content-Type: multipart/alternative; boundary="000000000000d856ce05c484e701"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/XO2cwLzPuQHSpsqjYBBACVqlJaY>
Subject: Re: [Cbor] Maps: Need for additional compactness?
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Jun 2021 22:19:54 -0000

Hi Everyone,

I've watched the recording of the meeting starting from the Map-like
slides. Here are my comments...

The MList encoding becomes a pessimization if there are few duplicate keys
and a large number of key-value pairs. It might not be so bad to assume
that keys tend to be generally short - they are often things like database
IDs.

The OMList encoding not allowing array values seems like a showstopper to
me.

For OEMList, I'm not clear on this part:

So [[“aaa”, 1], [“aaa”, 2], [“bbb”, 3], [“ccc”, [4]]] becomes [“aaa”, [1,
> 2], “bbb”, 3, [“ccc”, [[4]]]]
>

Is the resulting singleton array [“ccc”, [[4]]] used as an indicator so
that the decoder interprets [[4]] as the array [4] to be associated with
the key "ccc"? If so, how is the decoder supposed to distinguish this from
an array key in heterogeneous maps? This problem is more obvious if you
change the example to this:

[[“aaa”, 1], [“aaa”, 2], [“bbb”, [3]], [“ccc”, 4]] becomes [“aaa”, [1, 2],
[“bbb”, [[3]]], “ccc”, 4]

and a decoder interprets this as:
Key,     Value
"aaa",    1
"aaa",    2
[“bbb”, [[3]]],    "ccc"
4,    <missing value>

It seems to me the only way to avoid the possible pessimization of MList,
while also avoiding ambiguities with keys/values that can themselves be
arrays/maps, is to introduce a new CBOR special value (or "abuse" CBOR
break) to indicate the "shared key" special case. This has the disadvantage
that it can't be directly transcoded to JSON.

Another possibility would be to use a special tag so that [[“aaa”, 1],
[“aaa”, 2], [“bbb”, [3]], [“ccc”, 4]] becomes [“aaa”, <MultiValuesTag>[1,
2], [“bbb”, [3]], “ccc”, 4]. While this can be transcoded to JSON by
dropping the tag, an application consuming the resulting JSON would still
be vexed by the ambiguity of the (“aaa”, [1, 2]) pair even if it knows it's
decoding a multimap.

While the desire to avoid repeating duplicate keys is understandable, these
compression schemes lead to the following complications for a generic CBOR
encoding/decoding library:

1. For something like OEMList, when encoding a multimap, an encoder needs
to look ahead to see if there are multiple values associated with the same
key.

2. A simple SAX-like decoder can no longer emit alternating (key, value)
events without having to memorize the shared key, which can be a complex,
nested entity. This can be overcome by emitting a MultiMapValuesStart
"meta" event after emitting the shared key, and then the SAX handler knows
there will be multiple value events associated with the same key while
populating the multimap. MultiMapValuesStart then has to be treated as a
special state (along with the usual ArrayStart and MapStart) in the
decoder's state stack.

If there ends up being a viable compression scheme, the specification
should still allow an application to use the "naive" scheme with repeated
keys. If everyone agrees with this, then we should be able to proceed
without the optimized encodings and add them later, while preserving
backward compatibility with the naive scheme.

Cheers,
Emile Cormier

On Wed, Jun 9, 2021 at 1:03 PM Christian Amsüss <christian@amsuess.com>
wrote:

> Hello Emile, hello Kio,
> and hello CBOR group in general,
>
> during the last CBOR interim meeting[1], the question as to how far we'd
> like to go in map-like data structure optimization came up. These do not
> relate to the information model (these variations are covered in tags),
> but only to how data is put on the wire, so it'd be about variations of
> not placing items into inner lists when avoidable, delta encoding for
> ordered lists or other compression steps.
>
> An idea floated during the meeting was go with the obvious encoding now,
> and only go there later if use cases really need those.
>
> What are your needs with respect to compactness? Does it suffice to
> address the high-level (homogenity) aspects of map data structures?
>
> BR
> Christian
>
> [1]
>   minutes:
> https://datatracker.ietf.org/meeting/interim-2021-cbor-10/materials/minutes-interim-2021-cbor-10-202106021600-00
>   relevant slides:
> https://datatracker.ietf.org/meeting/interim-2021-cbor-10/materials/slides-interim-2021-cbor-10-sessa-map-like-data-draft-bormann-cbor-cddl-map-like-data-00
>   video: https://www.youtube.com/watch?v=IfrxbspRcQo
>   ... and also from me: sorry for the time mismatch.
>
> --
> There's always a bigger fish.
>   -- Qui-Gon Jinn
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor
>