Re: [Cbor] "ordered hash"

Carsten Bormann <cabo@tzi.org> Thu, 30 July 2020 04:08 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0E7633A0CBB for <cbor@ietfa.amsl.com>; Wed, 29 Jul 2020 21:08:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.919
X-Spam-Level:
X-Spam-Status: No, score=-1.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HT6UUbc6WcCP for <cbor@ietfa.amsl.com>; Wed, 29 Jul 2020 21:08:31 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 333F73A0CB9 for <cbor@ietf.org>; Wed, 29 Jul 2020 21:08:31 -0700 (PDT)
Received: from [192.168.217.116] (p5089ae91.dip0.t-ipconnect.de [80.137.174.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4BHH2h4MFYzyYv; Thu, 30 Jul 2020 06:08:28 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <0A0FEABF-ADF2-4342-9AC3-42F768F14CCF@mothers-arms.co.uk>
Date: Thu, 30 Jul 2020 06:08:26 +0200
Cc: cbor@ietf.org, Brendan Moran <Brendan.Moran@arm.com>, Michael Richardson <mcr+ietf@sandelman.ca>
X-Mao-Original-Outgoing-Id: 617774906.826647-fb17c0cafc449f1f01479df25a87ec2c
Content-Transfer-Encoding: quoted-printable
Message-Id: <CB724CD0-52AC-4036-B5BA-70D7F056E4B6@tzi.org>
References: <25163.1596045376@localhost> <DA6A1A7E-59E8-42C9-A294-7D23BFD4BF0A@arm.com> <0A0FEABF-ADF2-4342-9AC3-42F768F14CCF@mothers-arms.co.uk>
To: Kio Smallwood <kio@mothers-arms.co.uk>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/5MuDSyPivZ7JfPhsfwCaW2usFHQ>
Subject: Re: [Cbor] "ordered hash"
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jul 2020 04:08:34 -0000

On 2020-07-30, at 01:44, Kio Smallwood <kio@mothers-arms.co.uk> wrote:
> 
> I wrote a proposal for tag 272 ordered map.
> 
> https://github.com/Sekenre/cbor-ordered-map-spec

(272 is already taken: Non-UTF-8 CESU-8 string.)

> Unfortunately no one has approved the tag allocation or given me any guidance on how to proceed.

Let’s look into this.

I think this idiom is frequent enough that it should get a 1+1 tag.

The difference from a map is that this is both order-preserving and allowing duplicate keys; whether you want one or two of these properties on the application data model level may differ between applications.

The traditional alist (association list) is

AList<K, V> = [* [K, V]]

By using the exact structure of a map (except that the array head will have double the argument that the map head has), we can save 1 byte per k/v pair (“flattened alist”):

FAList<K, V> = [* (K, V)]

So [[“a”, 1], [“b”, 2]] becomes [“a”, 1, “b”, 2]

I’m not sure which of these MCR was inquiring on.  The first is definitely an alist, and there is no need for a new term.  The second can be called an ordered multimap, but there are other structures that also deserve this name.  

If adjacent duplicate keys of non-trivial size are likely in an ordered multimap, a better version of the multimap is:

MList<K, V> = [* (K, [* V])]

So [[“aaa”, 1], [“aaa”, 2], [“bbb”, 3]] becomes [“aaa”, [1, 2], “bbb”, [3]]

If V can never be an array (*), this can be further optimized into:

OMList<K, V> = [* (K, [2* V] / V)]

So [[“aaa”, 1], [“aaa”, 2], [“bbb”, 3]] becomes [“aaa”, [1, 2], “bbb”, 3]

Further refine by using delta coding for the keys (works best with sorted keys), etc.  Also, a map can be used for the outer structure if order preservation is not actually needed:

MMap<K, V> = {* K => [* V]}
OMMap<K, V> = {* K => [2* V] / V}

There are many idioms to choose from; it would probably be best to define tags for at least each of the ones listed here.  I think FAList (what Kio proposed) is the one that will see the most use.

Grüße, Carsten

(*) Or extra-wrap just arrays:
OEMList<K, V> = [* (K, [2* V] / [V .and [* any]] / V)]