Re: [Cbor] "ordered hash"

Kio Smallwood <kio@mothers-arms.co.uk> Thu, 30 July 2020 09:00 UTC

Return-Path: <kio@mothers-arms.co.uk>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 861443A0FDC for <cbor@ietfa.amsl.com>; Thu, 30 Jul 2020 02:00:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uXGMsVPkCYhr for <cbor@ietfa.amsl.com>; Thu, 30 Jul 2020 02:00:06 -0700 (PDT)
Received: from authenticated.a-painless.mh.aa.net.uk (a-painless.mh.aa.net.uk [IPv6:2001:8b0:0:30::51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DD9F93A0F9F for <cbor@ietf.org>; Thu, 30 Jul 2020 02:00:02 -0700 (PDT)
Received: from a-webmail.thn.aa.net.uk ([2001:8b0:62::22] helo=webmail.aa.net.uk) by a-painless.mh.aa.net.uk with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from <kio@mothers-arms.co.uk>) id 1k14Pw-0003uL-Av; Thu, 30 Jul 2020 10:00:00 +0100
Received: from cpc104502-sgyl40-2-0-cust104.18-2.cable.virginm.net ([82.34.80.105]) by webmail.aa.net.uk with HTTP (HTTP/1.1 POST); Thu, 30 Jul 2020 09:59:52 +0100
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="=_40c9ab4a3b40d4888ec8e029dee3278d"
Date: Thu, 30 Jul 2020 09:59:52 +0100
From: Kio Smallwood <kio@mothers-arms.co.uk>
To: Carsten Bormann <cabo@tzi.org>
Cc: cbor@ietf.org, Brendan Moran <Brendan.Moran@arm.com>, Michael Richardson <mcr+ietf@sandelman.ca>
In-Reply-To: <CB724CD0-52AC-4036-B5BA-70D7F056E4B6@tzi.org>
References: <25163.1596045376@localhost> <DA6A1A7E-59E8-42C9-A294-7D23BFD4BF0A@arm.com> <0A0FEABF-ADF2-4342-9AC3-42F768F14CCF@mothers-arms.co.uk> <CB724CD0-52AC-4036-B5BA-70D7F056E4B6@tzi.org>
Message-ID: <99a918d0f4cdabc88071a82056efe621@mothers-arms.co.uk>
X-Sender: kio@mothers-arms.co.uk
User-Agent: Roundcube Webmail/1.3.14
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/ccUtqH_0OEzCfU0hi-DG_gl7Hdg>
Subject: Re: [Cbor] "ordered hash"
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jul 2020 09:00:09 -0000

Thanks Carsten!

> I think this idiom is frequent enough that it should get a 1+1 tag.

Yes I agree.

> The difference from a map is that this is both order-preserving and allowing duplicate keys; whether you want one or two of these properties on the application data model level may differ between applications.

I've always felt that allowing duplicate keys is unnecessary, does it
need to be bundled with order-preserving? Your other example MList<K, V>
= [* (K, [* V])] covers this use case nicely.

> FAList<K, V> = [* (K, V)]

Yes a flattened AList is the main use case. Allocating tags for the
other structures at the same time makes good sense. 

Thanks, 

Kio 

On 2020-07-30 5:08, Carsten Bormann wrote:

> On 2020-07-30, at 01:44, Kio Smallwood <kio@mothers-arms.co.uk> wrote: 
> 
>> I wrote a proposal for tag 272 ordered map.
>> 
>> https://github.com/Sekenre/cbor-ordered-map-spec
> 
> (272 is already taken: Non-UTF-8 CESU-8 string.)
> 
>> Unfortunately no one has approved the tag allocation or given me any guidance on how to proceed.
> 
> Let's look into this.
> 
> I think this idiom is frequent enough that it should get a 1+1 tag.
> 
> The difference from a map is that this is both order-preserving and allowing duplicate keys; whether you want one or two of these properties on the application data model level may differ between applications.
> 
> The traditional alist (association list) is
> 
> AList<K, V> = [* [K, V]]
> 
> By using the exact structure of a map (except that the array head will have double the argument that the map head has), we can save 1 byte per k/v pair ("flattened alist"):
> 
> FAList<K, V> = [* (K, V)]
> 
> So [["a", 1], ["b", 2]] becomes ["a", 1, "b", 2]
> 
> I'm not sure which of these MCR was inquiring on.  The first is definitely an alist, and there is no need for a new term.  The second can be called an ordered multimap, but there are other structures that also deserve this name.  
> 
> If adjacent duplicate keys of non-trivial size are likely in an ordered multimap, a better version of the multimap is:
> 
> MList<K, V> = [* (K, [* V])]
> 
> So [["aaa", 1], ["aaa", 2], ["bbb", 3]] becomes ["aaa", [1, 2], "bbb", [3]]
> 
> If V can never be an array (*), this can be further optimized into:
> 
> OMList<K, V> = [* (K, [2* V] / V)]
> 
> So [["aaa", 1], ["aaa", 2], ["bbb", 3]] becomes ["aaa", [1, 2], "bbb", 3]
> 
> Further refine by using delta coding for the keys (works best with sorted keys), etc.  Also, a map can be used for the outer structure if order preservation is not actually needed:
> 
> MMap<K, V> = {* K => [* V]}
> OMMap<K, V> = {* K => [2* V] / V}
> 
> There are many idioms to choose from; it would probably be best to define tags for at least each of the ones listed here.  I think FAList (what Kio proposed) is the one that will see the most use.
> 
> Grüße, Carsten
> 
> (*) Or extra-wrap just arrays:
> OEMList<K, V> = [* (K, [2* V] / [V .and [* any]] / V)]