Re: [Cbor] A CBOR tag for alternatives/unions, request for comments

Carsten Bormann <cabo@tzi.org> Thu, 24 February 2022 00:06 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DE8A83A116A for <cbor@ietfa.amsl.com>; Wed, 23 Feb 2022 16:06:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o6pZ3Kmftg7B for <cbor@ietfa.amsl.com>; Wed, 23 Feb 2022 16:06:06 -0800 (PST)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F4573A0A9A for <cbor@ietf.org>; Wed, 23 Feb 2022 16:06:06 -0800 (PST)
Received: from [192.168.217.118] (p5089ad4f.dip0.t-ipconnect.de [80.137.173.79]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4K3tV20lR7zDCbq; Thu, 24 Feb 2022 01:06:02 +0100 (CET)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <26378.1645659038@localhost>
Date: Thu, 24 Feb 2022 01:06:01 +0100
Cc: Michael Peyton Jones <michael.peyton-jones@iohk.io>, Duncan Coutts <duncan@well-typed.com>, cbor@ietf.org, Jared Corduan <jared.corduan@iohk.io>, Alexander Byaly <alexander.byaly@iohk.io>
X-Mao-Original-Outgoing-Id: 667353961.629038-6613494fe1c5098a2b437d45a0cc2375
Content-Transfer-Encoding: quoted-printable
Message-Id: <94A53D04-2C5A-4965-A65C-4B1C2815647A@tzi.org>
References: <9300a81abc33a45a9f3c7c1c62da88908280e54a.camel@well-typed.com> <1D3EF118-0223-43BC-81B2-369D4515DB21@tzi.org> <1ce2c092d3214d5fdce59435fc10b084c1ba48ca.camel@well-typed.com> <DFCBE61C-35B2-42A2-8D1A-A633CF939154@tzi.org> <2e10456c5791a422bf7218e7b84051f2b7121b66.camel@well-typed.com> <CAKoRMYGcwrhVWd-J3dX75WZfc+T_oyb6NCUNaeCXMW5_-VYdTw@mail.gmail.com> <52EDB93C-827A-465F-B644-51B3EF590D06@tzi.org> <C9D895BB-40E6-431F-958F-AC031DE4FB58@tzi.org> <CAKoRMYG9X0JF4ehkMc30_UNi0JtT2YMeG4RgxznS6O3Di6pkRA@mail.gmail.com> <CAKoRMYHwewaYxkX=CsfETBbdV7c9U97jfbd9xg=PyrMX5vJhnA@mail.gmail.com> <3B3B7EF0-152B-4015-8485-B204F7AEFFBC@tzi.org> <CAKoRMYFbEG=TkuZPPOiXv2DjEh23Ujd_Q44kQqiWPGc_0GMTuQ@mail.gmail.com> <CAKoRMYHnF6fGJp1dTrJnRHFTBOhreLRwzR_=cCckW1nBXOEz0A@mail.gmail.com> <E8A9E016-2248-4BB9-9864-C6C7D52A4AE5@tzi.org> <CAKoRMYE+gmWyCL9zYDa-O-c3KV_iuzgYuS+Q4fi=U7VHDNDtkQ@mail.gmail.com> <CAKoRMYFdAr1YY3mtmY0NU5X9Bk8_4WYh7bC0CtXpZc3toLSu8g@mail.gmail.com> <7FA54553-5421-4C45-! B7DD-E9B2 D379F46F@tzi.org> <CAKoRMYH3MTMi_tX5KHF-O-DTKzopiGqe3fi6XjkPaGCM4823OQ@mail.gmail.com> <7dfd62ccb6c089af90c90f26a8945f23232ecbc1.camel@well-typed.com> <CAKoRMYEOo1Gqfc4W4k3NOLKpFa97Q9YzLCm3r0PJ13V2HJPf3A@mail.gmail.com> <2BBF6463-FDB2-4A8A-B20D-7A1AD976A90D@tzi.org> <CAKoRMYFi8uo2GfHA9s1n+-rMO8Ja9=2qMMzjS9Z=F9r3LFozRQ@mail.gmail.com> <8EA89504-C176-4850-9BB8-C7E7206374FF@tzi.org> <CAKoRMYGmOa0hzEFsJh8kpz0bU5x56Yc9P=DBK-ghU83gXxPv7A@mail.gmail.com> <CAKoRMYGUvmxufQUVyvX2mciq5LCmV0Nz-uE2MJn54GDBB+9DRw@mail.gmail.com> <CAKoRMYF_19V6mu4S9GVqfiNzyQVvvOzX6eYwHp_DtZQoG0xTKg@mail.gmail.com> <4B47F4D7-ADE3-4A22-8A5B-97F4E5FCD933@tzi.org> <26378.1645659038@localhost>
To: Michael Richardson <mcr+ietf@sandelman.ca>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/E5ZrqQoV0EzlMwF1riEduJvdAdA>
Subject: Re: [Cbor] A CBOR tag for alternatives/unions, request for comments
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Feb 2022 00:06:11 -0000

On 2022-02-24, at 00:30, Michael Richardson <mcr+ietf@sandelman.ca> wrote:
> 
> 
> {why haven't we adopted this document yet?}

So it’s current author can munge it more freely, instead of waiting for a WG direction to form.
(You can always do a hostile adoption, though ;-)

> Hi, I've reviewed section 9 of notable-tags-06.
> I raised a minor PR: https://github.com/cabo/notable-tags/pull/1
> because I had to stare at the [uint,value] part really hard before I
> understood things.  Part of the issue is that the "value" in the previous two
> cases was just "obvious", and then I couldn't understand what it meant in the
> third case.

I just commented on the specific words.

> I understood from the meeting this morning that tag 101 'e', is not actually
> available.  But, actually it looks like it is according to IANA.org.
> Or was it that another draft wanted it?
> How about 'x' (NOPE) 'X' YES. 'u' YES.
> https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml#cbor-tags

(NOPE = tag 120 ‘x’ is already in use.)
I’ll leave that choice to the proposers…

> I think that the document might explain why we optimize the first 7 (0..6),
> rather than the first 3 or 5 or 9.  Are the next 128 worth it?

We could skip the whole 1+2 tag range if we think the 24 1+1+1 cases from was-102 are enough of an expansion.
But then we still have lots of 1+2 tags (0.07 % in use right now), if these actually are useful.

I can’t really imagine more than 31 cases in one alternative being used very often, but then I’d like to hear from the people who actually use this.

> I wonder if there are more than 7, are there typically more 100s?
> If it turns out that 9 is the sweet spot, I'd be okay with more 1+1
> and no 1+2.  I don't have any data on this.
> I wonder if Michael, Duncan, Jared or Alexander do?

Yes, we need a PDF (probability distribution function) :-)
(Well, really, some intuition about that.)

My hunch is that this is going to be approximately Zipf’s law, and that means the first 4 or 5 are really important, and then it goes down, but slowly. 

> I agree with the discussion that tag 101 could just encode 0..x
> rather than uint+128.  I think that the simpler encoder might be worth it
> here.  Particularly in some cases where I suspect which ones are the lower
> numbered onces might not be known until later, and some code wants to come
> back and fill in the alternatives afterwards, having left 1+4 for the uint or
> something like that.

See my previous message.

Grüße, Carsten