Re: [Cbor] A CBOR tag for alternatives/unions, request for comments

Michael Peyton Jones <michael.peyton-jones@iohk.io> Thu, 24 February 2022 11:31 UTC

Return-Path: <michael.peyton-jones@iohk.io>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF6D93A1168 for <cbor@ietfa.amsl.com>; Thu, 24 Feb 2022 03:31:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.088
X-Spam-Level:
X-Spam-Status: No, score=-7.088 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=iohk.io
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wXAi3Uon21Yx for <cbor@ietfa.amsl.com>; Thu, 24 Feb 2022 03:31:01 -0800 (PST)
Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CA4D13A11E3 for <cbor@ietf.org>; Thu, 24 Feb 2022 03:31:00 -0800 (PST)
Received: by mail-wm1-x32f.google.com with SMTP id p4so924879wmg.1 for <cbor@ietf.org>; Thu, 24 Feb 2022 03:31:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iohk.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fmU9yP/RxQR4CYjSqMUDsOimE9DQ8D7+4+LkO4Q4QzA=; b=Mfux/whKfXKKyf7BygO73ZLcI8Re4K+QFgT6dSPjOwK2Jehj/WdqREhkjQ8LnF52f1 rTjqTyQnma90OTNzYrX871ThRaPKXALclNVtT9rJ+P2VMHWdaEDTDD1mwKiGUb4ERwrN HM+qzINlSgWUyEZLVeU4ouxjkxvutm1VusUTcM+oRoyNsQz3mSq5IvtYY+FvGAou0v22 7aJNH9DAsemNZnl5qU9dsLtVQ/yqcAZx1HEtYwPO6IIU4Y68n20xyf1rft+3lZJvpr97 4vpst9f+1jQGRs7V5mbf9dWqv6vNRc7eu6k1lRs9ViOKgzjotoEMPcvQP4u0ynzAu604 nxFA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fmU9yP/RxQR4CYjSqMUDsOimE9DQ8D7+4+LkO4Q4QzA=; b=43rmkb5E5PuAAp7QX6gbmWLSKYTWBouar448WHh+ghgfJqpNhHfhSWfYsYqRkg9MQe 7PQ74LOg3x7z4byaPfH1g+S6k5VWypK1iW/O0rMPULdOxBF6EuFiNp/YWClCALj+hPy7 y5Vi4JKm+ct2+dQfEvh95dLlzRer03Gd8xVpZapmKkprcKpmZ9MIlyrWOAaE+1TvBHKM iP6B25GMfurfH9OpYy9RDRIAREgke8MBDz9qhWX10gHZk2ngk4k+sCSP2JQoQuQwsvCa aCLfoUcyNgXNIUYSRepgSNrY4tkN+dnu4fYv7xrcawKvCstjxnOtF1oxA9Vt/bRJuAcf Qc4A==
X-Gm-Message-State: AOAM530lqlZchPkPp1O9SsEzxpOtGvVK4cKUPQldl/ialPaq2ppuQ2HI W8uIAv/K4o5D+NELVsCu6pOaw35YHHH95k3qCkO6TQ==
X-Google-Smtp-Source: ABdhPJx00pL/wui7V1+DpX5hR9vcLKcTr2m8CJ1D4+1mIduHg+97lVkxLFbJBXOAjYvTS0axX9Vo1ULpOVkbxQJerwY=
X-Received: by 2002:a05:600c:1c9a:b0:37b:bbbd:b8db with SMTP id k26-20020a05600c1c9a00b0037bbbbdb8dbmr1939760wms.185.1645702258802; Thu, 24 Feb 2022 03:30:58 -0800 (PST)
MIME-Version: 1.0
References: <9300a81abc33a45a9f3c7c1c62da88908280e54a.camel@well-typed.com> <1D3EF118-0223-43BC-81B2-369D4515DB21@tzi.org> <1ce2c092d3214d5fdce59435fc10b084c1ba48ca.camel@well-typed.com> <DFCBE61C-35B2-42A2-8D1A-A633CF939154@tzi.org> <2e10456c5791a422bf7218e7b84051f2b7121b66.camel@well-typed.com> <CAKoRMYGcwrhVWd-J3dX75WZfc+T_oyb6NCUNaeCXMW5_-VYdTw@mail.gmail.com> <52EDB93C-827A-465F-B644-51B3EF590D06@tzi.org> <C9D895BB-40E6-431F-958F-AC031DE4FB58@tzi.org> <CAKoRMYG9X0JF4ehkMc30_UNi0JtT2YMeG4RgxznS6O3Di6pkRA@mail.gmail.com> <CAKoRMYHwewaYxkX=CsfETBbdV7c9U97jfbd9xg=PyrMX5vJhnA@mail.gmail.com> <3B3B7EF0-152B-4015-8485-B204F7AEFFBC@tzi.org> <CAKoRMYFbEG=TkuZPPOiXv2DjEh23Ujd_Q44kQqiWPGc_0GMTuQ@mail.gmail.com> <CAKoRMYHnF6fGJp1dTrJnRHFTBOhreLRwzR_=cCckW1nBXOEz0A@mail.gmail.com> <E8A9E016-2248-4BB9-9864-C6C7D52A4AE5@tzi.org> <CAKoRMYE+gmWyCL9zYDa-O-c3KV_iuzgYuS+Q4fi=U7VHDNDtkQ@mail.gmail.com> <CAKoRMYFdAr1YY3mtmY0NU5X9Bk8_4WYh7bC0CtXpZc3toLSu8g@mail.gmail.com> <CAKoRMYH3MTMi_tX5KHF-O-DTKzopiGqe3fi6XjkPaGCM4823OQ@mail.gmail.com> <7dfd62ccb6c089af90c90f26a8945f23232ecbc1.camel@well-typed.com> <CAKoRMYEOo1Gqfc4W4k3NOLKpFa97Q9YzLCm3r0PJ13V2HJPf3A@mail.gmail.com> <2BBF6463-FDB2-4A8A-B20D-7A1AD976A90D@tzi.org> <CAKoRMYFi8uo2GfHA9s1n+-rMO8Ja9=2qMMzjS9Z=F9r3LFozRQ@mail.gmail.com> <8EA89504-C176-4850-9BB8-C7E7206374FF@tzi.org> <CAKoRMYGmOa0hzEFsJh8kpz0bU5x56Yc9P=DBK-ghU83gXxPv7A@mail.gmail.com> <CAKoRMYGUvmxufQUVyvX2mciq5LCmV0Nz-uE2MJn54GDBB+9DRw@mail.gmail.com> <CAKoRMYF_19V6mu4S9GVqfiNzyQVvvOzX6eYwHp_DtZQoG0xTKg@mail.gmail.com> <4B47F4D7-ADE3-4A22-8A5B-97F4E5FCD933@tzi.org> <26378.1645659038@localhost> <94A53D04-2C5A-4965-A65C-4B1C2815647A@tzi.org>
In-Reply-To: <94A53D04-2C5A-4965-A65C-4B1C2815647A@tzi.org>
From: Michael Peyton Jones <michael.peyton-jones@iohk.io>
Date: Thu, 24 Feb 2022 11:30:47 +0000
Message-ID: <CAKoRMYHyRz7he5qRjCLSQ0v0zDq6y16oJ_dHzjHJSjV1O013Gg@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: Michael Richardson <mcr+ietf@sandelman.ca>, Duncan Coutts <duncan@well-typed.com>, cbor@ietf.org, Jared Corduan <jared.corduan@iohk.io>, Alexander Byaly <alexander.byaly@iohk.io>
Content-Type: multipart/alternative; boundary="0000000000008ae4d005d8c1eadd"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/JZWaMyptJNpDq1WRqgK92vbvJ5Q>
Subject: Re: [Cbor] A CBOR tag for alternatives/unions, request for comments
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Feb 2022 11:31:07 -0000

I don't think we have any data on how common different number of
alternatives are, but we do have a large amount of experience writing code
in functional programming languages, which are one of the target use cases
for these tags.

The argument for going up to 8 alternatives in the most compact range is
simple:
- As computer scientists, we like to round things to powers of 2 :)
- Data types with more than 4 alternatives are common, so that is fewer
than we would like.
- Data types with more than 8 alternatives are rare.

Ergo: 8 should be enough for most purposes.

The argument for the second range is perhaps less clear. I could see the
argument for not including it at all. However, since the tag space is
comparatively more plentiful, it seemed cheap to use a modest amount of
them for this purpose. Again, the argument for the number of tags is from
experience: data types with more than 100 alternatives are *vanishingly*
rare, and likely machine-generated if they exist at all.

Best wishes,
Michael

On Thu, 24 Feb 2022 at 00:06, Carsten Bormann <cabo@tzi.org> wrote:

> On 2022-02-24, at 00:30, Michael Richardson <mcr+ietf@sandelman.ca> wrote:
> >
> >
> > {why haven't we adopted this document yet?}
>
> So it’s current author can munge it more freely, instead of waiting for a
> WG direction to form.
> (You can always do a hostile adoption, though ;-)
>
> > Hi, I've reviewed section 9 of notable-tags-06.
> > I raised a minor PR: https://github.com/cabo/notable-tags/pull/1
> > because I had to stare at the [uint,value] part really hard before I
> > understood things.  Part of the issue is that the "value" in the
> previous two
> > cases was just "obvious", and then I couldn't understand what it meant
> in the
> > third case.
>
> I just commented on the specific words.
>
> > I understood from the meeting this morning that tag 101 'e', is not
> actually
> > available.  But, actually it looks like it is according to IANA.org.
> > Or was it that another draft wanted it?
> > How about 'x' (NOPE) 'X' YES. 'u' YES.
> > https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml#cbor-tags
>
> (NOPE = tag 120 ‘x’ is already in use.)
> I’ll leave that choice to the proposers…
>
> > I think that the document might explain why we optimize the first 7
> (0..6),
> > rather than the first 3 or 5 or 9.  Are the next 128 worth it?
>
> We could skip the whole 1+2 tag range if we think the 24 1+1+1 cases from
> was-102 are enough of an expansion.
> But then we still have lots of 1+2 tags (0.07 % in use right now), if
> these actually are useful.
>
> I can’t really imagine more than 31 cases in one alternative being used
> very often, but then I’d like to hear from the people who actually use this.
>
> > I wonder if there are more than 7, are there typically more 100s?
> > If it turns out that 9 is the sweet spot, I'd be okay with more 1+1
> > and no 1+2.  I don't have any data on this.
> > I wonder if Michael, Duncan, Jared or Alexander do?
>
> Yes, we need a PDF (probability distribution function) :-)
> (Well, really, some intuition about that.)
>
> My hunch is that this is going to be approximately Zipf’s law, and that
> means the first 4 or 5 are really important, and then it goes down, but
> slowly.
>
> > I agree with the discussion that tag 101 could just encode 0..x
> > rather than uint+128.  I think that the simpler encoder might be worth it
> > here.  Particularly in some cases where I suspect which ones are the
> lower
> > numbered onces might not be known until later, and some code wants to
> come
> > back and fill in the alternatives afterwards, having left 1+4 for the
> uint or
> > something like that.
>
> See my previous message.
>
> Grüße, Carsten
>
>
>

-- 

*Michael Peyton Jones*
Software Engineering Lead | London, UK

Website: www.iohk.io <http://iohk.io>
Skype: michael.s.pj
Twitter: @mpeytonjones
PGP Key ID: 29F64616

[image: Input Output] <http://iohk.io>

[image: Twitter] <https://twitter.com/InputOutputHK> [image: Github]
<https://github.com/input-output-hk> [image: LinkedIn]
<https://www.linkedin.com/company/input-output-global>


This e-mail and any file transmitted with it are confidential and intended
solely for the use of the recipient(s) to whom it is addressed.
Dissemination, distribution, and/or copying of the transmission by anyone
other than the intended recipient(s) is prohibited. If you have received
this transmission in error please notify IOHK immediately and delete it
from your system. E-mail transmissions cannot be guaranteed to be secure or
error free. We do not accept liability for any loss, damage, or error
arising from this transmission