Re: [Cbor] Bignums and the generic data models (Re: 🔔 WGLC on draft-ietf-cbor-7049bis-09)

Jeffrey Yasskin <jyasskin@chromium.org> Wed, 15 January 2020 20:22 UTC

Return-Path: <jyasskin@google.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5AACE1209FF for <cbor@ietfa.amsl.com>; Wed, 15 Jan 2020 12:22:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.252
X-Spam-Level:
X-Spam-Status: No, score=-9.252 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=chromium.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DB8Nht9QwIj9 for <cbor@ietfa.amsl.com>; Wed, 15 Jan 2020 12:22:42 -0800 (PST)
Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F31211209EA for <cbor@ietf.org>; Wed, 15 Jan 2020 12:22:41 -0800 (PST)
Received: by mail-qk1-x729.google.com with SMTP id d71so17008243qkc.0 for <cbor@ietf.org>; Wed, 15 Jan 2020 12:22:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=7/CSFSqsckVHg2Z4mXfQ+GtxSjtElgqLcIeYLgllLGM=; b=dxD46P0BeY3GccLf6tSvJ28pQ0f0sNT1ncVRqxfZcE818xQl1rl6L5DmLhm2pzVVjh +uRzI5jUj4X+ec09ixWVN7lEji/enRUSMVioGjeB0lkkKOEMAFKgNID77SAT4ZafTQ+W 0LSCdv0TigxG0TeKp/pKM2wfVAKbb9GK/MIzU=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=7/CSFSqsckVHg2Z4mXfQ+GtxSjtElgqLcIeYLgllLGM=; b=JRFvKdEtlGN9EzUEdbiiyBi3L7u6wRdNnoDSkt8Drp/fMHIpw6ghY/NEIv4dsHDQqT JkIR5Hl7Vxpic0tR9SG2AzotqbmjVvYMRifjSu/5aPdotho4R5xE13u7mZiY4xUG0YW0 Zm7LCNHex0WCJf13+Aqq/nwobd90FhlWxZvUi97P3g12KePx1GgtTWJ8OeeQPhslnp+4 wITVMjqqSo34wzH6Zzx57ZPTAITdeKTC0xQVt4CFUdBGR39NZo8ejnWXAnQnFWtD91Gj 3Zt7FcpLD1+I+HB5NTUa2rAIKPOlT4XEKmtCpwpqRK+YeUdQxWZAYDazVZyy/EkDkuic 46tA==
X-Gm-Message-State: APjAAAVa+mRzrSkkZO1LV9sQKOCPO3k1pcV2WBUO23JXyW9afZrntLfd KkBplYsloeYfDpnhdaZZgdJ48o5sDnZMZVCl3OgPW7WXp4E=
X-Google-Smtp-Source: APXvYqxavA0T5AWUUGYy3vbX4gpd13fS10ioKLu1ae+Nug999IWgdbQDeIIcmYMSSEDMWzbJ6/Kwbuf7gXYTjY9KMhY=
X-Received: by 2002:a37:e505:: with SMTP id e5mr26526923qkg.324.1579119760662; Wed, 15 Jan 2020 12:22:40 -0800 (PST)
MIME-Version: 1.0
References: <293AFF31-D0EF-45D6-9B9D-E8136481C404@ericsson.com> <CANh-dXnPRd7w_z2LA0gYD0GHVbmych4BGA5_-vmJz+Zn1qBh_w@mail.gmail.com> <A4756143-2E47-474C-8EBD-0632DD3B659D@tzi.org>
In-Reply-To: <A4756143-2E47-474C-8EBD-0632DD3B659D@tzi.org>
From: Jeffrey Yasskin <jyasskin@chromium.org>
Date: Wed, 15 Jan 2020 12:22:29 -0800
Message-ID: <CANh-dXnfhoDO_Q9X+B73EtsHH89MjayeturjNOiBkKFaLL3NiQ@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: Jeffrey Yasskin <jyasskin@chromium.org>, "cbor@ietf.org" <cbor@ietf.org>, "draft-ietf-cbor-7049bis@ietf.org" <draft-ietf-cbor-7049bis@ietf.org>, "cbor-chairs@ietf.org" <cbor-chairs@ietf.org>, Francesca Palombini <francesca.palombini=40ericsson.com@dmarc.ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/jPaXoeenOdprFfwvS7DUJ-YcuH4>
Subject: Re: [Cbor] Bignums and the generic data models (Re: 🔔 WGLC on draft-ietf-cbor-7049bis-09)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jan 2020 20:22:43 -0000

On Tue, Jan 14, 2020 at 1:52 PM Carsten Bormann <cabo@tzi.org> wrote:
>
> On 2019-11-24, at 12:23, Jeffrey Yasskin <jyasskin@chromium.org> wrote:
> >
> > 3.4.4.  Bignums
> >       […]
> >       • "and preferred encoding never makes use of bignums that also can be expressed as basic integers (see below)." <- This seems inconsistent with "In the generic data model, bignum values are not equal to integers from the basic data model". If they're not the same value at the data model level, they can't be alternate encodings of each other.
>
> Hi Jeffrey,
>
> of your many useful comments in that message, let me pick this one because answering it is a prerequisite to getting the map validity text right.
>
> Indeed, in the basic (unextended) generic data model, bignums are different from (mt0/1) integers.
>
> The extended generic data model for tag 2/3 changes this.  It has to.  Consider:
>
> 2(h'00112233445566778899')
> and
> 2(h'112233445566778899')
>
> These are different in the basic generic data model, but both mean 316059037807746189465 in the extended generic data model, so that has to be different from the basic one.
>
> So given that we already had to extend the generic data model with the values (types) specific to the tag semantics, we also can (and should!) say that
>
> 2(h'12')
>
> means 18 in the extended generic data model, the preferred encoding of which is 0x12 and not 0xc24112.
>
> For a generic codec, both options are available for a specific tag number:
>
> - offer the basic generic data model.  This leaves it to the application to build the tag content in the way that fits to the tag definition and to interpret it this way.  It is then also up to the application to take care of map validity (e.g., by not using both 2(h'00112233445566778899’) and 2(h'112233445566778899’) as keys in the same map), and to issue preferred serialization if desired.
> — process the tag, i.e., offer the extended generic data model.  Now the generic codec can do all that work; for the example at hand, at the codec API you only see integers.
>
> This is the direction in which I think we need to go if we want to enable tag processing in generic codecs.
>
> (The cognitive dissonance here is maybe with ECMAscript bigints, which are a separate type from numbers.  But sending a bignum is not on its own a good way to distinguish a number as a ES bigint; probably a language-specific tag is a better way to make that language-specific distinction.)
>

I think either of the following situations would be plausible:

1. 2(h'12') is a different value from major-type-0 "18", in the same
way major-type-0 18 is a different value from floating "18.0".
2. 2(h'12') is the same value as major-type-0 "18", but both are
distinct from floating "18.0".

(It would also have been plausible to declare that integral 18 is the
same value as floating 18.0, but we discussed that a while ago and
decided to keep them separate.)

I don't think the wording in
https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#extended-generic-data-models
nails down whether tags can offer alternate representations for
elements of the basic generic data model.

It's true that a tag can define a non-injective mapping into its
domain, but I don't think that by itself requires that its domain be
able to overlap the domain of the basic generic data model.

The extended generic data model *does* specify that particular tag
values in the basic generic data model are equal to other values in
the extended generic data model, but in that case the identified
values have the same encoding, so I don't think it helps much in
deciding whether they can be equal to a value in the basic generic
data model.

My preference is to follow the decision we made to distinguish
integers from floating point, and also distinguish bignums from
primitives. However, if we decide not to do that, we need to remove
the sentence in
https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#bignums
saying "In the generic data model, bignum values are not equal to
integers from the basic data model" and the sentence in
https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#fractions
saying "As with bignums, values of different types are not equal in
the generic data model."

Jeffrey