Re: [Cbor] Bignums and the generic data models (Re: 🔔 WGLC on draft-ietf-cbor-7049bis-09)

I’m in favor of bignums being quite separate:
- There are very very few use cases for them, many fewer than floating point, as we just don’t need the the range and the precision at the same time
- CPUs and programming languages don’t generally or generically support them — if you are writing code for bignums you are writing very specific code

For example, I don’t think it would be useful for a generic decoder to convert bignums less than 0xffffffff to uint32_t and those larger than that to a byte array when returned to the caller.

I don’t think anyone is going to send smaller integer values as the bignum type (some integer values like 0xffffff encode smaller as a bignum type than they would as type 0 and 1, but I don’t think we want to encourage that). 

LL

Bignum spelling auto correct: big numb lignum bingo bigness!

> On Jan 15, 2020, at 12:22 PM, Jeffrey Yasskin <jyasskin@chromium.org> wrote:
> 
> On Tue, Jan 14, 2020 at 1:52 PM Carsten Bormann <cabo@tzi.org> wrote:
>> 
>> On 2019-11-24, at 12:23, Jeffrey Yasskin <jyasskin@chromium.org> wrote:
>>> 
>>> 3.4.4.  Bignums
>>>      […]
>>>      • "and preferred encoding never makes use of bignums that also can be expressed as basic integers (see below)." <- This seems inconsistent with "In the generic data model, bignum values are not equal to integers from the basic data model". If they're not the same value at the data model level, they can't be alternate encodings of each other.
>> 
>> Hi Jeffrey,
>> 
>> of your many useful comments in that message, let me pick this one because answering it is a prerequisite to getting the map validity text right.
>> 
>> Indeed, in the basic (unextended) generic data model, bignums are different from (mt0/1) integers.
>> 
>> The extended generic data model for tag 2/3 changes this.  It has to.  Consider:
>> 
>> 2(h'00112233445566778899')
>> and
>> 2(h'112233445566778899')
>> 
>> These are different in the basic generic data model, but both mean 316059037807746189465 in the extended generic data model, so that has to be different from the basic one.
>> 
>> So given that we already had to extend the generic data model with the values (types) specific to the tag semantics, we also can (and should!) say that
>> 
>> 2(h'12')
>> 
>> means 18 in the extended generic data model, the preferred encoding of which is 0x12 and not 0xc24112.
>> 
>> For a generic codec, both options are available for a specific tag number:
>> 
>> - offer the basic generic data model.  This leaves it to the application to build the tag content in the way that fits to the tag definition and to interpret it this way.  It is then also up to the application to take care of map validity (e.g., by not using both 2(h'00112233445566778899’) and 2(h'112233445566778899’) as keys in the same map), and to issue preferred serialization if desired.
>> — process the tag, i.e., offer the extended generic data model.  Now the generic codec can do all that work; for the example at hand, at the codec API you only see integers.
>> 
>> This is the direction in which I think we need to go if we want to enable tag processing in generic codecs.
>> 
>> (The cognitive dissonance here is maybe with ECMAscript bigints, which are a separate type from numbers.  But sending a bignum is not on its own a good way to distinguish a number as a ES bigint; probably a language-specific tag is a better way to make that language-specific distinction.)
>> 
> 
> I think either of the following situations would be plausible:
> 
> 1. 2(h'12') is a different value from major-type-0 "18", in the same
> way major-type-0 18 is a different value from floating "18.0".
> 2. 2(h'12') is the same value as major-type-0 "18", but both are
> distinct from floating "18.0".
> 
> (It would also have been plausible to declare that integral 18 is the
> same value as floating 18.0, but we discussed that a while ago and
> decided to keep them separate.)
> 
> I don't think the wording in
> https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#extended-generic-data-models
> nails down whether tags can offer alternate representations for
> elements of the basic generic data model.
> 
> It's true that a tag can define a non-injective mapping into its
> domain, but I don't think that by itself requires that its domain be
> able to overlap the domain of the basic generic data model.
> 
> The extended generic data model *does* specify that particular tag
> values in the basic generic data model are equal to other values in
> the extended generic data model, but in that case the identified
> values have the same encoding, so I don't think it helps much in
> deciding whether they can be equal to a value in the basic generic
> data model.
> 
> My preference is to follow the decision we made to distinguish
> integers from floating point, and also distinguish bignums from
> primitives. However, if we decide not to do that, we need to remove
> the sentence in
> https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#bignums
> saying "In the generic data model, bignum values are not equal to
> integers from the basic data model" and the sentence in
> https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#fractions
> saying "As with bignums, values of different types are not equal in
> the generic data model."
> 
> Jeffrey
> 
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor