Re: [Json] JSON Schema Language is nearly done: int53

Anders Rundgren <anders.rundgren.net@gmail.com> Thu, 01 August 2019 05:11 UTC

Return-Path: <anders.rundgren.net@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B60F7120026 for <json@ietfa.amsl.com>; Wed, 31 Jul 2019 22:11:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rU-CI0dTWpTn for <json@ietfa.amsl.com>; Wed, 31 Jul 2019 22:11:42 -0700 (PDT)
Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9F93512000F for <json@ietf.org>; Wed, 31 Jul 2019 22:11:41 -0700 (PDT)
Received: by mail-wm1-x32a.google.com with SMTP id s15so40533795wmj.3 for <json@ietf.org>; Wed, 31 Jul 2019 22:11:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Ah6QY63rkUFRj1fHrRh+Stko1hCnxAgHstI5EL3NrSg=; b=TPpflBjyHUtuXcla9Tr++ZCDfOWi15YrZmEKvKnooK91dZre0TAfQWw7C+9qaHJbci /2P6VD+HxPFZ4nFTyMqOjnbLG6xbBXWYujhpa6s9Z0j4a5obqm0GQprv5G8+PJOcG+1u eD6PKuGDxpr1pUMl4tIdLlkRzgWpGoEVyCkp51RHCUj0oZCCjcdmPeNqh8YdnE35JeIW cFYNyExmaUlp3UE3m38m2QaCE4SXDqFnNGo1HedrBrawj7W44UCKC4Sqbs+ZYWOJ11q0 Lei+6V0hNhNxLRS+VtU2wHSvzrhUvlW4svbWEhONWy8nEY9+5lNQvEcwBZqY+F/vX8Tz HEBg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Ah6QY63rkUFRj1fHrRh+Stko1hCnxAgHstI5EL3NrSg=; b=PygkevrPtXnOrdy8bJqZcJUHwtKKZS6JVAAHTkGP1sIwK7OPdvA2TQODLsOf+1WPTD 84vL/qrxq5W9MOn2zZtfTxFoloCRlPFy8+xErfZMoRl/WuGB8YUZKXxThKAwYMu4ZpHk DzhY3OYVyM1272TLioZLyGNdNqsWcuTwiMSeblC5ZNold6gg7mNz1k1T+Ox35GwvD7n3 n7lj6SZbWU6sUv9ZoESlaiHoll71ViEQROZdm8V96j0mFheAwUIz7ri/JGEITmkAw53F suUnhqaFTUx/MgBRUU85R4L2zemN3Iq2XjBlRuMI7bNgoHr7zzXkr4+Fr/BqEPb6JPpJ DYzA==
X-Gm-Message-State: APjAAAW1qHbCJ2I9T9kOl7nX/fyJE9NWmxo9w6FCNInx4aJLlL9nAFOm bT2sEPH85DlNSx3swAFnfRcUdH75
X-Google-Smtp-Source: APXvYqwGdgvwH2ZQYAiodotWgADfQ5HKvB5A1Ilz3aBJ+gSPHlTmwAqFWALvVUBBYiBwDcIBTlFeDg==
X-Received: by 2002:a1c:1bd7:: with SMTP id b206mr110291315wmb.85.1564636299313; Wed, 31 Jul 2019 22:11:39 -0700 (PDT)
Received: from [192.168.1.79] (25.131.146.77.rev.sfr.net. [77.146.131.25]) by smtp.googlemail.com with ESMTPSA id r5sm74944287wmh.35.2019.07.31.22.11.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 31 Jul 2019 22:11:38 -0700 (PDT)
To: Ulysse Carion <ulysse@segment.com>
Cc: "Manger, James" <James.H.Manger@team.telstra.com>, JSON WG <json@ietf.org>
References: <SY2PR01MB27642C6983E387C397B11581E5DD0@SY2PR01MB2764.ausprd01.prod.outlook.com> <CAJK=1RjhuCYJe4-BSB++8+-dHG3LV8TdqsnFEPAoAkfJ1mOE3A@mail.gmail.com> <SY2PR01MB2764AD4523625006B1F3DFEBE5DC0@SY2PR01MB2764.ausprd01.prod.outlook.com> <aeb4dfcc-4227-2d8e-d1dc-914d078450fe@gmail.com> <SY2PR01MB2764600E16BA7A19025964EFE5DF0@SY2PR01MB2764.ausprd01.prod.outlook.com> <7f663d84-eb38-271f-12c3-a0f4a2261090@gmail.com> <CAJK=1RjqqtZvdWBJNXR6ebKFT1KSkNJHQjiydQ7RJX6aPyB+9g@mail.gmail.com>
From: Anders Rundgren <anders.rundgren.net@gmail.com>
Message-ID: <7004bfd3-ad9d-20c0-fd45-4d03049edbb9@gmail.com>
Date: Thu, 01 Aug 2019 07:11:36 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <CAJK=1RjqqtZvdWBJNXR6ebKFT1KSkNJHQjiydQ7RJX6aPyB+9g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/gvDLjURg-brXhqVdIh89SD4EFc4>
Subject: Re: [Json] JSON Schema Language is nearly done: int53
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Aug 2019 05:11:45 -0000

On 2019-08-01 05:55, Ulysse Carion wrote:
> Replying to Anders and James at once:
> 
>  > The purpose of fixed sized integers in data formats is not to save bytes, it is about valid ranges and compatibility with languages like Java and C++.
> 
> Agreed. The numerical types are about writing correct software more easily in modern programming languages, not about optimizing bytes.
> 
>  > Dropping int64 or BigInteger from *programming languages* is not an option. But dropping the ability for a JSON schema to suggest that arbitrary values of those types should be serialized to JSON numbers would be sensible.
> 
> This is my view as well.

I don't understand what exactly would be dropped here...

> 
>  > This is where we (as a community of users) have a problem.  Jackson (now adopted by MSFT) indeed serializes BigInteger as JSON Number.  Other parties like Oracle have taken another (presumable unique) approach by using JSON number for int53-compliant BigIntegers and string notation for BigIntegers that doesn't fits int53.  I believe Oracle are considering a redesign though.
> 
> It's precisely this sort of complication that motivates dropping numbers above int32/uint32 from the JSL spec. The spec remains useful without them.
> 
> The same goes for monetary data types. There are too many options, and it's better for the spec to stick to relatively uncontroversial things.

I have only found two open standards using monetary types and they used the same notion.  There is always a possibility creating a unique solution using JSON Number or JSON String.

> 
>  > Mind you, I suspect 1.23E8 will break many implementations that expect an integer so perhaps we should require the no-fraction-no-exponent form.
> 
> This has been discussed in this thread before. Rough consensus suggests that it's better to simply require a zero fractional part in the number value,

> rather than impose constraints at the JSON parser level.

Right, JSL *seems* to presume that you build it on top of a standard JSON parser.  By doing that you inherit platform specific quirks that do not have a natural place in a universal JSON schema processor.

For documentation and code-gen purposes this idea should work just fine, but for run-time data validation it does not and that is the only case where the integer syntax issue would show up.

AFAICT you cannot even do range checking in a reliable manner using "any" JSON parser. Chrome:

JSON.parse('{"g":2e400}')
 > {g: Infinity}

I don't think mathematicians in general consider 2e400 as infinity :)

Before I became a JSON convert, I used XSD (XML Schema Definitions) extensively.  XSD didn't have the limitations of JSL (with respect to numbers), although it was designed 20 years ago. Unfortunately numbers in JSON are more difficult than in XML since there are two entirely different ways to textually represent numbers.

-- Anders



> 
> On Wed, Jul 31, 2019 at 12:14 AM Anders Rundgren <anders.rundgren.net@gmail.com <mailto:anders.rundgren.net@gmail.com>> wrote:
> 
>     On 2019-07-31 08:56, Manger, James wrote:
>      >> Dropping int64 is hardly an option, neither is BigInteger.   int53 is a special that for example fits UNIX "epoch".
>      >
>      > Dropping int64 or BigInteger from *programming languages* is not an option. But dropping the ability for a JSON schema to suggest that arbitrary values of those types should be serialized to JSON numbers would be sensible.
>      >
>      >> Arbitrarily sized integers are used by tons of applications based on JSON messaging and is available for most platforms including JS.
>      >
>      > Sure, but they are not serialized to a JSON number. They are serialized to a JSON string of some form (eg "65537.00" or "AQAB").
> 
>     This is where we (as a community of users) have a problem.  Jackson (now adopted by MSFT) indeed serializes BigInteger as JSON Number.  Other parties like Oracle have taken another (presumable unique) approach by using JSON number for int53-compliant BigIntegers and string notation for BigIntegers that doesn't fits int53.  I believe Oracle are considering a redesign though.
> 
>     The following lines in Chrome shows this division in clear:
> 
>     var big = 5n;
>     JSON.stringify(big)
> 
>       > VM887:1 Uncaught TypeError: Do not know how to serialize a BigInt
>           at Object.stringify (<anonymous>)
>           at <anonymous>:1:6
> 
>     Anders
> 
>      >
>      >> That there are multiple ways of formatting integers in JSON is unfortunately a reality all JSON tool maker must (in some way) address.
>      >
>      > Are you referring to 123, 123.0, 12.3E+1 and 1.23e2 as the multiple ways? Or 123, "123", "123.00", and "ew==" as the multiple ways?
>      >
>      >> I would reverse the integer syntax and claim that integers SHOULD conform to standard integer (not JSON) syntax since a standard should not break the multitude of JSON tools that already implement this.
>      >
>      > It sounds like you want JSON serializers to never use the fraction or exponent notations when a schema says a field is an integer. That's adding a distinction that doesn't exist in JSON so it will break things. A tool parsing JSON *without knowing the schema* may well use a double for a number, which it may then serialize in exponent notation even if it is an integer. For instance, in Java Double.toString(123000000) returns "1.23E8".
>      > Mind you, I suspect 1.23E8 will break many implementations that expect an integer so perhaps we should require the no-fraction-no-exponent form.
>      >
>      > This is a separate issue to this thread, however, which was whether a JSON schema should offer "int53" as a type, and if it should offer "{u}int{8|16|32}" as types.
>      >
>      > --
>      > James Manger
>      >
> 
>     _______________________________________________
>     json mailing list
>     json@ietf.org <mailto:json@ietf.org>
>     https://www.ietf.org/mailman/listinfo/json
>