Re: [Json] JSON Schema Language

Anders Rundgren <anders.rundgren.net@gmail.com> Sun, 05 May 2019 06:49 UTC

Return-Path: <anders.rundgren.net@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7A951120049 for <json@ietfa.amsl.com>; Sat, 4 May 2019 23:49:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9K1wplbizVLa for <json@ietfa.amsl.com>; Sat, 4 May 2019 23:49:50 -0700 (PDT)
Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4303912000E for <json@ietf.org>; Sat, 4 May 2019 23:49:50 -0700 (PDT)
Received: by mail-wr1-x443.google.com with SMTP id l2so13033255wrb.9 for <json@ietf.org>; Sat, 04 May 2019 23:49:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=B2U0gGP3MSVnIm92Zl8T9PiBgYZZkJNDDCIEsu/+DSo=; b=IDiFaLd+UwlEbN9B26WNZXX9E9rHgO+7huUZmpd7guPc3lG++ay8XF/A25N2atIa2Q H8hGewgfn7ilkUeNI9meQZprq4VHgMc+w+YvCAhczc2W7LOh2WIo+7nJahOG8BcPmBXb 3bWW44VTZRK2gGXZc+vTbaiqe4WJwDuyU4RpBI/guBGFTVoauGGC1mYoxZqlLtbjlZzY 5uC+cUjDQV50aPqudrAhDOLn4e27h1POQyY8tjhRoBP/7O1UFVp/qRsNW/QJZ1iq2I81 OzmYFFGZIvr48ebHM/SsVp/KvOLmEr62aKP7YrVrSSJR3QalgevpVVciHYfkk1UdwGpd O3Aw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=B2U0gGP3MSVnIm92Zl8T9PiBgYZZkJNDDCIEsu/+DSo=; b=CgyxO8d7Bd+tqxI5+tCOLCAjLY1eVDWn4wOyZ5U+be85XaF0RGLR5JatH85lkGA4ZG 1nStdycjnIZPv3wtjTGtA0Sm46xytI5aJgMxIEnVowSkYhv0Mh4UhHBV0P9bbffrML5T ae4C8QxijUorRJefLvxW7th5s1cVnuHLn6crMOjZdHryX1QxlsWGYFMh6i48Bm8wH7Gw UdpA2/8/ObHeij9Hz54VzlWbuN24a4zAJcLgw78XDz0+ytffRS2wUHKhcfChRHWUI7uN pLxCSIzNRImBcqASTTdAM9HehCBg9KJqNgbf4ugjajHjza9xrxzTibfWQiB8n+Q6FiHd oNFg==
X-Gm-Message-State: APjAAAVzr8nSc9Ly58ppD0sSn0wXyczHSQkf+pYTcmFhW9Hpz3quS8K2 HtMNjQDqHUh8tfCd9ClraQg=
X-Google-Smtp-Source: APXvYqx6OkPWlW2gCZ0vk+1CeNjOIWs+g0uOgdggVzhB38ZQ263YCb2obIyUPhE8/NuNydbgFb9+IQ==
X-Received: by 2002:adf:8122:: with SMTP id 31mr2445989wrm.112.1557038988569; Sat, 04 May 2019 23:49:48 -0700 (PDT)
Received: from [192.168.1.79] (25.131.146.77.rev.sfr.net. [77.146.131.25]) by smtp.googlemail.com with ESMTPSA id j10sm20730128wrb.0.2019.05.04.23.49.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 04 May 2019 23:49:46 -0700 (PDT)
To: Austin Wright <aaa@bzfx.net>
Cc: Carsten Bormann <cabo@tzi.org>, json@ietf.org, Ulysse Carion <ulysse@segment.com>
References: <CAJK=1RjV1uv0eOdtFZ8cKn-FfCwCiGP5r2hOz1UamiM6YV4H1A@mail.gmail.com> <39682ec8-f993-a44c-d3e2-1638d2c1608f@gmail.com> <29CAE1CE-D6CB-4796-B2F2-2095BE921385@tzi.org> <AD5ABD9C-F5F2-477D-B862-529C890D5472@bzfx.net> <DA1767B8-22D6-4EA9-8112-4B36B79E9039@tzi.org> <D21B379B-23CC-48B3-BE10-D2777308E2E0@bzfx.net> <40f80ea0-d130-3f3b-39fa-2c84e802ed55@gmail.com> <35E2623E-753D-4918-8AF4-BF0BC5DE4868@bzfx.net>
From: Anders Rundgren <anders.rundgren.net@gmail.com>
Message-ID: <6260354b-aca2-e001-7145-148b32658416@gmail.com>
Date: Sun, 05 May 2019 08:49:43 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <35E2623E-753D-4918-8AF4-BF0BC5DE4868@bzfx.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/2nh2cyS9sUbO31tpiGWk1IyP3hk>
Subject: Re: [Json] JSON Schema Language
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 05 May 2019 06:49:53 -0000

On 2019-05-05 08:16, Austin Wright wrote:
> 
> 
>> On May 4, 2019, at 22:07, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:
>>
>> On 2019-05-05 04:51, Austin Wright wrote:
>>>> On May 4, 2019, at 02:42, Carsten Bormann <cabo@tzi.org> wrote:
>>>>
>>>> Curious:
>>>>
>>>> On that web page, it also says “For consistency, integer JSON numbers SHOULD NOT be encoded with a fractional part.”
>>>>
>>>> What does that mean?
>>> It’s a non-normative suggestion with the aim of enhancing performance: Many programming languages distinguish between integers and IEEE floats by the presence of a decimal point. While JSON makes no such distinction (all numbers are arbitrary precision decimal), some parsers do make that distinction, and it’s slightly easier to determine if an int32_t is an integer than if a double is an integer.
>>
>> in the C# example I provided before this is (by default) not non-normative, since it threw an exception.   Here is a little bit more detail:
>>
>> class MyObject {
>>   int Counter;
>>    .
>>    .
>> }
>>
>> Deserializing JSON into this type (which BTW works as as "schema"), REQUIRES "Counter" data to adhere to normal integer notation, including value span.
>>
>> A scheme language should align with the actual use and interpretation of JSON.  This involves alternative serialization formats as well.  As an example monetary values are (probably without exceptions) expressed as JSON Strings since floating point is unsuited for decimal arithmetic.
>>
>> Based on the RFC, one might come to the conclusion that JSON is an inferior information transfer format, but aided by external mapping it actually works extremely well, albeit being slightly verbose :)
> 
> The Newtonsoft.Json parser behavior would be incompatible with JSON Schema, according to what you provided. But I can’t imagine it’s much of an issue: If the encoder knows the value will always be an integer, why add a fractional part?

Right, my guess is that not a single mainstream serializer adds a fractional part to a number that can be exactly represented as an integer regardless if they underlying number type is an integer or not.


> That behavior still seems arbitrary to me, though. Instead of erroring on the first period, all you have to do is error on the first nonzero after the period. Scientific notation is another matter, but presumably it allows 1.9e4 as a valid integer? (I’m not sure, off-hand.)

To me the whole idea of having a scheme is to create a *clean* description of an object.  Predecessors like XML Schema supported this in (IMO) a pretty good way.  However, this was straightforward since XML doesn't come with a built-in data model.

JSON was derived from JavaScript (anno 2003 or so) which didn't have an explicit integer type, only an unconstrained Number type.  This obviously creates certain issues when communicating with platforms having a completely different handling of numeric data.  The new proposal doesn't take this is consideration.


> Probably a better way to approach JSON parsers is to raise an error if it can’t preserve all the information in the source. For example, you can preserve 1e10, 0.5, and even 0.1 (if you permit a small error, such that if you converted the float back to decimal, 0.1 would still be the closest decimal representation). But you can’t preserve 9007199254740993.5 as a double: it’s too many significant figures, and above that value, the precision is below 1. In this case, you would throw.

A schema should not try to change how parsers work.


> 
> Parsers that ensure the preservation of the encoded data like this might encourage better use of the JSON types, like for monetary values, even if you are reading into an IEEE floating point.

See previous point.


> Still, now that we bring it up, that SHOULD NOT seems suspicious. It’s stating the obvious.
> 
> Austin.
> 
>>
>> Cheers,
>> Anders
>>
>>> Cheers,
>>> Austin.
>>>>
>>>> Grüße, Carsten
>>>>
>>>>
>>>>> On May 4, 2019, at 11:36, Austin Wright <aaa@bzfx.net> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On May 4, 2019, at 00:58, Carsten Bormann <cabo@tzi.org> wrote:
>>>>>>
>>>>>> On May 4, 2019, at 06:47, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:
>>>>>>>
>>>>>>> Example: although 10.0 is a valid JSON Number, in system where you expect
>>>>>>> an integer, this should be flagged as a syntax error.
>>>>>>
>>>>>> 10.0 is an integer number.
>>>>>>
>>>>>> “Schema Languages” operate at the data model level.  In the JSON data model, there is only one kind of number.
>>>>>> Of course, the JSON data model is not actually defined in a standard, which is one of the major shortcomings of JSON.
>>>>>>
>>>>>
>>>>> JSON Schema handles this exactly the same way, defining a data model [1]. The lexical representation is surjective onto the data model: as you point out, 10.0 is the same as 10, which is an integer.
>>>>>
>>>>> The one case where this might fall apart is if significant digits are important, such that 10.0 is different than 10.00 (i.e., 10.00 is more precise by an order of magnitude). However, I’m not aware of any JSON parsers that keep track of the precision of numbers, even ones that support arbitrary precision. I imagine scientific applications would want to store an explicit precision, since they’re not always powers-of-10 (e.g. {value: 32.0, precision: 0.5}).
>>>>>
>>>>> [1] http://json-schema.org/latest/json-schema-core.html#rfc.section.4.2.1 "4.2.1. Instance Data Model"
>>>>>
>>>>> Austin.
>>>>
>>
>