Re: [Json] JSON Schema Language

Austin Wright <aaa@bzfx.net> Mon, 06 May 2019 07:23 UTC

Return-Path: <aaa@bzfx.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EAB01120046 for <json@ietfa.amsl.com>; Mon, 6 May 2019 00:23:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=bzfx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f9ZcUrjHaGvn for <json@ietfa.amsl.com>; Mon, 6 May 2019 00:23:08 -0700 (PDT)
Received: from mail-pg1-x543.google.com (mail-pg1-x543.google.com [IPv6:2607:f8b0:4864:20::543]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4A3E21200D5 for <json@ietf.org>; Mon, 6 May 2019 00:23:08 -0700 (PDT)
Received: by mail-pg1-x543.google.com with SMTP id i21so6008682pgi.12 for <json@ietf.org>; Mon, 06 May 2019 00:23:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bzfx.net; s=google; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=9dS3LLaJt+UNQUcJQ4y+k+pbJN9rIYBVWFwXmWrzCbw=; b=NfZ8QORDJf8PV5KugOCG8mQGZoKgG5Pv5+3wG1TK2c7B0KgRjziv3Mii3GDfnZlb2c ld1kizg61dTxIO65zdQFFCDrMNsuAAmzxumqmchGPE7QKI9A37GfA9BYwIaXu9dsHXEg nG4LbG4sL8/+qnh1rA770IwNvD8X6twQfMWdI=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=9dS3LLaJt+UNQUcJQ4y+k+pbJN9rIYBVWFwXmWrzCbw=; b=Lts3yL2rpph/qGYg9EvzaMcfey7GMJU738PlH4njT3z99lJyVeC9ZQgO0UK6jnwiCD 3fDswYqZZDjLq+/kyBKXCFEvn22H7GqYPQsmHRJEfWKIFv2b8LIR59bKB0Kh5rF/pIiI eeSzvsLKakLy16bUIZGV6gBgvj6hdo0Rf+7pIx30aMX1EB1eEN058lw6OmAEjJU9nycg w2boUDMbbHpI22kbH6N9aigNLwpyEJ6kQYdOub1/fxGr4EyuU5myCa1i1VYP9oYybsq8 YryIK6Z2OgtZvQ8yJOR6AYpSot3jAqi2P0h9zBVRIGGHt3JE32iur8qpprixi+1z+bCl Rw9A==
X-Gm-Message-State: APjAAAVc1BorQ1sBL4ES9roTOCxwZ0ynQ/XPeDrcf8hBnx+1NwbSHgZA zs2eQa8Km1t7BKP51M9E1V1nUw==
X-Google-Smtp-Source: APXvYqyfd7MJaFzHLCJgTA+qSTTxXEpVk9bfO8J2/2qOJgL8c0sfQpYOei47HqclHOOi6DG+6osxiw==
X-Received: by 2002:a63:161d:: with SMTP id w29mr30677700pgl.395.1557127387682; Mon, 06 May 2019 00:23:07 -0700 (PDT)
Received: from [192.168.0.116] ([184.101.46.90]) by smtp.gmail.com with ESMTPSA id i15sm8772836pfj.167.2019.05.06.00.23.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 May 2019 00:23:07 -0700 (PDT)
From: Austin Wright <aaa@bzfx.net>
Message-Id: <9D90C1F1-6747-4373-93B0-8D51C5B25F1C@bzfx.net>
Content-Type: multipart/alternative; boundary="Apple-Mail=_5C673D6C-1380-4479-AA66-B3B312D5B2F5"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\))
Date: Mon, 06 May 2019 00:22:45 -0700
In-Reply-To: <6260354b-aca2-e001-7145-148b32658416@gmail.com>
Cc: Carsten Bormann <cabo@tzi.org>, json@ietf.org, Ulysse Carion <ulysse@segment.com>
To: Anders Rundgren <anders.rundgren.net@gmail.com>
References: <CAJK=1RjV1uv0eOdtFZ8cKn-FfCwCiGP5r2hOz1UamiM6YV4H1A@mail.gmail.com> <39682ec8-f993-a44c-d3e2-1638d2c1608f@gmail.com> <29CAE1CE-D6CB-4796-B2F2-2095BE921385@tzi.org> <AD5ABD9C-F5F2-477D-B862-529C890D5472@bzfx.net> <DA1767B8-22D6-4EA9-8112-4B36B79E9039@tzi.org> <D21B379B-23CC-48B3-BE10-D2777308E2E0@bzfx.net> <40f80ea0-d130-3f3b-39fa-2c84e802ed55@gmail.com> <35E2623E-753D-4918-8AF4-BF0BC5DE4868@bzfx.net> <6260354b-aca2-e001-7145-148b32658416@gmail.com>
X-Mailer: Apple Mail (2.3445.104.8)
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/jaCHIiv2QFdLhDsoUXCFozrdREI>
Subject: Re: [Json] JSON Schema Language
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 May 2019 07:23:11 -0000


> On May 4, 2019, at 23:49, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:
> 
> On 2019-05-05 08:16, Austin Wright wrote:
>>> On May 4, 2019, at 22:07, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:
>>> 
>>> On 2019-05-05 04:51, Austin Wright wrote:
>>>>> On May 4, 2019, at 02:42, Carsten Bormann <cabo@tzi.org> wrote:
>>>>> 
>>>>> Curious:
>>>>> 
>>>>> On that web page, it also says “For consistency, integer JSON numbers SHOULD NOT be encoded with a fractional part.”
>>>>> 
>>>>> What does that mean?
>>>> It’s a non-normative suggestion with the aim of enhancing performance: Many programming languages distinguish between integers and IEEE floats by the presence of a decimal point. While JSON makes no such distinction (all numbers are arbitrary precision decimal), some parsers do make that distinction, and it’s slightly easier to determine if an int32_t is an integer than if a double is an integer.
>>> 
>>> in the C# example I provided before this is (by default) not non-normative, since it threw an exception.   Here is a little bit more detail:
>>> 
>>> class MyObject {
>>>  int Counter;
>>>   .
>>>   .
>>> }
>>> 
>>> Deserializing JSON into this type (which BTW works as as "schema"), REQUIRES "Counter" data to adhere to normal integer notation, including value span.
>>> 
>>> A scheme language should align with the actual use and interpretation of JSON.  This involves alternative serialization formats as well.  As an example monetary values are (probably without exceptions) expressed as JSON Strings since floating point is unsuited for decimal arithmetic.
>>> 
>>> Based on the RFC, one might come to the conclusion that JSON is an inferior information transfer format, but aided by external mapping it actually works extremely well, albeit being slightly verbose :)
>> The Newtonsoft.Json parser behavior would be incompatible with JSON Schema, according to what you provided. But I can’t imagine it’s much of an issue: If the encoder knows the value will always be an integer, why add a fractional part?
> 
> Right, my guess is that not a single mainstream serializer adds a fractional part to a number that can be exactly represented as an integer regardless if they underlying number type is an integer or not.
> 
> 
>> That behavior still seems arbitrary to me, though. Instead of erroring on the first period, all you have to do is error on the first nonzero after the period. Scientific notation is another matter, but presumably it allows 1.9e4 as a valid integer? (I’m not sure, off-hand.)
> 
> To me the whole idea of having a scheme is to create a *clean* description of an object.  Predecessors like XML Schema supported this in (IMO) a pretty good way.  However, this was straightforward since XML doesn't come with a built-in data model.
> 
> JSON was derived from JavaScript (anno 2003 or so) which didn't have an explicit integer type, only an unconstrained Number type.  This obviously creates certain issues when communicating with platforms having a completely different handling of numeric data.  The new proposal doesn't take this is consideration.
> 
> 
>> Probably a better way to approach JSON parsers is to raise an error if it can’t preserve all the information in the source. For example, you can preserve 1e10, 0.5, and even 0.1 (if you permit a small error, such that if you converted the float back to decimal, 0.1 would still be the closest decimal representation). But you can’t preserve 9007199254740993.5 as a double: it’s too many significant figures, and above that value, the precision is below 1. In this case, you would throw.
> 
> A schema should not try to change how parsers work.
> 
> 
>> Parsers that ensure the preservation of the encoded data like this might encourage better use of the JSON types, like for monetary values, even if you are reading into an IEEE floating point.
> 
> See previous point.

Can you elaborate on this? My point is primarily concerned with JSON parsers, not JSON the media type.

Suppose a string were too large to store in memory, most ECMAScript engines would bail with an uncatchable error. The suggestion is to implement this more robustly, like throw an error if a string would be too long, or if a number has too many significant figures; instead of either crashing or silently approximating the number.

But also, why not build a parser that uses a schema to influence how values are parsed? Suppose you have a dynamically typed language, and want to parse some value, e.g. sometimes as an int, sometimes as a bignum, depending on the property. You might provide a JSON Schema to specify this behavior.

>> Still, now that we bring it up, that SHOULD NOT seems suspicious. It’s stating the obvious.
>> Austin.
>>> 
>>> Cheers,
>>> Anders
>>> 
>>>> Cheers,
>>>> Austin.
>>>>> 
>>>>> Grüße, Carsten
>>>>> 
>>>>> 
>>>>>> On May 4, 2019, at 11:36, Austin Wright <aaa@bzfx.net> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On May 4, 2019, at 00:58, Carsten Bormann <cabo@tzi.org> wrote:
>>>>>>> 
>>>>>>> On May 4, 2019, at 06:47, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Example: although 10.0 is a valid JSON Number, in system where you expect
>>>>>>>> an integer, this should be flagged as a syntax error.
>>>>>>> 
>>>>>>> 10.0 is an integer number.
>>>>>>> 
>>>>>>> “Schema Languages” operate at the data model level.  In the JSON data model, there is only one kind of number.
>>>>>>> Of course, the JSON data model is not actually defined in a standard, which is one of the major shortcomings of JSON.
>>>>>>> 
>>>>>> 
>>>>>> JSON Schema handles this exactly the same way, defining a data model [1]. The lexical representation is surjective onto the data model: as you point out, 10.0 is the same as 10, which is an integer.
>>>>>> 
>>>>>> The one case where this might fall apart is if significant digits are important, such that 10.0 is different than 10.00 (i.e., 10.00 is more precise by an order of magnitude). However, I’m not aware of any JSON parsers that keep track of the precision of numbers, even ones that support arbitrary precision. I imagine scientific applications would want to store an explicit precision, since they’re not always powers-of-10 (e.g. {value: 32.0, precision: 0.5}).
>>>>>> 
>>>>>> [1] http://json-schema.org/latest/json-schema-core.html#rfc.section.4.2.1 "4.2.1. Instance Data Model"
>>>>>> 
>>>>>> Austin.