Re: [Json] JSON Schema Language

Austin Wright <aaa@bzfx.net> Mon, 06 May 2019 21:01 UTC

Return-Path: <aaa@bzfx.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E8B78120155 for <json@ietfa.amsl.com>; Mon, 6 May 2019 14:01:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=bzfx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BfVWSQB5XsoE for <json@ietfa.amsl.com>; Mon, 6 May 2019 14:00:58 -0700 (PDT)
Received: from mail-pf1-x442.google.com (mail-pf1-x442.google.com [IPv6:2607:f8b0:4864:20::442]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EDCB61200EA for <json@ietf.org>; Mon, 6 May 2019 14:00:57 -0700 (PDT)
Received: by mail-pf1-x442.google.com with SMTP id 13so7090640pfw.9 for <json@ietf.org>; Mon, 06 May 2019 14:00:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bzfx.net; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=zO7BZmx1F+JD+RO+4XUg+Dxco90g4LQLpverk/s3D2o=; b=4XG0SlOaPTtDyk+BFL7Qd9gLychDknPJeiU6om/Sq5QagcuzFAhsqKUCYJjV953C2x 6j7PPvje4RJXlzlF+vnaZSAu++UWsETExahMJdbhYY6bGgt04cjKbMshNR9kGuG+K7eS XS3pTRNb6Ctc37WvdAX/whVtbUMHW/c4oRMKI=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=zO7BZmx1F+JD+RO+4XUg+Dxco90g4LQLpverk/s3D2o=; b=bWzGLRpVXW7coO1Cm8U+ydeH6F3PwAUSp/fV9M5y1IJr5ae7zg9l/yrVInRDx6cp3U 4AKiFrCpHqT3+uDI8o60cklSc9wwLy/d9LlaXFYUR4lHV/PQNryGk64qm12FjAHEhzBP uE5gimCa3W4rXjNvcmZd4ZK22ZE2nIWBExXqoys8fMdrHTAWz3fWGZb7XXH2iLAGCAa+ +0BA71YbNXkxImWHfu21wqpe1tiuP6uIfNgXpHLt2a6rUBF8kkRDAayQ7COa/TuOyxus Gp5HIduDYLQSTyuKHx73XoY1ezeRe0IWFts0SxPOXoXO0noTyS+i8hfK5jxKOcik0fTX Gc0g==
X-Gm-Message-State: APjAAAVExMULfZqquryoU7NEfjhBHVWlnQmX319QJ5exnCtoojaoHRAV VJmeYzhqMfAd2wrgRAZB5SXpRg==
X-Google-Smtp-Source: APXvYqxWuc8TF/R0+cHNCGT2S5f3t6a2Q/2qHEdMfm37lR+rHfe1ANlvENRHugCPnnboKgkKEAEi8g==
X-Received: by 2002:a62:864a:: with SMTP id x71mr37495601pfd.228.1557176457074; Mon, 06 May 2019 14:00:57 -0700 (PDT)
Received: from [192.168.0.116] ([184.101.46.90]) by smtp.gmail.com with ESMTPSA id m123sm8726467pfm.39.2019.05.06.14.00.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 May 2019 14:00:56 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\))
From: Austin Wright <aaa@bzfx.net>
In-Reply-To: <20190506192453.GK21049@localhost>
Date: Mon, 06 May 2019 14:00:54 -0700
Cc: Carsten Bormann <cabo@tzi.org>, json@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <753A412B-299F-400F-9D19-A9688068D842@bzfx.net>
References: <39682ec8-f993-a44c-d3e2-1638d2c1608f@gmail.com> <29CAE1CE-D6CB-4796-B2F2-2095BE921385@tzi.org> <AD5ABD9C-F5F2-477D-B862-529C890D5472@bzfx.net> <DA1767B8-22D6-4EA9-8112-4B36B79E9039@tzi.org> <D21B379B-23CC-48B3-BE10-D2777308E2E0@bzfx.net> <40f80ea0-d130-3f3b-39fa-2c84e802ed55@gmail.com> <35E2623E-753D-4918-8AF4-BF0BC5DE4868@bzfx.net> <6260354b-aca2-e001-7145-148b32658416@gmail.com> <9D90C1F1-6747-4373-93B0-8D51C5B25F1C@bzfx.net> <751DAC92-D70C-4C5E-9C61-954D6E300A1F@tzi.org> <20190506192453.GK21049@localhost>
To: Nico Williams <nico@cryptonector.com>
X-Mailer: Apple Mail (2.3445.104.8)
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/Ph1H2OgcTQvG9t78ZvGNxkc8O5g>
Subject: Re: [Json] JSON Schema Language
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 May 2019 21:01:01 -0000


> On May 6, 2019, at 12:24, Nico Williams <nico@cryptonector.com> wrote:
> 
> On Mon, May 06, 2019 at 09:32:13AM +0200, Carsten Bormann wrote:
>> On May 6, 2019, at 09:22, Austin Wright <aaa@bzfx.net> wrote:
>>> But also, why not build a parser that uses a schema to influence how
>>> values are parsed?
> 
> Because some of us have generic parsers with associated DSLs and we
> don't want to be left out.
> 
> Specifically, in my case, jq https://stedolan.github.io/jq -- something
> of an XSLT/XPath for JSON.
> 
> It's perfectly reasonable to build a jq application that adheres to a
> JSON schema, but it's not reasonable to expect the jq encoder to know to
> do encoding in context-specific ways -- the jq encoder is too generic
> and divorced from the rest of the jq machinery to possibly be able to do
> that.

Left out of what, exactly? See below.

> 
> By pursuing a schema design that requires context-specific parsing or
> encoding behaviors, though you could say that the result looks like
> JSON, it precludes use of existing JSON tooling such as jq.
> 
> Substitute ECMAScript for jq if you like -- the result is the same.
> 
> I strenuously object to this sort of schema.
> 
> As a maintainer of jq, I wouldn't know what to say to users who then
> complain about non-interop with such a schema other than to point to
> this thread and blame the people who allowed it to happen.  But the
> damage to jq would be real and unavoidable.  To avoid that damage, if I
> cannot convince you, then I assure you that I would appeal any finding
> of consensus for such a schema design.
> 
> I'm sorry, but 10.0 is perfectly acceptable as an integer, and MUST be
> accepted as an integer when you expect an integer even if the JSON RFC
> says encoders SHOULD NOT include zero fractional parts.
> 
> I would suggest that if you have a schema that says "and this is an
> integer", the schema MUST accept zero fractional parts, and SHOULD let
> you specify what to do if a non-integer real number is parsed for that
> field (e.g., truncate the fractional part, floor, round).

I’m not suggesting a variance from the JSON semantics.

My suggestion is an alternative to parsers that lose data when they parse JSON documents, for example, ECMAScript's JSON.parse number parsing. You can, of course, always parse a JSON document according to the generic semantics.

The issue here is how does a program parse a JSON document that with a wider range of values than what the program has room for? Either a string that’s too long, an array with too many items, or a number with too many significant figures?

Normally you would have to write your own parser, or use a tokenizer that preserves the lexical values without casting them to native types. Then you perform your own validation in code, and decide how to convert a lexical JSON number into a native number, depending on context.

Right now, programs usually do this in code. My suggestion is to write these rules as a document. The idea of JSON Schema is you make assertions about the document—“this field must be an RFC3339 date string”—and let the parser or application make additional optimizations (in this case, an RFC3339 string will never be more than 25 characters).

> 
>> That is exactly what most of the members of this WG are afraid of when
>> the talk comes to “schemas”.
> 
> +1
> 
>> XML (and later JSON) learned one thing from the problems of SGML:
>> schema-oblivious decoders are more robust than schema-depending ones.
>> (They are, in the first place, and protocol evolution only amplifies
>> this.)
> 
> Not only that!
> 
> Can you imagine using XSLT/XPath if some XML schema requires different
> parsing behavior for different elements?!  It wouldn't be possible to do
> that *and* still be able to use XSLT/XPath because XSLT/XPath processors
> generally have a built-in generic XML parser.  It would destroy the
> utility of XSLT/XPath!
> 
> And that proves that an XML schema of that sort... would yield something
> that is NOT XML, but merely looks like XML.
> 
> The same exact reasoning applies in the case of JSON.  The only
> difference being that JSON doesn't [yet] have a standard equivalent to
> XSLT/XPath, but since generic parsers and encoders exist, and since
> node.jq, JavaScript, ECMAScript, jq, and such exist, the point stands.
> 
>>> Suppose you have a dynamically typed language, and want to parse
>>> some value, e.g. sometimes as an int, sometimes as a bignum,
>>> depending on the property. You might provide a JSON Schema to
>>> specify this behavior.
>> 
>> There goes the generic decoder.
>> 
>> (There is nothing wrong with using a “schema” to specify upconversion
>> from the JSON data model to your application data model.  What I’m
>> objecting to is getting rid of JSON and replacing it with a gazillion
>> dialects that happen to be syntactically compatible with JSON but the
>> semantics of which are controlled by a “schema” mechanism.)

There’s scarcely a completely generic decoder in existence. They should be, but they’ve typically decided it’s easier to make assumptions of one sort or another, especially around numbers.

> 
> +1.
> 
> Nico
> --