Re: [Json] JSON Schema Language

Phillip Hallam-Baker <ietf@hallambaker.com> Tue, 07 May 2019 16:46 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 73CA51201B4 for <json@ietfa.amsl.com>; Tue, 7 May 2019 09:46:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.648
X-Spam-Level:
X-Spam-Status: No, score=-1.648 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vE-y_pawLE1l for <json@ietfa.amsl.com>; Tue, 7 May 2019 09:46:14 -0700 (PDT)
Received: from mail-oi1-f175.google.com (mail-oi1-f175.google.com [209.85.167.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A3EE312018A for <json@ietf.org>; Tue, 7 May 2019 09:46:14 -0700 (PDT)
Received: by mail-oi1-f175.google.com with SMTP id j9so12149498oie.10 for <json@ietf.org>; Tue, 07 May 2019 09:46:14 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YYi2SOoNp3d4S5bqkygfhKlw6K1oAbOF/pZq0Lw5iiY=; b=hClAjqx3zzN4i+FVAfflq/OTTcaZosB39DqTGtWLnOMvtECe6Qx39JoNSKtKDiJ3aw T+IRPouaQNCcQtT8+PZs43U8WsFJrE1Ey5s6pV5Ldezvx2Ifd/mbndx9AhcgsG1pZu5s LxhyiKWvd8RKyd8QeY+n82HvvKNDmvhAguP7GLhPep4gfyjbkV10dwwCCKVMmuq94BCs I6wjis+wPpDvpGB4gs79ATcBYe3qbVGYGMrNWfYrwEyE/9SEIipMQuXLWaHlao4xyZa7 7ZbirITxlUwAo+0HhaUkvbWv+t3+to17rSrKOVTOe0v/pBeFKQYXc5sN07ro/Aadhdwq RUNA==
X-Gm-Message-State: APjAAAXvmvujyWwYmzG8vWXkhJ8O2kICD/riZ1FfkI7LFd6bMY/rhzGW yTI+dyQK4Oki2vAjUB77MjjVCRWBpAtPR8dc0clpHE8Fr0Q=
X-Google-Smtp-Source: APXvYqw6BsvRkKSuUH09eleNaTqykoCVUyeIKi7dLZQUIWFjQHqkaZsrdLSRAvvTq2K6O1WvnZwmOPF1aUactCtjvjE=
X-Received: by 2002:aca:c348:: with SMTP id t69mr803717oif.95.1557247573680; Tue, 07 May 2019 09:46:13 -0700 (PDT)
MIME-Version: 1.0
References: <CAJK=1RjV1uv0eOdtFZ8cKn-FfCwCiGP5r2hOz1UamiM6YV4H1A@mail.gmail.com> <CAHBU6itE8kub1qtdRoW8BqxaOmzMv=vUo1aDeuAr3HX141NUGg@mail.gmail.com>
In-Reply-To: <CAHBU6itE8kub1qtdRoW8BqxaOmzMv=vUo1aDeuAr3HX141NUGg@mail.gmail.com>
From: Phillip Hallam-Baker <ietf@hallambaker.com>
Date: Tue, 07 May 2019 12:46:03 -0400
Message-ID: <CAMm+Lwj1rVSCu=RKRconSwMWybP76f3NvF2LTxrz4QOk7z78vQ@mail.gmail.com>
To: Tim Bray <tbray@textuality.com>
Cc: Ulysse Carion <ulysse@segment.com>, JSON WG <json@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000074e7d505884ef47b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/jQvdNKK-iQTigVJa9GHp2xRoeWk>
Subject: Re: [Json] JSON Schema Language
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 May 2019 16:46:16 -0000

On Mon, May 6, 2019 at 6:38 PM Tim Bray <tbray@textuality.com> wrote:

> 1. I'm pretty sure that we need something better than what we have in the
> area of JSON schemas.  At least, I'm 100% sure that my job at Amazon Web
> Services would be easier, and our customer experiences would be more
> pleasant, if we had something.
>

I agree. And I think most others here agree. The problem is that when we go
down the 'schema' road, we tend to end up with schema languages that become
baroque and more hassle than they are worth.



> 2. One thing schemas are useful for is to syntax-check JSON texts that
> claim to conform to some language specification or another. Obviously no
> schema can ever completely satisfy this requirement - there are always
> things in specifications which are semantic and not addressable by schemas
> - but they can still be super useful.
>
> 3. Another thing they are useful for is for providing help to developers
> working in strongly typed programming languages. With a well-built schema
> it is reasonably straightforward to auto-generate nice idiomatic class
> declarations in modern programming languages, and also to build
> serializers/deserializers that will move data back and forth between JSON
> blobs and programming-language constructs, or fail in a clean deterministic
> way if the JSON fails to match the schema.
>

That is one reason I need a schema language. The other is that I want to
document the protocol design so that I can generate the reference manual
and the reference code from the same source.

This is NOT something I have seen in any proposal other than mine to date.
But it is the one I think most relevant to IETF purposes. For example, this
is a fragment of a schema I am working with right now:

Section 1 "Shared Classes"
Description
|The following classes are used as common elements in
|Mesh profile specifications.a


Structure HostEntry
Description
|Describes a current or pending connection to a Mesh account
String ID
Description
|Unique object instance identifier.

The description sections flow straight into my Internet Drafts.

This is not how I would do the same thing now. That would be more like:

Shared: Section 1 "Shared Classes"
// The following classes are used as common elements in
// Mesh profile specifications.

HostEntry: Class
// Describes a current or pending connection to a Mesh account
ID: String
//Unique object instance identifier.

In fact, I would probably get rid of the colons as they aren't needed by
the parser and are therefore clutter.

The point is that documentation and code should be integrated.


> I mostly fail to understand the debate about jq and integers and so on.
> Clearly, the following is a valid JSON text and will be parsed successfully
> by any JSON parser.
>
> {
>   "foo": 3.0
> }
>

It will be parsed successfully but the problem that comes up are that a
lexical analyzer may legitimately interpret 1.0 as a float and 1 as an
integer and so when the parse tree is traversed end up rejecting the data
as invalid. But that is only an issue if your schema validator doesn't know
how the parse tree handles numbers.


> I imagine that most schema-driven software would first deserialize it into
> a tree,
>

That isn't what I do. I parse directly to the memory data structures. I
don't need the tree structure.

That said, I am thinking of rewriting the code so that it does both at the
same time


probably something like Jackson ObjectMapper's JsonNode, and then apply
> schema constructs to the tree.   I would hope that a sane schema would
> accept this whether a top-level "foo" was required to be an integer or
> double or most other flavors of number, and reject it if "foo" was required
> to be a string or boolean.
>
> Put another way, no JSON schema spec can change the definition of what
> JSON is, or make the built-in type system anything but what it is.
>

But do we actually want a JSON schema spec or a general data schema spec
that supports JSON?

JSON meets pretty much all the requirements I have for writing protocols
except for representing binary data. The application that JSON does not
currently support is data representation because as things stand, floats do
not round trip.

One option would be to write a profile of JSON which ensures floats round
trip. But I can't see that being adopted consistently enough to get
traction. A better solution would be to introduce new float encodings which
do round trip. It wouldn't be JSON but is could use 95% of JSON, be 100%
backwards compatible reading old data and not corrupt data when writing the
new.

Point is that any spec that attempts to solve interop issues by declaring
it can never ever change will fail. Instead of there being one extension,
there will be many and we will end up in the Markdown situation where it
has taken ten years for the market to converge on a common set of tags. And
that mainly because GitHub chose one particular flavor and so
iPython/Jypityr did, etc.