Re: [Json] A minimal examplotron-style JSON validation language.

Nico Williams <nico@cryptonector.com> Thu, 30 May 2019 18:28 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E7D331202A9 for <json@ietfa.amsl.com>; Thu, 30 May 2019 11:28:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5jrfYcS8Eylr for <json@ietfa.amsl.com>; Thu, 30 May 2019 11:28:30 -0700 (PDT)
Received: from gecko.birch.relay.mailchannels.net (gecko.birch.relay.mailchannels.net [23.83.209.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8450D1202A5 for <json@ietf.org>; Thu, 30 May 2019 11:28:30 -0700 (PDT)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 8DC315E31BF; Thu, 30 May 2019 18:28:29 +0000 (UTC)
Received: from pdx1-sub0-mail-a68.g.dreamhost.com (100-96-89-88.trex.outbound.svc.cluster.local [100.96.89.88]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 6ABAC5E31D9; Thu, 30 May 2019 18:28:28 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a68.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.17.2); Thu, 30 May 2019 18:28:29 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Trade-Well-Made: 43fb653d1762f605_1559240909339_3478523510
X-MC-Loop-Signature: 1559240909339:801285633
X-MC-Ingress-Time: 1559240909338
Received: from pdx1-sub0-mail-a68.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a68.g.dreamhost.com (Postfix) with ESMTP id 525247FC9D; Thu, 30 May 2019 11:28:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=UTHiRSI/SAvOQB v+BvXBG8EOAHQ=; b=QPiQT2WxOIAAZkaP6fvtlQl66FzZk781hvdw9RtEvNQ2IN WbhrCMI8G6nCulBE+T3FmCXHy098oD9QKr/dUuvysS/AKCNsYQE5Xj6E2/1SJafK iZCkN0qswJLHLullK9m0Ge+X8JkxKM/qpegGMk/Q2gTbvB1goCXtOK5L70DAw=
Received: from localhost (unknown [24.28.108.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a68.g.dreamhost.com (Postfix) with ESMTPSA id E81BF7FC84; Thu, 30 May 2019 11:28:20 -0700 (PDT)
Date: Thu, 30 May 2019 13:16:51 -0500
X-DH-BACKEND: pdx1-sub0-mail-a68
From: Nico Williams <nico@cryptonector.com>
To: John Cowan <cowan@ccil.org>
Cc: Tim Bray <tbray@textuality.com>, JSON WG <json@ietf.org>, Carsten Bormann <cabo@tzi.org>, Ulysse Carion <ulysse@segment.com>, Rob Sayre <sayrer@gmail.com>
Message-ID: <20190530181650.GE11773@localhost>
References: <11CDA7F6-30BB-40E4-8926-2EDCBCFD785B@tzi.org> <CAHBU6iv8ZsFM5yco5gi+gcyU8d=u3bOSgiKaF6-hv-GARgNh9w@mail.gmail.com> <CAChr6SwNvG4Z7TKUxAVeH7HMVWiPsEBNb12K9zVkjaGt2_v0fw@mail.gmail.com> <CAHBU6ivTD_v7L-wQ+P9TmSfBY=5N+k-caaZ0TZhg6yZ_SWR_aA@mail.gmail.com> <CAChr6SzD8qdETafQKKU41BcYayTWf+C4GENd9FNzy5JYOv5jRQ@mail.gmail.com> <CAHBU6isx5aB94U-vn_t6GGoQ9W+ATDNYR6_+CtXgOhFho5Qh-g@mail.gmail.com> <20190529144005.GC11773@localhost> <CAD2gp_QELt-3=wqA1gRafNim8Y6fsxZ6hcQmTsoOxCSxU8eM1Q@mail.gmail.com> <20190529201716.GD11773@localhost> <CAD2gp_RmOgea63LeVr36=++nkALa34cvqHbMKXqR7HyzEioBcg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAD2gp_RmOgea63LeVr36=++nkALa34cvqHbMKXqR7HyzEioBcg@mail.gmail.com>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-VR-OUT-STATUS: OK
X-VR-OUT-SCORE: -100
X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduuddruddvledguddviecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucggtfgfnhhsuhgsshgtrhhisggvpdfftffgtefojffquffvnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpeffhffvuffkfhggtggujggfsehttdertddtredvnecuhfhrohhmpefpihgtohcuhghilhhlihgrmhhsuceonhhitghosegtrhihphhtohhnvggtthhorhdrtghomheqnecukfhppedvgedrvdekrddutdekrddukeefnecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehlohgtrghlhhhoshhtpdhinhgvthepvdegrddvkedruddtkedrudekfedprhgvthhurhhnqdhprghthheppfhitghoucghihhllhhirghmshcuoehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmqedpmhgrihhlfhhrohhmpehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmpdhnrhgtphhtthhopehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmnecuvehluhhsthgvrhfuihiivgeptd
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/4AzUJUi16EctneAZLQ-Cf1c0RrY>
Subject: Re: [Json] A minimal examplotron-style JSON validation language.
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 May 2019 18:28:33 -0000

On Thu, May 30, 2019 at 08:53:50AM -0400, John Cowan wrote:
> On Wed, May 29, 2019 at 4:24 PM Nico Williams <nico@cryptonector.com> wrote:
>  - if, where you expect an array you see a boolean, that's an error
> >
> 
> Agreed,, because statically typed programming languages can't cope.

And dynamic typing sucks.

> The question is, where do you stop?  Range constraints are all very well,
> but suppose you want to constrain a number value to be a U.S shoe size,
> which is either an integer (in a certain range) or an integer plus 0.5?

You start with what you need and add new constraints as you discover
they would be useful.

> What is more, the valid values of one field often depend on the actual
> value of one or more other fields, and so you end up designing a whole
> (functional) programming language.

ASN.1 has SDL, but no one uses it.  It turns out that SEQUENCE OF/
SEQUENCE/CHOICE is enough to avoid having to add an actual programming
language to do much more validation, with "business logic" getting
written as natural language text in a spec.

Mind you, I wouldn't mind a language for this, but it's a lot of work to
expect the IETF to do, so I wouldn't bother.  If there were enough
interest (there won't be), I suppose we could take inspiration from jq
(which is a functional programming language built around JSON) and write
an RFC for it or a language inspired by it.

> But in general we expect JSON to be consumed by a program written
> in some such language anyway.  So we might as well decode all numbers

Exactly.

> as double floats, since you can only count on that much precision and
> range anyway, and leave subtypes of number to general-purpose validation.

I wasn't proposing that numerics get decoded as whatever the schema
says.  Rather, that if you have a stack that uses a generic JSON decoder
then a validation pass might validate and coerce numeric values to
integers/whatever as needed.

> By the same token, I don't see that it makes sense to put a whole
> regular expression subsystem into the validator when it can often
> be expressed much more clearly in code (especially in languages
> that have a combinator rather than a string representation of
> regular expressions).

It's perfectly fine for us to decide we don't need that feature.

> Actually, I'll amend my proposal to inclde integers, given how
> important they are.  This is easily accomplished by using, say,
> 0 (equivalently 0.0) to designate an integer and 0.5 to designate a float.)

Again, if we have consensus for such a contraint, we should include it,
and if not, not.  I was not making a detailed proposal, just proposing
that a schema language should be built around types and user-defined
types -- just like ASN.1.

> > Using JSON itself as the language for the schema is nice because
> > you don't need to build a partser for the schema.  But it's not very
> > nice for users, so I'd prefer to have some non-JSON syntax for the
> > schema.
> 
> I don't care about the syntax, but examplotron-style (in which a
> type designation is an instance of what it designates) is a strong
> design constraint, which is why I adopted it.  I have no problem
> with a BNF-style syntax for practical use, like RNG compact syntax.

The point about defining types is that you can express recursive
nesting, which is harder to do with by-example schemas.

Nico
--