Re: [Json] A minimal examplotron-style JSON validation language.

Nico Williams <nico@cryptonector.com> Wed, 29 May 2019 20:24 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AFAB5120150 for <json@ietfa.amsl.com>; Wed, 29 May 2019 13:24:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dq-tH_2MSOr6 for <json@ietfa.amsl.com>; Wed, 29 May 2019 13:24:01 -0700 (PDT)
Received: from purple.birch.relay.mailchannels.net (purple.birch.relay.mailchannels.net [23.83.209.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 790FE12001B for <json@ietf.org>; Wed, 29 May 2019 13:24:00 -0700 (PDT)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 590562C250A; Wed, 29 May 2019 20:23:59 +0000 (UTC)
Received: from pdx1-sub0-mail-a68.g.dreamhost.com (100-96-89-88.trex.outbound.svc.cluster.local [100.96.89.88]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id CCFB22C2C42; Wed, 29 May 2019 20:23:57 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a68.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.17.2); Wed, 29 May 2019 20:23:59 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Daffy-White: 0939173d6d9a9800_1559161438326_635740150
X-MC-Loop-Signature: 1559161438326:3919572260
X-MC-Ingress-Time: 1559161438325
Received: from pdx1-sub0-mail-a68.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a68.g.dreamhost.com (Postfix) with ESMTP id AB0F27FB90; Wed, 29 May 2019 13:23:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=jwNU0Z0t+9zXTD ip5YL23EROnzo=; b=xm5+nCBNbs7pbTCPHXZ6jxl/qU/EusmkWdQrmyPawMH9m/ KHxjfIhgx/WAdbWUrsoJA7TH6l84CSETjOdh/O39YwG9JxlUxYAdPHvs7nf5wOsi Zv0SILd5adFLBddf5AKdQxBnqtS/qSqKb+Daz8+R3Fi+vGgQl++ggy8W4xKAU=
Received: from localhost (unknown [24.28.108.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a68.g.dreamhost.com (Postfix) with ESMTPSA id F25397FB89; Wed, 29 May 2019 13:23:52 -0700 (PDT)
Date: Wed, 29 May 2019 15:17:17 -0500
X-DH-BACKEND: pdx1-sub0-mail-a68
From: Nico Williams <nico@cryptonector.com>
To: John Cowan <cowan@ccil.org>
Cc: Tim Bray <tbray@textuality.com>, JSON WG <json@ietf.org>, Carsten Bormann <cabo@tzi.org>, Ulysse Carion <ulysse@segment.com>, Rob Sayre <sayrer@gmail.com>
Message-ID: <20190529201716.GD11773@localhost>
References: <8224451C-F21B-41E5-A834-A9005050CB1F@tzi.org> <CAJK=1RjdYD6TZCNrw=H3d9ZLKLxZZOwVCOYYPwfbP+1ETDDz1Q@mail.gmail.com> <11CDA7F6-30BB-40E4-8926-2EDCBCFD785B@tzi.org> <CAHBU6iv8ZsFM5yco5gi+gcyU8d=u3bOSgiKaF6-hv-GARgNh9w@mail.gmail.com> <CAChr6SwNvG4Z7TKUxAVeH7HMVWiPsEBNb12K9zVkjaGt2_v0fw@mail.gmail.com> <CAHBU6ivTD_v7L-wQ+P9TmSfBY=5N+k-caaZ0TZhg6yZ_SWR_aA@mail.gmail.com> <CAChr6SzD8qdETafQKKU41BcYayTWf+C4GENd9FNzy5JYOv5jRQ@mail.gmail.com> <CAHBU6isx5aB94U-vn_t6GGoQ9W+ATDNYR6_+CtXgOhFho5Qh-g@mail.gmail.com> <20190529144005.GC11773@localhost> <CAD2gp_QELt-3=wqA1gRafNim8Y6fsxZ6hcQmTsoOxCSxU8eM1Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAD2gp_QELt-3=wqA1gRafNim8Y6fsxZ6hcQmTsoOxCSxU8eM1Q@mail.gmail.com>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-VR-OUT-STATUS: OK
X-VR-OUT-SCORE: -100
X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduuddruddvjedgudehudcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucggtfgfnhhsuhgsshgtrhhisggvpdfftffgtefojffquffvnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpeffhffvuffkfhggtggujggfsehttdertddtredvnecuhfhrohhmpefpihgtohcuhghilhhlihgrmhhsuceonhhitghosegtrhihphhtohhnvggtthhorhdrtghomheqnecuffhomhgrihhnpehjshhonhdrohhrghenucfkphepvdegrddvkedruddtkedrudekfeenucfrrghrrghmpehmohguvgepshhmthhppdhhvghloheplhhotggrlhhhohhsthdpihhnvghtpedvgedrvdekrddutdekrddukeefpdhrvghtuhhrnhdqphgrthhhpefpihgtohcuhghilhhlihgrmhhsuceonhhitghosegtrhihphhtohhnvggtthhorhdrtghomheqpdhmrghilhhfrhhomhepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomhdpnhhrtghpthhtohepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomhenucevlhhushhtvghrufhiiigvpedt
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/8rhE-xf1uWjJkV6VT1GwGdsrdAE>
Subject: Re: [Json] A minimal examplotron-style JSON validation language.
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 May 2019 20:24:03 -0000

On Wed, May 29, 2019 at 03:14:43PM -0400, John Cowan wrote:
> The trouble is that "validation" is a very vague term.  I would be happy
> with a language that can only express simple type validation, minimally as
> follows:

Validation, to me, means making sure that the "shape" of a value adheres
to the "shape" given by the schema.

Examples:

 - if, where you expect an array you see a boolean, that's an error

 - where you expect a numeric value you might have some constraints
   (e.g., range constraints) that, again, should raise an error if not
   met

 - ditto strings ('must be bas64', 'must be one of these (enum)', 'must
   match some regexp', etc.).

That's it for validation.

Now, validation mostly should only happen at decode time.

In a language like JS (or jq) you might first parse the whole JSON text
and then validate its "shape".

> 0) JSON values are divided into types, namely null, numbers, strings,
> booleans, object types, and arrays of any of these.
> 
> 1) An object type specifies what attributes the corresponding object has
> and what the types of the corresponding values are, or else specifies that
> it may have arbitrary keys and values.   It may have a name in the schema
> language
> 
> 2) When an array is expected, the validator will check that all its
> contents have the same specified type.
> 
> 3) There is a way to specify for a given object type whether it must
> contain exactly the specified keys, may contain a superset of the specified
> keys where the additional keys are not checked, or may contain a subset of
> the specified keys.

So far it sounds a lot like ASN.1.

> Here is a very simple examplotron language:
> 
> Sample document from json.org:
> 
> {"menu": {
>   "id": "file",
>   "value": "File",
>   "popup": {
>     "menuitem": [
>       {"value": "New", "onclick": "CreateNewDoc()"},
>       {"value": "Open", "onclick": "OpenDoc()"},
>       {"value": "Close", "onclick": "CloseDoc()"}
>     ]
>   }
> }}
> 
> Corresponding schema (constructed by hand, may have errors):

I would prefer a schema language built around defining types, where
user-defined types have to be a) JSON types with b) optional
constraints.

Regarding (a), we're going to need a discriminated union, something JSON
doesn't have as such, but obviously is trivial to build by convention.

Also, using JSON itself as the language for the schema is nice because
you don't need to build a partser for the schema.  But it's not very
nice for users, so I'd prefer to have some non-JSON syntax for the
schema.

My counter:

  top_level : choice(file_menu, edit_menu);
  file_menu : object{"menu"->file_menu_contents};
  edit_menu : object{"edit_menu"->edit_menu_contents};
  file_menu_contents := object{
    "id":"file", /* this means this must be present with this value */
    "value":"File",
    "popup": object { ... }
  };

You can see the above has a hint of ASN.1 flavor, greatly simplified and
somewhat transformed to be sure.

I don't care one bit about the specifics of the syntax other than: it
should be easy to build parsers for it.

But I do very much like the idea of a schema language built around
defining types.

Nico
--