Re: [Json] Another problematic JSON Schema use-case

John Cowan <cowan@mercury.ccil.org> Thu, 26 May 2016 17:37 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A7C5812D7EA for <json@ietfa.amsl.com>; Thu, 26 May 2016 10:37:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.027
X-Spam-Level:
X-Spam-Status: No, score=-4.027 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ra5UORmjRw4S for <json@ietfa.amsl.com>; Thu, 26 May 2016 10:37:26 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B3C2012D7CF for <json@ietf.org>; Thu, 26 May 2016 10:37:25 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1b5zDk-0002Ap-8R; Thu, 26 May 2016 13:37:20 -0400
Date: Thu, 26 May 2016 13:37:20 -0400
From: John Cowan <cowan@mercury.ccil.org>
To: Phillip Hallam-Baker <ietf@hallambaker.com>
Message-ID: <20160526173719.GA19074@mercury.ccil.org>
References: <CAMm+Lwg2rWh0_gjXnSAEAvWtsMO1U3UiA8jsBzc+rRR6fiKcJg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAMm+Lwg2rWh0_gjXnSAEAvWtsMO1U3UiA8jsBzc+rRR6fiKcJg@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Archived-At: <http://mailarchive.ietf.org/arch/msg/json/0RkkLRedUZQsN5hF2GXfzMHx7XI>
Cc: Tim Bray <tbray@textuality.com>, "json@ietf.org" <json@ietf.org>, Austin William Wright <aaa@bzfx.net>, Andrew Newton <andy@hxr.us>
Subject: Re: [Json] Another problematic JSON Schema use-case
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 May 2016 17:37:27 -0000

Phillip Hallam-Baker scripsit:

> I have some very strong opinions on the advantages of strong typing.

Even if you accept the principle of strong typing, that is not conclusive
as to how many atomic types you have.  Most programming languages have
only one string type, or at most have a family of types varying only by
length or maximum length.  But it would be possible to declare the type
of a string using a regular expression, and Cobol approximates this with
its PICTURE types.

On the other hand, statically typed languages typically have multiple
types of numbers: C has 14, Java has 7, Pascal has 2, and Algol 68 has
a denumerably infinite number (short int, short short int, short short
short int, etc.) all statically distinct, though normally only a finite
number of distinct run-time representations.

If we stick to JSON's mutually exclusive atomic types of null, boolean,
number, string, we can devise a very simple schema language for static
typing only.  Most of the complexity comes from wanting to provide more
complex static types such as integer vs. non-integer, exact vs. inexact,
or highly constrained strings.

===========

Here's a sketch of a simple schema-by-example language:

The type 'any JSON number' is encoded by any JSON number.

The type 'any JSON boolean' is encoded by any JSON boolean.

The type 'any JSON string' is encoded by any JSON string not specially
mentioned here.

The type 'JSON array of fixed size with fixed types' is encoded by
a JSON array whose elements are the encodings of the fixed types.
This corresponds to a tuple in some statically typed languages.

The type 'JSON array of variable size with fixed element type' is encoded
by a JSON array whose first element encodes the element type and whose
second element is the string "...".  This corresponds to a list in some
statically typed languages.  It can be combined with the last rule to
allow arrays whose first few elements have different element types,
and any additional elements have the same type.

The type 'arbitrary JSON array' is encoded by the JSON string "[]".

The type 'JSON object with fixed keys' is encoded by a JSON object
whose keys are the fixed keys and whose values encode the types of their
corresponding keys.

The type 'JSON object with arbitrary keys' is encoded by the JSON string
"{}".

The type 'arbitrary JSON value' is encoded by the JSON string "*".

Open questions:

1) What about a JSON object with fixed keys that also allows arbitrary
additional keys?

2) What about null?  It can be treated as the sole inhabitant of a unique
atomic type encoded by a JSON null, or as something not mentioned in
schemas that is always allowed in place of any JSON value (like SQL null),
or as a replacement for arrays and objects only.

-- 
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
                if if = then then then = else else else = if;