[apps-discuss] JSON Schema considered harmful

Phillip Hallam-Baker <hallam@gmail.com> Wed, 19 September 2012 17:30 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id CC1F021F8746 for <apps-discuss@ietfa.amsl.com>; Wed, 19 Sep 2012 10:30:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.963
X-Spam-Status: No, score=-3.963 tagged_above=-999 required=5 tests=[AWL=-0.365, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id zc8H7uirGUaf for <apps-discuss@ietfa.amsl.com>; Wed, 19 Sep 2012 10:30:33 -0700 (PDT)
Received: from mail-oa0-f44.google.com (mail-oa0-f44.google.com []) by ietfa.amsl.com (Postfix) with ESMTP id 8E39821F8745 for <apps-discuss@ietf.org>; Wed, 19 Sep 2012 10:30:33 -0700 (PDT)
Received: by oagn5 with SMTP id n5so804269oag.31 for <apps-discuss@ietf.org>; Wed, 19 Sep 2012 10:30:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=y0ASv6FyYxP1u2ehZeUpq9dYFKxIJJnHocRYZ/o+2cY=; b=bHndkoEYbNCZ1nHSkiZar3ziuvCT8t5o3fQ/rKT0njdNvzLqOT3QfcwA2g2rT9CITR ex3JOxHXrL3AGd4czU0ygeS3dBYe+/GR69Owmi8F3PKqmVu2PPmns76wRSEHkhq5Adw2 ik9E7q60ZQcplT3gBz8wqzkgYed3b0yjA/4mYA7HNhUlTuHfmjBUfcNMu5k5eZu3ib1U JiZn5CMbycyV796ItlVaBJUCKHqCL5yupubNSW2uuzGDYcqqd8xcZFOifhr/AhGzT9tT Sw8xKpWXS9ab30Fo2104k7inG4LmDM31ngbDayV60Csk96lsWsIqEC9tQad5uhUZzRgZ vmNg==
MIME-Version: 1.0
Received: by with SMTP id i8mr3538422oeh.26.1348075833232; Wed, 19 Sep 2012 10:30:33 -0700 (PDT)
Received: by with HTTP; Wed, 19 Sep 2012 10:30:33 -0700 (PDT)
Date: Wed, 19 Sep 2012 13:30:33 -0400
Message-ID: <CAMm+LwjYj0gd3Cxjj8WFcLy-zgBwfVDCPaRGcNSgOHD9m_07yw@mail.gmail.com>
From: Phillip Hallam-Baker <hallam@gmail.com>
To: Francis Galiegue <fgaliegue@gmail.com>
Content-Type: multipart/alternative; boundary="e89a8fb1f8062b8e6604ca115d3b"
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Subject: [apps-discuss] JSON Schema considered harmful
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 17:30:34 -0000

One of the biggest mistakes that was made in the XML world was to allow one
group of authors to develop a specification and call it 'XML Schema'. I
think we are at risk of making the same mistake here.

The more I hear about this effort, the less I like it. The approach seems
to be needlessly complex for a start. There should be no need to have a
pointer structure in a modern data modelling language, none. C# and Java
don't use pointers, so why are they needed in JSON Schema?

I think people should be allowed to go off and develop a spec but they
should not be allowed to grab a false sense of authority through an
inappropriate name grab. That had bad results in the XML world and I can't
see this effort working out any better.

Using a schema to develop a protocol is a very good idea but each and every
schema language I have had to use has been botched. XML schema has two
separate type systems. ASN.1 schema is arcane. We must not let that happen
to JSON.

It should be possible to have a schema language that is essentially
language neutral and encoding neutral. Pretty much every programming
language in current use has the same set of intrinsic types (bytes, int16,
int32, int64, strings (unicode), chars (unicode), real32, real64, boolean),
and at least one form of enumerated collection (arrays, lists, sets). For
protocol design it is very useful to add in standard representations for
Binary and DateTime intrinsict types as base64 encoded strings are easier
to deal with than arrays of decimal bytes and we already have an IETF
string representation for time.

It should be possible to use a schema to map a data structure that can be
represented conveniently in any sensible modern programming language (i.e.
anything from Perl to Java, C# but not necessarily FORTRAN, COBOL or the
like) to at least a subset of any sensible encoding (JSON, XML, ASN.1)

Where the XML Schema effort went wrong was they tried to support every
feature of DTDs and they gave the schema designer the ability to map a data
structure onto multiple different encodings. Which is pretty weird. Making
standards is the process of making design choices that don't matter.
Providing multiple ways to serialize the same data structure is a harmful

At any rate I suggest that either

1) the authors of JSON Schema start using a different name or
2) Everyone else with ideas for a schema for JSON busilly pollute the name
space by introducing their own plans named JSON Schema until the authors
back down.

Since there is no consensus that JSON needs a schema, I can't see how there
can be a consensus for one particular proposal.

In fact I don't think JSON does need a schema either. We need a schema for
describing protocol data structures that can be reified as JSON or other
data formats.

On Wed, Sep 19, 2012 at 1:05 PM, Francis Galiegue <fgaliegue@gmail.com>wrote:

> On Wed, Sep 19, 2012 at 6:59 PM, Phillip Hallam-Baker <hallam@gmail.com>
> wrote:
> [...]
> >
> >
> > Just where is JSON Pointer intended to be used? Who is using it?
> >
> JSON Schema, for instance, even though only implicitly for now. The
> next version will mandate JSON Reference support and, as such, JSON
> Pointer (which is the fragment part of a JSON Reference).
> >
> > Pretty much every modern programming language has adopted a notation that
> > uses dots for member extraction and square braces for array indices. I
> don't
> > see a particularly good reason to invent another way to extract
> information
> > from a data structure.
> >
> > In C# and Java the notation would be x.y[1].
> >
> Again, programming languages do not enter the picture. What is wanted
> is an unambiguous way to address any part of a JSON value.
> Oh, and "." is a valid object member name. And so is "/" for that
> matter. And so is "", and so is "\x00".
> [...]
> >
> > I can't see why I would want to have a different syntax to extract
> elements
> > after a data object has been JSON encoded. What is the value of such a
> > standard meant to be?
> >
> Language-proof, future-proof, covers all corner cases. If these are
> not arguments, I don't know what is.
> Oh, and the fact that it can be used in URI fragments.
> > This looks like an attempt to reinvent a wheel that has already been
> tried
> > and tested.
> >
> Tried and tested _by a few subset of programming languages_.
> --
> Francis Galiegue, fgaliegue@gmail.com
> JSON Schema: https://github.com/json-schema
> "It seems obvious [...] that at least some 'business intelligence'
> tools invest so much intelligence on the business side that they have
> nothing left for generating SQL queries" (Stéphane Faroult, in "The
> Art of SQL", ISBN 0-596-00894-5)

Website: http://hallambaker.com/