Re: [Json] Nudging the English-language vs. formalisms discussion forward

Phillip Hallam-Baker <> Wed, 19 February 2014 18:43 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id A1C8C1A0238 for <>; Wed, 19 Feb 2014 10:43:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id keQRxX8xiiRM for <>; Wed, 19 Feb 2014 10:43:38 -0800 (PST)
Received: from ( [IPv6:2a00:1450:4010:c04::22f]) by (Postfix) with ESMTP id 31C421A00FB for <>; Wed, 19 Feb 2014 10:43:37 -0800 (PST)
Received: by with SMTP id p9so587413lbv.6 for <>; Wed, 19 Feb 2014 10:43:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=qZnl2hSl3BSS3MYBHW2niZi71qdmJGr1iW6ZMjyAD5I=; b=l84m5DPX3LjFSfPgfIGU8L6wiezTUaSSPka+hZbaPYB8kHi2z6ugdV9jErkMm3001E rqWMRU7t4yhr/HKvbpJol0f3yXAQZ0kpK5pMsrSuyArUnJyn7odEAEqkR39VLqwPokgq BxRK8NaW4uDJwopYPeaRwLLfzb5hsTFZiuTiq+gSrOlmePzFzdolhnGENidnMI8O2zO0 tc8BbR4ni3DfTsLTP4s0hZ7CswoS2GdAZMqBejrdCDr+tqDJ0FQcbrHao+TFnRA741P0 Mafp0nW32JHOUUKyskDDaeUb/JlCEWjGHoILGhKckjDijFQnrJ22pbkc2ic2v1N1o6Fq YG5Q==
MIME-Version: 1.0
X-Received: by with SMTP id a8mr1870084lbv.68.1392835414080; Wed, 19 Feb 2014 10:43:34 -0800 (PST)
Received: by with HTTP; Wed, 19 Feb 2014 10:43:33 -0800 (PST)
In-Reply-To: <>
References: <> <> <> <> <> <>
Date: Wed, 19 Feb 2014 13:43:33 -0500
Message-ID: <>
From: Phillip Hallam-Baker <>
To: John Cowan <>
Content-Type: multipart/alternative; boundary=001a11c23e2e16433604f2c6c421
Cc: Tim Bray <>, Paul Hoffman <>, JSON WG <>
Subject: Re: [Json] Nudging the English-language vs. formalisms discussion forward
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 19 Feb 2014 18:43:41 -0000

On Wed, Feb 19, 2014 at 12:25 PM, John Cowan <> wrote:

> Tim Bray scripsit:
> > > The point is to focus the discussion on the data going over the wire
> > > rather than the syntax.
> >
> > Here is where we disagree absolutely. The point is to specify the syntax
> > clearly and unambiguously, and the semantics of its payload.  Trying to
> > come at it from the other direction (specifying data structures and then
> > extracting syntax) leads to huge mistakes like CORBA and WS-*.

They are only mistakes because they are huge. Microsoft and IBM wanted to
build a consulting business which meant that they liked a specification
that was overly complex.

The whole ProtoGen system is less than about 5,000 lines of code and it has
equivalent functionality to the WS-* stack.

XML isn't the farce that SGML was and JSON is even simpler than XML without
loss of power (though it does not make a very good document markup).

C# and Java are not the screwups that C++ was either. And C is not ADA.

The fact that the B-crew has botched a job in the past does not mean the
approach is wrong. The collapse of ADA did not prove that high level
languages were a bad idea. Though there were folk on USENet making that
argument at the time. My college tutor was known for having walked out of
the ADA design meetings but he still worked on many other programming
languages afterwards.

> I agree absolutely with Tim, if it wasn't clear before.  Semantics is vague
> and fuzzy (my semantics of your JSON may consist solely in counting the
> number of fields in the top-level object, for example).  Syntax is not.
> Agreement on syntax promotes interoperation; agreement on semantics
> takes forever, pushing syntax to the rear, thus tending to create bad
> syntax.

JSON has no semantics, All the JSON syntax has is a mapping to an implicit
data model.

Semantics can be defined very precisely and most modern computer science
courses teach methods of doing that Z, VDM and the rest are all very
precise ways of specifying the behavior of a protocol with as high a degree
of precision as LR(0) parsers allow for syntax.

What I have found is that making use of syntax to encode semantics is
always a mistake. For example we might have a specification that says that
a Date field can either be an RFC3337 DateTime or an integer giving the
offset in seconds from the current time. That is certainly possible but we
now require the parser to be able to recognize the format of the data and
select the representation appropriately. Maybe 95% of people will do that
right but at least 5% will do it wrong and if one of those implementations
becomes popular we end up with a specification that now has three different
cases to code, the two in the spec and the real life situation that isn't

Using different tags for "DateTime" (=String) and "DeltaTime" (=Integer)
avoids that whole business.

When people reach for Regular Expressions they are almost always doing
something that is better done without.