Re: [Json] JSON by example

Phillip Hallam-Baker <ietf@hallambaker.com> Thu, 26 May 2016 21:57 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D441E12D526 for <json@ietfa.amsl.com>; Thu, 26 May 2016 14:57:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.4
X-Spam-Level:
X-Spam-Status: No, score=-2.4 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.198, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9l_Bp-Cxw21o for <json@ietfa.amsl.com>; Thu, 26 May 2016 14:57:52 -0700 (PDT)
Received: from mail-qk0-x22e.google.com (mail-qk0-x22e.google.com [IPv6:2607:f8b0:400d:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6207D12D13D for <json@ietf.org>; Thu, 26 May 2016 14:57:52 -0700 (PDT)
Received: by mail-qk0-x22e.google.com with SMTP id x7so67535894qkd.3 for <json@ietf.org>; Thu, 26 May 2016 14:57:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=+2W6wC/6NHUFKjfM0+J2NQXLwteLO88UGfk/r+S7gso=; b=osZIJXSND/eXbH43mNGO5+2Mrp1lMp1yd6P9tK6++HtK31R27qCJjAcrnwcWlxZptw /elfir3GOP9vk/dYczDhZGsraBTuk2V0IIa9iE8GitLzgE5CeENOmY1qiWma4GBQ7xOl +iHus3d4L90i7ZjWTSqmj5UuvEZNqRZT98L1KTTwlMIlu9N2nAWlYC+Oz7nKRHUY795Q eHy+TFbPYKWUCQOL3GTrBFN9o0G4zugyFz9EPmBQzxC8onvvXEsNJ1Jd2jkJkI+C3Fdq dnKxIYtozoMbVTLqLjsVdFw2McGkUpkrlpoBni5v639KJODLSckoyBVGF6x3nkP8nr8M caNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=+2W6wC/6NHUFKjfM0+J2NQXLwteLO88UGfk/r+S7gso=; b=NKjEMiizt/vWSxliE1jBsaudS8o5oZ3SSEWrgvNHlTjQPobjeJS3DSQpAbo4ReKOCy J11HUgsdJECY6K3FNNx6AeI+K74fWeggCLHCkc81Kw9bE4Bo4b3oe+aNVpUG4uLTuLBG TrI4qFctZSWINjhqOyxDVOk0hkKCfXgxAlcaHAHe3urS7Irs8V7ZbDmDEa/3XHqtY5gb Yg107BMNe9S9zTqIGAdUxiocibTJdYq7jkx5+xfv9eP3BBzKXmP1nrjRHaStcz9aRH0j MDuhX0M850SA3N4OurhMORZ6eVwBIR3NIwJ4/eqTMJy46B2xVpRoz/4ppCD79SmL5RIC tGgQ==
X-Gm-Message-State: ALyK8tJGTU61BedwTVapNBvaiDG0Am9WoEzhLsxzdUHg6249yIL3zfxxNbf/69DSwaXrr/oYr5rDLleu7QlRoQ==
MIME-Version: 1.0
X-Received: by 10.55.114.71 with SMTP id n68mr11308216qkc.37.1464299871443; Thu, 26 May 2016 14:57:51 -0700 (PDT)
Sender: hallam@gmail.com
Received: by 10.55.25.85 with HTTP; Thu, 26 May 2016 14:57:51 -0700 (PDT)
In-Reply-To: <20160526205033.GD19074@mercury.ccil.org>
References: <CAMm+Lwg2rWh0_gjXnSAEAvWtsMO1U3UiA8jsBzc+rRR6fiKcJg@mail.gmail.com> <20160526173719.GA19074@mercury.ccil.org> <CAMm+Lwgzdw5gBdBvmA6aHQ55mcHXnQzabrfdA4DObtyReauuWA@mail.gmail.com> <20160526205033.GD19074@mercury.ccil.org>
Date: Thu, 26 May 2016 17:57:51 -0400
X-Google-Sender-Auth: uHyIUrQZ_sVLUGQxx61dxLjmSv8
Message-ID: <CAMm+LwgqS4kac_0M4c+E2+SmsSBM+bekMcTiTMb-pVD_G=OFDg@mail.gmail.com>
From: Phillip Hallam-Baker <ietf@hallambaker.com>
To: John Cowan <cowan@mercury.ccil.org>
Content-Type: multipart/alternative; boundary="001a114fef8aaea6e00533c5e139"
Archived-At: <http://mailarchive.ietf.org/arch/msg/json/fJ8kgffvV32dJk1VGqHqvdxuwcY>
Cc: Tim Bray <tbray@textuality.com>, "json@ietf.org" <json@ietf.org>, Austin William Wright <aaa@bzfx.net>, Andrew Newton <andy@hxr.us>
Subject: Re: [Json] JSON by example
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 May 2016 21:57:55 -0000

On Thu, May 26, 2016 at 4:50 PM, John Cowan <cowan@mercury.ccil.org> wrote:

> Phillip Hallam-Baker scripsit:
>
> > 'Wirthless'
>
> "You may call me by value, and call me 'Worth', or you may call me by
> name, and call me 'Veert'."  --Wirth on how to say his name.
>

The nickname I used was given him by someone just as well known in that
crowd :)



> > Yes, real numbers and integers are probably things you want to be able
> > to treat differently in code because it is very very rare to need or
> > want a real64 number in a protocol.
>
> Okay, an integer encodes the type "JSON number which is an integer",
> and a non-integer encodes the type "JSON number".  Note that this might
> be represented as a Java BigDecimal or equivalent.
>

Yes, there are cases where distinguishing representations is useful. This
does not include big nums for crypto in my view, I would prefer those to be
in hex or base64.

I do think it useful to distinguish integers and reals at the schema level
as the treatment is very different. In fact I do that but I have never ever
used that code. If you were doing scientific stuff, being able to represent
numbers as fractions and such like Wolfram does would be very desirable.



> > But I think that you absolutely want to be able to put binary blobs of
> > data in a stream and distinguish them from strings.
>
> Good idea.  If the string is "deadbeef", it encodes the type of binary
> blobs represented in JSON as a string. :-)
>

Yes, there are a small number of intrinsic types that we use in protocols.
It is slightly larger than JSON gives us but not very much larger.
Specifically:

* Boolean, Integer, String
* Binary  (Base64 (data) in string)
* DateTime (IETF time format in string)
* URI (in string)
* Label (String)
* Reference (string)
* Token (basically a reference but does not require the label to be defined)

I have parser generator tools for pretty much every IETF format (ASN.1, TLS
Schema, XML, DNS, JSON, RFC822). Those are the only ones I have ever found
a need for. (counting OIDs as a subset of URI).



> > That is an interesting one because it comes down to how do you want
> > extensibility to work. In particular, when do you want adding things
> > to be backwards compatible and when do you want to cause things to
> > halt rather than have an application try to act on data it did not
> > fully understand?
>
> I think you want to be able to choose, but I'm not sure how to represent
> it within the JSON-by-example format.  The obvious way is to reserve a
> key like "...", but I haven't wanted to do that.
>

The way that I am doing it now is that I support both as follows:

* Fields can be added to any data structure. These are simply ignored if
the parser does not understand them.

* If the field label changes, the semantics are different and so the
decoder should abort.


My encodings always give the type of an object as a field name. So if I
have a message Hello, the C#, JSON encoding are:

class Hello {
    string Param1, Param2; }

{ "Hello" : { "Param1" : "Value1", "Param2" : "Value2" } }

If you change the protocol so that you can say who you are saying hello to,
this is backwards compatible :

{ "Hello" : { "Param1" : "Value1", "Param2" : "Value2", "To" : "Bob" } }

But if we want to add in a completely different message that replaces
Hello, we use a different tag:

{ "Hello2" : { "Param1" : "Value1", "Param2" : "Value2" } }

This approach makes it very easy to ensure that an interpreter that does
not understand Hello2 does not attempt to do the right thing and get it
wrong.


This works very well and doesn't give the problems that people are getting
wrapped around their axle when they try to use type fields:

{ "Param1" : "Value1", "Param2" : "Value2", "Type" : "Hello" } }

Why should anyone expect "Type" to be more important than the other fields?
It really doesn't make any sense. It also comes in last which means that if
we have a long message and the type field comes last we have to read the
whole thing before we try to make any sense of it.

*Object type should always be specified in a key, never a value.*



> This is actually a little problematic as in JSON any slot can hold
> > null while many languages don't allow integers or booleans to be null.
>
> True.  We can say then that null is always permitted wherever the schema
> allows an array or object.
>

That isn't necessarily what you want though. If you are using a schema,
there are actually four possibilities for a boolean slot 'fred':

true  { "fred" : true }
false  { "fred" : false }
null  { "fred" : null }
empty  {  }

Now you could in theory write a protocol which distinguished between the
four values but I don't think that is very useful at all. I don't think it
is practical to rely on a difference between empty and null.

You could also argue that omitting a value is equivalent to false. But that
is a case where I think you actually should be required to declare it.