Re: [Json] Streaming JSON parsers

Stefan Drees <stefan@drees.name> Wed, 03 July 2013 05:44 UTC

Return-Path: <stefan@drees.name>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 09B8D21F9017 for <json@ietfa.amsl.com>; Tue, 2 Jul 2013 22:44:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.349
X-Spam-Level:
X-Spam-Status: No, score=-1.349 tagged_above=-999 required=5 tests=[AWL=-0.900, BAYES_00=-2.599, HELO_EQ_DE=0.35, J_CHICKENPOX_52=0.6, J_CHICKENPOX_54=0.6, J_CHICKENPOX_57=0.6]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DekNlgi25FpO for <json@ietfa.amsl.com>; Tue, 2 Jul 2013 22:44:21 -0700 (PDT)
Received: from mout.web.de (mout.web.de [212.227.15.3]) by ietfa.amsl.com (Postfix) with ESMTP id 73CD221F8F4A for <json@ietf.org>; Tue, 2 Jul 2013 22:44:20 -0700 (PDT)
Received: from newyork.local.box ([93.129.117.152]) by smtp.web.de (mrweb004) with ESMTPSA (Nemesis) id 0MFvnC-1Uz9g12tcs-00EvYM; Wed, 03 Jul 2013 07:44:00 +0200
Message-ID: <51D3BA1E.30401@drees.name>
Date: Wed, 03 Jul 2013 07:43:58 +0200
From: Stefan Drees <stefan@drees.name>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130620 Thunderbird/17.0.7
MIME-Version: 1.0
To: John Cowan <cowan@mercury.ccil.org>
References: <20130703030157.GR31347@mercury.ccil.org>
In-Reply-To: <20130703030157.GR31347@mercury.ccil.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K0:k7lPjidiqo8Jsxd9JzISSyOTU2clrnIuqXp8lUXxv3kaNW4UoPy obY3gN7y3CldrgVNQK1c0boY89MQCuiRmrTd6oQBE9ANNMz/yZ4wyoTWRs07ZreYIyGiKql RdWyZ/T5YF8ehRByfrYkqBNcSO1THiuZwfvo1lAVQkWdwFeFHjHXlZO8qMAhyUBmSKg97+s zPb8RQAstgzZCFVBEodkg==
Cc: json@ietf.org
Subject: Re: [Json] Streaming JSON parsers
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: stefan@drees.name
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jul 2013 05:44:26 -0000

On 2013-07-03 05:01 +02:00, John Cowan wrote:
> I was asked in private mail if there actually were any streaming JSON
> parsers.  I googled for that phrase, and immediately found that
> <http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json>
> mentions at least eight of them:  Jackson, Json-simple, Gson, YAJL,
> Clarinet, LitJSON, the JSON Streaming Parser for PHP, and Software
> Monkey's parser (the last two appear not to have names).  I have not
> verified this claim.
>
> It was suggested to me that list members may not be aware of this.
>

also while developing the version 4.0 Open Data specification at OASIS 
extra care is taken to ease the life for streaming JSON parsers, where 
the main notion for streaming there indicates at least stable ordering 
of names in objects and preferrably the semantically most useful 
ordering (for processing on the client side).

As OData defines a base protocol to deliver even high volume data 
(including navigateable metadata) this is important for most usage 
scenarios out in the wild.

The current second committee specification draft (going into public 
review this week) states eg. in section 2 "JSON Format Design"

"""
JSON, as described in [RFC4627], defines a text format for serializing 
structured data. Objects are serialized as an unordered collection of 
name-value pairs.

JSON does not define any semantics around the name/value pairs that make 
up an object, nor does it define an extensibility mechanism for adding 
control information to a payload.

OData’s JSON format extends JSON by defining general conventions for 
name-value pairs that annotate a JSON object, property or array. OData 
defines a set of canonical annotations for control information such as 
ids, types, and links, and custom annotations MAY be used to add 
domain-specific information to the payload.

[...]

To optimize streaming scenarios, there are a few restrictions that MAY 
be imposed on the sequence in which name/value pairs appear within JSON 
objects.
"""

And esp. in section 4.4 "Payload Ordering Constraints" this:
"""
Ordering constraints MAY be imposed on the JSON payload in order to 
support streaming scenarios. These ordering constraints MUST only be 
assumed if explicitly specified as some clients (and services) might not 
be able to control, or might not care about, the order of the JSON 
properties in the payload.

Clients can request that a JSON response conform to these ordering 
constraints by specifying a media type of application/json with the 
odata.streaming=true parameter in the Accept header or $format query 
option. Services MUST return 406 Not Acceptable if the client only 
requests streaming and the service does not support it.

Processors MUST only assume streaming support if it is explicitly 
indicated in the Content-Type header via the odata.streaming=true 
parameter.

   Example 3: a payload with

   Content-Type: 
application/json;odata.metadata=minimal;odata.streaming=true

   can be assumed to support streaming, whereas a payload with

   Content-Type: application/json;odata.metadata=minimal
   cannot be assumed to support streaming.

JSON producers are encouraged to follow the payload ordering constraints 
whenever possible (and include the odata.streaming=true content type 
parameter) to support the maximum set of client scenarios.

To support streaming scenarios the following payload ordering 
constraints have to be met:

* If present, the odata.context annotation MUST be the first property in 
the JSON object.

* The odata.type annotation, if present, MUST appear next in the JSON 
object.

* The odata.id and odata.etag annotations MUST appear before any 
property or property annotation.

* All annotations for a structural or navigation property MUST appear as 
a group immediately before the property they annotate. The one exception 
is the odata.nextlink annotation of an expanded collection which MAY 
appear after the navigation property it annotates.

* All other odata annotations can appear anywhere in the payload as long 
as they do not violate any of the above rules.

* Annotations for navigation properties MUST appear after all structural 
properties.
"""

URL: 
https://www.oasis-open.org/committees/document.php?document_id=49674&wg_abbrev=odata

I cited quite greedily because the URL points to a zip file containing a 
document in the OfficeXML format, which might not be the most directly 
accessible format for everyone on the list wanting to just skim through.

{"Stefan":true}