[Json] JSON text sequences and semi-online parsing (Re: Regarding JSON text sequence ambiguities (...)

Nico Williams <nico@cryptonector.com> Thu, 13 March 2014 23:14 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com []) by ietfa.amsl.com (Postfix) with ESMTP id 96FBA1A0764 for <json@ietfa.amsl.com>; Thu, 13 Mar 2014 16:14:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.044
X-Spam-Status: No, score=-1.044 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, IP_NOT_FRIENDLY=0.334] autolearn=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id 94gj7JblLIRA for <json@ietfa.amsl.com>; Thu, 13 Mar 2014 16:14:11 -0700 (PDT)
Received: from homiemail-a34.g.dreamhost.com (agjbgdcfdbge.dreamhost.com []) by ietfa.amsl.com (Postfix) with ESMTP id EC18C1A06EE for <json@ietf.org>; Thu, 13 Mar 2014 16:14:10 -0700 (PDT)
Received: from homiemail-a34.g.dreamhost.com (localhost []) by homiemail-a34.g.dreamhost.com (Postfix) with ESMTP id 7825210070 for <json@ietf.org>; Thu, 13 Mar 2014 16:14:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:date:message-id:subject:from:to:cc:content-type: content-transfer-encoding; s=cryptonector.com; bh=nkckGzKWme+Rcx MLXFFecwrwXzw=; b=w8x5r2jwr8cocB9Qc0Ecz/agOIjetWicylX36qwkZ5Ud/S 2ahaReZ+xepAOW4nMzwhZRSNCLY0UlPI4PBrW11zPf2xUxBeN/Ka+ldiA6tpXfYV NQNTWaN/TN7AVJAElJu9pycXOnHlyMLVzowX440rYhVOkcZcXYAnwWvbaERno=
Received: from mail-we0-f175.google.com (mail-we0-f175.google.com []) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a34.g.dreamhost.com (Postfix) with ESMTPSA id 2BADC1005D for <json@ietf.org>; Thu, 13 Mar 2014 16:14:04 -0700 (PDT)
Received: by mail-we0-f175.google.com with SMTP id q58so1479773wes.34 for <json@ietf.org>; Thu, 13 Mar 2014 16:14:03 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ZYmzPuFFiT7ipMBMQXhbxtnpmHg30RVD8Al1Glsr3AI=; b=ASfDKqQFNHtJJZk1BOEe+dUiow6cVcxZYpME2PxHVYcV0fAXrznBpuktwjjou4npdz M6fBnMjrKp6Ti0La6AVAfX18p2F/VZfm7UgGxOKYxVpiC6CtyOD2NyXoG7bVEENRgHEE lM2Y6gBEmW1j8OaD38DiynB7HET7OZUL//i+80gpQRY767m8gQk5KubtnGNSLQRtb2DZ zTSCs1+a3WkoVkXE1rn9W3hd/TGp0U0zE7yuNqQq+u0LfzGVYqiHXQ/r1HYmQa8+Ywid zWhnvYQkT3YDprDz2A5vJ/zyhl7tsIaY7hJUMTUVG6YY/o6KUimMIrCYReY1SrTF7KKM lrxg==
MIME-Version: 1.0
X-Received: by with SMTP id e5mr3781894wjr.32.1394752443167; Thu, 13 Mar 2014 16:14:03 -0700 (PDT)
Received: by with HTTP; Thu, 13 Mar 2014 16:14:03 -0700 (PDT)
Date: Thu, 13 Mar 2014 18:14:03 -0500
Message-ID: <CAK3OfOjT5YW85G9j_JYz=-X=xxoPj_CNj5=mwpKDNr_ypbmQ5w@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Tim Bray <tbray@textuality.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/json/0oS47212lqAEAIFLB_AXPouCPqk
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>, Matthew Morley <matt@mpcm.com>, Larry Masinter <masinter@adobe.com>
Subject: [Json] JSON text sequences and semi-online parsing (Re: Regarding JSON text sequence ambiguities (...)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2014 23:14:14 -0000

On Thu, Mar 13, 2014 at 4:43 PM, Tim Bray <tbray@textuality.com> wrote:
> I’m halfway convinced that it might be useful to say that if you are
> persisting something as chunks of JSON, use X as a separator.  I wouldn’t be

But they aren't chunks.  They are proper, fully-formed JSON texts.

> surprised to see the advent of standardized log-as-JSON libraries. I’d go

Neither would I.

> further and say that if you are doing this, use objects too.  Having said

I'm going to write a separate reply to this.  Your insistence on
top-level objectness deserves a complete treatment.

> that, it’s not obvious that this is the kind of format that you’d
> interchange much, so the case for a media type is weak.

Let's talk about not-online, online, and what I'll call semi-online parsing.

Consider jq once more.  At this time jq's parser handles incremental
parsing, but is not online.  This means that it can parse a top-level
array of 10e6 values, say, incrementally but it won't produce any
values until the final ']' is read.  But jq gladly reads JSON text
sequences, producing each top-level value as it is read.  Therefore jq
can handle JSON text sequences in a semi-online way: online for the
sequence, not online for each element.

A semi-online parsing approach actually happens to be a very useful
way to deal with logs and databases, where you know you're dealing
with a sequence of things, possibly of indeterminate size, possibly
infinite (tail -f ...), but where each element of the sequence is
generally expected to be small.

Not-at-all-online parsing simply does not work in these cases.  But
fully-online parsing is a real PITA, since the parser then produces
outputs like [<path>, <leaf value>] or worse, and one is likely to
aggregate results that form discrete entities, usually one or two
levels down from the top-level.

Now, it should be possible to build a semi-online JSON parser that is
online only for top-level arrays.  But if you have a not-online JSON
parser you can just use JSON text sequences to build a semi-online
JSON text sequence parser.

I.e., it's trivial to implement semi-online parsing of JSON text
sequences with any off-the-shelf JSON text parser.

That's *huge* win because not-online JSON parsers are a dime a dozen,
and a semi-online parse is pretty much the sweet spot for online vs.
simple.  This is exactly what jq does, as described above.