[Json] JSON text sequences and semi-online parsing (Re: Regarding JSON text sequence ambiguities (...)
Nico Williams <nico@cryptonector.com> Thu, 13 March 2014 23:14 UTC
Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com
(Postfix) with ESMTP id 96FBA1A0764 for <json@ietfa.amsl.com>;
Thu, 13 Mar 2014 16:14:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.044
X-Spam-Level:
X-Spam-Status: No,
score=-1.044 tagged_above=-999 required=5 tests=[BAYES_00=-1.9,
DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622,
IP_NOT_FRIENDLY=0.334] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com
[127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 94gj7JblLIRA for
<json@ietfa.amsl.com>; Thu, 13 Mar 2014 16:14:11 -0700 (PDT)
Received: from homiemail-a34.g.dreamhost.com (agjbgdcfdbge.dreamhost.com
[69.163.253.164]) by ietfa.amsl.com (Postfix) with ESMTP id EC18C1A06EE for
<json@ietf.org>; Thu, 13 Mar 2014 16:14:10 -0700 (PDT)
Received: from homiemail-a34.g.dreamhost.com (localhost [127.0.0.1]) by
homiemail-a34.g.dreamhost.com (Postfix) with ESMTP id 7825210070 for
<json@ietf.org>; Thu, 13 Mar 2014 16:14:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=
mime-version:date:message-id:subject:from:to:cc:content-type:
content-transfer-encoding; s=cryptonector.com;
bh=nkckGzKWme+Rcx MLXFFecwrwXzw=;
b=w8x5r2jwr8cocB9Qc0Ecz/agOIjetWicylX36qwkZ5Ud/S
2ahaReZ+xepAOW4nMzwhZRSNCLY0UlPI4PBrW11zPf2xUxBeN/Ka+ldiA6tpXfYV
NQNTWaN/TN7AVJAElJu9pycXOnHlyMLVzowX440rYhVOkcZcXYAnwWvbaERno=
Received: from mail-we0-f175.google.com (mail-we0-f175.google.com
[74.125.82.175]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client
certificate requested) (Authenticated sender: nico@cryptonector.com) by
homiemail-a34.g.dreamhost.com (Postfix) with ESMTPSA id 2BADC1005D for
<json@ietf.org>; Thu, 13 Mar 2014 16:14:04 -0700 (PDT)
Received: by mail-we0-f175.google.com with SMTP id q58so1479773wes.34 for
<json@ietf.org>; Thu, 13 Mar 2014 16:14:03 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
s=20130820;
h=mime-version:date:message-id:subject:from:to:cc:content-type
:content-transfer-encoding; bh=ZYmzPuFFiT7ipMBMQXhbxtnpmHg30RVD8Al1Glsr3AI=;
b=ASfDKqQFNHtJJZk1BOEe+dUiow6cVcxZYpME2PxHVYcV0fAXrznBpuktwjjou4npdz
M6fBnMjrKp6Ti0La6AVAfX18p2F/VZfm7UgGxOKYxVpiC6CtyOD2NyXoG7bVEENRgHEE
lM2Y6gBEmW1j8OaD38DiynB7HET7OZUL//i+80gpQRY767m8gQk5KubtnGNSLQRtb2DZ
zTSCs1+a3WkoVkXE1rn9W3hd/TGp0U0zE7yuNqQq+u0LfzGVYqiHXQ/r1HYmQa8+Ywid
zWhnvYQkT3YDprDz2A5vJ/zyhl7tsIaY7hJUMTUVG6YY/o6KUimMIrCYReY1SrTF7KKM lrxg==
MIME-Version: 1.0
X-Received: by 10.194.60.37 with SMTP id e5mr3781894wjr.32.1394752443167;
Thu, 13 Mar 2014 16:14:03 -0700 (PDT)
Received: by 10.216.199.6 with HTTP; Thu, 13 Mar 2014 16:14:03 -0700 (PDT)
Date: Thu, 13 Mar 2014 18:14:03 -0500
Message-ID: <CAK3OfOjT5YW85G9j_JYz=-X=xxoPj_CNj5=mwpKDNr_ypbmQ5w@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Tim Bray <tbray@textuality.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/json/0oS47212lqAEAIFLB_AXPouCPqk
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>,
Matthew Morley <matt@mpcm.com>, Larry Masinter <masinter@adobe.com>
Subject: [Json] JSON text sequences and semi-online parsing (Re: Regarding
JSON text sequence ambiguities (...)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>,
<mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>,
<mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Mar 2014 23:14:14 -0000
On Thu, Mar 13, 2014 at 4:43 PM, Tim Bray <tbray@textuality.com> wrote: > I’m halfway convinced that it might be useful to say that if you are > persisting something as chunks of JSON, use X as a separator. I wouldn’t be But they aren't chunks. They are proper, fully-formed JSON texts. > surprised to see the advent of standardized log-as-JSON libraries. I’d go Neither would I. > further and say that if you are doing this, use objects too. Having said I'm going to write a separate reply to this. Your insistence on top-level objectness deserves a complete treatment. > that, it’s not obvious that this is the kind of format that you’d > interchange much, so the case for a media type is weak. Let's talk about not-online, online, and what I'll call semi-online parsing. Consider jq once more. At this time jq's parser handles incremental parsing, but is not online. This means that it can parse a top-level array of 10e6 values, say, incrementally but it won't produce any values until the final ']' is read. But jq gladly reads JSON text sequences, producing each top-level value as it is read. Therefore jq can handle JSON text sequences in a semi-online way: online for the sequence, not online for each element. A semi-online parsing approach actually happens to be a very useful way to deal with logs and databases, where you know you're dealing with a sequence of things, possibly of indeterminate size, possibly infinite (tail -f ...), but where each element of the sequence is generally expected to be small. Not-at-all-online parsing simply does not work in these cases. But fully-online parsing is a real PITA, since the parser then produces outputs like [<path>, <leaf value>] or worse, and one is likely to aggregate results that form discrete entities, usually one or two levels down from the top-level. Now, it should be possible to build a semi-online JSON parser that is online only for top-level arrays. But if you have a not-online JSON parser you can just use JSON text sequences to build a semi-online JSON text sequence parser. I.e., it's trivial to implement semi-online parsing of JSON text sequences with any off-the-shelf JSON text parser. That's *huge* win because not-online JSON parsers are a dime a dozen, and a semi-online parse is pretty much the sweet spot for online vs. simple. This is exactly what jq does, as described above. Nico --
- [Json] JSON text sequences and semi-online parsin… Nico Williams