[Json] Regarding JSON text sequence ambiguities (Re: serializing sequences of JSON values)

Nico Williams <nico@cryptonector.com> Mon, 10 March 2014 22:06 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com []) by ietfa.amsl.com (Postfix) with ESMTP id 01BE01A058E for <json@ietfa.amsl.com>; Mon, 10 Mar 2014 15:06:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.044
X-Spam-Status: No, score=-1.044 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, IP_NOT_FRIENDLY=0.334] autolearn=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id aLY8V_y1weOb for <json@ietfa.amsl.com>; Mon, 10 Mar 2014 15:06:51 -0700 (PDT)
Received: from homiemail-a29.g.dreamhost.com (agjbgdcfdbfj.dreamhost.com []) by ietfa.amsl.com (Postfix) with ESMTP id D86731A0547 for <json@ietf.org>; Mon, 10 Mar 2014 15:06:50 -0700 (PDT)
Received: from homiemail-a29.g.dreamhost.com (localhost []) by homiemail-a29.g.dreamhost.com (Postfix) with ESMTP id 3DD9967406A for <json@ietf.org>; Mon, 10 Mar 2014 15:06:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:date:message-id:subject:from:to:cc:content-type; s= cryptonector.com; bh=O6te9IpROLNDH5y41ySoZ/iimv4=; b=k3BoBFxbSrG 6ThCqZHuDsD287btIiBe+6+n0rcPzv8Au852ZmPPQ1B30SqQ8sjwyIiWuPUmfCUs aam897sdqpzxkBPy0vP1vT9irnjr9PPx1rhhYNcSD3WPz2X8Ie5F9I/gCeqDgZVC EaTa0mGp4NAFa85ypKA79PP718EGcwrA=
Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com []) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a29.g.dreamhost.com (Postfix) with ESMTPSA id E01A7674058 for <json@ietf.org>; Mon, 10 Mar 2014 15:06:44 -0700 (PDT)
Received: by mail-wg0-f50.google.com with SMTP id x13so9423788wgg.33 for <json@ietf.org>; Mon, 10 Mar 2014 15:06:43 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=/QrHwzhIrpgnRxafwhm8vUH/qbnpm3aRlfdDzsr8DJA=; b=eoWCLZbv8k7r9Rw1PKOqNrDz8kvSMK7u5YMJZClg0VX2Lt3guopU91jOzxRzYY68i2 ExT0isy2N2tlXObwbnc6XPEzNRWoBm42rikHCXipox4O58h/Taz9UUdEp7o+OndJ6WX+ 5A+DCj0ac3ufef/pzHJnTGASHTQM4py5cdYLN3wg1cwHtBuQYXDk1YV5iwJzT0taDi9B 6T8pgF+5nNewwZpik3+YB/MBKJ8Zq0iSkTvyxZRHkAwF9yJMJ0pWKD7XmcYarHd6W0JT sPrO+0auk/ucu3kkxhXUpMnlL1N2msjRVNnIdvN3j2kRvx/EDjOyEbRJbMoPDWm6Zusq smzA==
MIME-Version: 1.0
X-Received: by with SMTP id fx9mr41010wjc.56.1394489203589; Mon, 10 Mar 2014 15:06:43 -0700 (PDT)
Received: by with HTTP; Mon, 10 Mar 2014 15:06:43 -0700 (PDT)
Date: Mon, 10 Mar 2014 17:06:43 -0500
Message-ID: <CAK3OfOj_XQJq-JKAjNdH-GuH0_UwZfeWntgyyizMpTLmSaWQoA@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>
Content-Type: text/plain; charset=UTF-8
Archived-At: http://mailarchive.ietf.org/arch/msg/json/4BrFdVdXhhGeNAwSdFoav6uPzcY
Cc: "json@ietf.org" <json@ietf.org>
Subject: [Json] Regarding JSON text sequence ambiguities (Re: serializing sequences of JSON values)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Mar 2014 22:06:53 -0000

On Mon, Mar 10, 2014 at 3:19 PM, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
> [ No hat ]
> On Mar 10, 2014, at 7:35 PM, Paul E. Jones <paulej@packetizer.com> wrote:
>> Why would sequences of objects not be preferred for logging.
> Because the log is always open for appending, so the sequence never terminates.


Other use cases relate to transferring large sequences of texts.  For
example, serializing a database: it may be easier to arrange for the
encoding of such a serialization as multiple top-level values than as
one large array (because, e.g., the encoder may not support online
functionality).  Yes, yes, the application could emit a '[' to start,
',' between top-level values, and a ']' at the end.  But if the
sequence is indeterminate in length (because it is an event log
including future events as they happen) then the parser on the other
end had better be an online (at least as to top-level array elements)
parser!  Sure the application could specially handle top-level arrays
to "explode" them in an online manner, but this still requires the
parser to identify exactly where in the input stream each array
element ends.

IOW, some use cases are easier to code as sequences of top-level values.

> This, and many other related topics, were discussed at length earlier on the mailing list and the rough consensus was that the current wording was sufficient.

Yes, though it's worth going into this a bit further.

There are two ambiguities then.  The values true, false, null, and
numeric values are all ambiguous unless separated by whitespace or
other unambiguous values (strings, arrays, or objects).  The second
ambiguity is between true/false/null/numeric values and EOF: a parser
fed 'true' cannot produce that value (or a parse error) until one more
byte is read _or_ EOF is detected.

This means that top-level non-array/object/string values should be
immediately _followed_ by whitespace when sequences of of such
top-level values are emitted.  Note: _followed_, not _preceded_ --
it's OK to also precede them with whitespace, but it is necessary to
follow them with whitespace in order to avoid the EOF/error ambiguity.