Re: [Json] Regarding JSON text sequence ambiguities (Re: serializing sequences of JSON values)

Nico Williams <> Mon, 10 March 2014 22:12 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 8C36A1A04B0 for <>; Mon, 10 Mar 2014 15:12:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.044
X-Spam-Status: No, score=-1.044 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, IP_NOT_FRIENDLY=0.334] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9NS7pwVAyPSW for <>; Mon, 10 Mar 2014 15:12:04 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 420CD1A024B for <>; Mon, 10 Mar 2014 15:12:04 -0700 (PDT)
Received: from (localhost []) by (Postfix) with ESMTP id E5DB32005D107 for <>; Mon, 10 Mar 2014 15:11:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed;; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type;; bh=dAU+OrFr7LOQ3rQaaIBJ YO8f0GU=; b=mMlttTpdrhgUPR/CDVAvN/VEFVwjZbkpqsar3UW7K1Vg6wdbAHhK O9jEoLCyWbcKJrEmLcrSitsR2FlZLcJT6Sqk7E2ApXGIq8EV8ISLtPuVRAj4K39e lj6YVa8HEHOpKQKZ3SuKj5LN05kRmVV1MI/xafzx3Pw9Y3rXJANb4og=
Received: from ( []) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id 9804E2005D106 for <>; Mon, 10 Mar 2014 15:11:58 -0700 (PDT)
Received: by with SMTP id f8so93375wiw.12 for <>; Mon, 10 Mar 2014 15:11:57 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=q6US/vGMnJrnO2KE8AmLIeKcbTCVSRa5mZmFSBFHqI8=; b=NX72dQbXYvZ5UCTUypT9tpbIPht7fl9ATIbH32PIdUi8sqF0wBKjFf7evZMpcLLAn9 icf/04aS94+FiJ7Sw7vm5GySAYjmozak406O52T5QVKmdgyUJ7dkNUD/CwDIpal3oMT1 9SOgD9FZciODSe/P9u/2fpvbzmcVGqwleplxjNQV74ySSSLL5wZoa810kvCVePk9ExOf Mm3nEGvNaO6/YA7C+7NF0dIvMbt7aU8pSrxIg5kGmdy0Pc/Xqn0bqgK5NXZCJ2vlgwAv HXFFIC0lWhTG6NfcJr/iroNcsc4ck/NWapETh3ncOUPaO/2ZWB5qzxqNeTJBDDsY1Ird gnHw==
MIME-Version: 1.0
X-Received: by with SMTP id m8mr243208wij.42.1394489516987; Mon, 10 Mar 2014 15:11:56 -0700 (PDT)
Received: by with HTTP; Mon, 10 Mar 2014 15:11:56 -0700 (PDT)
In-Reply-To: <>
References: <>
Date: Mon, 10 Mar 2014 17:11:56 -0500
Message-ID: <>
From: Nico Williams <>
To: Paul Hoffman <>
Content-Type: text/plain; charset=UTF-8
Cc: "" <>
Subject: Re: [Json] Regarding JSON text sequence ambiguities (Re: serializing sequences of JSON values)
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 10 Mar 2014 22:12:05 -0000

Consider jq ( as an example.

jq can and will read a sequence of JSON texts from stdin.  That
sequence isn't an array.  jq applies its program to each input.  jq
produces as its output all the values -encoded as JSON texts- output
by the jq program.  This is exceedingly convenient, including for log
files (for the reason that you give: log files are an indeterminate
length, unending append-only sequence).

jq handles the ambiguity by requiring top-level values to be
"terminated" by any of: unambiguous top-level values, whitespace, or
EOF.  Thus the sequence of null, true, false, 1, and 2, requires
whitespace between every value.  While the sequence of true, "foo",
false, "bar", 1, "foobar", and 2 doesn't.  An input that looks like
"truefalsenull12" elicits a parse error.

Note that the jq parser does NOT in fact parse multiple top-level
values; it parses at most one top-level value.  Instead the program
using the parser feeds input bytes to the parser until the parser
finds the end of a top-level value and outputs it.  Remaining unparsed
bytes are buffered, of course so that when the program adds more input
bytes the parser can be restarted to parse the next top-level value.

This does mean that the jq parser will not output null when fed 'null'
until one more byte _or_ EOF are fed to it.  But if fed '[1,2]' then
the parser emits the parsed array value immediately when the closing
bracket is parsed, without waiting for further inputs.

jq always outputs a newline (though it could be space or tab) after
outputting any top-level value's JSON text encoding.  It does so
precisely to avoid these ambiguities.

(This is also why jq must continue to output a newline or other
whitespace after every output text, except, perhaps, when in raw
output mode, in which case the outputs aren't JSON texts.)