Re: [Json] Regarding JSON text sequence ambiguities (Re: serializing sequences of JSON values)

Nico Williams <> Tue, 11 March 2014 00:15 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 9258C1A0670 for <>; Mon, 10 Mar 2014 17:15:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.044
X-Spam-Status: No, score=-1.044 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, IP_NOT_FRIENDLY=0.334] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id QP1zcA--3LFy for <>; Mon, 10 Mar 2014 17:15:13 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 7D0061A064C for <>; Mon, 10 Mar 2014 17:15:13 -0700 (PDT)
Received: from (localhost []) by (Postfix) with ESMTP id 228C6598060 for <>; Mon, 10 Mar 2014 17:15:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed;; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type;; bh=74QQt8WWiCC5EWcvLom9 UZNa9Jg=; b=ROBff57HZ7kA/exDEpgPZnZwcAYIHVOC/LZ0vButcJ7a3tj42thx 5WJQCm3R0MXKk4kYEXfUhPK9/UGdFIV0xMgzjkvl6LVCddyoypbp2+cyDcPDgfkz KSxnANJ5bjMfEORnIvF94kwGzqtU22EdmTQ5rO/d03OkJGLHet0sINo=
Received: from ( []) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id C2F5C598058 for <>; Mon, 10 Mar 2014 17:15:07 -0700 (PDT)
Received: by with SMTP id f8so177180wiw.12 for <>; Mon, 10 Mar 2014 17:15:06 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=41yGN56gCfAFEeffde1kOGj46y9LAEsm35ERYlKFqt4=; b=XGRG5WgmvzdgDVNHjPeqUwGvKmNWfU/4Pkm2jzIUfyA3JFMRo/SfUvYMiA9ezc9qCw pG8H24/66GoLU61apT1K2b0D1Z2SZzC0at+epPmBbOoS7sVARP593Y2EVtefiAvu7bT4 E7KWovX1tc67+sY8MC7gu/Nc4+D4AKJ7Yn+fdFY3JsWShDQ50F2vhxCGNIEmv0dI+MJF NZOxk13bvZ/YNWZgj+RK3Yyo1vecDs6zkSJdoGxeEGZ4/g4uU8t4Pfn9kgIYY1yLXMD2 Pw4rqPbj3Zj28JJ/wO4AYRR1PBerWv7QjDykeLDrYlyCSr/7HM54rdBs8xpvtw911aFu 33Lw==
MIME-Version: 1.0
X-Received: by with SMTP id m8mr562037wij.42.1394496906459; Mon, 10 Mar 2014 17:15:06 -0700 (PDT)
Received: by with HTTP; Mon, 10 Mar 2014 17:15:06 -0700 (PDT)
In-Reply-To: <>
References: <> <> <>
Date: Mon, 10 Mar 2014 19:15:06 -0500
Message-ID: <>
From: Nico Williams <>
To: John Cowan <>
Content-Type: text/plain; charset=UTF-8
Cc: Paul Hoffman <>, "" <>
Subject: Re: [Json] Regarding JSON text sequence ambiguities (Re: serializing sequences of JSON values)
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 11 Mar 2014 00:15:14 -0000

On Mon, Mar 10, 2014 at 6:59 PM, John Cowan <> wrote:
> Nico Williams scripsit:
>> jq handles the ambiguity by requiring top-level values to be
>> "terminated" by any of: unambiguous top-level values, whitespace, or
>> EOF.  Thus the sequence of null, true, false, 1, and 2, requires
>> whitespace between every value.
> Why is that?  The input "nulltruefalse1 2" would be unambiguous.

The input "[nulltrue]" is clearly an error.  The input "nulltrue"
could either be: an error, or null then true.  Since there's two
possible interpretations of "nulltrue" it is ambiguous.  But you're
right that one could take the position that the "e" in "true" and
"false" terminates those values, and the second "l" in "null
terminates that one, and that resolves that ambiguity.  But there's
still an ambiguity in an online parser: is online parsing of
"[nulltrue]" to produce an array element value of null followed by an
error, or is it to produce just the one error?

Given that we have to separate top-level numbers with whitespace
anyways, I think the safe position is that "nulltrue" and so on are
invalid and that when emitting top-level values other than
arrays/objects/strings (and even then anyways) the encoding
application should follow the encoded value with whitespace (a
newline, preferably).

(I say encoding application because I believe an encoder is not
defined in such a way that permits it to emit a sequence of more than
one top-level values in a single octet stream output.)