Re: [Json] Regarding JSON text sequence ambiguities (Re: serializing sequences of JSON values)

Nico Williams <nico@cryptonector.com> Tue, 11 March 2014 18:59 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8E0471A07B7 for <json@ietfa.amsl.com>; Tue, 11 Mar 2014 11:59:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.044
X-Spam-Level:
X-Spam-Status: No, score=-1.044 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, IP_NOT_FRIENDLY=0.334] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PtSsGRor0Eef for <json@ietfa.amsl.com>; Tue, 11 Mar 2014 11:59:29 -0700 (PDT)
Received: from homiemail-a103.g.dreamhost.com (agjbgdcfdbea.dreamhost.com [69.163.253.140]) by ietfa.amsl.com (Postfix) with ESMTP id 6DA861A077D for <json@ietf.org>; Tue, 11 Mar 2014 11:59:29 -0700 (PDT)
Received: from homiemail-a103.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a103.g.dreamhost.com (Postfix) with ESMTP id C38E12005D108 for <json@ietf.org>; Tue, 11 Mar 2014 11:59:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type:content-transfer-encoding; s= cryptonector.com; bh=tR1gTfJw9+mb97AREI4wT9dB+X8=; b=pJ6Om+jAbK0 wPiiwVPrpY/BvomYNOn/lOOX9zha2heKlCY9VuEYCX0ZrDZKtSshfIR1qnxD4SSs 2diZMlKvPeT1FC7k0VHQdgjbuRXiX7BhgB2cEwVCgHb/iP+vFNVCekuWliiGkV15 cbsieC0u77yd12u3AyaIFA//GQUHEgaU=
Received: from mail-we0-f179.google.com (mail-we0-f179.google.com [74.125.82.179]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a103.g.dreamhost.com (Postfix) with ESMTPSA id 776992005D106 for <json@ietf.org>; Tue, 11 Mar 2014 11:59:23 -0700 (PDT)
Received: by mail-we0-f179.google.com with SMTP id x48so10217135wes.24 for <json@ietf.org>; Tue, 11 Mar 2014 11:59:22 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=5tLPOvQaOlQqZn1+WjkkYxohj41M+XZTpEWKt9yAk7g=; b=Y0YGw5f0SZqGxN/V4WnaSPfr+lWtsqDGj4ir/vtsq1XcL1N/u38P5V0L71/kWKkjEi myn67ecbWzk4A4td+ojP0CtiWRkKfA8qwu89RHKZ2nUWTu+7Gqo5p77QN/KP139gYBI8 c9goGlK/K1+v2MJ4Xtb+XkaBkwwyDvAjBm7Y9PcaMXSUYxpyu1MJydR95E01kJdQWBPV a4J8BxNljam67pEOTZ7nnuZv/1/fYCTZf+jdAmKCooospzOjZoqOE4SmAml2WZWqaXlI U3RsCFW1YVySlXK+qloTsmR0It9MnTUGLgO59nsfcfoQNRB9NCAzGuptwO0N1IHI9cNy cZ3A==
MIME-Version: 1.0
X-Received: by 10.180.102.42 with SMTP id fl10mr4438961wib.42.1394564362214; Tue, 11 Mar 2014 11:59:22 -0700 (PDT)
Received: by 10.216.199.6 with HTTP; Tue, 11 Mar 2014 11:59:22 -0700 (PDT)
In-Reply-To: <CAHBU6itdCdJE3t8gE=AOcOofaFORfJxZxo3ZqbpF9nTMv_CYaQ@mail.gmail.com>
References: <CAK3OfOj_XQJq-JKAjNdH-GuH0_UwZfeWntgyyizMpTLmSaWQoA@mail.gmail.com> <CAK3OfOio58+1yuxQOcvWep1CADMfE1PVC48XDid0dWvd8=SVjA@mail.gmail.com> <CAOXDeqoYb=NXz4ikMxAg3EHFA+903bFgdpR_BL-K18U2oYriXQ@mail.gmail.com> <CAK3OfOiPDfWpOZgExTmwwq6WFcuVbyi_z3C0=M9RhQveBhV_+w@mail.gmail.com> <CAHBU6iuRyRd95Wa_omGS1_T52t+s0AKjWPUW21EAh2ySHuFp=A@mail.gmail.com> <CAMm+LwjRA8x0=zXGRVDy0BqYvyOcEp7=gnUiG4vYOb1RScoyrA@mail.gmail.com> <CAK3OfOj1g_sbnhw9FBCCZtLWsFS5F+aoPX0d5AMkRxQ2fHQi0A@mail.gmail.com> <CAOXDeqpbSmEicxq_JzJa2iQDn8uJp3XkWp3FGbsbpg-_vgOiaQ@mail.gmail.com> <CAHBU6itdCdJE3t8gE=AOcOofaFORfJxZxo3ZqbpF9nTMv_CYaQ@mail.gmail.com>
Date: Tue, 11 Mar 2014 13:59:22 -0500
Message-ID: <CAK3OfOjSmofMAbXNEY5s-=wK0-7Av5=dOJd8Pc+uqKgt8oP4MA@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Tim Bray <tbray@textuality.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/json/Vm9Eg9-uHdrirOqth3Bj8xSvOO0
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, Phillip Hallam-Baker <hallam@gmail.com>, Matthew Morley <matt@mpcm.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Regarding JSON text sequence ambiguities (Re: serializing sequences of JSON values)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Mar 2014 18:59:30 -0000

On Tue, Mar 11, 2014 at 1:35 PM, Tim Bray <tbray@textuality.com> wrote:
> My assumption is that many (most? all?) JSON parsers can be altered to, when
> they encounter a JSON text containing an object, just stop reading when they
> hit the trailing “}”.  So my notion is that using such a parser you have a
> loop like

That's true of arrays as well.

It's also true of strings for any parser that permits them at the top-level.

It's only true, false, null, and numeric top-level values that require
a disambiguation separator.

Using a newline (or any other separator character or string) permits
the parsing of any JSON text sequences with unmodified parsers.  This
is especially true when the JSON texts are encoded with no newlines
and the separator is or has a newline.

> while !eof
>    obj = parser.ReadObjectAndStop() // parse error handling left as an
> exercise for the reader

The key is being able to parse one text, then the next.  If the texts
self-delimit (e.g., they contain no embedded newlines and are
terminated by a newline) then any JSON parser can be used to build a
JSON text sequence parser.

Even if a text contains a new line (e.g., after every element of an
array or after every name or value in an object), one could parse a
JSON text sequence even with the simplest JSON parsers (ones that have
no incremental parsing capability).

Thus I'd RECOMMEND that JSON text sequences consist of sequences of
JSON texts encoded with no embedded newlines, and I'd REQUIRE that
each text followed by a newline.

Then the general case (any parser, incremental or not) is as simple as:

    while read line; do
        parse "$line" ...
        ...
    done

> So yeah, requiring a SINGLE NEWLINE AND NOTHING ELSE would be simpler.  But

Indeed, it's the simplest: because every language/runtime has a "read
a line at a time" primitive.  If your texts have embedded newlines and
you have an incremental parser, you're still OK.

Heck, even if you don't have an incremental parser, you'd be OK,
though the result would be inefficient:

    buffer=
    while read line; do
        last_is_partial=true
        if [[ -z $buffer ]] && ! parse "$line"...; then
            buffer=$line
            continue
        elif [[ -n $buffer ]]; then
            buffer+=$line
            if ! parse "$buffer"...; then
                continue
            fi
        fi
        last_is_partial=false
        ...
    done
    if $last_is_partial; then
        # Report error
        ...
    fi

> if you’re going to allow any other non-significant white space characters,
> then why bother requiring that one of them be a newline?

Because reading lines is a common (universal?) primitive.  Sure, if
you have an evented app and an incremental JSON parser then you'll be
reading bytes till you get a complete entry, newline or not.

Nico
--