Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)

Nico Williams <nico@cryptonector.com> Wed, 04 June 2014 20:23 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4BC591A03C9 for <json@ietfa.amsl.com>; Wed, 4 Jun 2014 13:23:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.044
X-Spam-Level:
X-Spam-Status: No, score=-0.044 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, IP_NOT_FRIENDLY=0.334, J_BACKHAIR_22=1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bK2p1RjXCgSk for <json@ietfa.amsl.com>; Wed, 4 Jun 2014 13:23:58 -0700 (PDT)
Received: from homiemail-a31.g.dreamhost.com (sub4.mail.dreamhost.com [69.163.253.135]) by ietfa.amsl.com (Postfix) with ESMTP id 751EE1A0339 for <json@ietf.org>; Wed, 4 Jun 2014 13:23:58 -0700 (PDT)
Received: from homiemail-a31.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a31.g.dreamhost.com (Postfix) with ESMTP id 7868B20202C for <json@ietf.org>; Wed, 4 Jun 2014 13:23:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type:content-transfer-encoding; s= cryptonector.com; bh=MrV7zkjbFSfHCWfKdpOl0XZ++/U=; b=BtXfOz/BzeO txhVFez0GECAEjc1kMuwpUCJHG2DhsuoRGqZwoUUnTVNeQo84KaW08p3vUnAkUm+ 6IDDYbJOb9Sts2XP4AX9dbWVMnTotOTsI+DNqNV8EaP/P0dOb0Y5d4Kxr+EYzvOC JQlT7+/Yz8+1uY9cD8f2NL1CyD0WlJC8=
Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a31.g.dreamhost.com (Postfix) with ESMTPSA id 4B144202018 for <json@ietf.org>; Wed, 4 Jun 2014 13:23:52 -0700 (PDT)
Received: by mail-ob0-f182.google.com with SMTP id wn1so22286obc.41 for <json@ietf.org>; Wed, 04 Jun 2014 13:23:51 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.182.97.97 with SMTP id dz1mr61672518obb.13.1401913431708; Wed, 04 Jun 2014 13:23:51 -0700 (PDT)
Received: by 10.182.234.4 with HTTP; Wed, 4 Jun 2014 13:23:51 -0700 (PDT)
In-Reply-To: <CAMm+LwjoeC1R4O2iCPo+RfUFn4Qca4zyytqa817ayH60mNaWLg@mail.gmail.com>
References: <CAK3OfOidgk13ShPzpF-cxBHeg34s99CHs=bpY1rW-yBwnpPC-g@mail.gmail.com> <CAHBU6itr=ogxP4uoj57goEUSOCpsRx1AXVnW1NQwSTPxbbttkw@mail.gmail.com> <CAK3OfOhft+XJeMrg5rdY9E6fxAkJ2qsT3UHwu7zt=NEz2Q3XOQ@mail.gmail.com> <CAK3OfOhy-N0zjCVxtOMB8SqZEKceVvBz9Y6i0fo2W8i+gHKm4Q@mail.gmail.com> <CAK3OfOiQnLq29cv+kas3B8it-+82VmXvL3Rq1C5_767FDhBjRg@mail.gmail.com> <03CFAB3E-F4C6-4AE8-A501-8525376C4AA7@vpnc.org> <CAK3OfOja-17V391tTK91R98X8XQzd0iPnur2=oo4ii+MCOt+Rg@mail.gmail.com> <CFB42410.4EDDC%jhildebr@cisco.com> <CAMm+Lwime-=UQPu3t2ty05CZLb7xUMi9KGi31Xi2B7RNF5S3Og@mail.gmail.com> <CAK3OfOg_k4Ngq+z1pn4b+XRf0M1Hqx8qZ9BtW0sa8QQ+bjKJyA@mail.gmail.com> <084664DB-A55D-465E-8888-97BA0BB59637@vpnc.org> <CAHBU6itEph5GzB-P8bUUvUMopRNxcCE-16qys7ofhdmsDvpN4w@mail.gmail.com> <CAMm+LwjoeC1R4O2iCPo+RfUFn4Qca4zyytqa817ayH60mNaWLg@mail.gmail.com>
Date: Wed, 04 Jun 2014 15:23:51 -0500
Message-ID: <CAK3OfOhjPZUXK6C0qSsQQZvOgR3Sv3SWpyH=qTuihuDC9uvXrA@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Phillip Hallam-Baker <ietf@hallambaker.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/json/mOnkg1DtTE1g_mUHUWueWcG1_yY
Cc: Tim Bray <tbray@textuality.com>, Paul Hoffman <paul.hoffman@vpnc.org>, Joe Hildebrand Hildebrand <jhildebr@cisco.com>, IETF JSON WG <json@ietf.org>
Subject: Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Jun 2014 20:23:59 -0000

On Wed, Jun 4, 2014 at 1:01 PM, Phillip Hallam-Baker
<ietf@hallambaker.com> wrote:
> On Wed, Jun 4, 2014 at 1:54 PM, Tim Bray <tbray@textuality.com> wrote:
>> Hah, I hadn’t realized that RS (U+001E, INFORMATION SEPARATOR TWO) was
>> excluded.   OK, so the abnf for JSON-sequence becomes one of these two:

Someone (Carsten?) proposed it earlier.  RS is, indeed, perfect for this.

>>
>> JSON-sequence = JSON-text *( %1e JSON-text )
>> JSON-sequence = *( ws %1e JSON-text )
>>
>> Depending on whether you see the RS as an initiator or a separator.  I think
>> I very slightly prefer the second.

It has to be an initiator for append-write logfiles, as it then marks
the end of a possibly incompletely-written text.  Otherwise you might
lose the first text following an incompletely-written text.

RS could also follow a text, but I prefer LF for this because it's
friendly to line-oriented text tools.  It's harmless to have a
trailing LF (since the JSON-text ABNF allows extra trailing whitespace
anyways).

> +1
>
> I prefer the 'strict writer, lose reader' approach here. And that is
> not my usual stance.

Is that in reference to my (2) or Tim's point?  Since you top-posted,
and since Tim didn't propose a loose reader, I can't tell :)

> The reason that I think readers need to be tolerant is that they
> should be able to read log files after they have been 'damaged' by
> tools that strip out the RS characters.

Yes.  If you see something like:

<text> <text> <text>

it should parse, even though it should have been:

RS<text>RS<text>RS<text>

because, why not parse it?

Thus my (2).

To repeat myself, jq doesn't insist on any separator at parse time,
except where the separator is needed to disambiguate.  E.g., if you
have to texts consisting of numbers, or booleans, or null, then the
parser can't parse them without a separator of some sort.

> For example, lets say that I have some program that records every
> transaction to a logfile and it is discovered that one of the
> transactions was wrong and is corrupting the database. The simplest
> solution is usually to take the log file, find the broken transaction,
> edit it out and rebuild the data base.

Or leave it in and just skip past it when you parse.

> Given the quality of editing tools available on many machines, I don't
> trust them to preserve non printing ASCII characters. Heck, the editor
> on ubuntu can't even start in a root account without writing garbage
> to the terminal.
>
> So readers should be tolerant.

Right, which is one reason that I want the ABNF for writers to be:

    sequence = RS JSON-text LF

but parsers should be more liberal.

I'll grant that if there's an RS present then one can use any kind of
JSON parser to parse the sequence whereas otherwise only incremental
and streaming JSON parsers can be used.  This is the one reason to
require that RS always be written.

Nico
--