Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)

Tim Bray <tbray@textuality.com> Wed, 04 June 2014 20:36 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A24FB1A02F0 for <json@ietfa.amsl.com>; Wed, 4 Jun 2014 13:36:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.977
X-Spam-Level:
X-Spam-Status: No, score=-0.977 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_BACKHAIR_22=1, RCVD_IN_DNSWL_LOW=-0.7] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vHWvbutCq785 for <json@ietfa.amsl.com>; Wed, 4 Jun 2014 13:36:46 -0700 (PDT)
Received: from mail-ve0-f181.google.com (mail-ve0-f181.google.com [209.85.128.181]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2CBC41A0024 for <json@ietf.org>; Wed, 4 Jun 2014 13:36:46 -0700 (PDT)
Received: by mail-ve0-f181.google.com with SMTP id pa12so51170veb.12 for <json@ietf.org>; Wed, 04 Jun 2014 13:36:39 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=X0axjuEf1b60zVrgalcYZCMvHYX5xK7Z1z8dQqLiuns=; b=kNRFUFaEZLwmbsmgyGI7yFvU2T9qQoyPyOGBaBCe7jQQ7tHUPNlEHkvBBCzUNBgcKP dMFN6TbP591tXV94epd9p/RTafQ7miDI/1DHE2qpazcEu7e0RDxzH37BFE2gXdyzSXLO KKubIsI9nPxdyy+Si6ekbp8JvTQFqsV0mwYp554Fjy9sxLCb8sM6DNb0kAv6kSX8Sd0d FhPUqn5c/E0nI2HfH9Ib43s6GP1txotJQG+SbagUnX+uO2GRMCHFJm2laozWV5o0tFQp BYMDgFTON2Iilokf21CRmTLQMV0ZJZ7OCt0Hs9HDk+051DNVO5X6AQemBiB1BpGiu38T J3Hg==
X-Gm-Message-State: ALoCoQmZewuztotryQmJzvB3rRzYULE8lepU0Irbdz9ihkmup4oTE4nnhYYvXzqN22+xTlCZTD6K
X-Received: by 10.58.123.71 with SMTP id ly7mr46371314veb.11.1401914199431; Wed, 04 Jun 2014 13:36:39 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.220.98.73 with HTTP; Wed, 4 Jun 2014 13:36:19 -0700 (PDT)
X-Originating-IP: [24.85.103.37]
In-Reply-To: <CAK3OfOhjPZUXK6C0qSsQQZvOgR3Sv3SWpyH=qTuihuDC9uvXrA@mail.gmail.com>
References: <CAK3OfOidgk13ShPzpF-cxBHeg34s99CHs=bpY1rW-yBwnpPC-g@mail.gmail.com> <CAHBU6itr=ogxP4uoj57goEUSOCpsRx1AXVnW1NQwSTPxbbttkw@mail.gmail.com> <CAK3OfOhft+XJeMrg5rdY9E6fxAkJ2qsT3UHwu7zt=NEz2Q3XOQ@mail.gmail.com> <CAK3OfOhy-N0zjCVxtOMB8SqZEKceVvBz9Y6i0fo2W8i+gHKm4Q@mail.gmail.com> <CAK3OfOiQnLq29cv+kas3B8it-+82VmXvL3Rq1C5_767FDhBjRg@mail.gmail.com> <03CFAB3E-F4C6-4AE8-A501-8525376C4AA7@vpnc.org> <CAK3OfOja-17V391tTK91R98X8XQzd0iPnur2=oo4ii+MCOt+Rg@mail.gmail.com> <CFB42410.4EDDC%jhildebr@cisco.com> <CAMm+Lwime-=UQPu3t2ty05CZLb7xUMi9KGi31Xi2B7RNF5S3Og@mail.gmail.com> <CAK3OfOg_k4Ngq+z1pn4b+XRf0M1Hqx8qZ9BtW0sa8QQ+bjKJyA@mail.gmail.com> <084664DB-A55D-465E-8888-97BA0BB59637@vpnc.org> <CAHBU6itEph5GzB-P8bUUvUMopRNxcCE-16qys7ofhdmsDvpN4w@mail.gmail.com> <CAMm+LwjoeC1R4O2iCPo+RfUFn4Qca4zyytqa817ayH60mNaWLg@mail.gmail.com> <CAK3OfOhjPZUXK6C0qSsQQZvOgR3Sv3SWpyH=qTuihuDC9uvXrA@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
Date: Wed, 04 Jun 2014 13:36:19 -0700
Message-ID: <CAHBU6iu_Mrnd+yYpBcg9Yy7Xw4s9s-LCfk0f9_TJGXxz76c9wg@mail.gmail.com>
To: Nico Williams <nico@cryptonector.com>
Content-Type: multipart/alternative; boundary="089e0115f0a2dcf0b304fb089502"
Archived-At: http://mailarchive.ietf.org/arch/msg/json/zxPlsq0RmCy1wZZ1H1n5cy8z7Vw
Cc: Phillip Hallam-Baker <ietf@hallambaker.com>, Paul Hoffman <paul.hoffman@vpnc.org>, Joe Hildebrand Hildebrand <jhildebr@cisco.com>, IETF JSON WG <json@ietf.org>
Subject: Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Jun 2014 20:36:47 -0000

I totally don’t get why you want the LF involved.  I can’t see how it would
make any difference, could you explain more?


On Wed, Jun 4, 2014 at 1:23 PM, Nico Williams <nico@cryptonector.com> wrote:

> On Wed, Jun 4, 2014 at 1:01 PM, Phillip Hallam-Baker
> <ietf@hallambaker.com> wrote:
> > On Wed, Jun 4, 2014 at 1:54 PM, Tim Bray <tbray@textuality.com> wrote:
> >> Hah, I hadn’t realized that RS (U+001E, INFORMATION SEPARATOR TWO) was
> >> excluded.   OK, so the abnf for JSON-sequence becomes one of these two:
>
> Someone (Carsten?) proposed it earlier.  RS is, indeed, perfect for this.
>
> >>
> >> JSON-sequence = JSON-text *( %1e JSON-text )
> >> JSON-sequence = *( ws %1e JSON-text )
> >>
> >> Depending on whether you see the RS as an initiator or a separator.  I
> think
> >> I very slightly prefer the second.
>
> It has to be an initiator for append-write logfiles, as it then marks
> the end of a possibly incompletely-written text.  Otherwise you might
> lose the first text following an incompletely-written text.
>
> RS could also follow a text, but I prefer LF for this because it's
> friendly to line-oriented text tools.  It's harmless to have a
> trailing LF (since the JSON-text ABNF allows extra trailing whitespace
> anyways).
>
> > +1
> >
> > I prefer the 'strict writer, lose reader' approach here. And that is
> > not my usual stance.
>
> Is that in reference to my (2) or Tim's point?  Since you top-posted,
> and since Tim didn't propose a loose reader, I can't tell :)
>
> > The reason that I think readers need to be tolerant is that they
> > should be able to read log files after they have been 'damaged' by
> > tools that strip out the RS characters.
>
> Yes.  If you see something like:
>
> <text> <text> <text>
>
> it should parse, even though it should have been:
>
> RS<text>RS<text>RS<text>
>
> because, why not parse it?
>
> Thus my (2).
>
> To repeat myself, jq doesn't insist on any separator at parse time,
> except where the separator is needed to disambiguate.  E.g., if you
> have to texts consisting of numbers, or booleans, or null, then the
> parser can't parse them without a separator of some sort.
>
> > For example, lets say that I have some program that records every
> > transaction to a logfile and it is discovered that one of the
> > transactions was wrong and is corrupting the database. The simplest
> > solution is usually to take the log file, find the broken transaction,
> > edit it out and rebuild the data base.
>
> Or leave it in and just skip past it when you parse.
>
> > Given the quality of editing tools available on many machines, I don't
> > trust them to preserve non printing ASCII characters. Heck, the editor
> > on ubuntu can't even start in a root account without writing garbage
> > to the terminal.
> >
> > So readers should be tolerant.
>
> Right, which is one reason that I want the ABNF for writers to be:
>
>     sequence = RS JSON-text LF
>
> but parsers should be more liberal.
>
> I'll grant that if there's an RS present then one can use any kind of
> JSON parser to parse the sequence whereas otherwise only incremental
> and streaming JSON parsers can be used.  This is the one reason to
> require that RS always be written.
>
> Nico
> --
>



-- 
- Tim Bray (If you’d like to send me a private message, see
https://keybase.io/timbray)