Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)

"Manger, James" <James.H.Manger@team.telstra.com> Fri, 06 June 2014 01:34 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8C3A31A0382 for <json@ietfa.amsl.com>; Thu, 5 Jun 2014 18:34:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.202
X-Spam-Level:
X-Spam-Status: No, score=-0.202 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, RCVD_IN_DNSWL_NONE=-0.0001, RELAY_IS_203=0.994] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sjbHK4R8C2c6 for <json@ietfa.amsl.com>; Thu, 5 Jun 2014 18:34:17 -0700 (PDT)
Received: from ipxcvo.tcif.telstra.com.au (ipxcvo.tcif.telstra.com.au [203.35.135.208]) by ietfa.amsl.com (Postfix) with ESMTP id E3B0F1A0380 for <json@ietf.org>; Thu, 5 Jun 2014 18:34:15 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.98,985,1392123600"; d="scan'208";a="17042384"
Received: from unknown (HELO ipccvi.tcif.telstra.com.au) ([10.97.217.208]) by ipocvi.tcif.telstra.com.au with ESMTP; 06 Jun 2014 11:21:36 +1000
X-IronPort-AV: E=McAfee;i="5600,1067,7460"; a="226969657"
Received: from wsmsg3754.srv.dir.telstra.com ([172.49.40.198]) by ipccvi.tcif.telstra.com.au with ESMTP; 06 Jun 2014 11:34:08 +1000
Received: from WSMSG3153V.srv.dir.telstra.com ([172.49.40.159]) by WSMSG3754.srv.dir.telstra.com ([172.49.40.198]) with mapi; Fri, 6 Jun 2014 11:34:07 +1000
From: "Manger, James" <James.H.Manger@team.telstra.com>
To: Nico Williams <nico@cryptonector.com>, John Cowan <cowan@mercury.ccil.org>
Date: Fri, 06 Jun 2014 11:34:05 +1000
Thread-Topic: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)
Thread-Index: Ac+BF4XJ7OjEcLu8Sxq+tJv0SK2lhQAB+ttw
Message-ID: <255B9BB34FB7D647A506DC292726F6E11546B223C9@WSMSG3153V.srv.dir.telstra.com>
References: <CAK3OfOja-17V391tTK91R98X8XQzd0iPnur2=oo4ii+MCOt+Rg@mail.gmail.com> <CFB42410.4EDDC%jhildebr@cisco.com> <CAMm+Lwime-=UQPu3t2ty05CZLb7xUMi9KGi31Xi2B7RNF5S3Og@mail.gmail.com> <CAK3OfOg_k4Ngq+z1pn4b+XRf0M1Hqx8qZ9BtW0sa8QQ+bjKJyA@mail.gmail.com> <084664DB-A55D-465E-8888-97BA0BB59637@vpnc.org> <CAHBU6itEph5GzB-P8bUUvUMopRNxcCE-16qys7ofhdmsDvpN4w@mail.gmail.com> <CAMm+LwjoeC1R4O2iCPo+RfUFn4Qca4zyytqa817ayH60mNaWLg@mail.gmail.com> <CAK3OfOhjPZUXK6C0qSsQQZvOgR3Sv3SWpyH=qTuihuDC9uvXrA@mail.gmail.com> <255B9BB34FB7D647A506DC292726F6E11546B21D22@WSMSG3153V.srv.dir.telstra.com> <93018E84-581D-4B75-8B58-6BFAD27D8EE3@vpnc.org> <20140605231607.GE15558@mercury.ccil.org> <CAK3OfOj6NQJcyNLZbsZ1vyXOD4dQSgPj+V-6smN_HQazw1=b=Q@mail.gmail.com>
In-Reply-To: <CAK3OfOj6NQJcyNLZbsZ1vyXOD4dQSgPj+V-6smN_HQazw1=b=Q@mail.gmail.com>
Accept-Language: en-US, en-AU
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US, en-AU
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/json/PuxOdFPufxW5kcCWvjNWH2sNAWQ
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, IETF JSON WG <json@ietf.org>
Subject: Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jun 2014 01:34:18 -0000

> Both require escaping in strings.  The difference is that LF can appear as
> insignificant whitespace, whereas RS cannot.

Correct.

>  So to achieve the same thing
> as RS achieves (restartable/correctable writes), it's simply necessary never
> to spit out an insignificant LF.

Incorrect.
It is not *necessary* to omit insignificant LF.
It is just *easier* for parsers if every LF terminates a single JSON value (and is never other insignificant whitespace).

With LF as the terminator but also allowed as insignificant whitespace, <endchar> LF <startchar> unambiguously separates JSON values.
This allows you to restart from anywhere in the middle of a sequence.
This allows you to restart after one value is corrupted or truncated.
If you have had some corruption (so we are already in a niche case) and you really don't want to miss the very next value, you are still ok when values are objects or arrays (as log entries will invariably be). The end of the next (non-corrupted) value is found as normal (<endchar> LF <startchar>), then look back for the matching { or [ that started it.

In my judgement it is worth accepting a little extra complexity for JSON sequence parsers (eg having to look for <endchar> LF <startchar>) to make it much easier for JSON sequence producers (no awkward control characters, can be pretty-printed, can be cut-n-paste, can be manually edited, can cat multiple files together, can wrap long lines...).

--
James Manger