Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)

Paul Hoffman <paul.hoffman@vpnc.org> Thu, 05 June 2014 17:01 UTC

Return-Path: <paul.hoffman@vpnc.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 44D361A017F for <json@ietfa.amsl.com>; Thu, 5 Jun 2014 10:01:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.347
X-Spam-Level:
X-Spam-Status: No, score=-1.347 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_MISMATCH_COM=0.553] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AUlSshksXeFT for <json@ietfa.amsl.com>; Thu, 5 Jun 2014 10:01:16 -0700 (PDT)
Received: from hoffman.proper.com (IPv6.Hoffman.Proper.COM [IPv6:2605:8e00:100:41::81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 99F0C1A0171 for <json@ietf.org>; Thu, 5 Jun 2014 10:01:16 -0700 (PDT)
Received: from [10.20.30.90] (50-1-51-90.dsl.dynamic.fusionbroadband.com [50.1.51.90]) (authenticated bits=0) by hoffman.proper.com (8.14.8/8.14.7) with ESMTP id s55H17Mw002944 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 5 Jun 2014 10:01:09 -0700 (MST) (envelope-from paul.hoffman@vpnc.org)
X-Authentication-Warning: hoffman.proper.com: Host 50-1-51-90.dsl.dynamic.fusionbroadband.com [50.1.51.90] claimed to be [10.20.30.90]
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.2\))
From: Paul Hoffman <paul.hoffman@vpnc.org>
In-Reply-To: <255B9BB34FB7D647A506DC292726F6E11546B21D22@WSMSG3153V.srv.dir.telstra.com>
Date: Thu, 05 Jun 2014 10:01:05 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <93018E84-581D-4B75-8B58-6BFAD27D8EE3@vpnc.org>
References: <CAK3OfOidgk13ShPzpF-cxBHeg34s99CHs=bpY1rW-yBwnpPC-g@mail.gmail.com> <CAHBU6itr=ogxP4uoj57goEUSOCpsRx1AXVnW1NQwSTPxbbttkw@mail.gmail.com> <CAK3OfOhft+XJeMrg5rdY9E6fxAkJ2qsT3UHwu7zt=NEz2Q3XOQ@mail.gmail.com> <CAK3OfOhy-N0zjCVxtOMB8SqZEKceVvBz9Y6i0fo2W8i+gHKm4Q@mail.gmail.com> <CAK3OfOiQnLq29cv+kas3B8it-+82VmXvL3Rq1C5_767FDhBjRg@mail.gmail.com> <03CFAB3E-F4C6-4AE8-A501-8525376C4AA7@vpnc.org> <CAK3OfOja-17V391tTK91R98X8XQzd0iPnur2=oo4ii+MCOt+Rg@mail.gmail.com> <CFB42410.4EDDC%jhildebr@cisco.com> <CAMm+Lwime-=UQPu3t2ty05CZLb7xUMi9KGi31Xi2B7RNF5S3Og@mail.gmail.com> <CAK3OfOg_k4Ngq+z1pn4b+XRf0M1Hqx8qZ9BtW0sa8QQ+bjKJyA@mail.gmail.com> <084664DB-A55D-465E-8888-97BA0BB59637@vpnc.org> <CAHBU6itEph5GzB-P8bUUvUMopRNxcCE-16qys7ofhdmsDvpN4w@mail.gmail.com> <CAMm+LwjoeC1R4O2iCPo+RfUFn4Qca4zyytqa817ayH60mNaWLg@mail.gmail.com> <CAK3OfOhjPZUXK6C0qSsQQZvOgR3Sv3SWpyH=qTuihuDC9uvXrA@mail.gmail.com> <255B9BB34FB7D647A506DC292726F6E11546B21D22@WSMSG3153V.srv.dir.telstra.com>
To: "Manger, James" <James.H.Manger@team.telstra.com>
X-Mailer: Apple Mail (2.1878.2)
Archived-At: http://mailarchive.ietf.org/arch/msg/json/ASx4GnVq5mKmtRsJg7NwrRGpZFg
Cc: IETF JSON WG <json@ietf.org>
Subject: Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jun 2014 17:01:18 -0000

<no hat>

As a summary from below: you prefer a more normal character like NL plus the need to escape it in strings, versus an obscure character like RS that requires no escaping. Is that correct?

On Jun 4, 2014, at 11:25 PM, Manger, James <James.H.Manger@team.telstra.com> wrote:

>>> JSON-sequence = *( ws %1e JSON-text )
> 
> RS as a JSON sequence prefix or separator was a bad idea when discussed a month ago and still is.
> 
> * You cannot (easily) enter an RS in notepad.
> * You cannot (easily) enter an RS in vi.
> * You cannot see an RS.

It seems like the purpose of draft-ietf-json-text-sequence is to create a format that can be used for log files and other such files that are constantly added to. If so, then the above complaints are not really relevant, right?

> * An RS causes Chrome to treat a file as binary data, instead of text.

Ditto.

> * Cut-n-paste a JSON value with an invisible RS prefix and the result is NOT JSON, ie it will fail with a JSON parser as RS is not allowed in JSON.

That will be true of anything other than a character that doesn't need to be escaped, right? People asked for RS (or something like it) so that they didn't have to deal with escaping when the value chosen was also in a string. 

> * No one uses RS.
> * RS is now labelled INFORMATION SEPARATOR TWO, not RECORD SEPARATOR.
> * We aren't using INFORMATION SEPARATOR ONE, THREE or FOUR.

All irrelevant. We are creating a new specification.

> * A newline as a JSON value terminator is sufficient to parse a JSON sequence unambiguously.

Sure. And it also causes the need to have escaping.

> * RS doesn't work well with APIs that read text by the line.

Are there JSON APIs that do that?

> * Detecting a newline that separates JSON values is more complex than detecting an RS character, but it is not that complex (eg handful of lines of code).

The people who asked for RS seemed more concerned about escaping newlines in the JSON being written, not detecting it on the incoming. Do you agree that that is also a concern?

> * An RS prefix detects only slightly more cases of accidentally truncated writes (in the middle of a top-level number, in a top-level string in the middle of an escape sequence) -- not enough to be compelling.

That was not the major motivation, however.

> * The awkwardness of RS will mean many implementations will be lenient, but leniency becomes "expected" which leads to interop problems.

That is a prediction of the future.

> "A JSON sequence is the concatenation of zero or more JSON values, where each JSON value is terminated with a newline."
> 
> Simple to understand. Simple to write. Simple enough to parse. Simple enough to resync from the middle of a sequence. Almost identical recovery from accidental corruption is possible in almost all the same instances regardless of whether an RS prefix or newline suffix is used.

Sure, but it ignores the issue many people had about escaping.

--Paul Hoffman