[Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)

Nico Williams <nico@cryptonector.com> Sun, 25 May 2014 23:05 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9485E1A0408 for <json@ietfa.amsl.com>; Sun, 25 May 2014 16:05:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.455
X-Spam-Level: *
X-Spam-Status: No, score=1.455 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, IP_NOT_FRIENDLY=0.334, J_CHICKENPOX_41=0.6, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id T67Ee6fOfJky for <json@ietfa.amsl.com>; Sun, 25 May 2014 16:05:48 -0700 (PDT)
Received: from homiemail-a54.g.dreamhost.com (sub4.mail.dreamhost.com [69.163.253.135]) by ietfa.amsl.com (Postfix) with ESMTP id 035EB1A035A for <json@ietf.org>; Sun, 25 May 2014 16:05:48 -0700 (PDT)
Received: from homiemail-a54.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a54.g.dreamhost.com (Postfix) with ESMTP id 50B124012373F for <json@ietf.org>; Sun, 25 May 2014 16:05:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:date:message-id:subject:from:to:cc:content-type; s= cryptonector.com; bh=ZIMmG6DYBmAg8Cr66r93plMKT5o=; b=my7DPBLkGmZ toWt7b2krVx16SR9p2uNYk8i+xZ65Bzi20n6RVvnEz18/hF9P2ICb8w0y/37RvV2 FSRkFhkQoPU+ZT5oqfoZZioh82rlHqGY96rV84K9VcWZ8PGE5ivW/3YEZ0d7sLYP F2c+A3OaTrySu6x7Y2+YAcGGQsiEaRMI=
Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a54.g.dreamhost.com (Postfix) with ESMTPSA id 0097B40123736 for <json@ietf.org>; Sun, 25 May 2014 16:05:44 -0700 (PDT)
Received: by mail-wg0-f50.google.com with SMTP id x12so7121938wgg.21 for <json@ietf.org>; Sun, 25 May 2014 16:05:43 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.180.12.135 with SMTP id y7mr22326352wib.39.1401059143579; Sun, 25 May 2014 16:05:43 -0700 (PDT)
Received: by 10.216.29.200 with HTTP; Sun, 25 May 2014 16:05:43 -0700 (PDT)
Date: Sun, 25 May 2014 18:05:43 -0500
Message-ID: <CAK3OfOidgk13ShPzpF-cxBHeg34s99CHs=bpY1rW-yBwnpPC-g@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Carsten Bormann <cabo@tzi.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: http://mailarchive.ietf.org/arch/msg/json/6JDiXvrxt0vSvl7nve64rozQvug
Cc: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Paul Hoffman <paul.hoffman@vpnc.org>, IETF JSON WG <json@ietf.org>
Subject: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 25 May 2014 23:05:48 -0000

I actually like the idea, but I'm concerned about backwards
compatibility for jq.  Help me reason through this.  Help sell me on
this.

Currently my thinking is that for backwards compatibility reasons I'd
want to to make this RECOMMENDED though, not REQUIRED, except for
cases where incomplete writes are a potential problem.  In jq this
would be an option to either use or maybe not use this new separator.

Another option is to say that encoders MUST use the new separator, but
parsers MAY/SHOULD/MUST handle sequences with a missing separator (as
jq does; see below).  jq would still have an encoding option, but when
not emitting the new separator the result just wouldn't be a JSON text
sequence.

FWIW, this is what the jq processor does to handle sequences: it reads
input bytes, feeds them to its parser (which works incrementally, but
isn't streaming), and passes each parsed output to the jq VM to use as
an input to the jq program.  Output values of the jq program are
encoded as JSON texts, printed, and then a newline is printed.

The jq processor has no special handling of newlines on input.  If
there's any bytes left over from parsing a previous text, they are
used in the next parse.  Whitespace is just whitespace.

The only special thing that the jq processor does is to print a
newline after each text on output.

This means that jq can handle JSON text sequences with any whitespace
separator, and even no separator when there would be no ambiguity:

% /jq -c .<<EOF
1 2 true false null"a string""another"[0,1,2
]{"foo":"bar"}
EOF
1
2
true
false
null
"a string"
"another"
[0,1,2]
{"foo":"bar"}
%

I could teach jq how to parse a non-whitespace control character
separator; that's easy enough.  The question is: how to handle
backwards compatibility?  The obvious answer is: add an option.  But
which way should it default?

Nico
--