Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)
Tim Bray <tbray@textuality.com> Sun, 01 June 2014 05:09 UTC
Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 894EC1A0177 for <json@ietfa.amsl.com>; Sat, 31 May 2014 22:09:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.778
X-Spam-Level:
X-Spam-Status: No, score=-0.778 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FM_FORGED_GMAIL=0.622, J_CHICKENPOX_14=0.6, J_CHICKENPOX_41=0.6, RCVD_IN_DNSWL_LOW=-0.7] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M6fggUmT8FpU for <json@ietfa.amsl.com>; Sat, 31 May 2014 22:09:15 -0700 (PDT)
Received: from mail-vc0-f181.google.com (mail-vc0-f181.google.com [209.85.220.181]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 260D81A0176 for <json@ietf.org>; Sat, 31 May 2014 22:09:14 -0700 (PDT)
Received: by mail-vc0-f181.google.com with SMTP id hq11so1815945vcb.12 for <json@ietf.org>; Sat, 31 May 2014 22:09:09 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type:content-transfer-encoding; bh=U9zX4y8hRmw1EepmjDDQKz6gebwG3zcKIfPTcTzvmkk=; b=HhJnSzNdbI6paSY50hINT1GFvIc4MBiTvX9131kzBqBDQuA3R0FT8IVax8rdqhoQlb Se4hTZWs9q0DWP8aAQKyEeBe7GJGgWBJL/F4twTlDeJ8AN703o767lp+K4PvFLNWul+s FK3fJOyeMIkM5KZvyz91qMvPVHRrEHU2HsyOYfdNFTsDJt4NP4VFTUt/wDwhIo8nWKl7 olQ/Ol1XRaCNEE0bLpSzxZNqkCrltJaSpsRKCrxG7RR5AuayHWwsUa1yrbX4MyVJ2Y7g 4QNRbLjBJ7aWeEXV++fdOOKhSEXNNPZcs5CcWf79///YCqm2ASwupR7tQs4LP8mbQUyd aS6g==
X-Gm-Message-State: ALoCoQntmrl01sZ7B7HppZJSp2gXxQ//GstMHUSyRXh0SVwCVAwSStwWdHWv+1vZEAQRil0HDO4j
X-Received: by 10.221.44.73 with SMTP id uf9mr23349374vcb.9.1401599349437; Sat, 31 May 2014 22:09:09 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.220.98.73 with HTTP; Sat, 31 May 2014 22:08:49 -0700 (PDT)
X-Originating-IP: [24.84.235.32]
In-Reply-To: <CAK3OfOidgk13ShPzpF-cxBHeg34s99CHs=bpY1rW-yBwnpPC-g@mail.gmail.com>
References: <CAK3OfOidgk13ShPzpF-cxBHeg34s99CHs=bpY1rW-yBwnpPC-g@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
Date: Sat, 31 May 2014 22:08:49 -0700
Message-ID: <CAHBU6itr=ogxP4uoj57goEUSOCpsRx1AXVnW1NQwSTPxbbttkw@mail.gmail.com>
To: Nico Williams <nico@cryptonector.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/json/xVdQtDQ1KqC6NcbiQ4qHV-jQavw
Cc: Carsten Bormann <cabo@tzi.org>, IETF JSON WG <json@ietf.org>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Paul Hoffman <paul.hoffman@vpnc.org>
Subject: Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jun 2014 05:09:16 -0000
On Sun, May 25, 2014 at 4:05 PM, Nico Williams <nico@cryptonector.com> wrote: > Currently my thinking is that for backwards compatibility reasons I'd > want to to make this RECOMMENDED though, not REQUIRED, except for > cases where incomplete writes are a potential problem. No. There should be only one way to do things. OK, I propose that the code point U+FFFE be used be used as the separator in JSON sequences. (This is the reversed form of the ZERO WIDTH NO BREAK SPACE a.k.a. Byte Order Mark character; it means that if you’re reading UTF-16 you have the endian-ness wrong). Since presumably by the time you see a separator you’ve figured out your byte order, and especially since de facto everything is UTF-8, U+FFFE just can’t occur. Also the Unicode spec is clear that it must never be interpreted as an abstract character nor interchanged; and is thus suitable for use as a separator. This makes the resync problem trivial: If you hit a busted JSON text, you drop into a loop like while ((nextCodepoint() != 0xFFFE) && !eof()) { // do nothing } So the top-level production is along the lines of JSON-sequence = JSON-text *( %xfffe JSON-text ) In jq this > would be an option to either use or maybe not use this new separator. > > Another option is to say that encoders MUST use the new separator, but > parsers MAY/SHOULD/MUST handle sequences with a missing separator (as > jq does; see below). jq would still have an encoding option, but when > not emitting the new separator the result just wouldn't be a JSON text > sequence. > > FWIW, this is what the jq processor does to handle sequences: it reads > input bytes, feeds them to its parser (which works incrementally, but > isn't streaming), and passes each parsed output to the jq VM to use as > an input to the jq program. Output values of the jq program are > encoded as JSON texts, printed, and then a newline is printed. > > The jq processor has no special handling of newlines on input. If > there's any bytes left over from parsing a previous text, they are > used in the next parse. Whitespace is just whitespace. > > The only special thing that the jq processor does is to print a > newline after each text on output. > > This means that jq can handle JSON text sequences with any whitespace > separator, and even no separator when there would be no ambiguity: > > % /jq -c .<<EOF > 1 2 true false null"a string""another"[0,1,2 > ]{"foo":"bar"} > EOF > 1 > 2 > true > false > null > "a string" > "another" > [0,1,2] > {"foo":"bar"} > % > > I could teach jq how to parse a non-whitespace control character > separator; that's easy enough. The question is: how to handle > backwards compatibility? The obvious answer is: add an option. But > which way should it default? > > Nico > -- > > _______________________________________________ > json mailing list > json@ietf.org > https://www.ietf.org/mailman/listinfo/json -- - Tim Bray (If you’d like to send me a private message, see https://keybase.io/timbray)
- [Json] Using a non-whitespace separator (Re: Work… Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … John Cowan
- Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Martin J. Dürst
- Re: [Json] Using a non-whitespace separator (Re: … Joe Hildebrand (jhildebr)
- Re: [Json] Using a non-whitespace separator (Re: … Phillip Hallam-Baker
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
- Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
- Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
- Re: [Json] Using a non-whitespace separator (Re: … Phillip Hallam-Baker
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Manger, James
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Jacob Davies
- Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
- Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … John Cowan
- Re: [Json] Using a non-whitespace separator (Re: … John Cowan
- Re: [Json] Using a non-whitespace separator (Re: … John Cowan
- Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
- Re: [Json] Using a non-whitespace separator (Re: … Manger, James