Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)

Nico Williams <nico@cryptonector.com> Mon, 02 June 2014 00:03 UTC

MIME-Version: 1.0
In-Reply-To: <03CFAB3E-F4C6-4AE8-A501-8525376C4AA7@vpnc.org>
References: <CAK3OfOidgk13ShPzpF-cxBHeg34s99CHs=bpY1rW-yBwnpPC-g@mail.gmail.com> <CAHBU6itr=ogxP4uoj57goEUSOCpsRx1AXVnW1NQwSTPxbbttkw@mail.gmail.com> <CAK3OfOhft+XJeMrg5rdY9E6fxAkJ2qsT3UHwu7zt=NEz2Q3XOQ@mail.gmail.com> <CAK3OfOhy-N0zjCVxtOMB8SqZEKceVvBz9Y6i0fo2W8i+gHKm4Q@mail.gmail.com> <CAK3OfOiQnLq29cv+kas3B8it-+82VmXvL3Rq1C5_767FDhBjRg@mail.gmail.com> <03CFAB3E-F4C6-4AE8-A501-8525376C4AA7@vpnc.org>
Date: Sun, 01 Jun 2014 19:03:18 -0500
Message-ID: <CAK3OfOja-17V391tTK91R98X8XQzd0iPnur2=oo4ii+MCOt+Rg@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: http://mailarchive.ietf.org/arch/msg/json/Bi5TsaghK3WpS-NMdKECd07vzT4
Cc: IETF JSON WG <json@ietf.org>
Subject: Re: [Json] Using a non-whitespace separator (Re: Working Group Last Call on draft-ietf-json-text-sequence)
Precedence: list

On Sun, Jun 1, 2014 at 6:17 PM, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
>>> Oh, right, the separator must be a character that must be escaped in
>>> strings.  That greatly limits the range of characters we can choose
>>> from.
>>
>> And it has to be a one-byte character (therefore an ASCII character,
>> and the texts must be encoded in UTF-8).
>
> Why is that?

The problem we're talking about is logfiles where applications
"append" to the logfile.  Incomplete writes can result in some
circumstances.  The question then is: how to recover?  In particular:
how to read past an incomplete entry to the next complete entry?

The I-D currently describes a recovery heuristic, but some reviewers
have stated a preference for a stronger, more easily understood
recovery method.  One obvious approach is to separate texts with a
byte or byte sequence that could not normally happen in a JSON text.
A byte is simpler and easier to handle and understand than a byte
sequence: because the latter can be written incompletely for the same
reasons that a JSON text can.

We've considered all of these approaches (not in this order):

0) newline separator
1) #0 + JSON text boundary detection ABNF
2) #1 + removal of newlines from JSON texts (which does not require
re-encoding, FYI)
3) #2 + write something like a text consisting of a single null value
before every text
4) use some other separator (that isn't a JSON whitespace character)
5) #4 with ASCII RS as the separator

Any separator for #4 has to be something that cannot happen in a JSON
text normally, and it has to be something amenable to recovery from
partial writes.  Even partial writes always at least write complete
_bytes_ (if they write any).  Therefore a one-byte separator is always
unambiguous.

In order for a separator to be utterly unambiguous in the face of
partial writes it has to involve a byte that can never occur in a JSON
text.  I suspect there's no such byte that works for UTF-8/16/32.  If
we limit JSON text sequences to UTF-8 then any ASCII character that
must be escaped in strings and is not valid in the encoding of any
values will do.  That's a very small set of characters.  RS (#5) was
objected to.  I don't see what other character can be used that won't
result similar objections.

The more I think about it the more I prefer the options in the I-D,
#0, #1, #2, and #3.  I don't see better alternatives, and I don't
think this is fatal.

Nico
--

[Json] Using a non-whitespace separator (Re: Work… Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … John Cowan
Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Martin J. Dürst
Re: [Json] Using a non-whitespace separator (Re: … Joe Hildebrand (jhildebr)
Re: [Json] Using a non-whitespace separator (Re: … Phillip Hallam-Baker
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
Re: [Json] Using a non-whitespace separator (Re: … Phillip Hallam-Baker
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Manger, James
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Jacob Davies
Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Paul Hoffman
Re: [Json] Using a non-whitespace separator (Re: … Tim Bray
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … John Cowan
Re: [Json] Using a non-whitespace separator (Re: … John Cowan
Re: [Json] Using a non-whitespace separator (Re: … John Cowan
Re: [Json] Using a non-whitespace separator (Re: … Nico Williams
Re: [Json] Using a non-whitespace separator (Re: … Manger, James