Re: [Json] Complete section 3 proposal

Nico Williams <nico@cryptonector.com> Tue, 18 June 2013 22:17 UTC

MIME-Version: 1.0
In-Reply-To: <A723FC6ECC552A4D8C8249D9E07425A70FC591EA@xmb-rcd-x10.cisco.com>
References: <CAK3OfOgFqwxkoZtv2t9XR4t-DLYRoBJeATtGhOHZ2ZACACW4Gg@mail.gmail.com> <A723FC6ECC552A4D8C8249D9E07425A70FC591EA@xmb-rcd-x10.cisco.com>
Date: Tue, 18 Jun 2013 17:17:29 -0500
Message-ID: <CAK3OfOhf7Ns3GyCuLAx-k7BK-7ofiQh9QtJ6ZbSvmG9T91xBuA@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>
Content-Type: text/plain; charset="UTF-8"
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Complete section 3 proposal
Precedence: list

On Tue, Jun 18, 2013 at 5:05 PM, Joe Hildebrand (jhildebr)
<jhildebr@cisco.com> wrote:
> On 6/18/13 2:52 PM, "Nico Williams" <nico@cryptonector.com> wrote:
>
>>Note that if a JSON string in JSON data contains unescaped naked
>>surrogates then the encoding of that data will not be valid UTF-8,
>>UTF-16, nor, for that matter, CESU-8.  And some implementations
>>probably produce CESU-8-encoded data.
>
> I think the spirit of 4627 was that it literally be UTF-8, and that all of
> those other odd encodings are already non-conformant.  We could always add
> a note that says that there has been a history of encodings not being
> quite adequately specified, so old software may produce octet streams that
> this document doesn't describe.

We've been over this.  The spirit was that strings were of Unicode
characters, but really they're of code points.  And so on.  And
there's no consensus to break existing encoders.  And so I don't think
we can get consensus for that MUST w/o that note.

>>I'm not sure whether that's
>>worth stating here or elsewhere, but the fact that there's
>>not-quite-UTF-8 JSON out there means this SHALL is either
>>interop-breaking or the matter must be mentioned nearby.
>
> I agree it might be interop-breaking, but I don't think that's necessarily
> the spec's fault.  People will write bad software, particularly when they
> don't have test vectors easily at hand for them to probe what they
> originally thought were edge cases.

Well, if you can get consensus for it...

It's not that people write bad code.  It's that bad *data* exists and
why should encoders look for it (particularly naked surrogates, why
look for them)?  I think in practice not much can be done about this
but note the problem and encourage encoders not to add to this (i.e.,
encoders should escape naked surrogates).

Nico
--

[Json] Complete section 3 proposal Joe Hildebrand (jhildebr)
Re: [Json] Complete section 3 proposal Carsten Bormann
Re: [Json] Complete section 3 proposal Nico Williams
Re: [Json] Complete section 3 proposal Joe Hildebrand (jhildebr)
Re: [Json] Complete section 3 proposal Joe Hildebrand (jhildebr)
Re: [Json] Complete section 3 proposal Paul Hoffman
Re: [Json] Complete section 3 proposal Nico Williams
Re: [Json] Complete section 3 proposal Carsten Bormann
Re: [Json] Complete section 3 proposal Pete Cordell
Re: [Json] Complete section 3 proposal Paul Hoffman
Re: [Json] Complete section 3 proposal John Cowan
Re: [Json] Complete section 3 proposal Joe Hildebrand (jhildebr)
Re: [Json] Complete section 3 proposal Joe Hildebrand (jhildebr)