Re: [Json] Unpaired surrogates in JSON strings

Douglas Crockford <douglas@crockford.com> Wed, 05 June 2013 18:20 UTC

Message-ID: <51AF8149.5090907@crockford.com>
Date: Wed, 05 Jun 2013 11:19:53 -0700
From: Douglas Crockford <douglas@crockford.com>
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: Paul Hoffman <paul.hoffman@vpnc.org>
References: <20130605162246.GG3680@mercury.ccil.org> <51AF7988.6040009@crockford.com> <61407E6F-4178-471A-931C-D98E6F0C9756@vpnc.org>
In-Reply-To: <61407E6F-4178-471A-931C-D98E6F0C9756@vpnc.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
Precedence: list

On 6/5/2013 11:01 AM, Paul Hoffman wrote:
> <definitely no hat>
>
> On Jun 5, 2013, at 10:46 AM, Douglas Crockford <douglas@crockford.com> wrote:
>
>> On 6/5/2013 9:22 AM, John Cowan wrote:
>>> RFC 4627 Section 1 says:
>>>
>>>      A string is a sequence of zero or more Unicode characters.
>>>
>>> However, the grammar of strings permits things like "foo\uDC00bar",
>>> which contains an escape sequence that does not correspond to any
>>> Unicode character.  This provides backward compatibility with JavaScript,
>>> where a string is not a sequence of characters but a sequence of UTF-16
>>> code units.
>>>
>>> If Section 1 is normative, then there is a contradiction with Section 4,
>>> which says:
>>>
>>>      A JSON parser MUST accept all texts that conform to the JSON
>>>      grammar.
>>>
>>> In my view, JSONbis processors should be REQUIRED to produce only strings
>>> that conform to Section 1.
>>>
>> Such a requirement will be breaking. Breaking changes are out of scope.
> How is that "breaking"? Section 1 has a definition of strings, and Section 4 says that the parser must accept all texts that conform to the grammar. Surrogate code points are not characters, according to the Unicode spec.
>
>> I like the suggestion that section 1 should be talking about code points instead of characters.
> That seems like a significant change that would cause parsers that currently follow Section 1 to fail.
>
It is not a change, it is a clarification. JavaScript, Java, and many 
other languages have strings of code points, owing to their being set 
before Unicode grew surrogate pairs. So strings in those languages are 
composed of code points. JSON is tolerant of that reality.

[Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Tim Bray
Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
Re: [Json] Unpaired surrogates in JSON strings Tim Bray
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings R S
Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Tim Bray
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Tim Bray
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Martin J. Dürst
Re: [Json] Unpaired surrogates in JSON strings Bjoern Hoehrmann
[Json] On characters and code points Paul Hoffman
Re: [Json] On characters and code points Tim Bray
Re: [Json] On characters and code points Stephen Dolan
Re: [Json] On characters and code points Stefan Drees
Re: [Json] On characters and code points Tim Bray
Re: [Json] On characters and code points Stefan Drees
Re: [Json] On characters and code points Tim Bray
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] On characters and code points John Cowan
Re: [Json] On characters and code points John Cowan
Re: [Json] On characters and code points Tim Bray
Re: [Json] On characters and code points John Cowan
Re: [Json] Unpaired surrogates in JSON strings Nico Williams
Re: [Json] Unpaired surrogates in JSON strings Nico Williams
Re: [Json] Unpaired surrogates in JSON strings Tatu Saloranta
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] On characters and code points Bjoern Hoehrmann
Re: [Json] On characters and code points Tim Bray
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] On characters and code points Nico Williams
Re: [Json] On characters and code points John Cowan
Re: [Json] On characters and code points Bjoern Hoehrmann
Re: [Json] On characters and code points Carsten Bormann
Re: [Json] On characters and code points Stefan Drees
Re: [Json] On characters and code points Paul Hoffman
Re: [Json] On characters and code points Carsten Bormann
Re: [Json] On characters and code points Nico Williams