Re: [Json] On characters and code points

Tim Bray <tbray@textuality.com> Fri, 07 June 2013 17:26 UTC

MIME-Version: 1.0
In-Reply-To: <20130607171950.GD13569@mercury.ccil.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com> <51B06F38.8050707@crockford.com> <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com> <51B1B4E7.8090101@it.aoyama.ac.jp> <9ld3r8pc0tufif18dohb2fmi0ijna1vs4n@hive.bjoern.hoehrmann.de> <56A163E9-E7CD-46B3-9984-8F009EBFF500@vpnc.org> <CAHBU6ivG=ONc8roT7W=LdpKYNMqRH_d5BobZ=pHnk=mVaKZKaA@mail.gmail.com> <20130607171950.GD13569@mercury.ccil.org>
Date: Fri, 07 Jun 2013 10:25:59 -0700
Message-ID: <CAHBU6iuO=D5Vtyjb_FQKHpttrFRBzXcB-Jac_ixb41GQFYF-Fw@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: John Cowan <cowan@mercury.ccil.org>
Content-Type: multipart/alternative; boundary="bcaec54857e86ba1f304de93b985"
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] On characters and code points
Precedence: list

John is right and I am wrong.  What I really want to legislate against is:

{
 "Mis-use of UTF-16 surrogates" : "\udf46\ud800AB\udf11CD\ud812EF",
 "Mis-use of flipped BOM" : "AB\ufffeCD"
}


On Fri, Jun 7, 2013 at 10:19 AM, John Cowan <cowan@mercury.ccil.org> wrote:

> Tim Bray scripsit:
>
> > >    { "End of data marker": "\uFFFF" }
> > >
> >
> > Yes, I *really* want to prohibit that. The one corner case it buys you is
> > outweighed by a factor of a thousand or so in not being able to use
> > general-purpose string processing software to deal with JSON payloads.
>
> Most general-purpose string processing software is perfectly happy
> with U+FFFF.  There are three different kinds of code points here,
> and it doesn't help to conflate them:
>
> 1) Surrogate code points.  These will never be assigned to any characters,
> and reserved for use as UTF-16 code units.  There are exactly 2048 of
> these, from U+DC00 to U+DFFF.
>
> 2) Non-character code points.  These will never be assigned to any
> characters, and are not meant to be interchanged, but internal software
> is expected to handle them.  There are exactly 66 of these, and U+FFFF
> is one.  See <http://www.unicode.org/faq/private_use.html#noncharacters>
> for more about this group.
>
> 3) Unassigned code points.  These are not assigned to any characters
> today, but may be assigned in future.  They may be interchanged.
> Internal libraries should process them.
>
> My view is that group 1 are and should be disallowed in JSON; others
> disagree.  Group 2 should be avoided by JSON creators, but accepted by
> JSON parsers, which may choose to change them to U+FFFD (replacement
> character).  Group 3 are and should be valid in JSON.
>
> --
> Values of beeta will give rise to dom!          John Cowan
> (5th/6th edition 'mv' said this if you tried    http://www.ccil.org/~cowan
> to rename '.' or '..' entries; see              cowan@ccil.org
> http://cm.bell-labs.com/cm/cs/who/dmr/odd.html)
>

[Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Tim Bray
Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
Re: [Json] Unpaired surrogates in JSON strings Tim Bray
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings R S
Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Tim Bray
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Tim Bray
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] Unpaired surrogates in JSON strings Martin J. Dürst
Re: [Json] Unpaired surrogates in JSON strings Bjoern Hoehrmann
[Json] On characters and code points Paul Hoffman
Re: [Json] On characters and code points Tim Bray
Re: [Json] On characters and code points Stephen Dolan
Re: [Json] On characters and code points Stefan Drees
Re: [Json] On characters and code points Tim Bray
Re: [Json] On characters and code points Stefan Drees
Re: [Json] On characters and code points Tim Bray
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] On characters and code points John Cowan
Re: [Json] On characters and code points John Cowan
Re: [Json] On characters and code points Tim Bray
Re: [Json] On characters and code points John Cowan
Re: [Json] Unpaired surrogates in JSON strings Nico Williams
Re: [Json] Unpaired surrogates in JSON strings Nico Williams
Re: [Json] Unpaired surrogates in JSON strings Tatu Saloranta
Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
Re: [Json] On characters and code points Bjoern Hoehrmann
Re: [Json] On characters and code points Tim Bray
Re: [Json] Unpaired surrogates in JSON strings John Cowan
Re: [Json] On characters and code points Nico Williams
Re: [Json] On characters and code points John Cowan
Re: [Json] On characters and code points Bjoern Hoehrmann
Re: [Json] On characters and code points Carsten Bormann
Re: [Json] On characters and code points Stefan Drees
Re: [Json] On characters and code points Paul Hoffman
Re: [Json] On characters and code points Carsten Bormann
Re: [Json] On characters and code points Nico Williams