Re: [Json] A possible summary of the discussion so far on code points and characters

Norbert Lindenberg <ietf@lindenbergsoftware.com> Wed, 12 June 2013 16:51 UTC

Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset="windows-1252"
From: Norbert Lindenberg <ietf@lindenbergsoftware.com>
In-Reply-To: <CAHBU6it6Zf3gFkUjcBq+xJxPj=SBopD=RyA=B5r=243hYnQRWA@mail.gmail.com>
Date: Wed, 12 Jun 2013 09:51:22 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <490F7536-10FB-4BF0-8AC4-ABE658D68FE6@lindenbergsoftware.com>
References: <AF793CAF-B30B-44A7-B864-82CEF79EA34D@vpnc.org> <CAChr6SwLDCUk0DC9pGTKqUu_V5vJHvs7Sgv4EneTJMryn1iKSA@mail.gmail.com> <D27EA9DC-9EFE-419B-BC34-3BF3FC8F5260@vpnc.org> <EF244D9B-29E2-40E4-99FF-810A28091106@tzi.org> <CAChr6Sxwhdn8CshU92y6fcoovzzhcayg3MECP7Hg=UXX390z=w@mail.gmail.com> <8C87F4D2-CABE-4F26-A5B1-6BC9C759C7CD@tzi.org> <CAChr6SzTHkbfXgUxYWLijyoYz0ug2TMjoVzFgDEF+Mz+idZ1Yg@mail.gmail.com> <CAHBU6it6Zf3gFkUjcBq+xJxPj=SBopD=RyA=B5r=243hYnQRWA@mail.gmail.com>
To: Tim Bray <tbray@textuality.com>, R S <sayrer@gmail.com>
Cc: Norbert Lindenberg <ietf@lindenbergsoftware.com>, Carsten Bormann <cabo@tzi.org>, json@ietf.org
Subject: Re: [Json] A possible summary of the discussion so far on code points and characters
Precedence: list

On Jun 8, 2013, at 21:48 , R S wrote:

> If we must "improve" the current text, I have a suggested addition which borrows from your emails. I'm not sure where to add it, because it doesn't fit well with the current structure of the document.
> 
> "At their most basic level, JSON strings represent a vector of unconstrained 16-bit values which largely map to UCS-2. Implementations MAY apply more stringent Unicode validation."

JSON is in no way constrained to the Basic Multilingual Plane, so if we discuss 16-bit values in JSON, they're UTF-16 code units, not UCS-2.


On Jun 9, 2013, at 0:08 , Tim Bray wrote:

> It seems clear that the intent of JSON, judging by the language in 4627, and the observed usage in a zillion RESTful protocols currently in production, is that JSON strings be used to interchange Unicode character sequences.

I agree.

> It seems clear that (at least partly as a side-effect of the JavaScript “character” model) there is no normative requirement to avoid Unicode abuse such as the use of non-character codepoints and naked surrogates, which will predictably lead to consequences such as Carsten’s exploding-python example.
> 
> So maybe just leave the spec more or less the way it is. Say in the introduction that strings are for interchanging Unicode characters, observe in the fine print that the specification does not forbid the use of things that cannot be useful in the Unicode context and will quite likely cause software breakage.

It might be better to say "Unicode code points" rather than "Unicode characters":

- This makes the spec independent of Unicode versions - the set of Unicode code points is fixed (U+0000 to U+10FFFF), while the set of assigned characters in Unicode keeps growing, and different systems communicating via JSON may not be based on the same Unicode version.

- It makes clear that noncharacters, unassigned code points, and surrogate code points are all allowed in JSON (although subject to the limitations imposed by parsers, communication channels, security systems, or the character encoding used).

> And in the best-practices doc, say “Encode only Unicode codepoints, and use only UTF-8 to do it.”

UTF-8 is the right encoding to use over the wire or in files, but at runtime many systems (including all that implement the ECMAScript or DOM specifications) have to use UTF-16 semantics.

Norbert

[Json] A possible summary of the discussion so fa… Paul Hoffman
Re: [Json] A possible summary of the discussion s… R S
Re: [Json] A possible summary of the discussion s… Paul Hoffman
Re: [Json] A possible summary of the discussion s… Stephen Dolan
Re: [Json] A possible summary of the discussion s… R S
Re: [Json] A possible summary of the discussion s… Carsten Bormann
Re: [Json] A possible summary of the discussion s… R S
Re: [Json] A possible summary of the discussion s… Carsten Bormann
Re: [Json] A possible summary of the discussion s… R S
Re: [Json] A possible summary of the discussion s… Tim Bray
Re: [Json] A possible summary of the discussion s… Stephen Dolan
Re: [Json] A possible summary of the discussion s… Norbert Lindenberg