Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

"Matthew A. Miller" <linuxwolf+ietf@outer-planes.net> Mon, 17 April 2017 17:41 UTC

Sender: Matthew Miller <linuxwolf@outer-planes.net>
To: Tim Bray <tbray@textuality.com>
Cc: "json@ietf.org" <json@ietf.org>
References: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net> <0E32A94D-CE12-4F52-9ED6-8743C49751B4@vpnc.org> <4d2f0fb3-a729-0c17-2394-bc1e005dd612@gmx.de> <d09f9a59-2411-45a0-470c-ea95072fe4fd@outer-planes.net> <dad91b19-e774-e239-36d2-9d086cca8e0d@gmx.de> <ac432615-ee84-3cdf-6b37-480626bd18c1@gmx.de> <804f9930-26a5-a565-0607-452b386cfeb5@outer-planes.net> <D89BCFAA-B81F-4EEB-8B3A-180BAAB9D16C@att.com> <e69d7c21-85cb-45f4-c0c2-34c624e63049@outer-planes.net> <14252631-AD76-4537-89BF-6368F4A8CDF4@att.com> <7e6af21f-16ea-a3bc-9c01-595ae8acebba@gmx.de> <05100401-88D4-4158-A3FF-3EF144D85449@att.com> <CAD2gp_T0bfpnsCA_t4BAMtEhr7p8JkZggjnY4F+m9-M2hWLfmw@mail.gmail.com> <1e94516c-9c82-8b0e-0d2d-7dbaa83b21bd@outer-planes.net> <40e3207f-e047-c898-1f0c-4422de1d597a@it.aoyama.ac.jp> <1b3ec14a-927a-8d46-e3d3-9807a9588437@outer-planes.net> <CAHBU6ivsq8+Z=MMkUH+=Q0uwc5NCtaJLYw5cp0Qg8eX2hQQ6sA@mail.gmail.com>
From: "Matthew A. Miller" <linuxwolf+ietf@outer-planes.net>
Message-ID: <b74cb31b-8e04-17d0-548a-fc164ce07c05@outer-planes.net>
Date: Mon, 17 Apr 2017 11:41:29 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.0
MIME-Version: 1.0
In-Reply-To: <CAHBU6ivsq8+Z=MMkUH+=Q0uwc5NCtaJLYw5cp0Qg8eX2hQQ6sA@mail.gmail.com>
Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="27d8Ibhnlj0TI78usrHRuGfoWJkqTBGaa"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/rfyg77yOplcAWcuVfR28AX7VT3Q>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
Precedence: list

On 17/03/27 22:48, Tim Bray wrote:
> First of all, let me say that I’m delighted with, and fully support, the
> promotion of the status of UTF-8 in the JSON RFC to MUST.  I suspect
> this steps way outside the JSONbis charter, but that’s a problem for
> chairs and ADs, not yr humble editor.
> 
> Comments on Matt's proposed text:
> 
> 1. How about a very short historical note, along the lines of: “Previous
> specifications of JSON, including the predecessor RFCs, have not
> required the use of UTF-8 for use with the application/json media type. 
> However, implementors of JSON-based software have overwhelmingly chosen
> to use the UTF-8 encoding, to the extent that it is the only realistic
> way to achieve interoperability in software which generates or consumes
> JSON.”
> 
> ... moving on...
> 
> 
> O
> 
> 
> n Mon, Mar 27, 2017 at 1:04 PM, Matthew A. Miller
> <linuxwolf+ietf@outer-planes.net
> <mailto:linuxwolf+ietf@outer-planes.net>> wrote:
> 
> 
>     
>     JSON text SHOULD be encoded in UTF-8 (Section 3 of [UNICODE]); JSON
>     
>     text MAY be encoded in UTF-16 or UTF-32 if the generator is certain
>     
>     the intended recipients can process it. JSON text MUST NOT be encoded
>     
>     in any encoding other than UTF-8, UTF-16, or UTF-32. When used with
>     
>     media type "application/json" the JSON text MUST be encoded as UTF-8.
> 
> 
> 2. Seriously, why the “JSON text MAY be encoded in… can process it ”
> phrase?  It’s a distraction, and if people want to do that, we can’t
> stop them, but we shouldn't waste RFC space talking about practices that
> are not remotely interoperable.  The I in IETF stands for Internet, and
> JSON on the Internet is UTF-8, end of story.
> 
> 
>     Recipients that wish to support Unicode encodings other than UTF-8
>     can do this using a detection mechanism that is based on the fact
>     that the first character will always have a Unicode code point
>     greater than 0 and less than 128, thus the UTF-16/32 variants can
>     be detected by inspecting the first octets for nulls.
> 
> 
> 3. Is it just me, or does it feel really dorky to talk mysteriously
> about this detection mechanism without providing details?  On top of
> which, anyone who's writing the kind of software that might lead one to
> consult an RFC first shouldn't bloody well use anything but UTF-8.  If
> people really want to have this, I think we owe the world an outline of
> the algorithm, maybe in an appendix. I'll volunteer to make my best
> effort to draft it and try to get consensus that it's correct..  If we
> can't, that's a powerful symbol that we shouldn't have this language. 
> But that's my fallback position; my real request to the group is that we
> just take this out.
> 

[ /me doffs hat ]

Thinking about this more, putting an encoding detection algorithm as an
appendix seems like a reasonable compromise to me.  To start, how about
removing the detection text from Section 8.1 and have an appendix that
starts with that text plus the table?

Assuming the above, what does everyone think of the following for
Section 8.1?

"""
JSON text SHOULD be encoded in UTF-8 (Section 3 of [UNICODE]). JSON
text MUST NOT be encoded in any encoding other than UTF-8, UTF-16,
or UTF-32. When used with media type "application/json" the JSON
text MUST be encoded as UTF-8.

Previous specifications of JSON have not required the use of UTF-8
with the "application/json" media type. However, the vast majority
of JSON-based software implementations have chosen to use the UTF-8
encoding, to the extent that it is the only encoding that achieves
interoperability.

Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a JSON text.  In the interests of interoperability,
implementations that parse JSON texts MAY ignore the presence of a
byte order mark rather than treating it as an error.
"""


- m&m

Matthew A. Miller

Attachment: signature.asc

Re: [Json] Call for Consensus: Proposed Text for … HANSEN, TONY L
Re: [Json] Call for Consensus: Proposed Text for … Nico Williams
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Martin J. Dürst
Re: [Json] Call for Consensus: Proposed Text for … Tim Bray
Re: [Json] Call for Consensus: Proposed Text for … John Cowan
Re: [Json] Call for Consensus: Proposed Text for … HANSEN, TONY L
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Martin J. Dürst
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
[Json] Call for Consensus: Proposed Text for "8.1… Matthew Miller
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Nico Williams
Re: [Json] Call for Consensus: Proposed Text for … Nico Williams
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Peter Cordell
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Tim Bray
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Peter Cordell
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Martin J. Dürst
Re: [Json] Call for Consensus: Proposed Text for … Peter Cordell
Re: [Json] Call for Consensus: Proposed Text for … Paul Hoffman
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
Re: [Json] Call for Consensus: Proposed Text for … John Cowan
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Peter Cordell
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … HANSEN, TONY L
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
Re: [Json] Call for Consensus: Proposed Text for … HANSEN, TONY L
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … HANSEN, TONY L
Re: [Json] Call for Consensus: Proposed Text for … John Cowan
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Nico Williams
Re: [Json] Call for Consensus: Proposed Text for … John Cowan
[Json] FW: Call for Consensus: Proposed Text for … Manger, James
Re: [Json] FW: Call for Consensus: Proposed Text … John Cowan
Re: [Json] FW: Call for Consensus: Proposed Text … Manger, James
Re: [Json] Call for Consensus: Proposed Text for … Martin J. Dürst
Re: [Json] Call for Consensus: Proposed Text for … Martin J. Dürst
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
Re: [Json] Call for Consensus: Proposed Text for … Martin J. Dürst
Re: [Json] Call for Consensus: Proposed Text for … Tim Bray
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Carsten Bormann
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
Re: [Json] Call for Consensus: Proposed Text for … Joe Hildebrand
Re: [Json] Call for Consensus: Proposed Text for … John Cowan
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Tim Bray
Re: [Json] Call for Consensus: Proposed Text for … Matthew A. Miller
Re: [Json] Call for Consensus: Proposed Text for … Martin J. Dürst
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Alexey Melnikov
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell
Re: [Json] Call for Consensus: Proposed Text for … Allen Wirfs-Brock
Re: [Json] Call for Consensus: Proposed Text for … Tim Bray
Re: [Json] Call for Consensus: Proposed Text for … Julian Reschke
Re: [Json] Call for Consensus: Proposed Text for … Alexey Melnikov
Re: [Json] Call for Consensus: Proposed Text for … Pete Cordell

Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Attachment: signature.asc