Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

First of all, let me say that I’m delighted with, and fully support, the
promotion of the status of UTF-8 in the JSON RFC to MUST.  I suspect this
steps way outside the JSONbis charter, but that’s a problem for chairs and
ADs, not yr humble editor.

Comments on Matt's proposed text:

1. How about a very short historical note, along the lines of: “Previous
specifications of JSON, including the predecessor RFCs, have not required
the use of UTF-8 for use with the application/json media type.  However,
implementors of JSON-based software have overwhelmingly chosen to use the
UTF-8 encoding, to the extent that it is the only realistic way to achieve
interoperability in software which generates or consumes JSON.”

... moving on...

O

n Mon, Mar 27, 2017 at 1:04 PM, Matthew A. Miller <
linuxwolf+ietf@outer-planes.net> wrote:

> 
> JSON text SHOULD be encoded in UTF-8 (Section 3 of [UNICODE]); JSON
> 
> text MAY be encoded in UTF-16 or UTF-32 if the generator is certain
> 
> the intended recipients can process it. JSON text MUST NOT be encoded
> 
> in any encoding other than UTF-8, UTF-16, or UTF-32. When used with
> 
> media type "application/json" the JSON text MUST be encoded as UTF-8.
>

2. Seriously, why the “JSON text MAY be encoded in… can process it ”
phrase?  It’s a distraction, and if people want to do that, we can’t stop
them, but we shouldn't waste RFC space talking about practices that are not
remotely interoperable.  The I in IETF stands for Internet, and JSON on the
Internet is UTF-8, end of story.

> Recipients that wish to support Unicode encodings other than UTF-8
> can do this using a detection mechanism that is based on the fact
> that the first character will always have a Unicode code point
> greater than 0 and less than 128, thus the UTF-16/32 variants can
> be detected by inspecting the first octets for nulls.
>

3. Is it just me, or does it feel really dorky to talk mysteriously about
this detection mechanism without providing details?  On top of which,
anyone who's writing the kind of software that might lead one to consult
an RFC first shouldn't bloody well use anything but UTF-8.  If people
really want to have this, I think we owe the world an outline of the
algorithm, maybe in an appendix. I'll volunteer to make my best effort to
draft it and try to get consensus that it's correct..  If we can't, that's
a powerful symbol that we shouldn't have this language.  But that's my
fallback position; my real request to the group is that we just take this
out.

> """
>
>
> - m&m
>
> Matthew A. Miller
> JSONBis Chair
>
>
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json
>
>

-- 
- Tim Bray (If you’d like to send me a private message, see
https://keybase.io/timbray)