Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Tim Bray <> Tue, 28 March 2017 04:49 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id C8DD9127241 for <>; Mon, 27 Mar 2017 21:49:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id CIXz60_GwQ7s for <>; Mon, 27 Mar 2017 21:49:13 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:400d:c0d::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 343BF128B44 for <>; Mon, 27 Mar 2017 21:49:13 -0700 (PDT)
Received: by with SMTP id x35so55074482qtc.2 for <>; Mon, 27 Mar 2017 21:49:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=nAyzffkgPJcgE3zPdlqphox6v0MNjc/gT4f+NUFgdTE=; b=ET9TvjJRNX2s16kWGNxm8a+202P9/5Gz5cWLO2NJgBGISmgFaTuZtq3/XAMcOFZWn0 Xf03UAWeN9m5Tcwx/hxA/YXWY3HWu9ZQpzeJjRhYMKrZKeBWmZYgzxoDeCcRsdU5cSE6 ArKsgDG6Z7oJmzHwaLOVxMNdHiQiAkXn06oF9ZzO2kBjktqpRqlH5r9LPCEt1O9d/wBG IC3Y0NJZHOnM8RUHbuEbookloFNCEIp/R8xhgjixEbES4pVDLVN+GnRq2KyuOXGlAyiM ADGjeNF4yLAs33vBAy04dpykmlvULmTU3V1s8pTbvRbBaRTcyrkYNjJjVYKKlcQ4rG7m BleQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=nAyzffkgPJcgE3zPdlqphox6v0MNjc/gT4f+NUFgdTE=; b=S1nKG/CCz3sTMDgV2mmBjt/jx9GaGLybFY9AtvRhB5CQxCJ/tF2Rc0tUELKmEzAujq 5e0bE1PnuRFIilVH6Xr1ZJAbklJa7t8dwu4wsrvvQYuYzUaSzDNdt8wZ6ifhpoC5k3I6 Z6mTF5ye8OthiPRLU7i9twC3mwEwK6LTtYJ/SSTi6EPD7GjJqo8xY3covcf2PFBL33WJ 2sSI5z7WoB4VIR3gsFup37WJIpLVhxsYz7wiNa8cBroRNt8ZWfyO2p5+56xqSxuFeZBk kCi7QwLndszFLcrqLPusYJ6Mc3jYgZSA1MGwlaR6F+0ahrFT/AaUEyvZKbGMOZC9FNIV 3QQQ==
X-Gm-Message-State: AFeK/H3oVCXStucQXa3n9skkEgUEr3Nb0xsX3N/gkEe5RiHvaMaiRrVcenXOPlLh2aiRuri4m65cy/CNTdOcpQ==
X-Received: by with SMTP id u3mr12419327qtu.203.1490676552325; Mon, 27 Mar 2017 21:49:12 -0700 (PDT)
MIME-Version: 1.0
Received: by with HTTP; Mon, 27 Mar 2017 21:48:51 -0700 (PDT)
X-Originating-IP: []
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
From: Tim Bray <>
Date: Mon, 27 Mar 2017 21:48:51 -0700
Message-ID: <>
To: "Matthew A. Miller" <>
Cc: "" <>
Content-Type: multipart/alternative; boundary=001a11412d8060723a054bc32ee2
Archived-At: <>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 28 Mar 2017 04:49:16 -0000

First of all, let me say that I’m delighted with, and fully support, the
promotion of the status of UTF-8 in the JSON RFC to MUST.  I suspect this
steps way outside the JSONbis charter, but that’s a problem for chairs and
ADs, not yr humble editor.

Comments on Matt's proposed text:

1. How about a very short historical note, along the lines of: “Previous
specifications of JSON, including the predecessor RFCs, have not required
the use of UTF-8 for use with the application/json media type.  However,
implementors of JSON-based software have overwhelmingly chosen to use the
UTF-8 encoding, to the extent that it is the only realistic way to achieve
interoperability in software which generates or consumes JSON.”

... moving on...

n Mon, Mar 27, 2017 at 1:04 PM, Matthew A. Miller <> wrote:

> ​​
> JSON text SHOULD be encoded in UTF-8 (Section 3 of [UNICODE]); JSON
> ​​
> text MAY be encoded in UTF-16 or UTF-32 if the generator is certain
> ​​
> the intended recipients can process it. JSON text MUST NOT be encoded
> ​​
> in any encoding other than UTF-8, UTF-16, or UTF-32. When used with
> ​​
> media type "application/json" the JSON text MUST be encoded as UTF-8.

​2. Seriously, why the “JSON text MAY be encoded in… can process it ​”
phrase?  It’s a distraction, and if people want to do that, we can’t stop
them, but we shouldn't waste RFC space talking about practices that are not
remotely interoperable.  The I in IETF stands for Internet, and JSON on the
Internet is UTF-8, end of story.

> Recipients that wish to support Unicode encodings other than UTF-8
> can do this using a detection mechanism that is based on the fact
> that the first character will always have a Unicode code point
> greater than 0 and less than 128, thus the UTF-16/32 variants can
> be detected by inspecting the first octets for nulls.

​3. Is it just me, or does it feel really dorky to talk mysteriously about
this detection mechanism without providing details?  On top of which,
anyone who's writing the kind of software that might lead one to consult
​an RFC first shouldn't bloody well use anything but UTF-8.  If people
really want to have this, I think we owe the world an outline of the
algorithm, maybe in an appendix. I'll volunteer to make my best effort to
draft it and try to get consensus that it's correct..  If we can't, that's
a powerful symbol that we shouldn't have this language.  But that's my
fallback position; my real request to the group is that we just take this

> """
> - m&m
> Matthew A. Miller
> JSONBis Chair
> _______________________________________________
> json mailing list

- Tim Bray (If you’d like to send me a private message, see