Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Nico Williams <> Mon, 13 March 2017 21:51 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4F252129477; Mon, 13 Mar 2017 14:51:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.401
X-Spam-Status: No, score=-1.401 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5] autolearn=no autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 6VvGHGjp-jOJ; Mon, 13 Mar 2017 14:51:16 -0700 (PDT)
Received: from ( []) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 7D99B129BD4; Mon, 13 Mar 2017 14:50:55 -0700 (PDT)
Received: from (localhost []) by (Postfix) with ESMTP id 39D51A004B16; Mon, 13 Mar 2017 14:50:55 -0700 (PDT)
Received: from localhost ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id B8F5CA004B13; Mon, 13 Mar 2017 14:50:54 -0700 (PDT)
Date: Mon, 13 Mar 2017 16:50:52 -0500
From: Nico Williams <>
To: Carsten Bormann <>
Message-ID: <20170313215051.GF543@localhost>
References: <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/1.5.24 (2015-08-30)
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Cc:, Matthew Miller <>, "" <>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 13 Mar 2017 21:51:17 -0000

On Mon, Mar 13, 2017 at 10:12:09PM +0100, Carsten Bormann wrote:
> > 1) Does the working group think adding any text on how to detect the
> > encoding worthwhile?
> No, that would be a regression into maintaining the fiction that
> UTF-16 and UTF-32 versions of JSON are being used in interchange.

I would support saying that JSON texts SHOULD be UTF-8.  I might even
support that they MUST be UTF-8, but I suspect there won't be consensus
for that, though IF there is, then I agree with you.

> Not sure if I’m allowed to note that after saying no above, but not

Why wouldn't you be?

> all JSON documents have four bytes.

True!  The empty string at the top-level, with no whitespace around it,
is a two-byte text when encoded in UTF-8.

Indeed, just two bytes will do for some cases; at most four are needed.
Byte order can be detected as well, also in two to four bytes, but byte
order also is pointless to think about if UTF-8 were REQUIRED (which it
isn't [yet]).