Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

"Matthew A. Miller" <> Wed, 10 May 2017 17:13 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 557E312948B for <>; Wed, 10 May 2017 10:13:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 4S6x3C_qFOKj for <>; Wed, 10 May 2017 10:13:09 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4001:c06::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 7B1D3127275 for <>; Wed, 10 May 2017 10:13:09 -0700 (PDT)
Received: by with SMTP id m4so752988ioe.0 for <>; Wed, 10 May 2017 10:13:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to; bh=62de5DQDulr+lhrcTM9QHgznNXfRGI+4gIL0+GSYxx4=; b=I6OJthYDBgBZ4TnAujVHbYDmSjJuelwFEFEEQbf8BxfeXS2TpSAHWKO3n0WZpTt6eJ uN5mOcJ/bR01ktzsjbcB3bQ7IGAzkVJAEO2WhdlJgcYgI3KC3rKVC0uX2D9iiR1O8DQR eMZvGXwWcQIl5sTrgKcJTgAMlHANv8agWEO5MHPdGZtl5jDsI6OlM5kCMIabs2xPnxFO mzu9RfxBbp+qaxnxnQU1042SzJ/UnzjOjbbqqCWDaeQn81+CJOcuWnmvaTefgv8Dzah1 s/v4nyBY0p8E6BY7oO3wnB/Hwytgj2VWJwcczas25m2yik4Ra+/7jx6vxQ2KkbTBHh+d Xs0A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to; bh=62de5DQDulr+lhrcTM9QHgznNXfRGI+4gIL0+GSYxx4=; b=ECt6HwK654qNob60OG9zBaywo4+9JeOQvepwt73GRWTYAG4cm4/tMP/h+LOa1p/9BL qonU3NRGt6zGoZMro9jzKISddtpPNxQksi3JiFv4MAiaDZAEib1pISkoCPWu0BHy2lI2 ASCbfIm8/9+fxFtfhTWb1qGeTfgZMi4y+f/EWw3pr5yz8coRqay8YsscxSHXPrzxldZL AlYkCKGpSWKoZg421XB3wu5ahQ7vpf6xOx7hao+QlgY0j8+lEeqc9hL8hkJvfTlB2Cj9 nK8JQdLo4yrMMNfQ09vfynKHBx/UDhZ/GAmn+4N1PR0ohlcke+VY1TcQWbukqC7aE85E //Iw==
X-Gm-Message-State: AODbwcD77iorvdZDfCb+VcUMOyOwJlyvL0/KeuFfCaFOVPF68DFVA+wX l973P3BBRr4gDw==
X-Received: by with SMTP id g189mr4304904ioa.123.1494436388688; Wed, 10 May 2017 10:13:08 -0700 (PDT)
Received: from [] ( []) by with ESMTPSA id h142sm1888064ith.31.2017. (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 May 2017 10:13:07 -0700 (PDT)
Sender: Matthew Miller <>
To: Pete Cordell <>, Julian Reschke <>, "" <>
References: <> <> <> <> <> <> <20170417175627.GK23461@localhost> <> <> <> <> <> <> <> <> <> <> <>
From: "Matthew A. Miller" <>
Message-ID: <>
Date: Wed, 10 May 2017 11:13:06 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:53.0) Gecko/20100101 Thunderbird/53.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="QqjApumpEovQe0FCdGwlDUDuAj7OcM16T"
Archived-At: <>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 10 May 2017 17:13:11 -0000

On 5/10/17 8:04 AM, Pete Cordell wrote:
> On 10/05/2017 14:08, Julian Reschke wrote:
>> I believe we should have separate names for JSON represented as a
>> sequence of characters (such as in a string variable in a programming
>> language) and for a JSON-shaped octet sequence inside a
>> "application/json"-typed (HTTP) message. For the latter, enforcing UTF-8
>> IMHO is attractive.
>> I think it's ok for the spec to talk about both, but it really needs to
>> be clear what we are talking about in each section.
> It's an interesting thought on JSON in programs.  It would be strange to
> be able to say it was not valid JSON if it was encoded in a string
> inside a Shift-JIS encoded Ruby program for example.
> My ISO layers are rusty, but it looks like we can talk about JSON
> character sequences somewhere above the transport layer, and JSON
> encoded messages somewhere below the transport layer.  The latter
> possibly being transport specific.
> To me it would seem discussion of "JSON" (without any further
> refinement) ought to be independent of the lower layer transport
> encoding aspects.
> "application/json" would be one transport specific encoding (for which I
> think most are happy with only UTF-8).  JSON inside a Shift-JIS encoded
> Ruby program is in effect another form of transport for a JSON message.
> Cheers,
> Pete.

I believe in essence it is within our purview to set expectations of
what transits a wire protocol.

That phrasing is broad and likely vague.  To get a more specific, I
believe it is within scope to cover instances where a media type is
specified and that media type is "application/json" (e.g. HTTP bodies),
but I think it is also within scope for this document to essentially say
"where a protocol says 'use JSON here' then it is encoded as".  I don't
believe it is within scope to dictate JSON be encoded in any particular
manner when placed into a storage medium, or when embedded within other
content (e.g., the Shift-JIS encoded Ruby program even if said program
were transmitted over a network protocol).

Assuming the Working Group finds that scope acceptable and finds UTF-8
only acceptable, here is a starting proposal for text:

8.1.  Character Encoding

When transmitting over a network protocol, JSON text MUST be
encoded in UTF-8 (Section 3 of [UNICODE]).

Previous specifications of JSON have not required the use of UTF-8
when transmitting JSON text. However, the vast majority of
JSON-based software implementations have chosen to use the UTF-8
encoding, to the extent that it is the only encoding that achieves

Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a JSON text.  In the interests of interoperability,
implementations that parse JSON texts MAY ignore the presence of a
byte order mark rather than treating it as an error.

If you find this acceptable, please indicate that.  Otherwise, please
provide suggested changes.

- m&m

Matthew A. Miller