Re: [Json] Encoding Schemes

Nico Williams <> Tue, 18 June 2013 20:32 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 9453121E8094 for <>; Tue, 18 Jun 2013 13:32:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.965
X-Spam-Status: No, score=-1.965 tagged_above=-999 required=5 tests=[AWL=0.012, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id kO7ez+o5eJeQ for <>; Tue, 18 Jun 2013 13:32:36 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 2900011E80F5 for <>; Tue, 18 Jun 2013 13:32:36 -0700 (PDT)
Received: from (localhost []) by (Postfix) with ESMTP id A5599B806D for <>; Tue, 18 Jun 2013 13:32:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed;; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type;; bh=VHn7f/ETvV5U8yE4ay9/ xGNLsSk=; b=qtsxghBs8HP1st1WdWnrCyr2C1855yIGkSljMuIwhr+a4NttksH/ poqt3A4pS3YsfdYD0mMzNVmpwS9sLA9JFfL+SLlcZCshMM7Oz0GwGhB3t19q7IwP gUzkfp+g1+AqPqlJ0D9hEzaA026mENQ5Bl2M3VgCnR2VqoLCpCSKUjI=
Received: from ( []) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id 37D7DB805C for <>; Tue, 18 Jun 2013 13:32:32 -0700 (PDT)
Received: by with SMTP id y10so3852536wgg.8 for <>; Tue, 18 Jun 2013 13:32:31 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=vnvysx8IB94KWy6ldFsqk2T4cCxWxJk7W3AY/HmEhT8=; b=Q1iMsNedeuAUGMKxK0UOfM/eqATIFSrkrhifsjxUeVlOOEfOSsgC5i48Q5OPmrjfBl RGfANfeZAtUWyw8UcmoXHLGObXfDz2dOF3j/BJgQFkb+2k6N/IZJ4TX58QqE1EkGdKhN nf52/WVcxB8AVTZPEZqZbN4tMAKD89ua2veHf6/ZLQOlAJ9fAWsxXgdwAmots5GCCqMg /3cBtuvkw4ynJRyqPoM2ROfc9h+m2/sSZc5uIezjADM462f5xxclmyk98JHGEXO8zrMY iVn2eIZY4CtxzfTiRLoK4CVd67BJqcEPSyyutugAuTMt0TkNovqbPWGNEK2CBW2nwqzp e1Og==
MIME-Version: 1.0
X-Received: by with SMTP id hu3mr8696021wib.13.1371587551679; Tue, 18 Jun 2013 13:32:31 -0700 (PDT)
Received: by with HTTP; Tue, 18 Jun 2013 13:32:31 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <> <>
Date: Tue, 18 Jun 2013 15:32:31 -0500
Message-ID: <>
From: Nico Williams <>
To: John Cowan <>
Content-Type: text/plain; charset=UTF-8
Cc: Carsten Bormann <>, Paul Hoffman <>,
Subject: Re: [Json] Encoding Schemes
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 18 Jun 2013 20:32:41 -0000

On Tue, Jun 18, 2013 at 3:05 PM, John Cowan <> wrote:
> Carsten Bormann scripsit:
>> Clearly, the text in section 3 does not work with the
>> BOM-based CESs.  But you might also read the text in 6 as asking for
>> UTF-16 CES and the text in 3 then excluding BOMs so the UTF-16 CES is
>> implicitly big-endian.  So much effort for something so theoretical.)
> Unfortunately, it isn't even clear that it means that.  It could simply
> mean "UTF-16, either big-endian or little-endian".

I read it as "UTF-16, big-endian or little-endian, without BOMs".  I
read the RFC as specifying that JSON encodings are (not normative)
UTF-8, UTF-16 (either endianness, no BOM), or UTF-32 (either
endianness).  It may well be that JSON is only ever sent as UTF-8...

...But then, following the thread on what a JSON string is, I now
understand that JSON strings generally contain Unicode character data
but really they may contain any codepoints from the BMP, and, really,
any 16-bit values, with some values having to be escaped.  Of course,
given the whole discussion re: surrogates, JSON uses neither UTF-8 nor
CESU-8, but an amalgam of sorts, so, to wrap it up, JSON is intended
to be encoded in UTF-8 (and maybe -16 and -32), with all the caveats
about surrogates getting smuggled in because bad Unicode character
data in JSON strings bleeds into the encoding.