Re: [Json] Encoding Schemes

Nico Williams <nico@cryptonector.com> Tue, 18 June 2013 20:32 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9453121E8094 for <json@ietfa.amsl.com>; Tue, 18 Jun 2013 13:32:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.965
X-Spam-Level:
X-Spam-Status: No, score=-1.965 tagged_above=-999 required=5 tests=[AWL=0.012, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kO7ez+o5eJeQ for <json@ietfa.amsl.com>; Tue, 18 Jun 2013 13:32:36 -0700 (PDT)
Received: from homiemail-a26.g.dreamhost.com (mailbigip.dreamhost.com [208.97.132.5]) by ietfa.amsl.com (Postfix) with ESMTP id 2900011E80F5 for <json@ietf.org>; Tue, 18 Jun 2013 13:32:36 -0700 (PDT)
Received: from homiemail-a26.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a26.g.dreamhost.com (Postfix) with ESMTP id A5599B806D for <json@ietf.org>; Tue, 18 Jun 2013 13:32:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type; s=cryptonector.com; bh=VHn7f/ETvV5U8yE4ay9/ xGNLsSk=; b=qtsxghBs8HP1st1WdWnrCyr2C1855yIGkSljMuIwhr+a4NttksH/ poqt3A4pS3YsfdYD0mMzNVmpwS9sLA9JFfL+SLlcZCshMM7Oz0GwGhB3t19q7IwP gUzkfp+g1+AqPqlJ0D9hEzaA026mENQ5Bl2M3VgCnR2VqoLCpCSKUjI=
Received: from mail-wg0-f53.google.com (mail-wg0-f53.google.com [74.125.82.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a26.g.dreamhost.com (Postfix) with ESMTPSA id 37D7DB805C for <json@ietf.org>; Tue, 18 Jun 2013 13:32:32 -0700 (PDT)
Received: by mail-wg0-f53.google.com with SMTP id y10so3852536wgg.8 for <json@ietf.org>; Tue, 18 Jun 2013 13:32:31 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=vnvysx8IB94KWy6ldFsqk2T4cCxWxJk7W3AY/HmEhT8=; b=Q1iMsNedeuAUGMKxK0UOfM/eqATIFSrkrhifsjxUeVlOOEfOSsgC5i48Q5OPmrjfBl RGfANfeZAtUWyw8UcmoXHLGObXfDz2dOF3j/BJgQFkb+2k6N/IZJ4TX58QqE1EkGdKhN nf52/WVcxB8AVTZPEZqZbN4tMAKD89ua2veHf6/ZLQOlAJ9fAWsxXgdwAmots5GCCqMg /3cBtuvkw4ynJRyqPoM2ROfc9h+m2/sSZc5uIezjADM462f5xxclmyk98JHGEXO8zrMY iVn2eIZY4CtxzfTiRLoK4CVd67BJqcEPSyyutugAuTMt0TkNovqbPWGNEK2CBW2nwqzp e1Og==
MIME-Version: 1.0
X-Received: by 10.180.109.195 with SMTP id hu3mr8696021wib.13.1371587551679; Tue, 18 Jun 2013 13:32:31 -0700 (PDT)
Received: by 10.216.29.5 with HTTP; Tue, 18 Jun 2013 13:32:31 -0700 (PDT)
In-Reply-To: <20130618200541.GK12085@mercury.ccil.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC57CF2@xmb-rcd-x10.cisco.com> <20130618183926.GG12085@mercury.ccil.org> <E9527431-1354-4755-8280-634B4A47BA25@tzi.org> <4626FCFD-90CE-4CE7-A123-ED3E12E7FF0A@vpnc.org> <4EC0C40B-CFEE-438C-A30F-1F43C017E24E@tzi.org> <20130618200541.GK12085@mercury.ccil.org>
Date: Tue, 18 Jun 2013 15:32:31 -0500
Message-ID: <CAK3OfOhCbk_qmart6X-+jnb5h92Pns1_J+dJGkHHyBkdwfWSmg@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: John Cowan <cowan@mercury.ccil.org>
Content-Type: text/plain; charset="UTF-8"
Cc: Carsten Bormann <cabo@tzi.org>, Paul Hoffman <paul.hoffman@vpnc.org>, json@ietf.org
Subject: Re: [Json] Encoding Schemes
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Jun 2013 20:32:41 -0000

On Tue, Jun 18, 2013 at 3:05 PM, John Cowan <cowan@mercury.ccil.org> wrote:
> Carsten Bormann scripsit:
>
>> Clearly, the text in section 3 does not work with the
>> BOM-based CESs.  But you might also read the text in 6 as asking for
>> UTF-16 CES and the text in 3 then excluding BOMs so the UTF-16 CES is
>> implicitly big-endian.  So much effort for something so theoretical.)
>
> Unfortunately, it isn't even clear that it means that.  It could simply
> mean "UTF-16, either big-endian or little-endian".

I read it as "UTF-16, big-endian or little-endian, without BOMs".  I
read the RFC as specifying that JSON encodings are (not normative)
UTF-8, UTF-16 (either endianness, no BOM), or UTF-32 (either
endianness).  It may well be that JSON is only ever sent as UTF-8...

...But then, following the thread on what a JSON string is, I now
understand that JSON strings generally contain Unicode character data
but really they may contain any codepoints from the BMP, and, really,
any 16-bit values, with some values having to be escaped.  Of course,
given the whole discussion re: surrogates, JSON uses neither UTF-8 nor
CESU-8, but an amalgam of sorts, so, to wrap it up, JSON is intended
to be encoded in UTF-8 (and maybe -16 and -32), with all the caveats
about surrogates getting smuggled in because bad Unicode character
data in JSON strings bleeds into the encoding.

Nico
--