Re: [Json] Encoding Schemes
Carsten Bormann <cabo@tzi.org> Tue, 18 June 2013 19:57 UTC
Return-Path: <cabo@tzi.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix)
with ESMTP id 632D621E805E for <json@ietfa.amsl.com>;
Tue, 18 Jun 2013 12:57:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.198
X-Spam-Level:
X-Spam-Status: No, score=-106.198 tagged_above=-999 required=5 tests=[AWL=0.051,
BAYES_00=-2.599, HELO_EQ_DE=0.35, RCVD_IN_DNSWL_MED=-4,
USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com
[127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pJBFkCfCxxyX for
<json@ietfa.amsl.com>; Tue, 18 Jun 2013 12:57:39 -0700 (PDT)
Received: from informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de
[IPv6:2001:638:708:30c9::12]) by ietfa.amsl.com (Postfix) with ESMTP id
4793621F843F for <json@ietf.org>; Tue, 18 Jun 2013 12:57:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from smtp-fb3.informatik.uni-bremen.de
(smtp-fb3.informatik.uni-bremen.de [134.102.224.120]) by
informatik.uni-bremen.de (8.14.4/8.14.4) with ESMTP id r5IJvT3e016502;
Tue, 18 Jun 2013 21:57:29 +0200 (CEST)
Received: from [192.168.217.105] (p54893361.dip0.t-ipconnect.de
[84.137.51.97]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No
client certificate requested) by smtp-fb3.informatik.uni-bremen.de (Postfix)
with ESMTPSA id 4ED133521; Tue, 18 Jun 2013 21:57:29 +0200 (CEST)
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Content-Type: text/plain; charset=iso-8859-1
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <4626FCFD-90CE-4CE7-A123-ED3E12E7FF0A@vpnc.org>
Date: Tue, 18 Jun 2013 21:57:28 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <4EC0C40B-CFEE-438C-A30F-1F43C017E24E@tzi.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC57CF2@xmb-rcd-x10.cisco.com>
<20130618183926.GG12085@mercury.ccil.org>
<E9527431-1354-4755-8280-634B4A47BA25@tzi.org>
<4626FCFD-90CE-4CE7-A123-ED3E12E7FF0A@vpnc.org>
To: Paul Hoffman <paul.hoffman@vpnc.org>
X-Mailer: Apple Mail (2.1508)
Cc: json@ietf.org
Subject: Re: [Json] Encoding Schemes
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>,
<mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>,
<mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Jun 2013 19:57:45 -0000
On Jun 18, 2013, at 21:34, Paul Hoffman <paul.hoffman@vpnc.org> wrote: >> JSON the media type (application/json) is specifically limited to UTF-8 (and theoretically the two or possibly four other character encoding schemes listed in RFC 4627; the RFC isn't quite consistent here). > > Can you point to the text in the draft that supports that statement? I see the opposite: > Encoding considerations: 8bit if UTF-8; binary if UTF-16 or UTF-32 > > JSON may be represented using UTF-8, UTF-16, or UTF-32. When JSON > is written in UTF-8, JSON is 8bit compatible. When JSON is > written in UTF-16 or UTF-32, the binary content-transfer-encoding > must be used. You listed two of the three places, the third is in section 3, which doesn't really list UTF-16 but UTF-16 "BE or LE" (same for UTF-32). Note that these are three different character encoding schemes (CESs), so it is not clear which ones are actually meant. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8 [Obligatory Unicode bashing: Giving one of the three encoding schemes (CESs) for the encoding form (CEF) UTF-16 the same name, i.e., "UTF-16", must have been decided by a very cruel person.] (Strictly speaking, the other mentions of UTF-16/-32 might be the encoding form, not the scheme, but the RFC simply isn't completely specified here. I think the only reasonable way to read this has the same result as what Joe is proposing, but some text interpretation is required. Clearly, the text in section 3 does not work with the BOM-based CESs. But you might also read the text in 6 as asking for UTF-16 CES and the text in 3 then excluding BOMs so the UTF-16 CES is implicitly big-endian. So much effort for something so theoretical.) Grüße, Carsten
- [Json] Encoding Schemes Joe Hildebrand (jhildebr)
- Re: [Json] Encoding Schemes John Cowan
- Re: [Json] Encoding Schemes Joe Hildebrand (jhildebr)
- Re: [Json] Encoding Schemes Tatu Saloranta
- Re: [Json] Encoding Schemes Norbert Lindenberg
- Re: [Json] Encoding Schemes Carsten Bormann
- Re: [Json] Encoding Schemes Paul Hoffman
- Re: [Json] Encoding Schemes John Cowan
- Re: [Json] Encoding Schemes Carsten Bormann
- Re: [Json] Encoding Schemes John Cowan
- Re: [Json] Encoding Schemes Nico Williams