Re: [Json] Proposed change: update the Unicode version

Carsten Bormann <> Wed, 05 June 2013 18:43 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id C41CB21F9B91 for <>; Wed, 5 Jun 2013 11:43:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -106.249
X-Spam-Status: No, score=-106.249 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_EQ_DE=0.35, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id EJAIaBV-kCcd for <>; Wed, 5 Jun 2013 11:43:54 -0700 (PDT)
Received: from ( [IPv6:2001:638:708:30c9::12]) by (Postfix) with ESMTP id 87D2421F9B8E for <>; Wed, 5 Jun 2013 11:43:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
Received: from ( []) by (8.14.4/8.14.4) with ESMTP id r55IhRNC028615; Wed, 5 Jun 2013 20:43:27 +0200 (CEST)
Received: from [] ( []) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id B0AEA384C; Wed, 5 Jun 2013 20:43:26 +0200 (CEST)
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Content-Type: text/plain; charset=windows-1252
From: Carsten Bormann <>
In-Reply-To: <>
Date: Wed, 5 Jun 2013 20:43:26 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <> <> <> <>
To: Tim Bray <>
X-Mailer: Apple Mail (2.1503)
Cc: "Matt Miller \(mamille2\)" <>, "" <>
Subject: Re: [Json] Proposed change: update the Unicode version
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 05 Jun 2013 18:43:59 -0000

On Jun 5, 2013, at 18:56, Tim Bray <> wrote:

>> It would help me if you could briefly explain what the reference to 2.7 specifically adds here.
>> (To me that is a bit confusing, as 2.7 is about internal programming language representation, not about representation for interchange.  But maybe I just don't understand.)
> Hm? The first paragraph says “A Unicode string data type is simply an ordered sequence of code units. Thus a Unicode 8-bit string is an ordered sequence of 8-bit code units, a Unicode 16-bit string is an ordered sequence of 16-bit code units, and a Unicode 32-bit string is an  ordered sequence of 32-bit code units.”

If you were trying to answer my question I must admit that I'm not any further along in understanding why the reference is to 2.7.

4627 Section 3 is about encoding JSON on the wire (correct me if that impression is wrong).  (Actually, it doesn't say that much, the meat is then section 6, which with the title change raises the question whether application/json and the JSON format are the same thing or not.  But I digress.)  Unicode section 2.7 is most emphatically NOT about Unicode on the wire, it is about data types in programming languages.  Much of it is about the problems of processing incomplete UTF-16 strings (unpaired surrogates) in programming languages and their libraries.

The section in the Unicode standard about encoding is 2.6.  This defines seven encoding schemes, some of which mainly differ in their treatment of BOMs.  BOMs are not allowed by the JSON grammar, but it is not entirely obvious that that applies to the sequence of characters obtained after decoding the Unicode encoding scheme.

Now Douglas says:

> I think the section on encoding is not saying anything useful and should be completely removed.

Works for me (and makes this discussion moot).

Grüße, Carsten