Re: [Json] Naked surrogates already banned?

Carsten Bormann <> Fri, 18 October 2013 06:39 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0DC4521F9F3A for <>; Thu, 17 Oct 2013 23:39:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -107.173
X-Spam-Status: No, score=-107.173 tagged_above=-999 required=5 tests=[AWL=1.076, BAYES_00=-2.599, GB_I_LETTER=-2, HELO_EQ_DE=0.35, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id rqtxY6PzCrfF for <>; Thu, 17 Oct 2013 23:39:48 -0700 (PDT)
Received: from ( [IPv6:2001:638:708:30c9::12]) by (Postfix) with ESMTP id E8E7621F9E3B for <>; Thu, 17 Oct 2013 23:39:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
Received: from ( []) by (8.14.4/8.14.4) with ESMTP id r9I6dhZM012120; Fri, 18 Oct 2013 08:39:43 +0200 (CEST)
Received: from [] ( []) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id A36EBE2F; Fri, 18 Oct 2013 08:39:43 +0200 (CEST)
Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\))
Content-Type: text/plain; charset=iso-8859-1
From: Carsten Bormann <>
In-Reply-To: <>
Date: Fri, 18 Oct 2013 08:39:42 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <>
To: Tim Bray <>
X-Mailer: Apple Mail (2.1510)
Cc: "" <>
Subject: Re: [Json] Naked surrogates already banned?
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 18 Oct 2013 06:39:54 -0000

On Oct 18, 2013, at 03:36, Tim Bray <> wrote:

> It says Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u,... etc

It already says in section 1:    
A string is a sequence of zero or more Unicode characters [UNICODE].

I think we had that discussion already.
Count me on the side of the people who don't think UTF-16 artifacts are, or have ever been, a part of JSON.
ECMA-404 is on the side of "any Unicode code point", but that is just one of the extensions 404 makes over JSON.

Now there is a problem that the definition in 4627 ties JSON to a specific version of Unicode.
(The reference is nicely confusing in which version is meant, but that is an artifact of the way Unicode versions are documented.)
I think a robust interpretation of the intent here will add all code points that are available to be characters in future versions of Unicode.
That is a change the WG SHOULD make, the predictable noise from the surrogate faction notwithstanding.

Grüße, Carsten