Re: [Json] Unpaired surrogates in JSON strings

Carsten Bormann <cabo@tzi.org> Thu, 06 June 2013 04:17 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6F8A111E80D3 for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 21:17:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.249
X-Spam-Level:
X-Spam-Status: No, score=-106.249 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_EQ_DE=0.35, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x-eOL+rzwE-u for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 21:17:36 -0700 (PDT)
Received: from informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) by ietfa.amsl.com (Postfix) with ESMTP id 56CF111E80E4 for <json@ietf.org>; Wed, 5 Jun 2013 21:17:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from smtp-fb3.informatik.uni-bremen.de (smtp-fb3.informatik.uni-bremen.de [134.102.224.120]) by informatik.uni-bremen.de (8.14.4/8.14.4) with ESMTP id r564HSQs009111; Thu, 6 Jun 2013 06:17:28 +0200 (CEST)
Received: from [10.139.14.114] (unknown [88.128.80.12]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp-fb3.informatik.uni-bremen.de (Postfix) with ESMTPSA id 04870390E; Thu, 6 Jun 2013 06:17:27 +0200 (CEST)
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Content-Type: text/plain; charset="iso-8859-1"
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <83728898-9A2D-4758-9C06-1157E2954CCB@vpnc.org>
Date: Thu, 06 Jun 2013 06:17:27 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <1A39ACCF-4B4D-4F69-9D97-ECEF1BB41D47@tzi.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2C12D@xmb-rcd-x10.cisco.com> <83728898-9A2D-4758-9C06-1157E2954CCB@vpnc.org>
To: Paul Hoffman <paul.hoffman@vpnc.org>
X-Mailer: Apple Mail (2.1503)
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 04:17:42 -0000

On Jun 6, 2013, at 02:58, Paul Hoffman <paul.hoffman@vpnc.org> wrote:

> Code points

That is a confusing term here, and I have a hard time understanding what people are trying to say.

Code points can refer to those of the characters or those of the code units (byte for UTF-8, etc.).
Please be specific which of the ones you mean.

RFC 4627 normatively references RFC 4234, which makes very clear that it is about characters.
This is further reinforced by the grammar in RFC 4627, which addresses code points up to %x10FFFF -- clearly it is about the Unicode characters (an assessment that also agrees with the rest of the text).

1) Surrogates are not characters, so they can't appear in the source character set.

2) Within strings, escape sequences can be used to construct surrogates from ASCII characters.  This contradicts the definition of strings in section 1.  However, the usage of this escape sequences to represent arbitrary binary data appears to be common enough (JavaScript only recently grew support for binary data, and this support will not be retrofit into JSON) that a clarification in the specification is required.

Please separate these two items clearly.  Thanks.

Grüße, Carsten