Re: [Json] Unpaired surrogates in JSON strings

"Joe Hildebrand (jhildebr)" <jhildebr@cisco.com> Thu, 06 June 2013 07:17 UTC

Return-Path: <jhildebr@cisco.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5320121F8233 for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 00:17:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2IvnmnNjo+4Q for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 00:16:58 -0700 (PDT)
Received: from rcdn-iport-7.cisco.com (rcdn-iport-7.cisco.com [173.37.86.78]) by ietfa.amsl.com (Postfix) with ESMTP id ACC3A21F8168 for <json@ietf.org>; Thu, 6 Jun 2013 00:16:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=704; q=dns/txt; s=iport; t=1370503018; x=1371712618; h=from:to:cc:subject:date:message-id:in-reply-to: content-id:content-transfer-encoding:mime-version; bh=CFXhnYq1+9TYMnBknn+IGiH3n2c9udWQ8YhZmieKB8Q=; b=j389BDydraGYv+x7bVD/NXJczWlgDoZbok3/5HRxoT8LgZnqsQMyqEbk 8g3nYXe3l/bFohgStwk5JFpkRXgG1N8YB+RGkq9wMgOAjiwTQRzrArt6n RoYBq9Ns/j7o90PwOelKPMq9EyZDVkQ4PsDjSVtb6u196yRrEVVSoP0g1 c=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AogFAOA1sFGtJV2c/2dsb2JhbABZgwmDJbxOdhZ0giUBBDo/EgEIIhRCJQIEAQ0FCIgFvBmOejEHgnphA6h/gw+CJw
X-IronPort-AV: E=Sophos;i="4.87,813,1363132800"; d="scan'208";a="219416315"
Received: from rcdn-core-5.cisco.com ([173.37.93.156]) by rcdn-iport-7.cisco.com with ESMTP; 06 Jun 2013 07:16:58 +0000
Received: from xhc-aln-x14.cisco.com (xhc-aln-x14.cisco.com [173.36.12.88]) by rcdn-core-5.cisco.com (8.14.5/8.14.5) with ESMTP id r567GvjD022955 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 6 Jun 2013 07:16:57 GMT
Received: from xmb-rcd-x10.cisco.com ([169.254.15.56]) by xhc-aln-x14.cisco.com ([173.36.12.88]) with mapi id 14.02.0318.004; Thu, 6 Jun 2013 02:16:57 -0500
From: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>
To: Douglas Crockford <douglas@crockford.com>, Tim Bray <tbray@textuality.com>
Thread-Topic: [Json] Unpaired surrogates in JSON strings
Thread-Index: AQHOYkg5aH+sWe/75UqTmsr195d0xpkoJhcAgAAN2ICAAAJhAA==
Date: Thu, 06 Jun 2013 07:16:56 +0000
Message-ID: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com>
In-Reply-To: <51AFE107.7020301@crockford.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.3.4.130416
x-originating-ip: [10.21.86.44]
Content-Type: text/plain; charset="us-ascii"
Content-ID: <2D479DB771E5D746800E209FA6AF63D9@emea.cisco.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 07:17:04 -0000

On 6/5/13 7:08 PM, "Douglas Crockford" <douglas@crockford.com> wrote:

>The application that does that is JavaScript. Any 16-bit value can be
>put next to any other 16-bit value and then JSON encoded.

Agree.  This is a relatively-outdated worldview that we're stuck with, and
is why I didn't say MUST in my suggested language.

>The meaning of 
>'character' throughout the RFC is ECMAScript's, which is roughly the
>same as Unicode's code point.

I'm not convinced yet.

"\uD834\uDD1E".charCodeAt(0).toString(16);

Yields:

'd834'

That's not a code point.  That's half a surrogate pair for a code point
encoded in UTF16.  It's only the same in the BMP.

-- 
Joe Hildebrand