Re: [Json] Unpaired surrogates in JSON strings

Douglas Crockford <douglas@crockford.com> Thu, 06 June 2013 11:15 UTC

Return-Path: <douglas@crockford.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4243A21F938E for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 04:15:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.949
X-Spam-Level:
X-Spam-Status: No, score=-1.949 tagged_above=-999 required=5 tests=[AWL=0.650, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pZ5fpTEG7BvZ for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 04:15:21 -0700 (PDT)
Received: from mout.perfora.net (mout.perfora.net [74.208.4.194]) by ietfa.amsl.com (Postfix) with ESMTP id 5BFD921F96E9 for <json@ietf.org>; Thu, 6 Jun 2013 04:15:21 -0700 (PDT)
Received: from [192.168.0.108] (173-228-7-202.dsl.static.sonic.net [173.228.7.202]) by mrelay.perfora.net (node=mrus3) with ESMTP (Nemesis) id 0LgpxO-1Tzux235JW-00nxcD; Thu, 06 Jun 2013 07:15:19 -0400
Message-ID: <51B06F38.8050707@crockford.com>
Date: Thu, 06 Jun 2013 04:15:04 -0700
From: Douglas Crockford <douglas@crockford.com>
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com>
In-Reply-To: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Provags-ID: V02:K0:set3iYguOBprk2en17ABSwfYT4JuKAewMrS6Dma/yT+ DyP/eIlHBeYyiuQmZE6QFEmH4P0YZg2MlBEic07VJdXBwuic/X z5VxVO0c6rL/9hS3cau0dRrPEAdsscMnA/hst0TqufEwDEhXVZ 1r4Scxuy2epu1HQmTpqJz+JiV+764vsFDwzCTW47I4f98BHRYI is4wTmq1qqbXyQStTATWgj9wZ2uu0oYTR8+BpaI3kIm9VUOriW gEEfKL4K3cgZsjvwp8Uo8MSuJO0JejXIYuKMeiHhBrBgweuj6o t8CJA81Vv4czGfMoOgdVEpWdTEqU5i7CjA1AzsaGHRN/0v55b9 aEJ0mwwI4dFb3YEXme1w=
Cc: Tim Bray <tbray@textuality.com>, Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 11:15:27 -0000

On 6/6/2013 12:16 AM, Joe Hildebrand (jhildebr) wrote:
> On 6/5/13 7:08 PM, "Douglas Crockford" <douglas@crockford.com> wrote:
>
>> The application that does that is JavaScript. Any 16-bit value can be
>> put next to any other 16-bit value and then JSON encoded.
> Agree.  This is a relatively-outdated worldview that we're stuck with, and
> is why I didn't say MUST in my suggested language.
>
>> The meaning of
>> 'character' throughout the RFC is ECMAScript's, which is roughly the
>> same as Unicode's code point.
> I'm not convinced yet.
>
> "\uD834\uDD1E".charCodeAt(0).toString(16);
>
> Yields:
>
> 'd834'
>
> That's not a code point.  That's half a surrogate pair for a code point
> encoded in UTF16.  It's only the same in the BMP.
>
What  then is the standard name for a 16-bit element of text? When 
JavaScript was created, that word was character. What is the word now?