Re: [Json] Unpaired surrogates in JSON strings
"Joe Hildebrand (jhildebr)" <jhildebr@cisco.com> Thu, 06 June 2013 17:49 UTC
Return-Path: <jhildebr@cisco.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E558521F9AE1 for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 10:49:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZBbiJABnEy4F for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 10:49:15 -0700 (PDT)
Received: from rcdn-iport-6.cisco.com (rcdn-iport-6.cisco.com [173.37.86.77]) by ietfa.amsl.com (Postfix) with ESMTP id 50BD221F9AD7 for <json@ietf.org>; Thu, 6 Jun 2013 10:49:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=1229; q=dns/txt; s=iport; t=1370540955; x=1371750555; h=from:to:cc:subject:date:message-id:in-reply-to: content-id:content-transfer-encoding:mime-version; bh=kv8yXmPyjUTITCzFVLMLYMywh03hbXW0poS3J8rtuy0=; b=Sr7jll8p9dh7oxopOTIbcSvm2HHqVm5DHGxoLDszC3UfMXgefM/krpkt Tljdk/JAL7DY0kGEDsQv7rFi+K5pPomdbvZl9Z11PgnwjkLSVeYEsZz2W BxN7tsmhwPHZHrTkMmLKHS6Euy7ZDTtC2WWuFCvpsDXNC4JAmKUXS/a1g I=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AggFADXLsFGtJV2c/2dsb2JhbABZgwm/d3kWdIIjAQEBAwE6PxIBCCIUQiUCBA4FCId/Brt9jwExB4J6YQOTbZUSgw+CJw
X-IronPort-AV: E=Sophos;i="4.87,816,1363132800"; d="scan'208";a="219702172"
Received: from rcdn-core-5.cisco.com ([173.37.93.156]) by rcdn-iport-6.cisco.com with ESMTP; 06 Jun 2013 17:49:05 +0000
Received: from xhc-aln-x04.cisco.com (xhc-aln-x04.cisco.com [173.36.12.78]) by rcdn-core-5.cisco.com (8.14.5/8.14.5) with ESMTP id r56Hn5N4011269 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 6 Jun 2013 17:49:05 GMT
Received: from xmb-rcd-x10.cisco.com ([169.254.15.56]) by xhc-aln-x04.cisco.com ([173.36.12.78]) with mapi id 14.02.0318.004; Thu, 6 Jun 2013 12:49:05 -0500
From: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>
To: Douglas Crockford <douglas@crockford.com>
Thread-Topic: [Json] Unpaired surrogates in JSON strings
Thread-Index: AQHOYkg5aH+sWe/75UqTmsr195d0xpkoJhcAgAAN2ICAAAJhAIAApyAAgAAJfgA=
Date: Thu, 06 Jun 2013 17:49:04 +0000
Message-ID: <A723FC6ECC552A4D8C8249D9E07425A70FC30833@xmb-rcd-x10.cisco.com>
In-Reply-To: <51B06F38.8050707@crockford.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.3.4.130416
x-originating-ip: [10.21.88.234]
Content-Type: text/plain; charset="us-ascii"
Content-ID: <2312B8ABD1760946ABEDFF8AB4434E70@emea.cisco.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: Tim Bray <tbray@textuality.com>, Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 17:49:21 -0000
On 6/6/13 5:15 AM, "Douglas Crockford" <douglas@crockford.com> wrote: >> That's not a code point. That's half a surrogate pair for a code point >> encoded in UTF16. It's only the same in the BMP. >> >What then is the standard name for a 16-bit element of text? When >JavaScript was created, that word was character. What is the word now? Let's say rather that we understand Unicode better than we did then. When we only cared about the Basic Multilingual Plane, it was easy to just say "wchar_t", and hope the compiler writer knew more than we did. Technically, wchar_t was supposed to be big enough to hold any supported code point, but in practice, it was usually 16 bits. This forced people to start using UTF16 in wchar_t's when they needed to access codepoints outside the BMP. This was a hack, but the Java, JavaScript, and .Net folks went with it in order to make their implementations a little easier. Some newer languages are using UTF8 as their internal String representation, which works ok. Note that counting code points correctly in either system requires a full scan of the string. UTF32 doesn't suffer from this, but always requires 4 bytes per code point. -- Joe Hildebrand
- [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings R S
- Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Martin J. Dürst
- Re: [Json] Unpaired surrogates in JSON strings Bjoern Hoehrmann
- [Json] On characters and code points Paul Hoffman
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points Stephen Dolan
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Nico Williams
- Re: [Json] Unpaired surrogates in JSON strings Nico Williams
- Re: [Json] Unpaired surrogates in JSON strings Tatu Saloranta
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] On characters and code points Bjoern Hoehrmann
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] On characters and code points Nico Williams
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points Bjoern Hoehrmann
- Re: [Json] On characters and code points Carsten Bormann
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Paul Hoffman
- Re: [Json] On characters and code points Carsten Bormann
- Re: [Json] On characters and code points Nico Williams