Re: [Json] Unpaired surrogates in JSON strings

"Joe Hildebrand (jhildebr)" <jhildebr@cisco.com> Thu, 06 June 2013 06:59 UTC

Return-Path: <jhildebr@cisco.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DE4C521F96A9 for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 23:59:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Mg0uZYOKSEQI for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 23:59:14 -0700 (PDT)
Received: from rcdn-iport-1.cisco.com (rcdn-iport-1.cisco.com [173.37.86.72]) by ietfa.amsl.com (Postfix) with ESMTP id B22B321F9636 for <json@ietf.org>; Wed, 5 Jun 2013 23:59:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=834; q=dns/txt; s=iport; t=1370501953; x=1371711553; h=from:to:cc:subject:date:message-id:in-reply-to: content-id:content-transfer-encoding:mime-version; bh=EkX+iTpMWUC2Ket8GOqTvhIfHMBb1dTAEmySR6N8qQ8=; b=CgPUf8MHKGJJxj013w9CTdgpxN8dMh8wAKXyzTgrHNrIrQ1AB2AMo8aq FkLfYz0f8tYoRStwVkqT4Wnj4YZJcynR0E/2eAgAoOcoTohUz+jtGx+6D KoIYvj5ru6AWwvzEts7PYDEio0YazEJtQ4an/QFf4Wa6gWo9d7dguG8pY o=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AokFAN0xsFGtJV2c/2dsb2JhbABZgwkwgnW8THkWdIIlAQICOj8SAQgiFEIlAgQBDQUIAYgEDLwajnoxB4J6YQOof4MPgic
X-IronPort-AV: E=Sophos;i="4.87,813,1363132800"; d="scan'208";a="219222107"
Received: from rcdn-core-5.cisco.com ([173.37.93.156]) by rcdn-iport-1.cisco.com with ESMTP; 06 Jun 2013 06:59:13 +0000
Received: from xhc-aln-x09.cisco.com (xhc-aln-x09.cisco.com [173.36.12.83]) by rcdn-core-5.cisco.com (8.14.5/8.14.5) with ESMTP id r566xDbj032144 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 6 Jun 2013 06:59:13 GMT
Received: from xmb-rcd-x10.cisco.com ([169.254.15.56]) by xhc-aln-x09.cisco.com ([173.36.12.83]) with mapi id 14.02.0318.004; Thu, 6 Jun 2013 01:59:12 -0500
From: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>
To: John Cowan <cowan@mercury.ccil.org>, Carsten Bormann <cabo@tzi.org>
Thread-Topic: [Json] Unpaired surrogates in JSON strings
Thread-Index: AQHOYgjwaH+sWe/75UqTmsr195d0xpknuQ0AgAAQ1ACAAALZgP//6RgAgAB73ICAADeTgIAAA1OA///FRgA=
Date: Thu, 06 Jun 2013 06:59:12 +0000
Message-ID: <A723FC6ECC552A4D8C8249D9E07425A70FC2E753@xmb-rcd-x10.cisco.com>
In-Reply-To: <20130606042921.GC1362@mercury.ccil.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.3.4.130416
x-originating-ip: [10.21.86.44]
Content-Type: text/plain; charset="us-ascii"
Content-ID: <3726485FCBD630429C618AA175F9EDE8@emea.cisco.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 06:59:19 -0000

On 6/5/13 10:29 PM, "John Cowan" <cowan@mercury.ccil.org> wrote:

>Carsten Bormann scripsit:
>
>> Code points can refer to those of the characters or those of the code
>> units (byte for UTF-8, etc.).
>
>Code points are (mathematical) integers corresponding to Unicode
>characters, though not all of them are assigned to characters.

The intro to the Unicode standard makes this pretty clear:

http://www.unicode.org/versions/Unicode6.2.0/ch01.pdf


This is why I wanted to decouple from a particular version of Unicode.  If
the reference remained at version 4, for example, the word "character"
means that any code point not in that version of Unicode is not
technically legal JSON (although we know it will interop just fine in
practice, which is why it's pretty safe to do the update).

-- 
Joe Hildebrand