Re: [Json] Unpaired surrogates in JSON strings

Paul Hoffman <paul.hoffman@vpnc.org> Thu, 06 June 2013 15:27 UTC

Return-Path: <paul.hoffman@vpnc.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1BEBD21F99B1 for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 08:27:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.585
X-Spam-Level:
X-Spam-Status: No, score=-101.585 tagged_above=-999 required=5 tests=[AWL=1.014, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JSNyDwQOUUoN for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 08:27:41 -0700 (PDT)
Received: from hoffman.proper.com (IPv6.Hoffman.Proper.COM [IPv6:2605:8e00:100:41::81]) by ietfa.amsl.com (Postfix) with ESMTP id 76A6F21F99AC for <json@ietf.org>; Thu, 6 Jun 2013 08:27:41 -0700 (PDT)
Received: from [10.20.30.90] (50-0-66-165.dsl.dynamic.sonic.net [50.0.66.165]) (authenticated bits=0) by hoffman.proper.com (8.14.5/8.14.5) with ESMTP id r56FRdBc029726 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 6 Jun 2013 08:27:40 -0700 (MST) (envelope-from paul.hoffman@vpnc.org)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
From: Paul Hoffman <paul.hoffman@vpnc.org>
In-Reply-To: <A723FC6ECC552A4D8C8249D9E07425A70FC2E753@xmb-rcd-x10.cisco.com>
Date: Thu, 06 Jun 2013 08:27:39 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <559B518A-7738-4379-8C86-CE28CC14AB09@vpnc.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E753@xmb-rcd-x10.cisco.com>
To: Joe Hildebrand <jhildebr@cisco.com>
X-Mailer: Apple Mail (2.1508)
Cc: json@ietf.org
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 15:27:42 -0000

On Jun 5, 2013, at 11:59 PM, Joe Hildebrand (jhildebr) <jhildebr@cisco.com> wrote:

> On 6/5/13 10:29 PM, "John Cowan" <cowan@mercury.ccil.org> wrote:
> 
>> Carsten Bormann scripsit:
>> 
>>> Code points can refer to those of the characters or those of the code
>>> units (byte for UTF-8, etc.).
>> 
>> Code points are (mathematical) integers corresponding to Unicode
>> characters, though not all of them are assigned to characters.
> 
> The intro to the Unicode standard makes this pretty clear:
> 
> http://www.unicode.org/versions/Unicode6.2.0/ch01.pdf
> 
> 
> This is why I wanted to decouple from a particular version of Unicode.  If
> the reference remained at version 4, for example, the word "character"
> means that any code point not in that version of Unicode is not
> technically legal JSON (although we know it will interop just fine in
> practice, which is why it's pretty safe to do the update).

This is an *extremely* good point. The definition of "character" changes from version to version of Unicode, and it is now clear we need to deal with that in this document.

--Paul Hoffman