Re: [Json] Unpaired surrogates in JSON strings

John Cowan <cowan@mercury.ccil.org> Fri, 07 June 2013 17:09 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9E22A21F9691 for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 10:09:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.118
X-Spam-Level:
X-Spam-Status: No, score=-3.118 tagged_above=-999 required=5 tests=[AWL=0.181, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UfkZ0iA2ya-n for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 10:08:59 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by ietfa.amsl.com (Postfix) with ESMTP id 0272421F947C for <json@ietf.org>; Fri, 7 Jun 2013 10:08:59 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1Ul09p-0006rV-Em; Fri, 07 Jun 2013 13:08:57 -0400
Date: Fri, 07 Jun 2013 13:08:57 -0400
From: John Cowan <cowan@mercury.ccil.org>
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Message-ID: <20130607170857.GC13569@mercury.ccil.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com> <51B06F38.8050707@crockford.com> <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com> <51B1B4E7.8090101@it.aoyama.ac.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <51B1B4E7.8090101@it.aoyama.ac.jp>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Cc: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>, Tim Bray <tbray@textuality.com>, Paul Hoffman <paul.hoffman@vpnc.org>, Douglas Crockford <douglas@crockford.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Jun 2013 17:09:04 -0000

"Martin J. Dürst" scripsit:

> But we should try to do whatever we can in the spec to make it perfectly
> clear that there are no good reasons whatsoever to actually do that.

That's a little stronger than the truth.  You can use Java(Script)
strings, which are sequences of arbitrary 16-bit code units, to
represent an immutable vector of unsigned integers bounded by 65535,
and I have actually done so.  However, it is not a good idea to do so
in open interchange.

> Any Unicode code point except high-surrogate and low-surrogate code
> points. In other words, the ranges of integers 0 to D7FF16 and E00016
> to 10FFFF16 inclusive. (See definition D76 in Section 3.9, Unicode
> Encoding Forms.)

Yes, we should use either "Unicode scalar value" or "Unicode 16-bit code
unit", depending on what we decide on.

-- 
That you can cover for the plentiful            John Cowan
and often gaping errors, misconstruals,         http://www.ccil.org/~cowan
and disinformation in your posts                cowan@ccil.org
through sheer volume -- that is another misconception.  --Mike to Peter