Re: [Json] On characters and code points
John Cowan <cowan@mercury.ccil.org> Fri, 07 June 2013 17:20 UTC
Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DBAB521F91BC for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 10:20:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.978
X-Spam-Level:
X-Spam-Status: No, score=-2.978 tagged_above=-999 required=5 tests=[AWL=0.021, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Oo09jLyBgtBV for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 10:20:01 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by ietfa.amsl.com (Postfix) with ESMTP id 0817421F8EC3 for <json@ietf.org>; Fri, 7 Jun 2013 10:20:00 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1Ul0KN-0007uR-42; Fri, 07 Jun 2013 13:19:51 -0400
Date: Fri, 07 Jun 2013 13:19:51 -0400
From: John Cowan <cowan@mercury.ccil.org>
To: Tim Bray <tbray@textuality.com>
Message-ID: <20130607171950.GD13569@mercury.ccil.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com> <51B06F38.8050707@crockford.com> <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com> <51B1B4E7.8090101@it.aoyama.ac.jp> <9ld3r8pc0tufif18dohb2fmi0ijna1vs4n@hive.bjoern.hoehrmann.de> <56A163E9-E7CD-46B3-9984-8F009EBFF500@vpnc.org> <CAHBU6ivG=ONc8roT7W=LdpKYNMqRH_d5BobZ=pHnk=mVaKZKaA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAHBU6ivG=ONc8roT7W=LdpKYNMqRH_d5BobZ=pHnk=mVaKZKaA@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] On characters and code points
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Jun 2013 17:20:05 -0000
Tim Bray scripsit: > > { "End of data marker": "\uFFFF" } > > > > Yes, I *really* want to prohibit that. The one corner case it buys you is > outweighed by a factor of a thousand or so in not being able to use > general-purpose string processing software to deal with JSON payloads. Most general-purpose string processing software is perfectly happy with U+FFFF. There are three different kinds of code points here, and it doesn't help to conflate them: 1) Surrogate code points. These will never be assigned to any characters, and reserved for use as UTF-16 code units. There are exactly 2048 of these, from U+DC00 to U+DFFF. 2) Non-character code points. These will never be assigned to any characters, and are not meant to be interchanged, but internal software is expected to handle them. There are exactly 66 of these, and U+FFFF is one. See <http://www.unicode.org/faq/private_use.html#noncharacters> for more about this group. 3) Unassigned code points. These are not assigned to any characters today, but may be assigned in future. They may be interchanged. Internal libraries should process them. My view is that group 1 are and should be disallowed in JSON; others disagree. Group 2 should be avoided by JSON creators, but accepted by JSON parsers, which may choose to change them to U+FFFD (replacement character). Group 3 are and should be valid in JSON. -- Values of beeta will give rise to dom! John Cowan (5th/6th edition 'mv' said this if you tried http://www.ccil.org/~cowan to rename '.' or '..' entries; see cowan@ccil.org http://cm.bell-labs.com/cm/cs/who/dmr/odd.html)
- [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings R S
- Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Martin J. Dürst
- Re: [Json] Unpaired surrogates in JSON strings Bjoern Hoehrmann
- [Json] On characters and code points Paul Hoffman
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points Stephen Dolan
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Nico Williams
- Re: [Json] Unpaired surrogates in JSON strings Nico Williams
- Re: [Json] Unpaired surrogates in JSON strings Tatu Saloranta
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] On characters and code points Bjoern Hoehrmann
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] On characters and code points Nico Williams
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points Bjoern Hoehrmann
- Re: [Json] On characters and code points Carsten Bormann
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Paul Hoffman
- Re: [Json] On characters and code points Carsten Bormann
- Re: [Json] On characters and code points Nico Williams