Re: [Json] On characters and code points
Tim Bray <tbray@textuality.com> Fri, 07 June 2013 17:26 UTC
Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4684121F994A for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 10:26:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.221
X-Spam-Level:
X-Spam-Status: No, score=-1.221 tagged_above=-999 required=5 tests=[AWL=1.155, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IFBYSXfFOupT for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 10:26:04 -0700 (PDT)
Received: from mail-vc0-f178.google.com (mail-vc0-f178.google.com [209.85.220.178]) by ietfa.amsl.com (Postfix) with ESMTP id 2EA3621F94DF for <json@ietf.org>; Fri, 7 Jun 2013 10:26:00 -0700 (PDT)
Received: by mail-vc0-f178.google.com with SMTP id id13so2966970vcb.9 for <json@ietf.org>; Fri, 07 Jun 2013 10:25:59 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=d7dQP0Y1BfsS/fB2qsq5Hq9WzaGCU5idg8kmiA5Pk8w=; b=LDfQq6v+YQ4D1EHYyBo7WjzjF1f05FJJrWyGUp4fsSftF8RXyKxhZyM6T0pYErweb7 +iPnzNRdu3E9q4r1nbEalGbIz6ECeZqhhEtAXok5V11krMIJZhSkUerEQ4thR23ewzcv aapqoRddW+zxHlGJAn6FMQQrjpiwfDX/2UzgY2vUgIAw0xEnADwu9v6g6P+pUanAmmRh ibhRZebfpL2hOaYhkZDwqRI/qernqyP/j/TIRCaaXgq0T0FJiFhPgwgYsS2iYzj3LIlf IjLfA7aCrIujL/g5VXl03qwPScbpNrHzJdCShnA33LkFcvEDOzzug9GqZSo/qX+jR4HG ccPA==
MIME-Version: 1.0
X-Received: by 10.52.112.5 with SMTP id im5mr5013020vdb.4.1370625959222; Fri, 07 Jun 2013 10:25:59 -0700 (PDT)
Received: by 10.220.48.14 with HTTP; Fri, 7 Jun 2013 10:25:59 -0700 (PDT)
X-Originating-IP: [96.49.81.176]
In-Reply-To: <20130607171950.GD13569@mercury.ccil.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com> <51B06F38.8050707@crockford.com> <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com> <51B1B4E7.8090101@it.aoyama.ac.jp> <9ld3r8pc0tufif18dohb2fmi0ijna1vs4n@hive.bjoern.hoehrmann.de> <56A163E9-E7CD-46B3-9984-8F009EBFF500@vpnc.org> <CAHBU6ivG=ONc8roT7W=LdpKYNMqRH_d5BobZ=pHnk=mVaKZKaA@mail.gmail.com> <20130607171950.GD13569@mercury.ccil.org>
Date: Fri, 07 Jun 2013 10:25:59 -0700
Message-ID: <CAHBU6iuO=D5Vtyjb_FQKHpttrFRBzXcB-Jac_ixb41GQFYF-Fw@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: John Cowan <cowan@mercury.ccil.org>
Content-Type: multipart/alternative; boundary="bcaec54857e86ba1f304de93b985"
X-Gm-Message-State: ALoCoQlXu0N1JvQD+xGbneujQbGg94br8fR6gWsVv8igSTFL+cJtlYoEE8mjLOaqt5nbYw2ZZFFs
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] On characters and code points
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Jun 2013 17:26:25 -0000
John is right and I am wrong. What I really want to legislate against is: { "Mis-use of UTF-16 surrogates" : "\udf46\ud800AB\udf11CD\ud812EF", "Mis-use of flipped BOM" : "AB\ufffeCD" } On Fri, Jun 7, 2013 at 10:19 AM, John Cowan <cowan@mercury.ccil.org> wrote: > Tim Bray scripsit: > > > > { "End of data marker": "\uFFFF" } > > > > > > > Yes, I *really* want to prohibit that. The one corner case it buys you is > > outweighed by a factor of a thousand or so in not being able to use > > general-purpose string processing software to deal with JSON payloads. > > Most general-purpose string processing software is perfectly happy > with U+FFFF. There are three different kinds of code points here, > and it doesn't help to conflate them: > > 1) Surrogate code points. These will never be assigned to any characters, > and reserved for use as UTF-16 code units. There are exactly 2048 of > these, from U+DC00 to U+DFFF. > > 2) Non-character code points. These will never be assigned to any > characters, and are not meant to be interchanged, but internal software > is expected to handle them. There are exactly 66 of these, and U+FFFF > is one. See <http://www.unicode.org/faq/private_use.html#noncharacters> > for more about this group. > > 3) Unassigned code points. These are not assigned to any characters > today, but may be assigned in future. They may be interchanged. > Internal libraries should process them. > > My view is that group 1 are and should be disallowed in JSON; others > disagree. Group 2 should be avoided by JSON creators, but accepted by > JSON parsers, which may choose to change them to U+FFFD (replacement > character). Group 3 are and should be valid in JSON. > > -- > Values of beeta will give rise to dom! John Cowan > (5th/6th edition 'mv' said this if you tried http://www.ccil.org/~cowan > to rename '.' or '..' entries; see cowan@ccil.org > http://cm.bell-labs.com/cm/cs/who/dmr/odd.html) >
- [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings R S
- Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Martin J. Dürst
- Re: [Json] Unpaired surrogates in JSON strings Bjoern Hoehrmann
- [Json] On characters and code points Paul Hoffman
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points Stephen Dolan
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Nico Williams
- Re: [Json] Unpaired surrogates in JSON strings Nico Williams
- Re: [Json] Unpaired surrogates in JSON strings Tatu Saloranta
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] On characters and code points Bjoern Hoehrmann
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] On characters and code points Nico Williams
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points Bjoern Hoehrmann
- Re: [Json] On characters and code points Carsten Bormann
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Paul Hoffman
- Re: [Json] On characters and code points Carsten Bormann
- Re: [Json] On characters and code points Nico Williams