Re: [Json] On characters and code points
Tim Bray <tbray@textuality.com> Fri, 07 June 2013 16:18 UTC
Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 32BA021F86D5 for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 09:18:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.201
X-Spam-Level: *
X-Spam-Status: No, score=1.201 tagged_above=-999 required=5 tests=[AWL=-0.756, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6, RCVD_IN_PBL=0.905, RCVD_IN_SORBS_DUL=0.877, RDNS_NONE=0.1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QQAvQoRxx+qg for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 09:18:08 -0700 (PDT)
Received: from mail-ve0-x22b.google.com (mail-ve0-x22b.google.com [IPv6:2607:f8b0:400c:c01::22b]) by ietfa.amsl.com (Postfix) with ESMTP id 9484D21F944F for <json@ietf.org>; Fri, 7 Jun 2013 09:18:08 -0700 (PDT)
Received: by mail-ve0-f171.google.com with SMTP id b10so3242509vea.30 for <json@ietf.org>; Fri, 07 Jun 2013 09:18:08 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=Q2dgx2OsPflK7vCc+B8VemqIRvIkysqF9JAw4os3Ks0=; b=omd7rR1OWKlh24ul0kAqcUSSGv4YnaKL+xItqsX6hdg3dDMCUP9GyYmQ/MXPJrhBty tNGyLo4bxOA5FguWY4Y3FHdTcKyIZ1+XeAxi+3lg2DYECztVSoCLuW5g8ZrkE5L7yb7h SK7HkNlznyEA5N/GVNANaxhD+SWFXZGa3OUMASw25KH1X1Z+N7/OT0B0dhgNyjMy9or/ 77yuOOgRK5yVrFqo9KY+Y3TP6C/BHx/JrNWNCrqrNkPC+uy06a5yfdBhIBas0t6CbaEP X3smcBqBR/0f2CwXtd7F0HWxVw435EBzz1RmRdA6KBvAX9+lFVWNy8SXb60Py4X3R4P6 o9Dg==
MIME-Version: 1.0
X-Received: by 10.220.193.202 with SMTP id dv10mr10252889vcb.24.1370621887991; Fri, 07 Jun 2013 09:18:07 -0700 (PDT)
Received: by 10.220.48.14 with HTTP; Fri, 7 Jun 2013 09:18:07 -0700 (PDT)
X-Originating-IP: [24.84.235.32]
In-Reply-To: <CA+mHimO-bUvodjgM89Nskg+tqWrsTAfL8EWRx++fd16t1hFR_g@mail.gmail.com>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com> <51B06F38.8050707@crockford.com> <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com> <51B1B4E7.8090101@it.aoyama.ac.jp> <9ld3r8pc0tufif18dohb2fmi0ijna1vs4n@hive.bjoern.hoehrmann.de> <56A163E9-E7CD-46B3-9984-8F009EBFF500@vpnc.org> <CA+mHimO-bUvodjgM89Nskg+tqWrsTAfL8EWRx++fd16t1hFR_g@mail.gmail.com>
Date: Fri, 07 Jun 2013 09:18:07 -0700
Message-ID: <CAHBU6iu_Lex8A9w2Fd77tfud+2h9BLAEHgLR-ezBXgxROs-3xw@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: Stephen Dolan <stephen.dolan@cl.cam.ac.uk>
Content-Type: multipart/alternative; boundary="047d7b673dbac163bc04de92c6d8"
X-Gm-Message-State: ALoCoQmh/VyqayOXQ6/XZBWu88371QeOKlTquIsNoG9TCeNBJnHNidzrBc+M70pYhQN/oC0tUNaK
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] On characters and code points
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Jun 2013 16:18:24 -0000
I agree 100% with Stephen Dolan. -T On Fri, Jun 7, 2013 at 9:09 AM, Stephen Dolan <stephen.dolan@cl.cam.ac.uk>wrote: > I think it is useful to distinguish three cases of codepoint: > (1) Those which are valid characters in a particular Unicode revision > (2) Those which are unallocated codepoints which may become valid > characters in a later Unicode revision > (3) The noncharacter codepoints which will never be valid > > (3) includes such beasts as U+FFFE (which you can only get by reading > a UTF16 byte order mark with the wrong byte order). The set (1) > increases with every Unicode revision to include characters from (2), > but (3) is stable (see > http://unicode.org/policies/stability_policy.html). > > I think JSON should allow characters from (1) and (2) to avoid being > dependent on a specific Unicode revision. I do not think (3) should be > allowed - this would cause problems with many existing parsers which > represent JSON strings using another system's native unicode > representation. > > The argument about testsuites does not seem compelling, as any such > testsuite testing behaviour of string functions with bad Unicode would > also include invalidly-encoded Unicode (such as overlong UTF8 > sequences) which cannot be represented at all in JSON, even with > escaping. > > Stephen > > On Fri, Jun 7, 2013 at 4:56 PM, Paul Hoffman <paul.hoffman@vpnc.org> > wrote: > > <no hat> > > > > This may be a part of the spec where some people have to hold their > noses. The Unicode definition of "character" does not include > non-characters, and the code points for some of those non-characters make > sense in JSON strings when those strings. Bjoern has pointed out a good > one: strings used for test cases of other code. The issue not just unpaired > surrogates. Do we *really* want to prohibit: > > { "End of data marker": "\uFFFF" } > > > > Proposal: > > > > Remove the word "character" from the spec except in an explanatory > paragraph in Section 2.5 that says: > > All code points, even those that represent non-characters in the > Unicode specification [UNICODE], are allowed in JSON strings. > > > > --Paul Hoffman > > _______________________________________________ > > json mailing list > > json@ietf.org > > https://www.ietf.org/mailman/listinfo/json > _______________________________________________ > json mailing list > json@ietf.org > https://www.ietf.org/mailman/listinfo/json >
- [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings R S
- Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Carsten Bormann
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings Douglas Crockford
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Paul Hoffman
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] Unpaired surrogates in JSON strings Martin J. Dürst
- Re: [Json] Unpaired surrogates in JSON strings Bjoern Hoehrmann
- [Json] On characters and code points Paul Hoffman
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points Stephen Dolan
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] On characters and code points John Cowan
- Re: [Json] Unpaired surrogates in JSON strings Nico Williams
- Re: [Json] Unpaired surrogates in JSON strings Nico Williams
- Re: [Json] Unpaired surrogates in JSON strings Tatu Saloranta
- Re: [Json] Unpaired surrogates in JSON strings Joe Hildebrand (jhildebr)
- Re: [Json] On characters and code points Bjoern Hoehrmann
- Re: [Json] On characters and code points Tim Bray
- Re: [Json] Unpaired surrogates in JSON strings John Cowan
- Re: [Json] On characters and code points Nico Williams
- Re: [Json] On characters and code points John Cowan
- Re: [Json] On characters and code points Bjoern Hoehrmann
- Re: [Json] On characters and code points Carsten Bormann
- Re: [Json] On characters and code points Stefan Drees
- Re: [Json] On characters and code points Paul Hoffman
- Re: [Json] On characters and code points Carsten Bormann
- Re: [Json] On characters and code points Nico Williams