Re: [Json] On characters and code points

Paul Hoffman <paul.hoffman@vpnc.org> Sat, 08 June 2013 15:25 UTC

Return-Path: <paul.hoffman@vpnc.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 200B021F9750 for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 08:25:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.396
X-Spam-Level:
X-Spam-Status: No, score=-102.396 tagged_above=-999 required=5 tests=[AWL=0.203, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HxEUecsnpEp5 for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 08:25:28 -0700 (PDT)
Received: from hoffman.proper.com (IPv6.Hoffman.Proper.COM [IPv6:2605:8e00:100:41::81]) by ietfa.amsl.com (Postfix) with ESMTP id A0B2F21F96B1 for <json@ietf.org>; Sat, 8 Jun 2013 08:25:28 -0700 (PDT)
Received: from [10.20.30.90] (50-0-66-165.dsl.dynamic.sonic.net [50.0.66.165]) (authenticated bits=0) by hoffman.proper.com (8.14.5/8.14.5) with ESMTP id r58FPRQ2041967 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 8 Jun 2013 08:25:28 -0700 (MST) (envelope-from paul.hoffman@vpnc.org)
Content-Type: text/plain; charset="iso-8859-1"
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
From: Paul Hoffman <paul.hoffman@vpnc.org>
In-Reply-To: <CDFC7751-98EE-466C-98D9-A53D278B2113@tzi.org>
Date: Sat, 08 Jun 2013 08:25:27 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <3A9644F9-A0E2-46FA-B4BD-9A834C2F442B@vpnc.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com> <51B06F38.8050707@crockford.com> <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com> <51B1B4E7.8090101@it.aoyama.ac.jp> <9ld3r8pc0tufif18dohb2fmi0ijna1vs4n@hive.bjoern.hoehrmann.de> <56A163E9-E7CD-46B3-9984-8F009EBFF500@vpnc.org> <CDFC7751-98EE-466C-98D9-A53D278B2113@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.1508)
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] On characters and code points
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Jun 2013 15:25:29 -0000

On Jun 8, 2013, at 2:38 AM, Carsten Bormann <cabo@tzi.org> wrote:

> On Jun 7, 2013, at 17:56, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
> 
>> Remove the word "character" from the spec except in an explanatory paragraph in Section 2.5 that says:
>>  All code points, even those that represent non-characters in the Unicode specification [UNICODE], are allowed in JSON strings.
> 
> A much better way to handle this problem would be
> 
> "For the purposes of this specification, the term "character" includes both Unicode characters and ______."
> 
> Fill in the blanks based on what we want to do here:
> 
> (1) I think it is now clear that we don't want to shun Unicode non-characters (see corrigendum #9).
> 
> (2) I think it is also clear that we want to include all Unicode code positions that could become Unicode characters in future versions of Unicode according to the compatibility spec.
> 
> (3) I also think that we don't want to change UTF-8, the UTF-16s, or the UTF-32s, so what's possible in "native" encoding will be controlled by those specifications, minus ", \, the C0 characters.
> 
> (4) The remaining question appears what we do with unpaired escaped surrogates.
> The answer will have to be a bit wishy-washy, because anything strong will invalidate half of the implementations.  If is probably a good idea not to "break" those applications that compensate JavaScript's lack of a binary string type by using UTF-16 as a vector of unconstrained 16-bit values, but we also cannot mandate that everyone adopt this 1990s style hack.

Ummm, how is that "much better"? "Code points minus THEONESWEHATE" seems a lot simpler than "characters plus ADDITIONAL1 plus ADDITIONAL2 plus ADDITIONAL3".

--Paul Hoffman