[Json] A possible summary of the discussion so far on code points and characters

Paul Hoffman <paul.hoffman@vpnc.org> Sat, 08 June 2013 20:15 UTC

Return-Path: <paul.hoffman@vpnc.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 47A3621F9476 for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 13:15:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.108
X-Spam-Level:
X-Spam-Status: No, score=-102.108 tagged_above=-999 required=5 tests=[AWL=-0.109, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P-fqZ9EeZ6je for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 13:15:44 -0700 (PDT)
Received: from hoffman.proper.com (IPv6.Hoffman.Proper.COM [IPv6:2605:8e00:100:41::81]) by ietfa.amsl.com (Postfix) with ESMTP id B4F8621F8506 for <json@ietf.org>; Sat, 8 Jun 2013 13:15:44 -0700 (PDT)
Received: from [10.20.30.90] (50-0-66-165.dsl.dynamic.sonic.net [50.0.66.165]) (authenticated bits=0) by hoffman.proper.com (8.14.5/8.14.5) with ESMTP id r58KFhcw048312 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for <json@ietf.org>; Sat, 8 Jun 2013 13:15:44 -0700 (MST) (envelope-from paul.hoffman@vpnc.org)
From: Paul Hoffman <paul.hoffman@vpnc.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Message-Id: <AF793CAF-B30B-44A7-B864-82CEF79EA34D@vpnc.org>
Date: Sat, 08 Jun 2013 13:15:42 -0700
To: "json@ietf.org" <json@ietf.org>
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
X-Mailer: Apple Mail (2.1508)
Subject: [Json] A possible summary of the discussion so far on code points and characters
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Jun 2013 20:15:45 -0000

No hat at all, just trying to get some focus on the current facts before trying to reach conclusions.

1) Some people have read the following statement from the RFC to mean "only Unicode characters are allowed in strings":
   A string is a sequence of zero or more Unicode characters [UNICODE].

2) The ABNF is more liberal about what can be in a string than that statement:
      char = unescaped /
          escape ( ...
              %x75 4HEXDIG )  ; uXXXX                U+XXXX
      ...
      unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

3) Some JSON parsers enforce (1), rejecting JSON texts that have strings that have some unallowed code points.

4) Some JSON parsers accept strings with all code points.

5) The definition of "Unicode character" has been surprising to some people on this list, and thus might be surprising to some developers and users of JSON.

6) Some people on the list consider some code points that are Unicode non-characters to be more objectionable than other code points.

--Paul Hoffman