Re: [Json] A possible summary of the discussion so far on code points and characters

R S <sayrer@gmail.com> Sat, 08 June 2013 20:52 UTC

Return-Path: <sayrer@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 486CA21F85E0 for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 13:52:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.299
X-Spam-Level:
X-Spam-Status: No, score=-2.299 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id md9hr42BTrbp for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 13:52:27 -0700 (PDT)
Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com [IPv6:2a00:1450:400c:c05::233]) by ietfa.amsl.com (Postfix) with ESMTP id CDD6B21F85D1 for <json@ietf.org>; Sat, 8 Jun 2013 13:52:26 -0700 (PDT)
Received: by mail-wi0-f179.google.com with SMTP id hm9so2180246wib.6 for <json@ietf.org>; Sat, 08 Jun 2013 13:52:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=bJlFbuENtQUoBHE9art41Th4iTtvRRE02uANZFlGTW4=; b=RhvVx2BBaHQRcsbvlmV0WDcJ9EDapWAFqblC1o2kzR9fYXgME6dXQsR7ZQXU4WMYP9 qGllqIfVKW4utnu1oGzlN8BCholzzJ97rvIdRxMx4vsB+UJ0gGk7gAZcyoz/GuKgQMHS K/OEQoGb9+ELfTm9UmumT/K2kmgMwzfCoEfDCf1RXBwYccc2fAPdihZsYo6X8m5KY0W0 V3okTFKkNLziBwLvi18dVLVfGXC6ueJiHj8bq/p4EP0mkjOVkOoR3gSXicnKOX8Cs53e tIxodALwL6pV+2rnPzK0Zfo4TSu57Pr7Ag4H3MjSfYiBE8/C3P+bUIIwBARnkJqYx9ek kkDg==
MIME-Version: 1.0
X-Received: by 10.194.58.239 with SMTP id u15mr2102022wjq.87.1370724745826; Sat, 08 Jun 2013 13:52:25 -0700 (PDT)
Received: by 10.194.83.35 with HTTP; Sat, 8 Jun 2013 13:52:25 -0700 (PDT)
In-Reply-To: <AF793CAF-B30B-44A7-B864-82CEF79EA34D@vpnc.org>
References: <AF793CAF-B30B-44A7-B864-82CEF79EA34D@vpnc.org>
Date: Sat, 08 Jun 2013 13:52:25 -0700
Message-ID: <CAChr6SwLDCUk0DC9pGTKqUu_V5vJHvs7Sgv4EneTJMryn1iKSA@mail.gmail.com>
From: R S <sayrer@gmail.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>
Content-Type: multipart/alternative; boundary="047d7ba97b948f59fb04deaab9b2"
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] A possible summary of the discussion so far on code points and characters
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Jun 2013 20:52:28 -0000

A seventh point of view, which I happen to agree with: JSON strings are a
sequence of code units.

This is similar to the definition of 'source text' in ECMAScript:

"ECMAScript source text is assumed to be a sequence of 16-bit code units
for the purposes of this specification. Such a source text may include
sequences of 16-bit code units that are not valid UTF-16 character
encodings."

http://es5.github.io/x6.html

- Rob



On Sat, Jun 8, 2013 at 1:15 PM, Paul Hoffman <paul.hoffman@vpnc.org> wrote:

> No hat at all, just trying to get some focus on the current facts before
> trying to reach conclusions.
>
> 1) Some people have read the following statement from the RFC to mean
> "only Unicode characters are allowed in strings":
>    A string is a sequence of zero or more Unicode characters [UNICODE].
>
> 2) The ABNF is more liberal about what can be in a string than that
> statement:
>       char = unescaped /
>           escape ( ...
>               %x75 4HEXDIG )  ; uXXXX                U+XXXX
>       ...
>       unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
>
> 3) Some JSON parsers enforce (1), rejecting JSON texts that have strings
> that have some unallowed code points.
>
> 4) Some JSON parsers accept strings with all code points.
>
> 5) The definition of "Unicode character" has been surprising to some
> people on this list, and thus might be surprising to some developers and
> users of JSON.
>
> 6) Some people on the list consider some code points that are Unicode
> non-characters to be more objectionable than other code points.
>
> --Paul Hoffman
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json
>