Re: [Json] Proposed minimal change for strings

Paul Hoffman <paul.hoffman@vpnc.org> Wed, 03 July 2013 15:28 UTC

Return-Path: <paul.hoffman@vpnc.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 327A721F9CD0 for <json@ietfa.amsl.com>; Wed, 3 Jul 2013 08:28:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.299
X-Spam-Level:
X-Spam-Status: No, score=-102.299 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9f4J7KzC5WuK for <json@ietfa.amsl.com>; Wed, 3 Jul 2013 08:28:46 -0700 (PDT)
Received: from hoffman.proper.com (IPv6.Hoffman.Proper.COM [IPv6:2605:8e00:100:41::81]) by ietfa.amsl.com (Postfix) with ESMTP id BC4D921F9CCE for <json@ietf.org>; Wed, 3 Jul 2013 08:28:46 -0700 (PDT)
Received: from [10.20.30.90] (50-1-98-228.dsl.dynamic.sonic.net [50.1.98.228]) (authenticated bits=0) by hoffman.proper.com (8.14.5/8.14.5) with ESMTP id r63FSiiH051111 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 3 Jul 2013 08:28:45 -0700 (MST) (envelope-from paul.hoffman@vpnc.org)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
From: Paul Hoffman <paul.hoffman@vpnc.org>
In-Reply-To: <CAK3OfOgN5SKOet5bvN1fpxj6UsvUdcOUxvETYxUmsWH_3sarcA@mail.gmail.com>
Date: Wed, 03 Jul 2013 08:28:45 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <0194C74E-3866-48B1-A6F8-69802FA30609@vpnc.org>
References: <9BACB3F2-F9BF-40C7-B4BA-C0C2F33E4278@vpnc.org> <CAK3OfOgN5SKOet5bvN1fpxj6UsvUdcOUxvETYxUmsWH_3sarcA@mail.gmail.com>
To: Nico Williams <nico@cryptonector.com>
X-Mailer: Apple Mail (2.1508)
Cc: "json@ietf.org WG" <json@ietf.org>
Subject: Re: [Json] Proposed minimal change for strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jul 2013 15:28:47 -0000

<no hat>

On Jul 2, 2013, at 8:44 PM, Nico Williams <nico@cryptonector.com> wrote:

> Huh?  Do you mean that any code unit may be allowed if escaped?  

That is exactly what the current document says, I believe. Do you see anything in the grammar that says differently?

>> In section 1 (Introduction):
>>  Change the sentence about Unicode characters to:
>>    A string is a sequence of zero or more Unicode code units [UNICODE].
> 
> These aren't Unicode things though.  

Yes, they are. See the Unicode Standard, definition D77. Do you see anything in the Unicode Standard that disagrees with D77?

>> In section 2.2 (Strings):
>>  Leave the production for "unescaped" as-is.
>> In section 3 (Encoding):
>>  Add "Some strings, notably those that have unescaped surrogate code units
>>  (value 0xD800 to 0xDFFF), cannot be encoded in UTF-8."
> 
> Unescaped and *unpaired*.

No, any surrogate code point. RFC 3629, the IETF's definition of UTF-8, says:
   The definition of UTF-8 prohibits encoding character numbers between
   U+D800 and U+DFFF, which are reserved for use with the UTF-16
   encoding form (as surrogate pairs) and do not directly represent
   characters.
Similar language is used in The Unicode Standard's definition of UTF-8 (D92), and of UTF-32 (D90).

--Paul Hoffman