Re: [Json] Proposed minimal change for strings

Nico Williams <nico@cryptonector.com> Wed, 03 July 2013 03:44 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BC24B21F9950 for <json@ietfa.amsl.com>; Tue, 2 Jul 2013 20:44:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.977
X-Spam-Level:
X-Spam-Status: No, score=-1.977 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hL1fqlCzMRVX for <json@ietfa.amsl.com>; Tue, 2 Jul 2013 20:44:16 -0700 (PDT)
Received: from homiemail-a26.g.dreamhost.com (caiajhbdcbhh.dreamhost.com [208.97.132.177]) by ietfa.amsl.com (Postfix) with ESMTP id DEC2721F8C4B for <json@ietf.org>; Tue, 2 Jul 2013 20:44:16 -0700 (PDT)
Received: from homiemail-a26.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a26.g.dreamhost.com (Postfix) with ESMTP id A60ADB806B for <json@ietf.org>; Tue, 2 Jul 2013 20:44:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type; s=cryptonector.com; bh=q4B0kjJ+n+MseCxinMuv ZA2iiUI=; b=M3aVA0V1j1/lV5tO8199SaRVfRDWFr8icRSEDskY5skSlUIGW+mE p3I//Qj756yILyN5MHBXMV0CiXhV/wBnL+DeL3q6QHCmsYmfh3zBe5h1VQYUbqUp 4SoBug8ocsqjaTPytwnDWPg8PkURcLcyEckcc13t4UXk95mMrqaS1XE=
Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a26.g.dreamhost.com (Postfix) with ESMTPSA id 5A8FFB805C for <json@ietf.org>; Tue, 2 Jul 2013 20:44:16 -0700 (PDT)
Received: by mail-wg0-f47.google.com with SMTP id l18so5353414wgh.14 for <json@ietf.org>; Tue, 02 Jul 2013 20:44:15 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=heX7D/KW8i5Px0l8dKIJEpe+Bgn+EymQDI6NAjldyoA=; b=O6eYdFki9EUgebepDPnmmyE7U2C5tT78W9zf1Fi9j9Y9v7rTIsJMSGWTISYY1wYda+ IjD2F5tOgfuhy+4/57h/Vs9IdT14dotrjcF3oG27v3fsPMuG68cOT0TwE8i+IoDN1mFi pctZZiapotvKq8cTJkL3AxOv7TBxBeNVaxoO5JhJpJ9bYT/HnmwLFO9oV9DSUozpkzWi Y4R0RgTzyBDc5UTYdEoI3YkVIHZ8aHCFmiC84yJfifOk6YcmGbbJxSylGZC8h0NSfLzv uKCFIDCT7HJKuP4L8AfRF7qre1wYk2dGtJAWQCSr+oZI6xMqUk0InPZ+xgalXBJZUwgR HBlA==
MIME-Version: 1.0
X-Received: by 10.194.7.137 with SMTP id j9mr25595317wja.11.1372823055023; Tue, 02 Jul 2013 20:44:15 -0700 (PDT)
Received: by 10.216.152.73 with HTTP; Tue, 2 Jul 2013 20:44:14 -0700 (PDT)
In-Reply-To: <9BACB3F2-F9BF-40C7-B4BA-C0C2F33E4278@vpnc.org>
References: <9BACB3F2-F9BF-40C7-B4BA-C0C2F33E4278@vpnc.org>
Date: Tue, 02 Jul 2013 22:44:14 -0500
Message-ID: <CAK3OfOgN5SKOet5bvN1fpxj6UsvUdcOUxvETYxUmsWH_3sarcA@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>
Content-Type: text/plain; charset="UTF-8"
Cc: "json@ietf.org WG" <json@ietf.org>
Subject: Re: [Json] Proposed minimal change for strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jul 2013 03:44:21 -0000

On Tue, Jul 2, 2013 at 6:27 PM, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
> <chair hats on>
>
> Do either or both of these proposals work for people in the WG?
>
> --Matt Miller and Paul Hoffman
>
> The following are proposed minimal changes to make the JSON spec interoperable with respect to
> which unescaped code units are legal in strings.
>
> Proposal 1 (allow all code units in their unescaped form):

Huh?  Do you mean that any code unit may be allowed if escaped?  Or
did you really mean that any code unit may be sent unescaped?
Certainly the latter would be bad (e.g., double-quotes and newline
must be escaped at the very, very least).

> In section 1 (Introduction):
>   Change the sentence about Unicode characters to:
>     A string is a sequence of zero or more Unicode code units [UNICODE].

These aren't Unicode things though.  They are 16-bit values that
usually can be interpreted as part of UTF-16-encoded Unicode text.

> In section 2.2 (Strings):
>   Leave the production for "unescaped" as-is.
> In section 3 (Encoding):
>   Add "Some strings, notably those that have unescaped surrogate code units
>   (value 0xD800 to 0xDFFF), cannot be encoded in UTF-8."

Unescaped and *unpaired*.

Note that parsers can unescape escaped characters/code points/code
units, and many do.  Therefore banning unescaped unpaired surrogate
halves is not sufficient.  We must say something about what parsers
SHOULD do when they see these, whether escaped or unescaped.

> Proposal 2 (prohibit unescaped surrogates):
>
> In section 1 (Introduction):
>   A string is a sequence of zero or more Unicode scalar values [UNICODE].
> In section 2.2 (Strings):
>   Change the production for "unescaped" to be:
>     unescaped = %x20-21 / %x23-5B / %x5D-D7FF / %xE000-10FFFF

I don't think we need to make this change: parsers can be expected to
do something with unpaired surrogate halves, like either throw them
away or escape them (if it makes sense in whatever context the parser
is executing in) or even output a binary string instead of a text
string.  Certainly, if we don't make this change, then we should say
something about what parsers SHOULD do in the face of such things.
Anyways, I'd +1 this, but we need to discuss what to do with these
when they are escaped.

Nico
--