Re: [Json] Proposed minimal change for strings

"Joe Hildebrand (jhildebr)" <jhildebr@cisco.com> Fri, 05 July 2013 03:09 UTC

Return-Path: <jhildebr@cisco.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DF3EA21F86D5 for <json@ietfa.amsl.com>; Thu, 4 Jul 2013 20:09:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id y0YCcGQcV12v for <json@ietfa.amsl.com>; Thu, 4 Jul 2013 20:09:17 -0700 (PDT)
Received: from rcdn-iport-7.cisco.com (rcdn-iport-7.cisco.com [173.37.86.78]) by ietfa.amsl.com (Postfix) with ESMTP id 5436E21F8643 for <json@ietf.org>; Thu, 4 Jul 2013 20:09:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=1107; q=dns/txt; s=iport; t=1372993757; x=1374203357; h=from:to:subject:date:message-id:in-reply-to:content-id: content-transfer-encoding:mime-version; bh=tfE2aXO658c53wqak0j9QcWu77dY6kWF00rlr0Sisls=; b=lCYpwF4b9MsR+mKmPnWaQQENEVqaUZZxBPJFlcQGyvm+xHBxDQgCWDGE SFNLfm4kH+f/4ooLS7OmFda6/IXFrqkFtun6JcUCvWj1E0qiBiS6auYpa dnhVNw9LtmZvny3fWils/ct9hdb1A6aMIvGkPZAptE6XF/SIcDqTQuS1h Y=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AisFAPI31lGtJXG+/2dsb2JhbABagwl7wDiBARZ0giMBAQEDATpEDQEIIhRCJQIEARIIiAEGuS2POjiDBGkDqQ6DEYIo
X-IronPort-AV: E=Sophos;i="4.87,999,1363132800"; d="scan'208";a="231128087"
Received: from rcdn-core2-3.cisco.com ([173.37.113.190]) by rcdn-iport-7.cisco.com with ESMTP; 05 Jul 2013 03:09:16 +0000
Received: from xhc-aln-x09.cisco.com (xhc-aln-x09.cisco.com [173.36.12.83]) by rcdn-core2-3.cisco.com (8.14.5/8.14.5) with ESMTP id r6539GBL007836 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Fri, 5 Jul 2013 03:09:16 GMT
Received: from xmb-rcd-x10.cisco.com ([169.254.15.56]) by xhc-aln-x09.cisco.com ([173.36.12.83]) with mapi id 14.02.0318.004; Thu, 4 Jul 2013 22:09:15 -0500
From: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org WG" <json@ietf.org>
Thread-Topic: [Json] Proposed minimal change for strings
Thread-Index: AQHOd3uz+zMmutqy8ECJIa9yown6S5lVWk+A
Date: Fri, 05 Jul 2013 03:09:15 +0000
Message-ID: <A723FC6ECC552A4D8C8249D9E07425A70FC7E0AD@xmb-rcd-x10.cisco.com>
In-Reply-To: <9BACB3F2-F9BF-40C7-B4BA-C0C2F33E4278@vpnc.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.3.5.130515
x-originating-ip: [10.21.87.230]
Content-Type: text/plain; charset="us-ascii"
Content-ID: <67DEB9FFE151FA4A9C3BE18775E44BB0@emea.cisco.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [Json] Proposed minimal change for strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Jul 2013 03:09:23 -0000

On 7/2/13 5:27 PM, "Paul Hoffman" <paul.hoffman@vpnc.org> wrote:

>Proposal 1 (allow all code units in their unescaped form):

vehemently -1

>Proposal 2 (prohibit unescaped surrogates):

> In section 1 (Introduction):
>   A string is a sequence of zero or more Unicode scalar values [UNICODE].
> In section 2.2 (Strings):
>   Change the production for "unescaped" to be:
>     unescaped = %x20-21 / %x23-5B / %x5D-D7FF / %xE000-10FFFF


That works for me.  I think that the current text implies this by limiting
the valid encodings to UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, and UTF-32LE,
none of which can legally encode unescaped surrogates.  The ABNF is
therefore a spec bug that needs to be cleaned up, and any existing code
that emits unescaped values in the range %xD800-%xDFFF when not producing
UTF-16, or invalid combinations of those values when producing UTF-16 has
a bug that needs to be fixed.  Note: I don't care if parsers choose to do
something heroic in the face of these buggy implementations instead of
rejecting the input (viva Postel!).

-- 
Joe Hildebrand