Re: [Json] Proposal for strings/Unicode text

"Manger, James H" <James.H.Manger@team.telstra.com> Mon, 17 June 2013 05:02 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 88E2621F9A31 for <json@ietfa.amsl.com>; Sun, 16 Jun 2013 22:02:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.633
X-Spam-Level:
X-Spam-Status: No, score=-0.633 tagged_above=-999 required=5 tests=[AWL=0.268, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, RELAY_IS_203=0.994]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FpEdFOQ2m-hR for <json@ietfa.amsl.com>; Sun, 16 Jun 2013 22:02:39 -0700 (PDT)
Received: from ipxbno.tcif.telstra.com.au (ipxbno.tcif.telstra.com.au [203.35.82.204]) by ietfa.amsl.com (Postfix) with ESMTP id E0F6021F9A30 for <json@ietf.org>; Sun, 16 Jun 2013 22:02:38 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.87,878,1363093200"; d="scan'208";a="134534556"
Received: from unknown (HELO ipccni.tcif.telstra.com.au) ([10.97.216.208]) by ipobni.tcif.telstra.com.au with ESMTP; 17 Jun 2013 15:02:38 +1000
X-IronPort-AV: E=McAfee;i="5400,1158,7108"; a="145213789"
Received: from wsmsg3702.srv.dir.telstra.com ([172.49.40.170]) by ipccni.tcif.telstra.com.au with ESMTP; 17 Jun 2013 15:02:37 +1000
Received: from WSMSG3153V.srv.dir.telstra.com ([172.49.40.159]) by WSMSG3702.srv.dir.telstra.com ([172.49.40.170]) with mapi; Mon, 17 Jun 2013 15:02:37 +1000
From: "Manger, James H" <James.H.Manger@team.telstra.com>
To: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>, "json@ietf.org" <json@ietf.org>
Date: Mon, 17 Jun 2013 15:02:36 +1000
Thread-Topic: [Json] Proposal for strings/Unicode text
Thread-Index: AQHOZ51Gb76aHFDldEuIrE325m6ppJkzAdeAgAAdpYCAAKLpAIAAIdEA///3qYCABYBx4A==
Message-ID: <255B9BB34FB7D647A506DC292726F6E1151B931064@WSMSG3153V.srv.dir.telstra.com>
References: <20130613121620.GB11739@mercury.ccil.org> <A723FC6ECC552A4D8C8249D9E07425A70FC47B42@xmb-rcd-x10.cisco.com>
In-Reply-To: <A723FC6ECC552A4D8C8249D9E07425A70FC47B42@xmb-rcd-x10.cisco.com>
Accept-Language: en-US, en-AU
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US, en-AU
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Subject: Re: [Json] Proposal for strings/Unicode text
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Jun 2013 05:02:45 -0000

> >The point is that if JSON is encoded in UTF-8, any surrogate code
> >points MUST be escaped, even though the grammar does not say so.
> 
> What about changing the grammar to make that clear?
> 
> unescaped = %x20-21 / %x23-5B / %x5D-%xD7FF / %E000-%10FFFF  ; Any
> unicode code point except control characters, QUOTATION MARK,  ;
> REVERSE SOLIDUS, or code points reserved for UTF-16 surrogates

+1
Unpaired surrogates cannot be interchanged reliably so they should be dropped from this ABNF. I don't mind a note saying how some implementations handle them (or their escaped form).

Fixing typos and tweaking the comment:

  unescaped = %x20-21 / %x23-5B / %x5D-D7FF / %xE000-10FFFF
    ; any Unicode scalar value, except those that must be escaped
    ; (control characters, quotation mark, and reverse solidus)

--
James Manger