Re: [Json] Proposed minimal change for strings

"Manger, James H" <James.H.Manger@team.telstra.com> Sat, 06 July 2013 13:43 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4EF7D21F9BF9 for <json@ietfa.amsl.com>; Sat, 6 Jul 2013 06:43:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.601
X-Spam-Level:
X-Spam-Status: No, score=-0.601 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, J_CHICKENPOX_83=0.6, RELAY_IS_203=0.994]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lv-3vH8J22pR for <json@ietfa.amsl.com>; Sat, 6 Jul 2013 06:43:41 -0700 (PDT)
Received: from ipxbvo.tcif.telstra.com.au (ipxbvo.tcif.telstra.com.au [203.35.135.204]) by ietfa.amsl.com (Postfix) with ESMTP id 423EF21F9BD8 for <json@ietf.org>; Sat, 6 Jul 2013 06:43:38 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.87,1009,1363093200"; d="scan'208";a="145218697"
Received: from unknown (HELO ipcbvi.tcif.telstra.com.au) ([10.97.217.204]) by ipobvi.tcif.telstra.com.au with ESMTP; 06 Jul 2013 23:43:30 +1000
X-IronPort-AV: E=McAfee;i="5400,1158,7127"; a="142699208"
Received: from wsmsg3752.srv.dir.telstra.com ([172.49.40.173]) by ipcbvi.tcif.telstra.com.au with ESMTP; 06 Jul 2013 23:43:30 +1000
Received: from WSMSG3153V.srv.dir.telstra.com ([172.49.40.159]) by WSMSG3752.srv.dir.telstra.com ([172.49.40.173]) with mapi; Sat, 6 Jul 2013 23:43:29 +1000
From: "Manger, James H" <James.H.Manger@team.telstra.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>, Paul Hoffman <paul.hoffman@vpnc.org>
Date: Sat, 06 Jul 2013 23:43:28 +1000
Thread-Topic: [Json] Proposed minimal change for strings
Thread-Index: Ac55o1OfLFkRXVvVSsCi5R/qLaBshQApGl8Q
Message-ID: <255B9BB34FB7D647A506DC292726F6E1151C2C71BD@WSMSG3153V.srv.dir.telstra.com>
References: <9BACB3F2-F9BF-40C7-B4BA-C0C2F33E4278@vpnc.org> <00sdt8hmont8gqvams8qbuas6o9c32ap5o@hive.bjoern.hoehrmann.de>
In-Reply-To: <00sdt8hmont8gqvams8qbuas6o9c32ap5o@hive.bjoern.hoehrmann.de>
Accept-Language: en-US, en-AU
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US, en-AU
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Cc: "json@ietf.org WG" <json@ietf.org>
Subject: Re: [Json] Proposed minimal change for strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 06 Jul 2013 13:43:47 -0000

> * Paul Hoffman wrote:
> >Proposal 2 (prohibit unescaped surrogates):
> >
> >In section 1 (Introduction):
> >  A string is a sequence of zero or more Unicode scalar values
> [UNICODE].
> 
> Section 1 describes the JSON data model and the proposed change would
> prohibit escaped unpaired surrogates, implying that something like
> 
>   JSON.stringify([ JSON.parse('["..."]')[0].substring(0, 140) ]);
> 
> occasionally returns a string that is not JSON text, even if the input
> to `JSON.parse` is proper JSON text, when there is a non-BMP character
> in just the wrong place (`substring` counts UCS-2 code units).
> 
> With JSON.stringify this can be avoided using a replace function like
> 
>   JSON.stringify(obj, function(key, value) {
>     if (typeof value == 'string') {
>       return value.replace(/.../g, "\uFFFD");
>     }
>     return value;
>   })
> 
> but then you have dataloss, unnecessarily so because you might just be
> putting an object into a datastore for later use and ecmascript doesn't
> mind unpaired surrogates, and the code is slower and harder to read.


It is only "unnecessary dataloss" when you are certain a value will only be handled by ECMAScript -- in which case relying on ECMAScript's (reasonable) extension beyond JSON is ok.
On the other hand, what happens if the datastore understands JSON and tries to store the non-dataloss values as UTF-8? That is likely to fail. In this case "dataloss" that the app controls (and can avoid by not splitting a non-BMP character) is much better that unpredictable behaviour (works or fails) from different JSON-compliant datastores.


> It seems pretty clear to me that decoders are a better place to deal
> with unpaired surrogate code points. In environments where they pose no
> problem you probably want them to roundtrip, in others substituting is
> a better option, and in yet others you might want to fail hard on them.
> 
> Prohibiting unpaired surrogate escapes would place the burden to avoid
> them on senders, and I have not seen in argument why that is a better
> option, all things considered.

I think your previous paragraph provides the argument. If different decoders can legitimately roundtrip, substitute, or fail on parsing an unpaired surrogate escape than senders need to be warned (in big flashing lights) of this non-interoperable behaviour. The best way to do that is to exclude unpaired surrogate escapes from the specification of JSON.

--
James Manger