Re: [Json] Proposed minimal change for strings

Bjoern Hoehrmann <derhoermi@gmx.net> Fri, 05 July 2013 17:15 UTC

Return-Path: <derhoermi@gmx.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0E73B21F9A79 for <json@ietfa.amsl.com>; Fri, 5 Jul 2013 10:15:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.266
X-Spam-Level:
X-Spam-Status: No, score=-2.266 tagged_above=-999 required=5 tests=[AWL=-0.267, BAYES_00=-2.599, J_CHICKENPOX_83=0.6]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QRnUE3sb5IUp for <json@ietfa.amsl.com>; Fri, 5 Jul 2013 10:15:53 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) by ietfa.amsl.com (Postfix) with ESMTP id 635CE21F9A59 for <json@ietf.org>; Fri, 5 Jul 2013 10:15:53 -0700 (PDT)
Received: from netb.Speedport_W_700V ([91.35.58.66]) by mail.gmx.com (mrgmx101) with ESMTPA (Nemesis) id 0Lx4dh-1UAxtG14rR-016jRU; Fri, 05 Jul 2013 19:15:51 +0200
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: Paul Hoffman <paul.hoffman@vpnc.org>
Date: Fri, 05 Jul 2013 19:15:53 +0200
Message-ID: <00sdt8hmont8gqvams8qbuas6o9c32ap5o@hive.bjoern.hoehrmann.de>
References: <9BACB3F2-F9BF-40C7-B4BA-C0C2F33E4278@vpnc.org>
In-Reply-To: <9BACB3F2-F9BF-40C7-B4BA-C0C2F33E4278@vpnc.org>
X-Mailer: Forte Agent 3.3/32.846
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K0:dNm0qm2mB/nJQMxsNL2wcDOdZPCClbdArPyUCeWo1UGCPfE0+i7 hLs+v70B2MZIKrY7SfF7uO+Ksn9e3B4/wO/1/kURt9vdW5Hf4uMzgNgQZyikJ+bwtW/PMCE W1SowraI0PWB7NxnDwvV7Ufh8bEZxBjcYqHSxThaznVIgNxHjZZ0LvG3jwbBMiKGGJyxyuM 6opHd7l3cZ2fgFcVp7IJA==
Cc: "json@ietf.org WG" <json@ietf.org>
Subject: Re: [Json] Proposed minimal change for strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Jul 2013 17:15:58 -0000

* Paul Hoffman wrote:
>Proposal 2 (prohibit unescaped surrogates):
>
>In section 1 (Introduction):
>  A string is a sequence of zero or more Unicode scalar values [UNICODE].

Section 1 describes the JSON data model and the proposed change would
prohibit escaped unpaired surrogates, implying that something like

  JSON.stringify([ JSON.parse('["..."]')[0].substring(0, 140) ]);

occasionally returns a string that is not JSON text, even if the input
to `JSON.parse` is proper JSON text, when there is a non-BMP character
in just the wrong place (`substring` counts UCS-2 code units).

With JSON.stringify this can be avoided using a replace function like

  JSON.stringify(obj, function(key, value) {
    if (typeof value == 'string') {
      return value.replace(/.../g, "\uFFFD");
    }
    return value;
  })

but then you have dataloss, unnecessarily so because you might just be
putting an object into a datastore for later use and ecmascript doesn't
mind unpaired surrogates, and the code is slower and harder to read.

It seems pretty clear to me that decoders are a better place to deal
with unpaired surrogate code points. In environments where they pose no
problem you probably want them to roundtrip, in others substituting is
a better option, and in yet others you might want to fail hard on them.

Prohibiting unpaired surrogate escapes would place the burden to avoid
them on senders, and I have not seen in argument why that is a better
option, all things considered.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/