[Json] "Generators SHOULD escape all Unicode whitespace characters"?

Jacob Davies <jacob@well.com> Mon, 10 June 2013 22:55 UTC

Return-Path: <cromis@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2C4BF21F96D9 for <json@ietfa.amsl.com>; Mon, 10 Jun 2013 15:55:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.978
X-Spam-Level:
X-Spam-Status: No, score=-1.978 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w3qyQ02tXcGf for <json@ietfa.amsl.com>; Mon, 10 Jun 2013 15:55:02 -0700 (PDT)
Received: from mail-qc0-x233.google.com (mail-qc0-x233.google.com [IPv6:2607:f8b0:400d:c01::233]) by ietfa.amsl.com (Postfix) with ESMTP id 1FD6621F96C2 for <json@ietf.org>; Mon, 10 Jun 2013 15:55:02 -0700 (PDT)
Received: by mail-qc0-f179.google.com with SMTP id e1so4038029qcx.10 for <json@ietf.org>; Mon, 10 Jun 2013 15:55:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:x-google-sender-auth:message-id :subject:to:content-type:content-transfer-encoding; bh=s9VKfSgsi3UbzbAsh79IFiPaPc1/GWzp5S1ZYAVpjbI=; b=iawuVPeCP0arc+W2aZQHtaf8JjtSQLBzd1V0fSzHRNjO0+iD9LiSHzOkaFMCjchrQM iFGgYU51QiN+t3XR5qrNVWIUK1yftmOhqB+pCBMiXpjCt/Ok2+0Z1dzhrPpTqTobAyok lNxykQ3Is6wNJYXd2m+q6l/aRwy89x1vFBI9yXaxmN1zjd3Xva3+PVz2gpAImIMVqXBg i67KPlkdiEP3CIIP4QTQLdSNYui2OC4uc0O5lRDuXOEXM9C72F/EPpRj5cx8554CZqPn NigFsT7AUtqzfQ2JG2ABIocVwWGqnDEYv3mzkRUcLLRO0mZu0ssy8OedhnbA9L7li++b b+Iw==
X-Received: by 10.229.124.80 with SMTP id t16mr4525581qcr.93.1370904901543; Mon, 10 Jun 2013 15:55:01 -0700 (PDT)
MIME-Version: 1.0
Sender: cromis@gmail.com
Received: by 10.49.106.228 with HTTP; Mon, 10 Jun 2013 15:54:41 -0700 (PDT)
From: Jacob Davies <jacob@well.com>
Date: Mon, 10 Jun 2013 15:54:41 -0700
X-Google-Sender-Auth: upThZMcNOiaSTd42xn_3yy-mBDY
Message-ID: <CAO1wJ5S_c_4H5PD5HAZo9UR2KbhDHqfXjo=C3GAGJeGEqCSFHA@mail.gmail.com>
To: "json@ietf.org" <json@ietf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: [Json] "Generators SHOULD escape all Unicode whitespace characters"?
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jun 2013 22:55:03 -0000

I'm curious if anyone else thinks this is worth suggesting to
implementors. There are a number of non-ASCII Unicode whitespace and
control characters that are not required to be escaped right now.

I think generators SHOULD escape them. Obviously parsers must continue
to accept them unescaped regardless. The set is fairly small and could
be enumerated in the document (it might expand in future, but this
would be a good start):

http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode

"Whitespace smuggling" is a mild security concern and, from
experience, can be quite hard to debug if non-0x20 spaces are not
escaped. There is a small overhead of a couple of characters in doing
so.

Everything else in JSON's text serialization uses either printing
characters, insignificant ASCII whitespace between values, or plain
spaces in strings. Of course some printing Unicode characters are
doppelgängers so perhaps people feel it is not worth worrying about
whitespace either.