Re: [Json] "Generators SHOULD escape all Unicode whitespace characters"?

Norbert Lindenberg <ietf@lindenbergsoftware.com> Wed, 12 June 2013 04:35 UTC

Return-Path: <ietf@lindenbergsoftware.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B497921F9BD6 for <json@ietfa.amsl.com>; Tue, 11 Jun 2013 21:35:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0TcrqJurDx3x for <json@ietfa.amsl.com>; Tue, 11 Jun 2013 21:35:48 -0700 (PDT)
Received: from mirach.lunarpages.com (mirach.lunarpages.com [216.97.235.70]) by ietfa.amsl.com (Postfix) with ESMTP id 8CF3821F9BCF for <json@ietf.org>; Tue, 11 Jun 2013 21:35:48 -0700 (PDT)
Received: from 50-0-136-241.dsl.dynamic.sonic.net ([50.0.136.241]:54843 helo=[192.168.0.5]) by mirach.lunarpages.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80) (envelope-from <ietf@lindenbergsoftware.com>) id 1Umcme-0029yk-J0; Tue, 11 Jun 2013 21:35:44 -0700
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset=iso-8859-1
From: Norbert Lindenberg <ietf@lindenbergsoftware.com>
In-Reply-To: <CAO1wJ5S_c_4H5PD5HAZo9UR2KbhDHqfXjo=C3GAGJeGEqCSFHA@mail.gmail.com>
Date: Tue, 11 Jun 2013 21:35:40 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <257919C3-279E-47CA-9430-17FD52F82745@lindenbergsoftware.com>
References: <CAO1wJ5S_c_4H5PD5HAZo9UR2KbhDHqfXjo=C3GAGJeGEqCSFHA@mail.gmail.com>
To: Jacob Davies <jacob@well.com>
X-Mailer: Apple Mail (2.1283)
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - mirach.lunarpages.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - lindenbergsoftware.com
X-Get-Message-Sender-Via: mirach.lunarpages.com: authenticated_id: ietf@lindenbergsoftware.com
Cc: Norbert Lindenberg <ietf@lindenbergsoftware.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] "Generators SHOULD escape all Unicode whitespace characters"?
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Jun 2013 04:35:54 -0000

On Jun 10, 2013, at 15:54 , Jacob Davies wrote:

> I'm curious if anyone else thinks this is worth suggesting to
> implementors. There are a number of non-ASCII Unicode whitespace and
> control characters that are not required to be escaped right now.
> 
> I think generators SHOULD escape them. Obviously parsers must continue
> to accept them unescaped regardless. The set is fairly small and could
> be enumerated in the document (it might expand in future, but this
> would be a good start):
> 
> http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode

This list includes some but not all Unicode control characters in addition to space characters.

> "Whitespace smuggling" is a mild security concern and, from
> experience, can be quite hard to debug if non-0x20 spaces are not
> escaped. There is a small overhead of a couple of characters in doing
> so.

Can you provide more detail on the problem that this proposal is intended to solve? Does the proposal really solve the problem, given that generators don't have to implement it, that they cannot implement it for characters added to Unicode in a Unicode version later than the one they're based on, and that parsers cannot rely on generators to have implemented it?

Norbert