Re: [Json] "Generators SHOULD escape all Unicode whitespace characters"?

Stephen Dolan <stephen.dolan@cl.cam.ac.uk> Tue, 11 June 2013 10:58 UTC

Return-Path: <stedolan@stedolan.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0675121F9636 for <json@ietfa.amsl.com>; Tue, 11 Jun 2013 03:58:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.788
X-Spam-Level:
X-Spam-Status: No, score=-1.788 tagged_above=-999 required=5 tests=[AWL=1.189, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uO9Jmb4krZvW for <json@ietfa.amsl.com>; Tue, 11 Jun 2013 03:58:11 -0700 (PDT)
Received: from mail-lb0-f171.google.com (mail-lb0-f171.google.com [209.85.217.171]) by ietfa.amsl.com (Postfix) with ESMTP id 6DDA921F9458 for <json@ietf.org>; Tue, 11 Jun 2013 03:58:11 -0700 (PDT)
Received: by mail-lb0-f171.google.com with SMTP id 13so4040525lba.2 for <json@ietf.org>; Tue, 11 Jun 2013 03:58:10 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:x-originating-ip:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=Y+UI1j0Vvl6luWHnu8B0fCQS8YWeRU3vqcvahXRi3VE=; b=Dau18pIObNmcQyViXdu9sevTvZwN2vUGWYKrXOqOKrUFy813Eug/15vX60QLa6LQpV w4WL/PfQAUTyxlaEnvwFZ5g5AKF6jAYm1H2jL1GwPXrqdYCdkjDjZj4hn7B5BxMjX+Lg H5zAjro+D/OqpG3gmx3nxrdNR1K3r0Lmskt3k7do9k/9XThR2Eo1f/iWPMJXztfk8EaP FR9G34k/+dM2cSdbPoQkq7SiLPGMKhrT0tTIJ4NxqvVLODLYul13C9kmaG0HwfBzeOnM oGvfb/271kwJfG0ZnOlyDc6H797ggQZ4DxprukYS5wBmLCtoKrQidcLvxzg0Xt6bBztr 7bvg==
MIME-Version: 1.0
X-Received: by 10.152.120.196 with SMTP id le4mr2321010lab.6.1370948290186; Tue, 11 Jun 2013 03:58:10 -0700 (PDT)
Sender: stedolan@stedolan.net
Received: by 10.114.176.231 with HTTP; Tue, 11 Jun 2013 03:58:10 -0700 (PDT)
X-Originating-IP: [131.111.184.26]
In-Reply-To: <CAO1wJ5S_c_4H5PD5HAZo9UR2KbhDHqfXjo=C3GAGJeGEqCSFHA@mail.gmail.com>
References: <CAO1wJ5S_c_4H5PD5HAZo9UR2KbhDHqfXjo=C3GAGJeGEqCSFHA@mail.gmail.com>
Date: Tue, 11 Jun 2013 11:58:10 +0100
X-Google-Sender-Auth: FNOhr-DvVnDGOLk4bI9Q_S4EHhE
Message-ID: <CA+mHimN_SQJ+0uc1GLJn7Qs=Mdv9amjrvfnqjU6DucNK_zTKnA@mail.gmail.com>
From: Stephen Dolan <stephen.dolan@cl.cam.ac.uk>
To: Jacob Davies <jacob@well.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State: ALoCoQn2G9C348p/ZRpQu5/KBvEUW5qFpAN5E1iIoSbiP/ksGotWMBQa/wJW0NoQHO2lQSg3mnMZ
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] "Generators SHOULD escape all Unicode whitespace characters"?
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Jun 2013 10:58:16 -0000

On Mon, Jun 10, 2013 at 11:54 PM, Jacob Davies <jacob@well.com> wrote:
> I'm curious if anyone else thinks this is worth suggesting to
> implementors. There are a number of non-ASCII Unicode whitespace and
> control characters that are not required to be escaped right now.
>
> I think generators SHOULD escape them. Obviously parsers must continue
> to accept them unescaped regardless. The set is fairly small and could
> be enumerated in the document (it might expand in future, but this
> would be a good start):
>
> http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode
>
> "Whitespace smuggling" is a mild security concern and, from
> experience, can be quite hard to debug if non-0x20 spaces are not
> escaped. There is a small overhead of a couple of characters in doing
> so.
>
> Everything else in JSON's text serialization uses either printing
> characters, insignificant ASCII whitespace between values, or plain
> spaces in strings. Of course some printing Unicode characters are
> doppelgängers so perhaps people feel it is not worth worrying about
> whitespace either.
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json

There is an interesting issue with unescaped unicode whitespace:
ECMAScript does not allow U+2028 LINE SEPARATOR inside a string
literal, yet JSON does (as it is not one of the listed
must-always-be-escaped characters).

See http://timelessrepo.com/json-isnt-a-javascript-subset

Stephen