Re: [Json] "Generators SHOULD escape all Unicode whitespace characters"?
Jacob Davies <jacob@well.com> Thu, 13 June 2013 23:48 UTC
Return-Path: <cromis@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2A9E321F9B32 for <json@ietfa.amsl.com>; Thu, 13 Jun 2013 16:48:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.839
X-Spam-Level:
X-Spam-Status: No, score=-1.839 tagged_above=-999 required=5 tests=[AWL=0.138, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HuYfuG5i1YFB for <json@ietfa.amsl.com>; Thu, 13 Jun 2013 16:48:08 -0700 (PDT)
Received: from mail-qa0-x234.google.com (mail-qa0-x234.google.com [IPv6:2607:f8b0:400d:c00::234]) by ietfa.amsl.com (Postfix) with ESMTP id 17F9421F9B31 for <json@ietf.org>; Thu, 13 Jun 2013 16:48:08 -0700 (PDT)
Received: by mail-qa0-f52.google.com with SMTP id bv4so35212qab.11 for <json@ietf.org>; Thu, 13 Jun 2013 16:48:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=HSIxEI2t5WsPK4G8SDdcIefFKZIFktTWX/cSWUSnELY=; b=uQdr4wCDKv7lBf+q04f3QKZZtOgPCB96WqH7CHUHbvzhamDVjuJeq0QBLAkrVnIDsC Ox2J0qEGdXv8YtHO8H7v5Bjnt6T5RRsk8wVog6XjMddwc4gN92lphi0cuE/7T31Fn2l3 cD/hZfjUiUmumykrptaJP4iyzkoraDZk4x3eyaqTv6djfhG5PqEhw6HlCfRwIAZ2283c Em7ouGfMNWvH6LfttO+XVmnBEx9UYVfzuC9AuL3W7FbZlHEkOBSvXcqrcJqvdN6Fa5bF +FvumlVTHe9SIJ9HXUzrnnBlir3WRAQYxuUvic5ERODCbpjCr/+fdcaQyu+XpXlGT543 mqwA==
X-Received: by 10.49.101.74 with SMTP id fe10mr4441224qeb.11.1371167287436; Thu, 13 Jun 2013 16:48:07 -0700 (PDT)
MIME-Version: 1.0
Sender: cromis@gmail.com
Received: by 10.49.106.228 with HTTP; Thu, 13 Jun 2013 16:47:47 -0700 (PDT)
In-Reply-To: <257919C3-279E-47CA-9430-17FD52F82745@lindenbergsoftware.com>
References: <CAO1wJ5S_c_4H5PD5HAZo9UR2KbhDHqfXjo=C3GAGJeGEqCSFHA@mail.gmail.com> <257919C3-279E-47CA-9430-17FD52F82745@lindenbergsoftware.com>
From: Jacob Davies <jacob@well.com>
Date: Thu, 13 Jun 2013 16:47:47 -0700
X-Google-Sender-Auth: wJM5sEdUkehBNJkp_uOlJR0dNTU
Message-ID: <CAO1wJ5TDUh8T-gbovjU4qJbHay0eH6Fk8YhcBVV9WQO36Qv8iw@mail.gmail.com>
To: Norbert Lindenberg <ietf@lindenbergsoftware.com>
Content-Type: multipart/alternative; boundary="001a11c2c6ce191cfb04df11c3f1"
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] "Generators SHOULD escape all Unicode whitespace characters"?
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Jun 2013 23:48:09 -0000
> > This list includes some but not all Unicode control characters in addition > to space characters. > Yes, and languages vary in what they actually consider "whitespace". I think in general we're concerned with "non-printing or whitespace characters other than a simple space". > "Whitespace smuggling" is a mild security concern and, from > > experience, can be quite hard to debug if non-0x20 spaces are not > > escaped. There is a small overhead of a couple of characters in doing > > so. > > Can you provide more detail on the problem that this proposal is intended > to solve? Sure - what sometimes happens is that various parts of a system disagree over what is whitespace. For instance, a server may strip whitespace using Java's built-in check that does not recognize all of the above-mentioned characters - http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html#isWhitespace(int)- when the intent was to remove all whitespace. Malicious users may use characters that evade this check to introduce whitespace to content in ways that are unwanted or misleading. In other cases some system may tokenize by whitespace but varying definitions of what whitespace is result in different tokenizations and security concerns (as in the case Stephen Dolan mentions earlier). It may also assist in debugging seemingly-identical JSON strings that differ only by invisible or indiscernible whitespace, whether malicious or intentional. Does the proposal really solve the problem, given that generators don't > have to implement it, that they cannot implement it for characters added to > Unicode in a Unicode version later than the one they're based on, and that > parsers cannot rely on generators to have implemented it? > It certainly does not solve it. It mitigates it in the same way that escaping control characters and ASCII whitespace in strings mitigate similar concerns; they make it easier to see exactly what a string is intended to contain, in human-readable characters. The case it helps mitigate is the common one where a non-malicious generator is sending the JSON you're trying to understand, as for instance when a site's Javascript is communicating with its own server. One of the nice things about JSON is that it is easy to debug problems in JSON data using primitive tools - dumping text into page content, or hitting a URL directly and looking at the JSON in the browser. As much as possible, implementations should assist. The recommendation could list a specific set of current characters and additionally refer to the whitespace and control characters in the latest Unicode version. As a mitigation measure it helps even though it is partial. This may be a candidate for a best practice recommendation instead; I thought it was worth mentioning one way or another.
- [Json] "Generators SHOULD escape all Unicode whit… Jacob Davies
- Re: [Json] "Generators SHOULD escape all Unicode … Stephen Dolan
- Re: [Json] "Generators SHOULD escape all Unicode … Norbert Lindenberg
- Re: [Json] "Generators SHOULD escape all Unicode … Jacob Davies