Re: p1: whitespace in request-target

Willy Tarreau <> Thu, 18 April 2013 05:58 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id EE59921F85CC for <>; Wed, 17 Apr 2013 22:58:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id QTFr94MyURb7 for <>; Wed, 17 Apr 2013 22:58:11 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 123BB21F8506 for <>; Wed, 17 Apr 2013 22:58:11 -0700 (PDT)
Received: from lists by with local (Exim 4.72) (envelope-from <>) id 1UShr1-0006yB-NM for; Thu, 18 Apr 2013 05:57:55 +0000
Resent-Date: Thu, 18 Apr 2013 05:57:55 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtp (Exim 4.72) (envelope-from <>) id 1UShqy-0006xS-Rs for; Thu, 18 Apr 2013 05:57:52 +0000
Received: from ([]) by with esmtp (Exim 4.72) (envelope-from <>) id 1UShqx-0000GE-Op for; Thu, 18 Apr 2013 05:57:52 +0000
Received: (from willy@localhost) by mail.home.local (8.14.4/8.14.4/Submit) id r3I5trwE014119; Thu, 18 Apr 2013 07:55:53 +0200
Date: Thu, 18 Apr 2013 07:55:53 +0200
From: Willy Tarreau <>
To: Mark Nottingham <>
Cc: " Group" <>, Amos Jeffries <>, Roy Fielding <>
Message-ID: <>
References: <>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/
Received-SPF: pass client-ip=;;
X-W3C-Hub-Spam-Status: No, score=-3.3
X-W3C-Hub-Spam-Report: AWL=-2.782, RP_MATCHES_RCVD=-0.556, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: 1UShqx-0000GE-Op 48f47f8127f5fe75f2a120bc04319c5b
Subject: Re: p1: whitespace in request-target
Archived-At: <>
X-Mailing-List: <> archive/latest/17326
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>


On Thu, Apr 18, 2013 at 10:49:10AM +1000, Mark Nottingham wrote:
> p1 3.1.1 says:
> > Unfortunately, some user agents fail to properly encode hypertext
> > references that have embedded whitespace, sending the characters directly
> > instead of properly encoding or excluding the disallowed characters.
> > Recipients of an invalid request-line SHOULD respond with either a 400 (Bad
> > Request) error or a 301 (Moved Permanently) redirect with the
> > request-target properly encoded. Recipients SHOULD NOT attempt to
> > autocorrect and then process the request without a redirect, since the
> > invalid request-line might be deliberately crafted to bypass security
> > filters along the request chain.
> I note that the practice of correcting this is fairly widespread; e.g., in
> Squid, the default is to strip the whitespace, and IIRC has been for some
> time:

Does Squid log something which helps seeing when it fixes this ?

> I think that the Squid documentation needs to be corrected, because the text
> in RFC2396 (and later in 3986) is about URIs in contexts like books, e-mail
> and so forth, not protocol elements:
> My question is why this is a SHOULD / SHOULD NOT. We say that SHOULD-level
> requirements affect conformance unless there's a documented exception here:
> ... but these requirements don't mention any exceptions. Is the security risk
> here high enough to justify a MUST / MUST NOT? If not, they probably need to
> be downgraded to ought (or an exception needs to be highlighted).

Well, FWIW, haproxy is strict on this and rejects requests which don't
exactly match the expected format, which means that embedded spaces are
rejected with a 400 badreq. At the same time when such a bad request
happens, the whole request is captured. I must say that all the ones
I have seen to date (which are extremely rare) were made by poorly
written attack scripts, or by stupid web scraping tools that resolve
URL-encoding in links found on web pages before doing the request.
These are the same which append '">' at the end of a URL because they
failed to properly parse the "<a href=..." tag.

So in my opinion, we could reasonably use a MUST here. Suggesting recipients
to fix this significantly increases the risk that they do it wrong and become
vulnerable to certain classes of attacks. And we're clearly not contributing
to cleaning the web by accepting such erroneous behaviours, especially if
browsers managed to get it right.