Re: Misc review notes for draft-18 p1

Amos Jeffries <squid3@treenet.co.nz> Mon, 30 January 2012 00:18 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CB6B221F84E4 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 29 Jan 2012 16:18:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.575
X-Spam-Level:
X-Spam-Status: No, score=-9.575 tagged_above=-999 required=5 tests=[AWL=1.024, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id F2ZMrv4NiQyC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 29 Jan 2012 16:18:31 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 3C7EA21F84E0 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 29 Jan 2012 16:18:31 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.69) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Rrewr-0002W2-GM for ietf-http-wg-dist@listhub.w3.org; Mon, 30 Jan 2012 00:18:17 +0000
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.69) (envelope-from <squid3@treenet.co.nz>) id 1Rrewg-0002Uj-N0 for ietf-http-wg@listhub.w3.org; Mon, 30 Jan 2012 00:18:06 +0000
Received: from ip-58-28-153-233.static-xdsl.xnet.co.nz ([58.28.153.233] helo=treenet.co.nz) by lisa.w3.org with esmtp (Exim 4.72) (envelope-from <squid3@treenet.co.nz>) id 1RrewW-0002bU-Ff for ietf-http-wg@w3.org; Mon, 30 Jan 2012 00:18:01 +0000
Received: from [10.1.1.14] (unknown [119.224.40.49]) by treenet.co.nz (Postfix) with ESMTP id B2D24E721D for <ietf-http-wg@w3.org>; Mon, 30 Jan 2012 13:17:34 +1300 (NZDT)
Message-ID: <4F25E19C.9090700@treenet.co.nz>
Date: Mon, 30 Jan 2012 13:17:32 +1300
From: Amos Jeffries <squid3@treenet.co.nz>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1
MIME-Version: 1.0
To: ietf-http-wg@w3.org
References: <20120126155637.GA11227@1wt.eu> <4F219829.4000704@gmx.de> <20120126193619.GA13716@1wt.eu>
In-Reply-To: <20120126193619.GA13716@1wt.eu>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Received-SPF: pass client-ip=58.28.153.233; envelope-from=squid3@treenet.co.nz; helo=treenet.co.nz
X-W3C-Hub-Spam-Status: No, score=-1.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1RrewW-0002bU-Ff 5d35d800496873baecf1d79f1abbb789
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Misc review notes for draft-18 p1
Archived-At: <http://www.w3.org/mid/4F25E19C.9090700@treenet.co.nz>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/12253
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
Resent-Message-Id: <E1Rrewr-0002W2-GM@frink.w3.org>
Resent-Date: Mon, 30 Jan 2012 00:18:17 +0000

On 27/01/2012 8:36 a.m., Willy Tarreau wrote:
>
> (...)
>>>>    When a server listening only for HTTP request messages, or processing
>>>>    what appears from the start-line to be an HTTP request message,
>>>>    receives a sequence of octets that does not match the HTTP-message
>>> Wouldn't "does not *exactly* match" be better ? I'm used to find
>>> crappy requests in my logs which are blocked but which some not-so-lazy
>>> implementations would let pass (eg: multiple SP).
>> "match" means "match"; I don't think there's any ambiguity here...
> There's no ambiguity, it's just to emphasize on the need to perform
> strict matching. A large number of HTTP parsers are much too lazy,
> causing nightmares when trying to filter undesired communications,
> or even to define new protocol extensions. For instance on my old
> Apache 1.3 here :
>
>      $ telnet www 60080
>      Connected to www.
>      Escape character is '^]'.
>      HEAD     /           HTTP/1.1     ergeargoaejgoiejgaoeg
>      Host:   ,,,,
>      Invalid/header name: blah
>
>      HTTP/1.1 200 OK
>      Date: Thu, 26 Jan 2012 19:07:02 GMT
>      Server: Apache
>      Last-Modified: Mon, 01 Jun 2009 16:47:12 GMT
>      ETag: "47038-3ad7-46b4c2d81a400"
>      Accept-Ranges: bytes
>      Content-Length: 15063
>      Connection: close
>      Content-Type: text/html
>
>      Connection closed by foreign host.
>
> "SP" is *one* SP, still multiple SPs are accepted in the request
> line. Same for forbidden chars in the header name. And I'm not
> specifically targeting Apache here, I just took the first example
> I had handy, it's far from being alone. It looks like strchr(),
> strtok(), sscanf() or split() depending on the language and
> implementation are common ways to parse requests. This is part
> of what caused all the mess in the hybi WG, delaying it by one
> year trying to find solutions against various implementations.


FWIW: we argued this out in Squid a while back.
The conclusion was to accept any series of non-wrapping BWS before/after 
the method and URL. Ignoring the BWS. All other formats and garbage to 
be treated as HTTP/0.9 mess and 400 the result if the suspected 
URL(+garbage) fails to parse as a usable URI in its entirety.

A few vendors have hit it with their SP padding practices so far. But by 
and large it works.

AYJ