Re: HTTP URI in the form of "http://example.com?query"

Willy Tarreau <w@1wt.eu> Wed, 05 June 2013 05:38 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0513B21F9362 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 4 Jun 2013 22:38:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, HS_INDEX_PARAM=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NiHVDUYOnm+9 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 4 Jun 2013 22:38:31 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 20A6A21F93BA for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 4 Jun 2013 22:38:27 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Uk6Or-0003KJ-J5 for ietf-http-wg-dist@listhub.w3.org; Wed, 05 Jun 2013 05:36:45 +0000
Resent-Date: Wed, 05 Jun 2013 05:36:45 +0000
Resent-Message-Id: <E1Uk6Or-0003KJ-J5@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <w@1wt.eu>) id 1Uk6Ob-0003Hv-Ld for ietf-http-wg@listhub.w3.org; Wed, 05 Jun 2013 05:36:29 +0000
Received: from 1wt.eu ([62.212.114.60]) by maggie.w3.org with esmtp (Exim 4.72) (envelope-from <w@1wt.eu>) id 1Uk6OW-00053X-Lt for ietf-http-wg@w3.org; Wed, 05 Jun 2013 05:36:29 +0000
Received: (from willy@localhost) by mail.home.local (8.14.4/8.14.4/Submit) id r555a0tr025572; Wed, 5 Jun 2013 07:36:00 +0200
Date: Wed, 05 Jun 2013 07:36:00 +0200
From: Willy Tarreau <w@1wt.eu>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Roberto Peon <grmocg@gmail.com>, Zhong Yu <zhong.j.yu@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20130605053600.GB21058@1wt.eu>
References: <CACuKZqFvFo2ztDBZwMVtSE54rvHthyJJc-8X-yFq=CSVMy9GXw@mail.gmail.com> <51AD8EC1.4010608@gmx.de> <CAP+FsNd3mWAtQ_Fn_Kvtx28300SO761fEZBhnm9NrHv_WQ2BjQ@mail.gmail.com> <51AD9A87.9020403@gmx.de> <CAP+FsNe9oZQPxS-8rhz=MYiLj8aceKFTCJow4wrRnXDrrgdzww@mail.gmail.com> <51AD9F87.5050603@gmx.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <51AD9F87.5050603@gmx.de>
User-Agent: Mutt/1.4.2.3i
Received-SPF: pass client-ip=62.212.114.60; envelope-from=w@1wt.eu; helo=1wt.eu
X-W3C-Hub-Spam-Status: No, score=-3.3
X-W3C-Hub-Spam-Report: AWL=-2.792, RP_MATCHES_RCVD=-0.535, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1Uk6OW-00053X-Lt f741eb03166906245b5f2cd8729ee960
X-Original-To: ietf-http-wg@w3.org
Subject: Re: HTTP URI in the form of "http://example.com?query"
Archived-At: <http://www.w3.org/mid/20130605053600.GB21058@1wt.eu>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/18180
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hi Julian,

On Tue, Jun 04, 2013 at 10:04:23AM +0200, Julian Reschke wrote:
> On 2013-06-04 09:50, Roberto Peon wrote:
> >A search for regular expression (or synonym) and url will bring up
> >numerous examples which would be broken by this change.
> >It is certainly not every one, but numerous, nonetheless.
> >
> >Here is one example:
> >http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/
> >-=R
> >...
> 
> Yes, but what's the exact breakage except for one component not 
> processing that edge case? It's an edge case after all?

Interesting case, at least it breaks haproxy's path extraction, which
relies on 2616. When you need to check the path from a request, haproxy
does this :

   1) skip the scheme and "://"
   2) skip user:pass@host:port
   3) look for the first "/"
   4) return everything from the first "/" to the first "?" or end of
      the string.

So "http://example.com?query=foo/bar" will return "/bar" as the path of
the request instead of an empty string or "/". BTW, is "/" supposed to
be the abspath here, or just something empty ? I'm asking because haproxy
returns a pointer to the beginning of the string and a length, so if the
response is "/", we don't have it in this request, so probably the best
thing to do would be to "fix" the request to insert the "/" before "?".

> Me thinks it's better to be (a) consistent with generic URI parsing and 
> (b) what important components already do (UAs, http servers etc).

I agree. Using a generic parser is important in that it avoids future
incompatibilities as in the example above which was based on 2616.

Best regards,
Willy