Re: Header field-name token and leading spaces

Jeff Pinner <jpinner@twitter.com> Sun, 03 March 2013 19:18 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2ECA321F8835 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 3 Mar 2013 11:18:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.977
X-Spam-Level:
X-Spam-Status: No, score=-9.977 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7QF6m1OIRQTw for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 3 Mar 2013 11:18:21 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 2099621F8833 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 3 Mar 2013 11:18:21 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UCEPo-0002LE-H3 for ietf-http-wg-dist@listhub.w3.org; Sun, 03 Mar 2013 19:17:44 +0000
Resent-Date: Sun, 03 Mar 2013 19:17:44 +0000
Resent-Message-Id: <E1UCEPo-0002LE-H3@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <jpinner@twitter.com>) id 1UCEPd-0002Dy-K0 for ietf-http-wg@listhub.w3.org; Sun, 03 Mar 2013 19:17:33 +0000
Received: from mail-oa0-f42.google.com ([209.85.219.42]) by maggie.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <jpinner@twitter.com>) id 1UCEPd-0006Xj-05 for ietf-http-wg@w3.org; Sun, 03 Mar 2013 19:17:33 +0000
Received: by mail-oa0-f42.google.com with SMTP id i18so8211098oag.1 for <ietf-http-wg@w3.org>; Sun, 03 Mar 2013 11:17:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=twitter.com; s=google; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=s0l+7COD5z0aGOPuwinmkOVdCq7UxEw6/2tRdWVcMeU=; b=NyHYbDDfNFcdbXRQt2OmQ3CRR9xZ4kQFzLiS7Rx71LOtPWW1+rWgO1w0IKEETh81Nt UEN9RaRbJtzbddig1fux6n8iJzw0QI+BWYV5ur//IfX5J7s4tq0RDxDjTE833BFqUWrl Xwv1vLGQDMv+H9xcTA2IBavyYV54b80UjmKYs=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding :x-gm-message-state; bh=s0l+7COD5z0aGOPuwinmkOVdCq7UxEw6/2tRdWVcMeU=; b=M6W+rXfoi9Xg81xovudFspWT1suWtMLbYfAXCcTaaNLLhucb93GRRxovIJ7j9pMHuV 57Hr/RSnmMFWm2VebvFfHF/TdK6ZDOAxeDFe9In9eiBrOMiF8v+G8moDTwqTs/BKxTvj 7ptxOYLXDzQM42hpPRQuFvfl5lnHw+EgqF+NY8kOniU8DCrjN4PipKAONjJDJhRVvWXe Fy6rogH3Gm8Z2sgKIoRESywll7RVacJhy4oUK/5jNmc2t/CKsw7a+/U7+uYYVu5xf0lP BrtMNgYD2+wrGaJCaoOsW6WfbGJVdPOVmWLmXOTgvg2Xx4UPnBPEw7W09ukFF0WFWDDR QZsQ==
MIME-Version: 1.0
X-Received: by 10.182.194.35 with SMTP id ht3mr6984626obc.39.1362338226692; Sun, 03 Mar 2013 11:17:06 -0800 (PST)
Received: by 10.182.87.134 with HTTP; Sun, 3 Mar 2013 11:17:06 -0800 (PST)
In-Reply-To: <02D76B3C-BED4-4E6C-BF72-6ED327FF72E8@la-grange.net>
References: <02D76B3C-BED4-4E6C-BF72-6ED327FF72E8@la-grange.net>
Date: Sun, 03 Mar 2013 11:17:06 -0800
Message-ID: <CA+pLO_hwWfwHeXZ+kgf-0-i5E49jQeKFNEE7vB9-gLbt0dpXfg@mail.gmail.com>
From: Jeff Pinner <jpinner@twitter.com>
To: Karl Dubost <karl@la-grange.net>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State: ALoCoQkIK0bsAwIxmrby2SvO+WsVmF8i50kmU8UsXPuQF3s0xRefUPN0jEyn3nAwsmv9eCZK3Arh
Received-SPF: pass client-ip=209.85.219.42; envelope-from=jpinner@twitter.com; helo=mail-oa0-f42.google.com
X-W3C-Hub-Spam-Status: No, score=-4.8
X-W3C-Hub-Spam-Report: AWL=-2.149, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1UCEPd-0006Xj-05 20b2056e992b8c3b9a815f4fa03e0f44
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Header field-name token and leading spaces
Archived-At: <http://www.w3.org/mid/CA+pLO_hwWfwHeXZ+kgf-0-i5E49jQeKFNEE7vB9-gLbt0dpXfg@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16959
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

The production rules forbid a leading space because you can't
disambiguate it from (obsolete) line folding. Nothing is said about
parsing it because as long as the leading space isn't on the first
header sent, the parser assumes its part of the previous header value.

On Sun, Mar 3, 2013 at 11:03 AM, Karl Dubost <karl@la-grange.net> wrote:
> Hi,
>
> This is a bit long but it came because we were trying to fix a bug into python library for HTTP headers and production rules.
>
> request.add_header('foo', 'bar')
> → "foo:bar"
> request.add_header(' foo', 'bar')
> → " foo:bar"
> request.add_header('foo ', 'bar')
> → "foo :bar"
>
> What I gathered from the spec:
>
> In 3.2.  Header Fields,
> http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2
>
> the header production rules are defined as:
>
>      header-field   = field-name ":" OWS field-value BWS
>      field-name     = token
>      field-value    = *( field-content / obs-fold )
>      field-content  = *( HTAB / SP / VCHAR / obs-text )
>      obs-fold       = CRLF ( SP / HTAB )
>                     ; obsolete line folding
>                     ; see Section 3.2.4
>
>    The field-name token labels the corresponding field-value as having
>    the semantics defined by that header field.
>
> So far so good, but we do not know what are the production rules for "field-name     = token". It might come later. Let's read a bit more.
>
> In 3.2.3, Whitespace, there are production rules for OWS, BWS, and RWS:
> http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.3
>
> This defines at least the rules for
>
>      header-field   = field-name ":" OWS field-value BWS
>
> which says basically.
>
> --------------------------------------------------
> OK    "Foo: bar"       (1 space or more)
> OK    "Foo:     bar"   (1 tab or more)
> OK    "Foo:bar"        (no space)
> --------------------------------------------------
> AVOID "Foo: bar "      (1 trailing space or more)
> AVOID "Foo: bar "      (1 trailing tab or more)
> --------------------------------------------------
>
> AVOID means:
>
> * senders SHOULD NOT generate it in messages.
> * recipients MUST accept such bad optional whitespace and remove it
>   before interpreting the field value or forwarding the message
>   downstream.
>
> ok cool. Let's go on.
>
> In 3.2.4 Field Parsing
> http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.4
>
>    No whitespace is allowed between the header field-name and colon.
>
> So In production rules, we can't do:
>
> -------------------------------------------------------
> BAD  "Foo :bar"    (1 or more space/tab before the ":")
> -------------------------------------------------------
>
>    In
>    the past, differences in the handling of such whitespace have led to
>    security vulnerabilities in request routing and response handling.  A
>    server MUST reject any received request message that contains
>    whitespace between a header field-name and colon with a response code
>    of 400 (Bad Request).
>
>
> OK. This is clear too. Tested on W3C Server,
>
> → curl -I -H "foo :bar" --trace-ascii - http://www.w3.org/
>
> W3C server sent back
>
>     HTTP/1.0 400 Bad request
>
> Though not all servers do that:
>
> → curl -I -H "foo :bar" --trace-ascii - http://www.ietf.org/
> HTTP/1.1 200 OK
>
>
> There is a rule also for proxies:
>
>   A proxy MUST remove any such whitespace from a
>    response message before forwarding the message downstream.
>
> -------------------------------------------------------
> "foo :bar" → "foo:bar"
> -------------------------------------------------------
>
> MY QUESTION (finally) :)
>
> Nothing is said about
> -------------------------------------------------------
> " foo:bar"   (1 or more space/tab before the fied-name)
> -------------------------------------------------------
>
> In appendix C, the ABNF defines token for:
> http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#appendix-C
>
> The section of the spec saying
>
>      field-name     = token
>
> with
>
>    token = 1*tchar
>
> and tchar as
>
>    tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
>     "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
>
>
> So the production rules forbid a leading space, but nothing is said about parsing this leading space.
>
> * Should it say something?
> * If yes, what?
> * If not, why?
>
>
> --
> Karl Dubost
> http://www.la-grange.net/karl/
>
>