Header field-name token and leading spaces

Karl Dubost <karl@la-grange.net> Sun, 03 March 2013 19:06 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F374921F87CE for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 3 Mar 2013 11:06:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.301
X-Spam-Level:
X-Spam-Status: No, score=-10.301 tagged_above=-999 required=5 tests=[AWL=-1.098, BAYES_00=-2.599, MIME_QP_LONG_LINE=1.396, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CgkuEljiYBfC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 3 Mar 2013 11:06:27 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id BADB221F87A3 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 3 Mar 2013 11:06:24 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UCECf-0004kZ-Pl for ietf-http-wg-dist@listhub.w3.org; Sun, 03 Mar 2013 19:04:09 +0000
Resent-Date: Sun, 03 Mar 2013 19:04:09 +0000
Resent-Message-Id: <E1UCECf-0004kZ-Pl@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <karl@la-grange.net>) id 1UCECU-0004jq-Lf for ietf-http-wg@listhub.w3.org; Sun, 03 Mar 2013 19:03:58 +0000
Received: from nerval.la-grange.net ([128.30.54.58]) by lisa.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <karl@la-grange.net>) id 1UCECU-0005LB-1L for ietf-http-wg@w3.org; Sun, 03 Mar 2013 19:03:58 +0000
Received: from [127.0.0.1] (nerval.la-grange.net [128.30.54.58]) by nerval.la-grange.net (8.14.5/8.14.5) with ESMTP id r23J2cDC093922 for <ietf-http-wg@w3.org>; Sun, 3 Mar 2013 14:02:38 -0500 (EST) (envelope-from karl@la-grange.net)
From: Karl Dubost <karl@la-grange.net>
Content-Type: text/plain; charset="utf-8"
Message-Id: <02D76B3C-BED4-4E6C-BF72-6ED327FF72E8@la-grange.net>
Date: Sun, 03 Mar 2013 14:03:56 -0500
To: HTTP Working Group <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v1283)
X-Mailer: Apple Mail (2.1283)
X-W3C-Hub-Spam-Status: No, score=-5.2
X-W3C-Hub-Spam-Report: ALL_TRUSTED=-1, AWL=-1.630, BAYES_00=-1.9, MIME_QP_LONG_LINE=0.001, RP_MATCHES_RCVD=-0.626
X-W3C-Scan-Sig: lisa.w3.org 1UCECU-0005LB-1L 3d54c9d4ac62a8a6d71cc22a94099a35
X-Original-To: ietf-http-wg@w3.org
Subject: Header field-name token and leading spaces
Archived-At: <http://www.w3.org/mid/02D76B3C-BED4-4E6C-BF72-6ED327FF72E8@la-grange.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16957
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hi,

This is a bit long but it came because we were trying to fix a bug into python library for HTTP headers and production rules.

request.add_header('foo', 'bar')
→ "foo:bar"
request.add_header(' foo', 'bar')
→ " foo:bar"
request.add_header('foo ', 'bar')
→ "foo :bar"

What I gathered from the spec:

In 3.2.  Header Fields, 
http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2 

the header production rules are defined as:

     header-field   = field-name ":" OWS field-value BWS
     field-name     = token
     field-value    = *( field-content / obs-fold )
     field-content  = *( HTAB / SP / VCHAR / obs-text )
     obs-fold       = CRLF ( SP / HTAB )
                    ; obsolete line folding
                    ; see Section 3.2.4

   The field-name token labels the corresponding field-value as having
   the semantics defined by that header field.

So far so good, but we do not know what are the production rules for "field-name     = token". It might come later. Let's read a bit more. 

In 3.2.3, Whitespace, there are production rules for OWS, BWS, and RWS:
http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.3

This defines at least the rules for 

     header-field   = field-name ":" OWS field-value BWS

which says basically.

--------------------------------------------------
OK    "Foo: bar"       (1 space or more)
OK    "Foo:	bar"   (1 tab or more)
OK    "Foo:bar"        (no space)
--------------------------------------------------
AVOID "Foo: bar "      (1 trailing space or more)
AVOID "Foo: bar "      (1 trailing tab or more)
--------------------------------------------------

AVOID means:

* senders SHOULD NOT generate it in messages.
* recipients MUST accept such bad optional whitespace and remove it
  before interpreting the field value or forwarding the message
  downstream.

ok cool. Let's go on.

In 3.2.4 Field Parsing
http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.4

   No whitespace is allowed between the header field-name and colon.  

So In production rules, we can't do:

-------------------------------------------------------
BAD  "Foo :bar"    (1 or more space/tab before the ":")
-------------------------------------------------------

   In
   the past, differences in the handling of such whitespace have led to
   security vulnerabilities in request routing and response handling.  A
   server MUST reject any received request message that contains
   whitespace between a header field-name and colon with a response code
   of 400 (Bad Request).


OK. This is clear too. Tested on W3C Server, 

→ curl -I -H "foo :bar" --trace-ascii - http://www.w3.org/

W3C server sent back 

    HTTP/1.0 400 Bad request

Though not all servers do that:

→ curl -I -H "foo :bar" --trace-ascii - http://www.ietf.org/
HTTP/1.1 200 OK


There is a rule also for proxies:

  A proxy MUST remove any such whitespace from a
   response message before forwarding the message downstream.

-------------------------------------------------------
"foo :bar" → "foo:bar"
-------------------------------------------------------

MY QUESTION (finally) :) 

Nothing is said about 
-------------------------------------------------------
" foo:bar"   (1 or more space/tab before the fied-name)
-------------------------------------------------------

In appendix C, the ABNF defines token for:
http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#appendix-C

The section of the spec saying

     field-name     = token

with

   token = 1*tchar

and tchar as 

   tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
    "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA


So the production rules forbid a leading space, but nothing is said about parsing this leading space. 

* Should it say something? 
* If yes, what? 
* If not, why?


-- 
Karl Dubost
http://www.la-grange.net/karl/