Re: what constitutes an "invalid" content-length

Alex Rousskov <rousskov@measurement-factory.com> Tue, 12 July 2016 16:26 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7B64112B03C for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 09:26:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.208
X-Spam-Level:
X-Spam-Status: No, score=-8.208 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id od7-ogaeHYO2 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 09:26:30 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0355812D542 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 12 Jul 2016 09:26:30 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1bN0R9-0002E3-QI for ietf-http-wg-dist@listhub.w3.org; Tue, 12 Jul 2016 16:21:31 +0000
Resent-Date: Tue, 12 Jul 2016 16:21:31 +0000
Resent-Message-Id: <E1bN0R9-0002E3-QI@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <rousskov@measurement-factory.com>) id 1bN0R5-0002DI-4s for ietf-http-wg@listhub.w3.org; Tue, 12 Jul 2016 16:21:27 +0000
Received: from mail.measurement-factory.com ([104.237.131.42]) by maggie.w3.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from <rousskov@measurement-factory.com>) id 1bN0R3-000254-3w for ietf-http-wg@w3.org; Tue, 12 Jul 2016 16:21:26 +0000
Received: from [65.102.233.169] (unknown [65.102.233.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.measurement-factory.com (Postfix) with ESMTPSA id 87FB3E06A; Tue, 12 Jul 2016 16:21:01 +0000 (UTC)
To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
References: <em19b7fba4-42bf-40e8-83a9-132dfdc92698@bodybag>
From: Alex Rousskov <rousskov@measurement-factory.com>
Cc: Adrien de Croy <adrien@qbik.com>
Message-ID: <578518CD.8070305@measurement-factory.com>
Date: Tue, 12 Jul 2016 10:20:29 -0600
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.8.0
MIME-Version: 1.0
In-Reply-To: <em19b7fba4-42bf-40e8-83a9-132dfdc92698@bodybag>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Received-SPF: pass client-ip=104.237.131.42; envelope-from=rousskov@measurement-factory.com; helo=mail.measurement-factory.com
X-W3C-Hub-Spam-Status: No, score=-6.0
X-W3C-Hub-Spam-Report: AWL=-0.856, BAYES_00=-1.9, RP_MATCHES_RCVD=-1.287, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1bN0R3-000254-3w f99f68a8109fb74761c295d16ff62ba1
X-Original-To: ietf-http-wg@w3.org
Subject: Re: what constitutes an "invalid" content-length
Archived-At: <http://www.w3.org/mid/578518CD.8070305@measurement-factory.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/31931
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 07/12/2016 07:31 AM, Adrien de Croy wrote:

> just dealing with a site that sends more payload data than is indicated
> in the Content-Length header.

>From the standards point of view, that is _not_ what you are dealing
with. You are dealing with a site that sends two responses, the first
response is proper HTTP. The second response is garbage.


> RFC7230 sections 3.3.2 (Content-Length), 3.3.3 (Message body length),
> and 3.3.4 (Handling incomplete messages) only contemplate issues around
> Content-Length specifying more bytes than are received, not fewer.

>From the standards point of view, it is impossible for the
Content-Length to specify fewer bytes than the message has. Irrelevant
for this discussion cases aside, the message end is defined by the
Content-Length header value. One cannot have more than what was promised
because one stops assembling the message [body] after the promised
number of bytes were added. Any "leftovers" are another message or
garbage, depending on Connection:close, pipelining, and similar factors.


> I guess one could argue that a wrong C-L value is "invalid", but it's
> not clear that invalid in this context simply means it doesn't parse, or
> is otherwise non-compliant with the ABNF.

It is valid from protocol point of view. You know it is "wrong" only
because you can (or you think you can) distinguish garbage from the end
of the content.


> So, it's not clear what the browser and/or proxy response should be.

There is no single right answer to that. A compliant client (including
proxies) ought to treat leftovers as post-message gardbage or another
message. A real-world client may identify specific cases where leftovers
are likely to be the end of the message content and ignore
Content-Length in those cases. The cases where such behavior would be a
good idea would vary from agent to agent, from one deployment to another.


> I would expect it's in everyone's best interest if sites that have
> broken framing are forced to be fixed.  This won't happen if browsers
> "just work" for the site.

The ever-popular "force sites to be fixed" approach rarely fixes enough
real-word sites to remove special treatment code. See Patrick's response
for a good illustration.


> Is there a special behaviour we should agree on for such cases?

We could agree to violate the standard in one or two special cases, but
any formal agreement would probably result in a few more broken sites
because more folks will tolerate them, decreasing the probability that
they will be fixed.

I can think of one special case where it is more-or-less safe to ignore
response Content-Length:

* the HTTP/1 connection is not persistent,
* no additional outstanding pipelined requests on that connection,
* the unique Content-Length header field is syntactically valid, and
* more bytes were read during the last network read than C-L promises.

The combination of these conditions can trigger [optional] "robustness"
code that reads until connection closure and re-sends leftovers/garbage
to the next hop (or displays it to the user), opening a message
smuggling attack vector.

Needless to say, there are benign leftover cases that the above
conditions do not cover.


Cheers,

Alex.