Re: what constitutes an "invalid" content-length

"Adrien de Croy" <adrien@qbik.com> Tue, 12 July 2016 22:44 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 008C512D9DC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 15:44:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.208
X-Spam-Level:
X-Spam-Status: No, score=-8.208 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vFKUzRCEzk9a for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 15:44:21 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 56F6712DA6D for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 12 Jul 2016 15:44:20 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1bN6Kz-00037z-DR for ietf-http-wg-dist@listhub.w3.org; Tue, 12 Jul 2016 22:39:33 +0000
Resent-Date: Tue, 12 Jul 2016 22:39:33 +0000
Resent-Message-Id: <E1bN6Kz-00037z-DR@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <adrien@qbik.com>) id 1bN6Kv-000370-Mh for ietf-http-wg@listhub.w3.org; Tue, 12 Jul 2016 22:39:29 +0000
Received: from smtp.qbik.com ([122.56.26.1]) by maggie.w3.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from <adrien@qbik.com>) id 1bN6Kn-0006Im-Kj for ietf-http-wg@w3.org; Tue, 12 Jul 2016 22:39:26 +0000
Received: From [192.168.1.146] (unverified [192.168.1.146]) by SMTP Server [192.168.1.3] (WinGate SMTP Receiver v9.0.0 (Build 5838)) with SMTP id <0000774821@smtp.qbik.com>; Wed, 13 Jul 2016 10:38:47 +1200
From: Adrien de Croy <adrien@qbik.com>
To: Alex Rousskov <rousskov@measurement-factory.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Date: Tue, 12 Jul 2016 22:38:47 +0000
Message-Id: <emec1ffa61-8cc9-4139-a25c-3704194a71e4@bodybag>
In-Reply-To: <578518CD.8070305@measurement-factory.com>
Reply-To: Adrien de Croy <adrien@qbik.com>
User-Agent: eM_Client/6.0.24928.0
Mime-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass client-ip=122.56.26.1; envelope-from=adrien@qbik.com; helo=smtp.qbik.com
X-W3C-Hub-Spam-Status: No, score=-5.3
X-W3C-Hub-Spam-Report: AWL=-0.085, BAYES_00=-1.9, RP_MATCHES_RCVD=-1.287, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1bN6Kn-0006Im-Kj 2c3d8c0863a01b1f1a83e85bbc4f6c4e
X-Original-To: ietf-http-wg@w3.org
Subject: Re: what constitutes an "invalid" content-length
Archived-At: <http://www.w3.org/mid/emec1ffa61-8cc9-4139-a25c-3704194a71e4@bodybag>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/31934
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Looks like this turns out to be a red herring sorry all.

improper header wrapping / bare linefeed in a response header value was 
pushing our header byte count out by 2 bytes.



------ Original Message ------
From: "Alex Rousskov" <rousskov@measurement-factory.com>
To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Cc: "Adrien de Croy" <adrien@qbik.com>
Sent: 13/07/2016 4:20:29 a.m.
Subject: Re: what constitutes an "invalid" content-length

>On 07/12/2016 07:31 AM, Adrien de Croy wrote:
>
>>  just dealing with a site that sends more payload data than is 
>>indicated
>>  in the Content-Length header.
>
>From the standards point of view, that is _not_ what you are dealing
>with. You are dealing with a site that sends two responses, the first
>response is proper HTTP. The second response is garbage.
>
>
>>  RFC7230 sections 3.3.2 (Content-Length), 3.3.3 (Message body length),
>>  and 3.3.4 (Handling incomplete messages) only contemplate issues 
>>around
>>  Content-Length specifying more bytes than are received, not fewer.
>
>From the standards point of view, it is impossible for the
>Content-Length to specify fewer bytes than the message has. Irrelevant
>for this discussion cases aside, the message end is defined by the
>Content-Length header value. One cannot have more than what was 
>promised
>because one stops assembling the message [body] after the promised
>number of bytes were added. Any "leftovers" are another message or
>garbage, depending on Connection:close, pipelining, and similar 
>factors.
>
>
>>  I guess one could argue that a wrong C-L value is "invalid", but it's
>>  not clear that invalid in this context simply means it doesn't parse, 
>>or
>>  is otherwise non-compliant with the ABNF.
>
>It is valid from protocol point of view. You know it is "wrong" only
>because you can (or you think you can) distinguish garbage from the end
>of the content.
>
>
>>  So, it's not clear what the browser and/or proxy response should be.
>
>There is no single right answer to that. A compliant client (including
>proxies) ought to treat leftovers as post-message gardbage or another
>message. A real-world client may identify specific cases where 
>leftovers
>are likely to be the end of the message content and ignore
>Content-Length in those cases. The cases where such behavior would be a
>good idea would vary from agent to agent, from one deployment to 
>another.
>
>
>>  I would expect it's in everyone's best interest if sites that have
>>  broken framing are forced to be fixed.  This won't happen if browsers
>>  "just work" for the site.
>
>The ever-popular "force sites to be fixed" approach rarely fixes enough
>real-word sites to remove special treatment code. See Patrick's 
>response
>for a good illustration.
>
>
>>  Is there a special behaviour we should agree on for such cases?
>
>We could agree to violate the standard in one or two special cases, but
>any formal agreement would probably result in a few more broken sites
>because more folks will tolerate them, decreasing the probability that
>they will be fixed.
>
>I can think of one special case where it is more-or-less safe to ignore
>response Content-Length:
>
>* the HTTP/1 connection is not persistent,
>* no additional outstanding pipelined requests on that connection,
>* the unique Content-Length header field is syntactically valid, and
>* more bytes were read during the last network read than C-L promises.
>
>The combination of these conditions can trigger [optional] "robustness"
>code that reads until connection closure and re-sends leftovers/garbage
>to the next hop (or displays it to the user), opening a message
>smuggling attack vector.
>
>Needless to say, there are benign leftover cases that the above
>conditions do not cover.
>
>
>Cheers,
>
>Alex.
>