Re: what constitutes an "invalid" content-length
Tim Bray <tbray@textuality.com> Tue, 12 July 2016 21:58 UTC
Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F0B7812D9FC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 14:58:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.207
X-Spam-Level:
X-Spam-Status: No, score=-8.207 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=textuality-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Iw9mZrDX8wD3 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 14:58:46 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 644B512D9F8 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 12 Jul 2016 14:58:46 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1bN5dF-0006z4-DG for ietf-http-wg-dist@listhub.w3.org; Tue, 12 Jul 2016 21:54:21 +0000
Resent-Date: Tue, 12 Jul 2016 21:54:21 +0000
Resent-Message-Id: <E1bN5dF-0006z4-DG@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <tbray@textuality.com>) id 1bN5dB-0006yD-CB for ietf-http-wg@listhub.w3.org; Tue, 12 Jul 2016 21:54:17 +0000
Received: from mail-qt0-f180.google.com ([209.85.216.180]) by maggie.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <tbray@textuality.com>) id 1bN5d8-0004GI-Ru for ietf-http-wg@w3.org; Tue, 12 Jul 2016 21:54:16 +0000
Received: by mail-qt0-f180.google.com with SMTP id j35so15951240qtj.2 for <ietf-http-wg@w3.org>; Tue, 12 Jul 2016 14:53:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=7bZBJhqLsK7RSMjunLtjud7yhZHWpEAIe/AD6VTawqM=; b=JKxyGfRL/g3GlSGLKBbLTw9ydMluGKkh6HlyYLXu3kWe1VVxDCxfv4sGLMSgdaeLwF D7g01RAs3zSC2L9QSYg+Q7yFPo28nX93E+Iuy2g5jjtjI0XcPzJz9QjU5BqnIf1Ist/C v6vNzEf68tNh3fzb1OEOTAOb2SY40TfwTskgcvPhQOXbK4OxRZGlNOFABfPoIWiI+5gD LIYSC8627Pa0BuIq7McZXttTaCqkL98aqI3KtjxqEfWWwAnSXcvdD3jVlCXjcHJ0d9wH 6+haFiQLmHk3KleJrgsSwu4gYRr/8T67Zq9w5Kgr+fNa8thTiPgPXfEPc4dCrU2brLjJ xNcw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=7bZBJhqLsK7RSMjunLtjud7yhZHWpEAIe/AD6VTawqM=; b=e0FeU7IMGHgmOl7op0WtnaXLHTayyg7I+BG7GId8VFOPDno76QEuzZpnXdimYmzrkm i0GAGxhP0GIiweDOcxMCENgaO+X1VCdbWP6vlPqk4vRtBXc10ZtAmxsmG88aQwUWWpVc F2rWu4qxc2AtgRvzDVmlQsb/KM7yL0KD/eKZtbMtNtgdGn8IJjOPAMDtQXQ9akxB4dau YVvzLHAE45bvFtKq2GkPM/j8J7zvJOWVmxdDMtJWIQSb70eL9KjkzytD6NBZ6rxTE43L b2VEHoivPnC5aSqGXytDlVInCgh9zHG7FF/ANL96fwTv7HAgXN3OcunBpS07LaT12OfV YTxQ==
X-Gm-Message-State: ALyK8tIU4JOrftW0NigUA8mTftiwpA3ZFjB7mzrnJMOY4aO6TKZTqg2CTiQbs6ZTgCmQN0qvpglkg8JFFXsWow==
MIME-Version: 1.0
X-Received: by 10.200.40.235 with SMTP id j40mr6751235qtj.99.1468360428437; Tue, 12 Jul 2016 14:53:48 -0700 (PDT)
Received: by 10.140.98.212 with HTTP; Tue, 12 Jul 2016 14:53:48 -0700 (PDT)
X-Originating-IP: [64.141.86.146]
Received: by 10.140.98.212 with HTTP; Tue, 12 Jul 2016 14:53:48 -0700 (PDT)
In-Reply-To: <em19b7fba4-42bf-40e8-83a9-132dfdc92698@bodybag>
References: <em19b7fba4-42bf-40e8-83a9-132dfdc92698@bodybag>
Date: Tue, 12 Jul 2016 14:53:48 -0700
Message-ID: <CAHBU6isPH3vu7Cq6dTOV-f0kb2g_Oc+iXi4kX1m7JRceWxiRVw@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: "Adrien W. de Croy" <adrien@qbik.com>
Cc: ietf-http-wg@w3.org
Content-Type: multipart/alternative; boundary="001a113e9024bd5a410537774d3c"
Received-SPF: none client-ip=209.85.216.180; envelope-from=tbray@textuality.com; helo=mail-qt0-f180.google.com
X-W3C-Hub-Spam-Status: No, score=-5.6
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1bN5d8-0004GI-Ru d7bda3482e40515a3d379b862830dadd
X-Original-To: ietf-http-wg@w3.org
Subject: Re: what constitutes an "invalid" content-length
Archived-At: <http://www.w3.org/mid/CAHBU6isPH3vu7Cq6dTOV-f0kb2g_Oc+iXi4kX1m7JRceWxiRVw@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/31933
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
I've written two large-scale web crawlers, processing billions of links, and since my principle was to err on the side of inclusiveness, I totally ignored Content-length and wrote the necessary code to deal with whatever I got, including the occasional GET returning an infinite stream of smiley-emoji or null bytes or whatever. On Jul 12, 2016 6:36 AM, "Adrien de Croy" <adrien@qbik.com> wrote: > Hi all > > just dealing with a site that sends more payload data than is indicated in > the Content-Length header. > > If the browser connects directly, the page loads fine, if via the proxy, > the proxy is truncating the length to that advertised and the client isn't > displaying a page (of course this is the .css file). > > RFC7230 sections 3.3.2 (Content-Length), 3.3.3 (Message body length), > and 3.3.4 (Handling incomplete messages) only contemplate issues around > Content-Length specifying more bytes than are received, not fewer. > > I guess one could argue that a wrong C-L value is "invalid", but it's not > clear that invalid in this context simply means it doesn't parse, or is > otherwise non-compliant with the ABNF. > > So, it's not clear what the browser and/or proxy response should be. If > we deem a wrong value to be "invalid" (s3.3.3 para 4), a client is supposed > to discard the response. This isn't happening. > > For the proxy, it only sees that the content length is wrong once it > receives too many bytes. By this stage, the horse has bolted so it cannot > really comply either. > > I would expect it's in everyone's best interest if sites that have broken > framing are forced to be fixed. This won't happen if browsers "just work" > for the site. > > Is there a special behaviour we should agree on for such cases? > > Regards > > Adrien de Croy >
- Re: what constitutes an "invalid" content-length Willy Tarreau
- Re: what constitutes an "invalid" content-length Patrick McManus
- Re: what constitutes an "invalid" content-length Mark Nottingham
- Re: what constitutes an "invalid" content-length Adrien de Croy
- Re: what constitutes an "invalid" content-length Tim Bray
- Re: what constitutes an "invalid" content-length Willy Tarreau
- Re: what constitutes an "invalid" content-length Alex Rousskov
- Re: what constitutes an "invalid" content-length Adrien de Croy
- Re: what constitutes an "invalid" content-length Patrick McManus
- what constitutes an "invalid" content-length Adrien de Croy