Re: what constitutes an "invalid" content-length

Tim Bray <tbray@textuality.com> Tue, 12 July 2016 21:58 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F0B7812D9FC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 14:58:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.207
X-Spam-Level:
X-Spam-Status: No, score=-8.207 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=textuality-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Iw9mZrDX8wD3 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 14:58:46 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 644B512D9F8 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 12 Jul 2016 14:58:46 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1bN5dF-0006z4-DG for ietf-http-wg-dist@listhub.w3.org; Tue, 12 Jul 2016 21:54:21 +0000
Resent-Date: Tue, 12 Jul 2016 21:54:21 +0000
Resent-Message-Id: <E1bN5dF-0006z4-DG@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <tbray@textuality.com>) id 1bN5dB-0006yD-CB for ietf-http-wg@listhub.w3.org; Tue, 12 Jul 2016 21:54:17 +0000
Received: from mail-qt0-f180.google.com ([209.85.216.180]) by maggie.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <tbray@textuality.com>) id 1bN5d8-0004GI-Ru for ietf-http-wg@w3.org; Tue, 12 Jul 2016 21:54:16 +0000
Received: by mail-qt0-f180.google.com with SMTP id j35so15951240qtj.2 for <ietf-http-wg@w3.org>; Tue, 12 Jul 2016 14:53:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=7bZBJhqLsK7RSMjunLtjud7yhZHWpEAIe/AD6VTawqM=; b=JKxyGfRL/g3GlSGLKBbLTw9ydMluGKkh6HlyYLXu3kWe1VVxDCxfv4sGLMSgdaeLwF D7g01RAs3zSC2L9QSYg+Q7yFPo28nX93E+Iuy2g5jjtjI0XcPzJz9QjU5BqnIf1Ist/C v6vNzEf68tNh3fzb1OEOTAOb2SY40TfwTskgcvPhQOXbK4OxRZGlNOFABfPoIWiI+5gD LIYSC8627Pa0BuIq7McZXttTaCqkL98aqI3KtjxqEfWWwAnSXcvdD3jVlCXjcHJ0d9wH 6+haFiQLmHk3KleJrgsSwu4gYRr/8T67Zq9w5Kgr+fNa8thTiPgPXfEPc4dCrU2brLjJ xNcw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=7bZBJhqLsK7RSMjunLtjud7yhZHWpEAIe/AD6VTawqM=; b=e0FeU7IMGHgmOl7op0WtnaXLHTayyg7I+BG7GId8VFOPDno76QEuzZpnXdimYmzrkm i0GAGxhP0GIiweDOcxMCENgaO+X1VCdbWP6vlPqk4vRtBXc10ZtAmxsmG88aQwUWWpVc F2rWu4qxc2AtgRvzDVmlQsb/KM7yL0KD/eKZtbMtNtgdGn8IJjOPAMDtQXQ9akxB4dau YVvzLHAE45bvFtKq2GkPM/j8J7zvJOWVmxdDMtJWIQSb70eL9KjkzytD6NBZ6rxTE43L b2VEHoivPnC5aSqGXytDlVInCgh9zHG7FF/ANL96fwTv7HAgXN3OcunBpS07LaT12OfV YTxQ==
X-Gm-Message-State: ALyK8tIU4JOrftW0NigUA8mTftiwpA3ZFjB7mzrnJMOY4aO6TKZTqg2CTiQbs6ZTgCmQN0qvpglkg8JFFXsWow==
MIME-Version: 1.0
X-Received: by 10.200.40.235 with SMTP id j40mr6751235qtj.99.1468360428437; Tue, 12 Jul 2016 14:53:48 -0700 (PDT)
Received: by 10.140.98.212 with HTTP; Tue, 12 Jul 2016 14:53:48 -0700 (PDT)
X-Originating-IP: [64.141.86.146]
Received: by 10.140.98.212 with HTTP; Tue, 12 Jul 2016 14:53:48 -0700 (PDT)
In-Reply-To: <em19b7fba4-42bf-40e8-83a9-132dfdc92698@bodybag>
References: <em19b7fba4-42bf-40e8-83a9-132dfdc92698@bodybag>
Date: Tue, 12 Jul 2016 14:53:48 -0700
Message-ID: <CAHBU6isPH3vu7Cq6dTOV-f0kb2g_Oc+iXi4kX1m7JRceWxiRVw@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: "Adrien W. de Croy" <adrien@qbik.com>
Cc: ietf-http-wg@w3.org
Content-Type: multipart/alternative; boundary="001a113e9024bd5a410537774d3c"
Received-SPF: none client-ip=209.85.216.180; envelope-from=tbray@textuality.com; helo=mail-qt0-f180.google.com
X-W3C-Hub-Spam-Status: No, score=-5.6
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1bN5d8-0004GI-Ru d7bda3482e40515a3d379b862830dadd
X-Original-To: ietf-http-wg@w3.org
Subject: Re: what constitutes an "invalid" content-length
Archived-At: <http://www.w3.org/mid/CAHBU6isPH3vu7Cq6dTOV-f0kb2g_Oc+iXi4kX1m7JRceWxiRVw@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/31933
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

I've written two large-scale web crawlers, processing billions of links,
and since my principle was to err on the side of inclusiveness, I totally
ignored Content-length and wrote the necessary code to deal with whatever I
got, including the occasional GET returning an infinite stream of
smiley-emoji or null bytes or whatever.
On Jul 12, 2016 6:36 AM, "Adrien de Croy" <adrien@qbik.com> wrote:

> Hi all
>
> just dealing with a site that sends more payload data than is indicated in
> the Content-Length header.
>
> If the browser connects directly, the page loads fine, if via the proxy,
> the proxy is truncating the length to that advertised and the client isn't
> displaying a page (of course this is the .css file).
>
> RFC7230 sections 3.3.2 (Content-Length), 3.3.3 (Message body length),
> and 3.3.4 (Handling incomplete messages) only contemplate issues around
> Content-Length specifying more bytes than are received, not fewer.
>
> I guess one could argue that a wrong C-L value is "invalid", but it's not
> clear that invalid in this context simply means it doesn't parse, or is
> otherwise non-compliant with the ABNF.
>
> So, it's not clear what the browser and/or proxy response should be.  If
> we deem a wrong value to be "invalid" (s3.3.3 para 4), a client is supposed
> to discard the response.  This isn't happening.
>
> For the proxy, it only sees that the content length is wrong once it
> receives too many bytes.  By this stage, the horse has bolted so it cannot
> really comply either.
>
> I would expect it's in everyone's best interest if sites that have broken
> framing are forced to be fixed.  This won't happen if browsers "just work"
> for the site.
>
> Is there a special behaviour we should agree on for such cases?
>
> Regards
>
> Adrien de Croy
>