Re: Range requests and content encoding

Matthew Kerwin <matthew@kerwin.net.au> Tue, 22 December 2015 04:58 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2FD481A0187 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 21 Dec 2015 20:58:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.289
X-Spam-Level:
X-Spam-Status: No, score=-6.289 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id STZ7Rzcu_l_z for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 21 Dec 2015 20:58:55 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 212F31A0184 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 21 Dec 2015 20:58:55 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1aBF0M-0004s4-Bt for ietf-http-wg-dist@listhub.w3.org; Tue, 22 Dec 2015 04:56:58 +0000
Resent-Date: Tue, 22 Dec 2015 04:56:58 +0000
Resent-Message-Id: <E1aBF0M-0004s4-Bt@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <phluid61@gmail.com>) id 1aBF0G-0004qH-Ax for ietf-http-wg@listhub.w3.org; Tue, 22 Dec 2015 04:56:52 +0000
Received: from mail-qg0-f49.google.com ([209.85.192.49]) by maggie.w3.org with esmtps (TLS1.2:RSA_ARCFOUR_SHA1:128) (Exim 4.80) (envelope-from <phluid61@gmail.com>) id 1aBF08-0007Lm-Bj for ietf-http-wg@w3.org; Tue, 22 Dec 2015 04:56:51 +0000
Received: by mail-qg0-f49.google.com with SMTP id 74so55953521qgh.1 for <ietf-http-wg@w3.org>; Mon, 21 Dec 2015 20:56:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=LE1TticohgA73WlM3VYP0wSzlQo4Bl0ASbpN7teuYRw=; b=NFR95g+2BfG2kSlMFbE+0a1vY/bDHEB+QSrgpH7yDxwF5eMnOjHzXmDCn8RvFxolaY ye06ELK07r2DbutZHnFX/yQdZuCN0FaRbpghOqeIXrDssVajJUqFdCrhIWd/ns1HumCe o/5eVpl6CX/nahUKGiKOMmc6P5/qqembZWvlF3ZgYpKAhQD27ndLTV0AQp8TKii3nf/A jy6ImUWZPYglu1vQVQyTDslj4as6xZYeLKJLXWUCuUoZJsvcIjZ4F78y3PMT7CT8hCPq fhT3LcHdxFxY+wPhWBH+yuro8Cr0MnW3dGX3n7FNB57Ja6wMw4M9jmG5U+1wjvJy3fuw 5Hfg==
MIME-Version: 1.0
X-Received: by 10.140.217.211 with SMTP id n202mr31962113qhb.26.1450760178279; Mon, 21 Dec 2015 20:56:18 -0800 (PST)
Sender: phluid61@gmail.com
Received: by 10.55.155.2 with HTTP; Mon, 21 Dec 2015 20:56:18 -0800 (PST)
In-Reply-To: <CABkgnnUm3T4J1x0PqV_ofiv41znne9c8G0shg3+K6fxP2Qm0zQ@mail.gmail.com>
References: <CABkgnnUm3T4J1x0PqV_ofiv41znne9c8G0shg3+K6fxP2Qm0zQ@mail.gmail.com>
Date: Tue, 22 Dec 2015 14:56:18 +1000
X-Google-Sender-Auth: 4xvA8RVvzCXtIixsj6vl8emxLMY
Message-ID: <CACweHNC0RCZPs8MeVszOzKfrvd2drqD7EiX3a_XGt+ywwW+LzA@mail.gmail.com>
From: Matthew Kerwin <matthew@kerwin.net.au>
To: Martin Thomson <martin.thomson@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="001a1137abc614a9e50527756d9a"
Received-SPF: pass client-ip=209.85.192.49; envelope-from=phluid61@gmail.com; helo=mail-qg0-f49.google.com
X-W3C-Hub-Spam-Status: No, score=-5.1
X-W3C-Hub-Spam-Report: AWL=-0.782, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1aBF08-0007Lm-Bj df94369715c8372366dffe6a73d5cb8f
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Range requests and content encoding
Archived-At: <http://www.w3.org/mid/CACweHNC0RCZPs8MeVszOzKfrvd2drqD7EiX3a_XGt+ywwW+LzA@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/30799
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 22 December 2015 at 12:28, Martin Thomson <martin.thomson@gmail.com>
wrote:

> RFC 7233 does not mention content encoding at all.  Same for transfer
> encoding.  I assume that is because this is completely unspecified and
> therefore completely unreliable, however, for my sanity...
>
> My reading is that a 206 response includes ranges of the encoded
> message, and that the content-encoding applies to the complete message
> body prior to being split into ranges.  Thus, if I had a "x2" content
> encoding that turned "Hello World!" into "HHeelllloo  WWoorrlldd!!",
> asking for bytes 3-5 would get you "eel" and not "llo".
>
> The text in Section 4.1 suggests that you would not include a
> Content-Encoding header field if the client used If-Range on the
> expectation that they already know.  That seems pretty dangerous, but
> it's consistent with the idea that you are repairing a larger message.
>
> On the other hand, I have to assume that a Transfer-Encoding applies
> *after* the range request.
>
> p.s., I've opened https://github.com/httpwg/http11bis/issues/11 for this.
>
>
​That's been my understanding. C-E can be used to send offline-encoded
files (like .gz archives), so a range request against that representation
of that resource should target a slice of gzip-encoded data. (IOW:
bytes=0-2 of all C-E:gzip resources on the web should be identical.)

And as you say, T-E applies *after* the slicing. That fact (and the absence
of T-E in H2) was what inspired Keith Morgan to start that big discussion a
while back about gzipping sliced-up log files, and lead to
draft-kerwin-http2-encoded-data

Regarding Section 4.1 and If-Range/Range/206: as I understand it,
conditional request conditions are tested *after* the representation is
selected (e.g. RFC 7232, Section 3.1 "...conditional on the selected
representation's modification date..."), which I assume means A-E/C-E has
already been resolved before looking at the If-Range/Range request headers.
If-Range uses the strong comparison function, which should be enough to
guarantee the content encoding*. It might be nice if there was some text in
RFC 7233 that spelled that out a bit better (or even a more explicit
pointer), but I don't know how to word it.

* RFC 7232, Section 3.1: "For example, if
   the origin server sends the same validator for a representation with
   a gzip content coding applied as it does for a representation with no
   content coding, then that validator is weak."


Cheers
-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/