How to handle content-encoding

Daurnimator <quae@daurnimator.com> Tue, 31 May 2016 02:52 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9BAA012D145 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 30 May 2016 19:52:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.236
X-Spam-Level:
X-Spam-Status: No, score=-8.236 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_DKIM_INVALID=0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (1024-bit key) reason="fail (body has been altered)" header.d=daurnimator.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0Ae_7lh1ZGhW for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 30 May 2016 19:52:21 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2BA7012D12C for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 30 May 2016 19:52:20 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1b7Zj4-0001QF-Jv for ietf-http-wg-dist@listhub.w3.org; Tue, 31 May 2016 02:48:14 +0000
Resent-Date: Tue, 31 May 2016 02:48:14 +0000
Resent-Message-Id: <E1b7Zj4-0001QF-Jv@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <quae@daurnimator.com>) id 1b7Ziz-0001PP-8t for ietf-http-wg@listhub.w3.org; Tue, 31 May 2016 02:48:09 +0000
Received: from mail-lf0-f44.google.com ([209.85.215.44]) by lisa.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <quae@daurnimator.com>) id 1b7Zix-0003eK-EI for ietf-http-wg@w3.org; Tue, 31 May 2016 02:48:08 +0000
Received: by mail-lf0-f44.google.com with SMTP id w16so68904960lfd.2 for <ietf-http-wg@w3.org>; Mon, 30 May 2016 19:47:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daurnimator.com; s=daurnimator; h=mime-version:date:message-id:subject:from:to; bh=HHqLCDXNvTjGGAuxFZSz/TDuh3nPUlNkQs0Y3pTsyio=; b=R/sfR31gJ/Hg3xtxJ2WIr+7nK9FL8PIlre1dWg6+swTf6cf9R7VRd1ksbw4fs4/bkQ cNBPMT8Qtfz2uqil04rOoctJ2RCXddgqVnk67IbZcRuOOmInZjh+mftTK9PyZ9clXXi9 Q3iIqR/HBqmv3sA0vgGHOqhLgm1pM4/VicJnI=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=HHqLCDXNvTjGGAuxFZSz/TDuh3nPUlNkQs0Y3pTsyio=; b=avNuBCU61K+QxXvwUjUJ9RwBmGRg8HCRa0+nTvwxQs83dJsOlfPaHmnsmuCOfffEm2 KOccTCyNqhDpTij/lr+s2Kz76Ba03de+plVIz+tH53DubUano+De4IgVb6AKuQU53fPt zZOGK2QzKJBYkYADN20kkDnNc2Y9vlgrlVypsQL8ldWJm2vCA+aAuNDcwOXYgxMg0rzK VGR/hPd2mi3EE4Ec59rV8K07TV7Z7zut6h3q7i8cfQ1s4o0+W3RwjqqdbWIhNlIAihb7 ifrapgMjon0lUPKyTvCetiJce2WlvuUq50ZTORoIg8di2N3QmMMFPJ833XH39axZ5vee UnBw==
X-Gm-Message-State: ALyK8tL1AtPeYijoc/IJZAQHYRV9AXPAj7TDGHC3S81+L/r7Q0csPDMGoblcNHPDZd3mCA==
X-Received: by 10.46.71.140 with SMTP id u134mr7087455lja.18.1464662860370; Mon, 30 May 2016 19:47:40 -0700 (PDT)
Received: from mail-lf0-f54.google.com (mail-lf0-f54.google.com. [209.85.215.54]) by smtp.gmail.com with ESMTPSA id 4sm2609173ljj.2.2016.05.30.19.47.39 for <ietf-http-wg@w3.org> (version=TLSv1/SSLv3 cipher=OTHER); Mon, 30 May 2016 19:47:39 -0700 (PDT)
Received: by mail-lf0-f54.google.com with SMTP id b73so58566747lfb.3 for <ietf-http-wg@w3.org>; Mon, 30 May 2016 19:47:39 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.46.1.92 with SMTP id 89mr6810358ljb.23.1464662858721; Mon, 30 May 2016 19:47:38 -0700 (PDT)
Received: by 10.25.146.15 with HTTP; Mon, 30 May 2016 19:47:38 -0700 (PDT)
Date: Tue, 31 May 2016 12:47:38 +1000
X-Gmail-Original-Message-ID: <CAEnbY+fW_n4sFrFQSVcMWBoqxEWw3yoKnhCu1seRXj4GBr6wfA@mail.gmail.com>
Message-ID: <CAEnbY+fW_n4sFrFQSVcMWBoqxEWw3yoKnhCu1seRXj4GBr6wfA@mail.gmail.com>
From: Daurnimator <quae@daurnimator.com>
To: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="001a1142beae6906d905341a65b0"
Received-SPF: pass client-ip=209.85.215.44; envelope-from=quae@daurnimator.com; helo=mail-lf0-f44.google.com
X-W3C-Hub-Spam-Status: No, score=-5.7
X-W3C-Hub-Spam-Report: AWL=-1.046, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: lisa.w3.org 1b7Zix-0003eK-EI 3e04a726768113d972270da3c070c009
X-Original-To: ietf-http-wg@w3.org
Subject: How to handle content-encoding
Archived-At: <http://www.w3.org/mid/CAEnbY+fW_n4sFrFQSVcMWBoqxEWw3yoKnhCu1seRXj4GBr6wfA@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/31668
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

I'm thinking through how to add support for Content-Encoding to lua-http
https://github.com/daurnimator/lua-http/issues/22

A brief digression to lua-http structure (library terminology is borrowed
from http2):
  - a 'connection' encapsulates a socket, a connection has many streams
  - a 'stream' is a request/response pair (a request can have multiple
header blocks, and many data chunks)
      - The same stream structure is used for both client and server
      - You can implement a HTTP proxy by forwarding items from one stream
to another
  - a 'request' is a pre-prepared object consisting of a request header
block, a function to obtain body chunks, and a destination.
      - `request:go()` returns the 'main' response header block and a
stream (from which you can read the body one chunk at a time)

There is a desire to compress content to save bandwidth, HTTP has had two
main ways to do this: Transfer-Encoding and Content-Encoding.

To me it was simple to add support for Transfer-Encoding, without any
ambiguities or issues. For HTTP1 in the stream logic:
  -  (if zlib is installed) we automatically add `TE: gzip, deflate`.
  - On reply, if Transfer-Encoding contains gzip or deflate, we decode it
before passing it onto the caller.
This is permitted as TE and Transfer-Encoding are hop-by-hop headers.

However, HTTP2 does not support transfer-encoding.
Furthermore, certain servers **stares at twitter.com** send
`Content-Encoding: gzip` even if you *don't* send `Accept-Encoding: gzip`
This seems to demand that I support Content-Encoding.

As far as the specifications go, Content-Encoding is *meant* to be used to
for end-to-end encoding that intermediate hops do not touch.
  - Intermediaries should cache Content-Encoded bodies in their encoded form
  - ETag is dependant on Content-Encoding

This makes it hard to find a place for it in lua-http's structure.
If I add it transparently in the stream (as done for Transfer-Encoding)
then it will be hop-by-hop (not end-to-end)
This seems to demand (at least for client requests) that it is switched
on/off at the request layer.
>From there though, it seems it would need to add some sort of stream body
filter?

How should I be adding this? What have other implementations done? (and
what do they wish they'd done differently?)
The current state seems to be *against* the spec: should the spec be
changed? should implementations be updated?
HTTP2 has no transfer-encoding equivalent... why not?

Regards,
Daurn.


Links:
  - https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11
Original content-encoding spec
  - https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1
Hop-by-hop headers
  - https://tools.ietf.org/html/rfc7231#section-3.1.2.1 Current spec
  - https://bugzilla.mozilla.org/show_bug.cgi?id=68517 Mozilla disregards
Content-Encoding spec
  -
https://stackoverflow.com/questions/11641923/transfer-encoding-gzip-vs-content-encoding-gzip
  - https://daurnimator.github.io/lua-http/ lua-http documentation