Re: Comparing HTTP over QUIC header compression schemes

"Charles 'Buck' Krasic" <ckrasic@google.com> Tue, 06 June 2017 09:25 UTC

Return-Path: <ckrasic@google.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E1BFF1289C3 for <quic@ietfa.amsl.com>; Tue, 6 Jun 2017 02:25:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.991
X-Spam-Level:
X-Spam-Status: No, score=-1.991 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VRZ5Wg6nXEIG for <quic@ietfa.amsl.com>; Tue, 6 Jun 2017 02:25:50 -0700 (PDT)
Received: from mail-yb0-x236.google.com (mail-yb0-x236.google.com [IPv6:2607:f8b0:4002:c09::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0C9AB129B8D for <quic@ietf.org>; Tue, 6 Jun 2017 02:25:50 -0700 (PDT)
Received: by mail-yb0-x236.google.com with SMTP id r66so40119329yba.2 for <quic@ietf.org>; Tue, 06 Jun 2017 02:25:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=XP/TEyuWIIqs4/qby7U6nqdN5pkfqNSD91GS55L+/bs=; b=t6A2KXWDEK1lT1dDqDoru78dlEwAtP+qQjy+ER9GnidAFuIszdV69DorLn7Jr5cxzm VxpEUtEnIZZK2K5LwgQXNBPMAQ5O2zlmZHITEMCPSWa7UzCPvXVn02KlPG3RBfvPI6rb vH6I946NrNsgZ5asbLyfzA/2eVFGcQZMsFrRZ2kTn55wXwaruVatpbl432G+RCB6XqYp 48x3b9LQbdIqojsHPt4xbWuk+XHy0fS6ATL/dkFe+4uBwm/evmlPPTxllIid5w0P4T4C KhAzPkknLK3NNfu/FuN43TSIo4wFtFVzYKmOP4CCmBWXFzAaffBgecexms1mUm3BCWG3 EvdQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=XP/TEyuWIIqs4/qby7U6nqdN5pkfqNSD91GS55L+/bs=; b=B7l5/K43UtUgNy3Ddznyorc6EYajrJWTZAOyvZdIPu0HEESi/Qn2SMS8KVbvuwFgcN em3F+3X6+5I5xUz9CVWW+rdClQSAF2R3zaQn+42AfP90lMn4065xhCDo93fA65oSpK58 JoTPBj25j+QAEA64ih5418xOXklioGQBt+vSmGDM4xKfmowhw3mIXyFWMSaDpLSVYSrZ r11wCOS9cHy/Cq1O4VKhMfNWHv0oYr7tKiFJxhp6EB6kDqZfyzhbHfNlZyGqLNc6VYVl e/21xarfPjt3oS5uFXSXROJ8cbdv9za60Si4QCZDwpnVccJMzOCPN5MeWwKwHbq1WYO0 /L2A==
X-Gm-Message-State: AODbwcDwRye0XOrUqB3liNEZyDZgXalIUVRrGP5USOnivAxg+epUgGKq h87Nv9C3lEC2FwWvFhSnuASs1bvVnyDF
X-Received: by 10.37.216.4 with SMTP id p4mr1847557ybg.35.1496741149088; Tue, 06 Jun 2017 02:25:49 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.3.142 with HTTP; Tue, 6 Jun 2017 02:25:28 -0700 (PDT)
In-Reply-To: <MWHPR03MB294475C20C1E4937386D171E87CB0@MWHPR03MB2944.namprd03.prod.outlook.com>
References: <7241DB81-B9AD-44B6-9B03-902A8890F672@fb.com> <MWHPR03MB294475C20C1E4937386D171E87CB0@MWHPR03MB2944.namprd03.prod.outlook.com>
From: Charles 'Buck' Krasic <ckrasic@google.com>
Date: Tue, 06 Jun 2017 02:25:28 -0700
Message-ID: <CAD-iZUaG0T_T7Kxa+f3v6QqJzTJSJUwh579k2u0V1LNPrCTpFA@mail.gmail.com>
Subject: Re: Comparing HTTP over QUIC header compression schemes
To: Mike Bishop <Michael.Bishop@microsoft.com>
Cc: Alan Frindell <afrind@fb.com>, IETF QUIC WG <quic@ietf.org>
Content-Type: multipart/alternative; boundary="001a114fd372839ce9055147346e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/clTAw2VXzSwF1OfBmEf9PLWLByw>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Jun 2017 09:25:53 -0000

On Tue, Jun 6, 2017 at 2:12 AM, Mike Bishop <Michael.Bishop@microsoft.com>
wrote:

> Again, thanks for putting this together, Alan.  My comments in more detail
> were:
>
>    - QCRAM forbids referencing a header until you’re sure it has
>    arrived.
>
> Minor nit.  QCRAM allows references within the same packet if the "packet
epoch" is used.


>
>    - That leads one to expect what you found below:  Never HOL-blocking,
>    more bytes.  QPACK leaves that choice to the encoder, which means you can
>    choose to take the risk and get better byte-efficiency or be more
>    conservative.  It’s just not mandated.
>    - An implementation that wanted to do blocking at the header set level
>    could do that – you’d add an initial pass of the encoded headers to see
>    whether all references can be satisfied.  If not, wait to decompress
>    anything until they can be.  That also saves the decoder buffering
>    uncompressed header data while waiting.
>
>
>
> -----Original Message-----
> From: QUIC [mailto:quic-bounces@ietf.org] On Behalf Of Alan Frindell
> Sent: Tuesday, June 6, 2017 8:48 AM
> To: IETF QUIC WG <quic@ietf.org>
> Subject: Comparing HTTP over QUIC header compression schemes
>
>
>
> I built a QCRAM and QPACK implementation and attempted to simulate
> performance under varying network conditions.  Here’s a summary of the work
> so far.
>
>
>
> -Alan
>
>
>
> ===
>
>
>
> Problem
>
>
>
> HTTP/2 uses HPACK for header compression, which relies on in-order
> processing of header frames. The current QUIC draft retains HPACK and
> introduces a sequence number for header frames so the HTTP implementation
> can ensure in-order processing.
>
> There are two proposed compression schemes for HTTP over QUIC that
> facilitate out-of-order processing, QCRAM and QPACK.
>
> My goal was to understand the magnitude of the problem caused by HOL
> blocking on header frames under varying network conditions, and quantify
> the degree to which QCRAM and QPACK alleviate the problem. I was also
> seeking to understand the implementation complexity of each scheme and
> provide feedback to the authors with on the specification drafts.
>
>
>
> Methodology
>
>
>
> I modified proxygen’s HPACK library (https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fgithub.com%
> 2Ffacebook%2Fproxygen&data=02%7C01%7Cmichael.bishop%40microsoft.com%
> 7Cf14e0c0f9fad4d42603908d4aca81547%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C636323285318004373&sdata=maDPZ5lGy4PsJZ7qQTdxZI7qtfRUaN
> gAP0KUj0hyfi0%3D&reserved=0)  to support both QCRAM and QPACK. I then
> wrote a simulator to test various network conditions. The simulator has two
> primary knobs:
>
> 1. Simulated RTT - round trip time between the endpoints. There is a
> minimum processing delay of rtt / 2 between encoding and decoding a header
> block 2. Simulated loss probability - probability that a given packet is
> dropped. Drops are simulated by adding 1.1 - 2 RTTs between encoding and
> decoding a header block After decoding a header block, a simulated ack is
> generated and delivered to the encoder with the same RTT and loss model
> (e.g.: acks can also be lost and retransmitted)
>
>
>
> I know this is a simplistic loss model.  If someone has a different model
> they prefer and can provide me with C or C++ code, I can re-run the
> simulations with a different loss engine.  I hope to open source the
> QPACK/QCRAM implementations and the simulator eventually, but they are not
> ready at present.
>
>
>
> Experiment Setup
>
>
>
> I loaded my Facebook feed in Chrome and captured request headers and
> timing in a HAR file. The simulator reads the HAR file and schedules the
> encodings based on the request start times. It contained 227 requests made
> to 5 origins.  I simulated as if all facebook.com and fbcdn.net origins
> were coalesced.  For the original test, I throttled the connection to 100ms
> RTT.
>
> I ran the simulation 5 times for each algorithm with RTT ranging from 10ms
> - 500ms and loss percentage from 0 to 10%.
>
>
>
> Metrics
>
>
>
> Cumulative HOL Delay - This is the sum of the time header blocks spent in
> a queue waiting for an earlier block to be decoded.  For example, if I send
> one request that adds indexed entries to the header table followed by two
> requests that reference it, but the first request is delayed 100ms, the
> cumulative delay would be 200ms (2 requests x 100ms).
>
>
>
> Compression Ratio - Ratio of bytes sent on the wire to the total
> uncompressed header size
>
>
>
> Maximum Buffer Size - If HOL blocking occurs, the decoder must buffer some
> header data.  This is the maximum size of the buffer in bytes over the
> lifetime of the session.
>
> Results
>
>
>
> Data:
>
>
>
> Delay: https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fwww.dropbox.com%2Fs%2Fqdh310epry8sha1%2FHOL%
> 2520Delay.png%3Fdl%3D0&data=02%7C01%7Cmichael.bishop%40microsoft.com%
> 7Cf14e0c0f9fad4d42603908d4aca81547%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C1%7C636323285318004373&sdata=3WF48OPhGihmeEIpy59WoXNUcO6sR2
> wBYxtajCuP98I%3D&reserved=0
>
> Compression Ratio: https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fwww.dropbox.com%2Fs%2Fqqwztqt8uwbqwjr%
> 2FCompression%2520Ratio.png%3Fdl%3D0&data=02%7C01%
> 7Cmichael.bishop%40microsoft.com%7Cf14e0c0f9fad4d42603908d4aca81547%
> 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636323285318004373&sdata=
> UxNMX6ryuu6fURXwA0xM5%2F718yJodjk2UNCrHIs8XNU%3D&reserved=0
>
> Buffer Size: https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fwww.dropbox.com%2Fs%2F3b0c34upt11wmt7%2FHOL%
> 2520Buffering.png%3Fdl%3D0&data=02%7C01%7Cmichael.bishop%40microsoft.com%
> 7Cf14e0c0f9fad4d42603908d4aca81547%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C1%7C636323285318004373&sdata=GCZIxN6puPKd5UAktLoeHVbrFIc%
> 2BcjMt4lT9VgoaTyg%3D&reserved=0
>
>
>
>
>
> High level observations
>
>
>
> 1. Head-of-line blocking is a very real problem for serialized HPACK that
> gets predictably worse as RTT and loss increase. At a fairly modest 100ms
> RTT + 2% loss, HOL blocking on headers added an average of 40ms / request.
> At 200ms/5% loss, it’s almost 100ms/request, and the p100 (max) was
> 497ms/request.
>
> 2. QPACK and QCRAM both drastically reduce HOL blocking. QCRAM’s design
> favors sacrificing compression ratio to reduce HOL blocking, while QPACK
> maintains HPACK’s compression ratio at the expense of additional latency.
> As such, QCRAM incurred 0 head of line blocking delay in 100% of my
> simulations, and QPACK had 0 in 80% of test runs. The average delay was
> 8ms/req, p95 delay was 63ms/request and p100 was 242ms/request.  Mike
> Bishop suggested some QCRAM techniques could be applied to QPACK to modify
> the blocking/compression ratio tradeoff.
>
> 3. QCRAM sends up 15% more header bytes on the wire even in moderate
> networks. Because of the nature of my simulation, the compression ratio
> data for QCRAM may be pessimistic for RTTs > 200ms, but it’s probably in
> the 84% ballpark.
>
> 4. Because QCRAM only infrequently induces HOL blocking, none of the
> simulations resulted in the decoder having to buffer frames. QPACKs buffers
> were usually manageable (smaller than 1kb), but note that when QPACK
> encounters HOL blocking, it buffers decompressed data potentially occupying
> a lot of memory.  Mike Bishop suggested this could be mitigated by
> performing the decode in two steps, and buffering the compressed data if
> decoding would block.
>
>
>
> Implementation Considerations
>
>
>
> A basic QCRAM implementation is simpler, because it can more directly
> leverage an existing HPACK implementation. It was further simpler for
> Facebook because our implementation was already designed to handle
> asynchrony at the layer of decoding headers. Full implementation also
> requires some interaction with the transport’s acknowledgement handling.
> Additional complexity also comes when trying to use the packet epoch,
> because it requires the packet scheduling layer of the transport to
> interact with the header compression algorithm. While I was able to
> approximate this simply in my simulation, Facebook prefers more separation
> between the transport and application layers of hq, so implementing in
> practice may be more difficult. In my simulation though, compression ratio
> didn’t appear to improve drastically with this feature, and might be hurt
> performance in some scenarios if re-indexing the same header/value leads to
> more table evictions.
>
> QPACK’s design is different enough that it can’t use the same underlying
> structure as HPACK. Furthermore, it’s quite different in where asynchrony
> is expected - at the level of table lookups rather than block decoding.
> Explicit indexing is simpler to understand than HPACK’s relative indexing,
> and along with reference counts an implementation can be clever about never
> evicting frequently used headers. QPACK requires more additional state in
> the header table than QCRAM.
>
>
>
> Caveats
>
>
>
> Different workloads may produce quite different results. In loading the
> Facebook feed, there were no evictions from the header table using QCRAM.
> This will vary depending on implementation decisions about which header
> fields to index. Evictions will have a more negative impact on QCRAM’s HOL
> blocking, and more impact on QPACK’s compression ratio.
>
>
>
> Having the CDN coalesced with the dynamic origin also plays a role in the
> performance of the compression schemes.  Over time I hope to build a
> diverse collection of input HAR files from different sites and applications
> which can be run through the simulator to get more complete data.
>
>
>
> Thanks to Mike Bishop and Buck Krasic who provided feedback.
>
>
>



-- 
Charles 'Buck' Krasic | Software Engineer | ckrasic@google.com | +1 (408)
412-1141