Re: Comparing HTTP over QUIC header compression schemes

"Charles 'Buck' Krasic" <ckrasic@google.com> Tue, 06 June 2017 09:25 UTC

MIME-Version: 1.0
In-Reply-To: <MWHPR03MB294475C20C1E4937386D171E87CB0@MWHPR03MB2944.namprd03.prod.outlook.com>
References: <7241DB81-B9AD-44B6-9B03-902A8890F672@fb.com> <MWHPR03MB294475C20C1E4937386D171E87CB0@MWHPR03MB2944.namprd03.prod.outlook.com>
From: Charles 'Buck' Krasic <ckrasic@google.com>
Date: Tue, 06 Jun 2017 02:25:28 -0700
Message-ID: <CAD-iZUaG0T_T7Kxa+f3v6QqJzTJSJUwh579k2u0V1LNPrCTpFA@mail.gmail.com>
Subject: Re: Comparing HTTP over QUIC header compression schemes
To: Mike Bishop <Michael.Bishop@microsoft.com>
Cc: Alan Frindell <afrind@fb.com>, IETF QUIC WG <quic@ietf.org>
Content-Type: multipart/alternative; boundary="001a114fd372839ce9055147346e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/clTAw2VXzSwF1OfBmEf9PLWLByw>
Precedence: list

On Tue, Jun 6, 2017 at 2:12 AM, Mike Bishop <Michael.Bishop@microsoft.com>
wrote:

> Again, thanks for putting this together, Alan.  My comments in more detail
> were:
>
>    - QCRAM forbids referencing a header until you’re sure it has
>    arrived.
>
> Minor nit.  QCRAM allows references within the same packet if the "packet
epoch" is used.


>
>    - That leads one to expect what you found below:  Never HOL-blocking,
>    more bytes.  QPACK leaves that choice to the encoder, which means you can
>    choose to take the risk and get better byte-efficiency or be more
>    conservative.  It’s just not mandated.
>    - An implementation that wanted to do blocking at the header set level
>    could do that – you’d add an initial pass of the encoded headers to see
>    whether all references can be satisfied.  If not, wait to decompress
>    anything until they can be.  That also saves the decoder buffering
>    uncompressed header data while waiting.
>
>
>
> -----Original Message-----
> From: QUIC [mailto:quic-bounces@ietf.org] On Behalf Of Alan Frindell
> Sent: Tuesday, June 6, 2017 8:48 AM
> To: IETF QUIC WG <quic@ietf.org>
> Subject: Comparing HTTP over QUIC header compression schemes
>
>
>
> I built a QCRAM and QPACK implementation and attempted to simulate
> performance under varying network conditions.  Here’s a summary of the work
> so far.
>
>
>
> -Alan
>
>
>
> ===
>
>
>
> Problem
>
>
>
> HTTP/2 uses HPACK for header compression, which relies on in-order
> processing of header frames. The current QUIC draft retains HPACK and
> introduces a sequence number for header frames so the HTTP implementation
> can ensure in-order processing.
>
> There are two proposed compression schemes for HTTP over QUIC that
> facilitate out-of-order processing, QCRAM and QPACK.
>
> My goal was to understand the magnitude of the problem caused by HOL
> blocking on header frames under varying network conditions, and quantify
> the degree to which QCRAM and QPACK alleviate the problem. I was also
> seeking to understand the implementation complexity of each scheme and
> provide feedback to the authors with on the specification drafts.
>
>
>
> Methodology
>
>
>
> I modified proxygen’s HPACK library (https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fgithub.com%
> 2Ffacebook%2Fproxygen&data=02%7C01%7Cmichael.bishop%40microsoft.com%
> 7Cf14e0c0f9fad4d42603908d4aca81547%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C636323285318004373&sdata=maDPZ5lGy4PsJZ7qQTdxZI7qtfRUaN
> gAP0KUj0hyfi0%3D&reserved=0)  to support both QCRAM and QPACK. I then
> wrote a simulator to test various network conditions. The simulator has two
> primary knobs:
>
> 1. Simulated RTT - round trip time between the endpoints. There is a
> minimum processing delay of rtt / 2 between encoding and decoding a header
> block 2. Simulated loss probability - probability that a given packet is
> dropped. Drops are simulated by adding 1.1 - 2 RTTs between encoding and
> decoding a header block After decoding a header block, a simulated ack is
> generated and delivered to the encoder with the same RTT and loss model
> (e.g.: acks can also be lost and retransmitted)
>
>
>
> I know this is a simplistic loss model.  If someone has a different model
> they prefer and can provide me with C or C++ code, I can re-run the
> simulations with a different loss engine.  I hope to open source the
> QPACK/QCRAM implementations and the simulator eventually, but they are not
> ready at present.
>
>
>
> Experiment Setup
>
>
>
> I loaded my Facebook feed in Chrome and captured request headers and
> timing in a HAR file. The simulator reads the HAR file and schedules the
> encodings based on the request start times. It contained 227 requests made
> to 5 origins.  I simulated as if all facebook.com and fbcdn.net origins
> were coalesced.  For the original test, I throttled the connection to 100ms
> RTT.
>
> I ran the simulation 5 times for each algorithm with RTT ranging from 10ms
> - 500ms and loss percentage from 0 to 10%.
>
>
>
> Metrics
>
>
>
> Cumulative HOL Delay - This is the sum of the time header blocks spent in
> a queue waiting for an earlier block to be decoded.  For example, if I send
> one request that adds indexed entries to the header table followed by two
> requests that reference it, but the first request is delayed 100ms, the
> cumulative delay would be 200ms (2 requests x 100ms).
>
>
>
> Compression Ratio - Ratio of bytes sent on the wire to the total
> uncompressed header size
>
>
>
> Maximum Buffer Size - If HOL blocking occurs, the decoder must buffer some
> header data.  This is the maximum size of the buffer in bytes over the
> lifetime of the session.
>
> Results
>
>
>
> Data:
>
>
>
> Delay: https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fwww.dropbox.com%2Fs%2Fqdh310epry8sha1%2FHOL%
> 2520Delay.png%3Fdl%3D0&data=02%7C01%7Cmichael.bishop%40microsoft.com%
> 7Cf14e0c0f9fad4d42603908d4aca81547%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C1%7C636323285318004373&sdata=3WF48OPhGihmeEIpy59WoXNUcO6sR2
> wBYxtajCuP98I%3D&reserved=0
>
> Compression Ratio: https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fwww.dropbox.com%2Fs%2Fqqwztqt8uwbqwjr%
> 2FCompression%2520Ratio.png%3Fdl%3D0&data=02%7C01%
> 7Cmichael.bishop%40microsoft.com%7Cf14e0c0f9fad4d42603908d4aca81547%
> 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636323285318004373&sdata=
> UxNMX6ryuu6fURXwA0xM5%2F718yJodjk2UNCrHIs8XNU%3D&reserved=0
>
> Buffer Size: https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fwww.dropbox.com%2Fs%2F3b0c34upt11wmt7%2FHOL%
> 2520Buffering.png%3Fdl%3D0&data=02%7C01%7Cmichael.bishop%40microsoft.com%
> 7Cf14e0c0f9fad4d42603908d4aca81547%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C1%7C636323285318004373&sdata=GCZIxN6puPKd5UAktLoeHVbrFIc%
> 2BcjMt4lT9VgoaTyg%3D&reserved=0
>
>
>
>
>
> High level observations
>
>
>
> 1. Head-of-line blocking is a very real problem for serialized HPACK that
> gets predictably worse as RTT and loss increase. At a fairly modest 100ms
> RTT + 2% loss, HOL blocking on headers added an average of 40ms / request.
> At 200ms/5% loss, it’s almost 100ms/request, and the p100 (max) was
> 497ms/request.
>
> 2. QPACK and QCRAM both drastically reduce HOL blocking. QCRAM’s design
> favors sacrificing compression ratio to reduce HOL blocking, while QPACK
> maintains HPACK’s compression ratio at the expense of additional latency.
> As such, QCRAM incurred 0 head of line blocking delay in 100% of my
> simulations, and QPACK had 0 in 80% of test runs. The average delay was
> 8ms/req, p95 delay was 63ms/request and p100 was 242ms/request.  Mike
> Bishop suggested some QCRAM techniques could be applied to QPACK to modify
> the blocking/compression ratio tradeoff.
>
> 3. QCRAM sends up 15% more header bytes on the wire even in moderate
> networks. Because of the nature of my simulation, the compression ratio
> data for QCRAM may be pessimistic for RTTs > 200ms, but it’s probably in
> the 84% ballpark.
>
> 4. Because QCRAM only infrequently induces HOL blocking, none of the
> simulations resulted in the decoder having to buffer frames. QPACKs buffers
> were usually manageable (smaller than 1kb), but note that when QPACK
> encounters HOL blocking, it buffers decompressed data potentially occupying
> a lot of memory.  Mike Bishop suggested this could be mitigated by
> performing the decode in two steps, and buffering the compressed data if
> decoding would block.
>
>
>
> Implementation Considerations
>
>
>
> A basic QCRAM implementation is simpler, because it can more directly
> leverage an existing HPACK implementation. It was further simpler for
> Facebook because our implementation was already designed to handle
> asynchrony at the layer of decoding headers. Full implementation also
> requires some interaction with the transport’s acknowledgement handling.
> Additional complexity also comes when trying to use the packet epoch,
> because it requires the packet scheduling layer of the transport to
> interact with the header compression algorithm. While I was able to
> approximate this simply in my simulation, Facebook prefers more separation
> between the transport and application layers of hq, so implementing in
> practice may be more difficult. In my simulation though, compression ratio
> didn’t appear to improve drastically with this feature, and might be hurt
> performance in some scenarios if re-indexing the same header/value leads to
> more table evictions.
>
> QPACK’s design is different enough that it can’t use the same underlying
> structure as HPACK. Furthermore, it’s quite different in where asynchrony
> is expected - at the level of table lookups rather than block decoding.
> Explicit indexing is simpler to understand than HPACK’s relative indexing,
> and along with reference counts an implementation can be clever about never
> evicting frequently used headers. QPACK requires more additional state in
> the header table than QCRAM.
>
>
>
> Caveats
>
>
>
> Different workloads may produce quite different results. In loading the
> Facebook feed, there were no evictions from the header table using QCRAM.
> This will vary depending on implementation decisions about which header
> fields to index. Evictions will have a more negative impact on QCRAM’s HOL
> blocking, and more impact on QPACK’s compression ratio.
>
>
>
> Having the CDN coalesced with the dynamic origin also plays a role in the
> performance of the compression schemes.  Over time I hope to build a
> diverse collection of input HAR files from different sites and applications
> which can be run through the simulator to get more complete data.
>
>
>
> Thanks to Mike Bishop and Buck Krasic who provided feedback.
>
>
>



-- 
Charles 'Buck' Krasic | Software Engineer | ckrasic@google.com | +1 (408)
412-1141

Re: Comparing HTTP over QUIC header compression s… Eggert, Lars
Comparing HTTP over QUIC header compression schem… Alan Frindell
RE: Comparing HTTP over QUIC header compression s… Mike Bishop
Re: Comparing HTTP over QUIC header compression s… Martin Thomson
Re: Comparing HTTP over QUIC header compression s… Mirja Kühlewind
Re: Comparing HTTP over QUIC header compression s… Patrick McManus
RE: Comparing HTTP over QUIC header compression s… Swindells, Thomas (Nokia - GB/Cambridge, UK)
Re: Comparing HTTP over QUIC header compression s… Eggert, Lars
Re: Comparing HTTP over QUIC header compression s… Eggert, Lars
Re: Comparing HTTP over QUIC header compression s… Charles 'Buck' Krasic
Re: Comparing HTTP over QUIC header compression s… Victor Vasiliev
Re: Comparing HTTP over QUIC header compression s… Alan Frindell
Re: Comparing HTTP over QUIC header compression s… Ian Swett