Comparing HTTP over QUIC header compression schemes

Alan Frindell <afrind@fb.com> Tue, 06 June 2017 06:48 UTC

Return-Path: <prvs=7330e21efe=afrind@fb.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AE50C126B7F for <quic@ietfa.amsl.com>; Mon, 5 Jun 2017 23:48:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.721
X-Spam-Level:
X-Spam-Status: No, score=-2.721 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=fb.com header.b=OhpwuNfP; dkim=pass (1024-bit key) header.d=fb.onmicrosoft.com header.b=EOzne8jr
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HaA9TlAFh8mh for <quic@ietfa.amsl.com>; Mon, 5 Jun 2017 23:48:42 -0700 (PDT)
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D47131200CF for <quic@ietf.org>; Mon, 5 Jun 2017 23:48:42 -0700 (PDT)
Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v566lhjx005352 for <quic@ietf.org>; Mon, 5 Jun 2017 23:48:26 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : subject : date : message-id : content-type : content-id : content-transfer-encoding : mime-version; s=facebook; bh=SINjw1gBH5Y4hIq+v82UGVtEOTUckFf64s5QUCJDFBY=; b=OhpwuNfPXH5WHqO8d4mAGeSnHN1Izqjln0WoWJHGsWsbktSq6XVDKEbLfs1RhHQCH4Cb 8vd90VYTN9/Muv0s4BqB102k9LkM8Ve+EX8vPUY229j2qUwf/WhHVlkXdUOQzl1cJsZy eAsRatItyrsAlNapZQnoa/hS/MRGIKPGIY4=
Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2awk9k8fxa-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for <quic@ietf.org>; Mon, 05 Jun 2017 23:48:26 -0700
Received: from PRN-CHUB02.TheFacebook.com (192.168.16.12) by PRN-CHUB01.TheFacebook.com (192.168.16.11) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 5 Jun 2017 23:48:25 -0700
Received: from NAM03-BY2-obe.outbound.protection.outlook.com (192.168.54.28) by o365-in.thefacebook.com (192.168.16.12) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 5 Jun 2017 23:48:25 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=SINjw1gBH5Y4hIq+v82UGVtEOTUckFf64s5QUCJDFBY=; b=EOzne8jrsE1u/wthQyQE+VM1NPKr9Glpp4cYTaFXpWzidU6R4eWuWaSE3Pfo0aySwM/xXjfUcux9D+Re6DEJ8332dxJuk75ULP2nBMvDZs/zijDqQJWTgbjmq0yOcM1qeaKpIeOFECb8AoTHvHhOqCat/QGaNcrvt5G/YxW0MK8=
Received: from BN6PR15MB1299.namprd15.prod.outlook.com (10.172.206.137) by BN6PR15MB1298.namprd15.prod.outlook.com (10.172.206.136) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1143.10; Tue, 6 Jun 2017 06:48:24 +0000
Received: from BN6PR15MB1299.namprd15.prod.outlook.com ([10.172.206.137]) by BN6PR15MB1299.namprd15.prod.outlook.com ([10.172.206.137]) with mapi id 15.01.1143.019; Tue, 6 Jun 2017 06:48:24 +0000
From: Alan Frindell <afrind@fb.com>
To: IETF QUIC WG <quic@ietf.org>
Subject: Comparing HTTP over QUIC header compression schemes
Thread-Topic: Comparing HTTP over QUIC header compression schemes
Thread-Index: AQHS3pDkbL6XmOk6K0ms9bs9UuyP1w==
Date: Tue, 06 Jun 2017 06:48:24 +0000
Message-ID: <7241DB81-B9AD-44B6-9B03-902A8890F672@fb.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/f.21.0.170409
authentication-results: ietf.org; dkim=none (message not signed) header.d=none;ietf.org; dmarc=none action=none header.from=fb.com;
x-originating-ip: [2a01:e35:2fd0:6950:255d:450:c1c4:276c]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BN6PR15MB1298; 7:NYoRl292u4NjPxAeYzHAE/OtperstI+XDRPTTxQhs4ePvzjD/m4//uNTaEnzfcK7LmyxbR131LXU2PpAATn1nik39aQvXWzRh73W6ql/52uKlcQl2KKC6KmPDvqVJ14LO8JQgiMhrAOX0Zh2RRyF/3RuA/fm5FuLAHnP0vcLGGTKidq/dpAJN46q88V2c9d8ykVSGXQqQ/mXyBLnfJXBm0q3HGJTsmf15Q79BEgl164jg/Pwj17n4QauL1BXsTWdckqQ3LpSJGPd7H5LRJPIqiM1MsyYVNdkRlt++ByKPr5HojkwJKdChFPJg/2D9amuGw3kxN39EjGljm3GWy3nQw==; 20:cxk9rErqsMnb9IIlGT78lkghX42dMWAOcUDANatk0vNn/2oV9LE+5WOTvQ+kX+N0zjHSnphDOfDzZqXx2akCIqoszSb9XRMx9ehBpYLIB9CiCX1Kj9jASwnzYPt3/eEA2dI/L1YIciK5f7EQnO8cnlVLYvgTYQuF7pGHCmyP8lY=
x-ms-traffictypediagnostic: BN6PR15MB1298:
x-ms-office365-filtering-correlation-id: 6fd2f738-0cbe-4ca0-9cfc-08d4aca8074d
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254075)(201703131423075)(201703031133081); SRVR:BN6PR15MB1298;
x-microsoft-antispam-prvs: <BN6PR15MB1298612F312267337FFE7509A7CB0@BN6PR15MB1298.namprd15.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(278428928389397)(166708455590820)(35073007944872)(60067363179207)(249562145798500)(81227570615382)(64217206974132);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(93006095)(93001095)(100000703101)(100105400095)(3002001)(10201501046)(6041248)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(20161123560025)(20161123555025)(20161123562025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BN6PR15MB1298; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BN6PR15MB1298;
x-forefront-prvs: 033054F29A
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(39840400002)(39450400003)(39850400002)(39400400002)(39410400002)(81166006)(5660300001)(86362001)(3280700002)(8936002)(575784001)(8676002)(305945005)(189998001)(38730400002)(4001350100001)(7736002)(110136004)(6916009)(3660700001)(33656002)(6506006)(54356999)(77096006)(6486002)(551984002)(83716003)(2900100001)(25786009)(6436002)(966005)(478600001)(82746002)(14454004)(83506001)(102836003)(6116002)(2906002)(50986999)(53936002)(6306002)(6512007)(122556002)(36756003)(99286003); DIR:OUT; SFP:1102; SCL:1; SRVR:BN6PR15MB1298; H:BN6PR15MB1299.namprd15.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en;
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-ID: <0625D08983448D4E9BD0F1B7AF79E2EE@namprd15.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 06 Jun 2017 06:48:24.1842 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR15MB1298
X-OriginatorOrg: fb.com
X-Proofpoint-Spam-Reason: safe
X-FB-Internal: Safe
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-06-06_05:, , signatures=0
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/S5p63SfjiKSaqObwUVO_fRGBOYA>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Jun 2017 06:48:45 -0000

I built a QCRAM and QPACK implementation and attempted to simulate performance under varying network conditions.  Here’s a summary of the work so far.

-Alan

===

Problem

HTTP/2 uses HPACK for header compression, which relies on in-order processing of header frames. The current QUIC draft retains HPACK and introduces a sequence number for header frames so the HTTP implementation can ensure in-order processing.  
There are two proposed compression schemes for HTTP over QUIC that facilitate out-of-order processing, QCRAM and QPACK.  
My goal was to understand the magnitude of the problem caused by HOL blocking on header frames under varying network conditions, and quantify the degree to which QCRAM and QPACK alleviate the problem. I was also seeking to understand the implementation complexity of each scheme and provide feedback to the authors with on the specification drafts.

Methodology

I modified proxygen’s HPACK library (https://github.com/facebook/proxygen)  to support both QCRAM and QPACK. I then wrote a simulator to test various network conditions. The simulator has two primary knobs:
1. Simulated RTT - round trip time between the endpoints. There is a minimum processing delay of rtt / 2 between encoding and decoding a header block
2. Simulated loss probability - probability that a given packet is dropped. Drops are simulated by adding 1.1 - 2 RTTs between encoding and decoding a header block
After decoding a header block, a simulated ack is generated and delivered to the encoder with the same RTT and loss model (e.g.: acks can also be lost and retransmitted)

I know this is a simplistic loss model.  If someone has a different model they prefer and can provide me with C or C++ code, I can re-run the simulations with a different loss engine.  I hope to open source the QPACK/QCRAM implementations and the simulator eventually, but they are not ready at present.

Experiment Setup

I loaded my Facebook feed in Chrome and captured request headers and timing in a HAR file. The simulator reads the HAR file and schedules the encodings based on the request start times. It contained 227 requests made to 5 origins.  I simulated as if all facebook.com and fbcdn.net origins were coalesced.  For the original test, I throttled the connection to 100ms RTT.
I ran the simulation 5 times for each algorithm with RTT ranging from 10ms - 500ms and loss percentage from 0 to 10%.

Metrics

Cumulative HOL Delay - This is the sum of the time header blocks spent in a queue waiting for an earlier block to be decoded.  For example, if I send one request that adds indexed entries to the header table followed by two requests that reference it, but the first request is delayed 100ms, the cumulative delay would be 200ms (2 requests x 100ms).

Compression Ratio - Ratio of bytes sent on the wire to the total uncompressed header size

Maximum Buffer Size - If HOL blocking occurs, the decoder must buffer some header data.  This is the maximum size of the buffer in bytes over the lifetime of the session.
Results

Data:

Delay: https://www.dropbox.com/s/qdh310epry8sha1/HOL%20Delay.png?dl=0
Compression Ratio: https://www.dropbox.com/s/qqwztqt8uwbqwjr/Compression%20Ratio.png?dl=0
Buffer Size: https://www.dropbox.com/s/3b0c34upt11wmt7/HOL%20Buffering.png?dl=0


High level observations

1. Head-of-line blocking is a very real problem for serialized HPACK that gets predictably worse as RTT and loss increase. At a fairly modest 100ms RTT + 2% loss, HOL blocking on headers added an average of 40ms / request. At 200ms/5% loss, it’s almost 100ms/request, and the p100 (max) was 497ms/request.
2. QPACK and QCRAM both drastically reduce HOL blocking. QCRAM’s design favors sacrificing compression ratio to reduce HOL blocking, while QPACK maintains HPACK’s compression ratio at the expense of additional latency. As such, QCRAM incurred 0 head of line blocking delay in 100% of my simulations, and QPACK had 0 in 80% of test runs. The average delay was 8ms/req, p95 delay was 63ms/request and p100 was 242ms/request.  Mike Bishop suggested some QCRAM techniques could be applied to QPACK to modify the blocking/compression ratio tradeoff.
3. QCRAM sends up 15% more header bytes on the wire even in moderate networks. Because of the nature of my simulation, the compression ratio data for QCRAM may be pessimistic for RTTs > 200ms, but it’s probably in the 84% ballpark.
4. Because QCRAM only infrequently induces HOL blocking, none of the simulations resulted in the decoder having to buffer frames. QPACKs buffers were usually manageable (smaller than 1kb), but note that when QPACK encounters HOL blocking, it buffers decompressed data potentially occupying a lot of memory.  Mike Bishop suggested this could be mitigated by performing the decode in two steps, and buffering the compressed data if decoding would block.

Implementation Considerations

A basic QCRAM implementation is simpler, because it can more directly leverage an existing HPACK implementation. It was further simpler for Facebook because our implementation was already designed to handle asynchrony at the layer of decoding headers. Full implementation also requires some interaction with the transport’s acknowledgement handling. Additional complexity also comes when trying to use the packet epoch, because it requires the packet scheduling layer of the transport to interact with the header compression algorithm. While I was able to approximate this simply in my simulation, Facebook prefers more separation between the transport and application layers of hq, so implementing in practice may be more difficult. In my simulation though, compression ratio didn’t appear to improve drastically with this feature, and might be hurt performance in some scenarios if re-indexing the same header/value leads to more table evictions.
QPACK’s design is different enough that it can’t use the same underlying structure as HPACK. Furthermore, it’s quite different in where asynchrony is expected - at the level of table lookups rather than block decoding.  Explicit indexing is simpler to understand than HPACK’s relative indexing, and along with reference counts an implementation can be clever about never evicting frequently used headers. QPACK requires more additional state in the header table than QCRAM.

Caveats

Different workloads may produce quite different results. In loading the Facebook feed, there were no evictions from the header table using QCRAM. This will vary depending on implementation decisions about which header fields to index. Evictions will have a more negative impact on QCRAM’s HOL blocking, and more impact on QPACK’s compression ratio. 

Having the CDN coalesced with the dynamic origin also plays a role in the performance of the compression schemes.  Over time I hope to build a diverse collection of input HAR files from different sites and applications which can be run through the simulator to get more complete data.

Thanks to Mike Bishop and Buck Krasic who provided feedback.