Is a faithful HTTP/2 response scheduler necessarily O(n) in the worst case?

Tom Bergan <tombergan@chromium.org> Tue, 24 January 2017 00:43 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 89241129499 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 23 Jan 2017 16:43:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.719
X-Spam-Level:
X-Spam-Status: No, score=-9.719 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-3.199, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=chromium.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C6R65FnGCX3F for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 23 Jan 2017 16:43:33 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2077812948B for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 23 Jan 2017 16:43:32 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1cVp9q-00014t-VR for ietf-http-wg-dist@listhub.w3.org; Tue, 24 Jan 2017 00:40:23 +0000
Resent-Date: Tue, 24 Jan 2017 00:40:22 +0000
Resent-Message-Id: <E1cVp9q-00014t-VR@frink.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by frink.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <tombergan@chromium.org>) id 1cVp9m-000142-F4 for ietf-http-wg@listhub.w3.org; Tue, 24 Jan 2017 00:40:18 +0000
Received: from mail-wm0-f52.google.com ([74.125.82.52]) by mimas.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from <tombergan@chromium.org>) id 1cVp9f-0007es-Mh for ietf-http-wg@w3.org; Tue, 24 Jan 2017 00:40:13 +0000
Received: by mail-wm0-f52.google.com with SMTP id c85so158036225wmi.1 for <ietf-http-wg@w3.org>; Mon, 23 Jan 2017 16:39:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:from:date:message-id:subject:to; bh=1EmyoToaVyq7A1OdJKqBee3tr9pzuTRsay8f0vyIT8s=; b=R7+qLl09oylm1NvNDujsNyAudAx3reekfO+QPRFBkM9iWqHjnPbE7C/BqJb21iY44l c+XmbOMiNWAi9rp3A0pUQYMnyIMjDQPxElNg/vMKBDcc4AeH1K2UjyTiPmJuB2okEfoL ulZvCCxBFDKkuuMXhXBTGAwXjSGAwf4EtcqvA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=1EmyoToaVyq7A1OdJKqBee3tr9pzuTRsay8f0vyIT8s=; b=gyIam83JFvvHRWoydY8sNjO/xKk44KZNwMUKtfmqfXa0wllryO4+XlqpEITSj+LJ/D ViGDGKB0xk50aw+inovt7IOZYilgxtZh7NR+CHcUhWcdOxgYN3Qz1iZulkxu3P3nBQiQ bnuW139tY6btzXz5iAiuTy0RnMJHKM52AYdilXpjoMGCkZw8AlxaZrauFsCet+vDFSwf 5JsjgOZ+qlSVkd4Iq9Q5iCRV8UbGNQWlB1HY6wS0642yxManN7q4a39Hk0vm1pwW8jiN uJjXeYC/WrF2QMNOsezc9Kn2VabTb9HQvNtPSegnumlwvScoztZ0nekh/igNMGtWix1Y Z/sg==
X-Gm-Message-State: AIkVDXL6z8C/G2b2CVjb0qtwflQ5vppv06vS8GZ702gNcw99oMkN5MuKTwxaUHHo6u73CbMX
X-Received: by 10.28.51.72 with SMTP id z69mr16558865wmz.38.1485218384175; Mon, 23 Jan 2017 16:39:44 -0800 (PST)
Received: from mail-wm0-f44.google.com (mail-wm0-f44.google.com. [74.125.82.44]) by smtp.gmail.com with ESMTPSA id 186sm23525037wmw.24.2017.01.23.16.39.43 for <ietf-http-wg@w3.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Jan 2017 16:39:43 -0800 (PST)
Received: by mail-wm0-f44.google.com with SMTP id r144so183355292wme.1 for <ietf-http-wg@w3.org>; Mon, 23 Jan 2017 16:39:43 -0800 (PST)
X-Received: by 10.223.141.229 with SMTP id o92mr29409103wrb.22.1485218382982; Mon, 23 Jan 2017 16:39:42 -0800 (PST)
MIME-Version: 1.0
Received: by 10.28.135.201 with HTTP; Mon, 23 Jan 2017 16:39:42 -0800 (PST)
From: Tom Bergan <tombergan@chromium.org>
Date: Mon, 23 Jan 2017 16:39:42 -0800
X-Gmail-Original-Message-ID: <CA+3+x5Ft8MfRduWp1RzJ9_qAgQiCujac8f5FLz81Xpyu3cTDxw@mail.gmail.com>
Message-ID: <CA+3+x5Ft8MfRduWp1RzJ9_qAgQiCujac8f5FLz81Xpyu3cTDxw@mail.gmail.com>
To: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="f403045f530a21ecd20546cc5aac"
Received-SPF: pass client-ip=74.125.82.52; envelope-from=tombergan@chromium.org; helo=mail-wm0-f52.google.com
X-W3C-Hub-Spam-Status: No, score=-3.5
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1cVp9f-0007es-Mh b80bc7560e830884dada034f8d9686fb
X-Original-To: ietf-http-wg@w3.org
Subject: Is a faithful HTTP/2 response scheduler necessarily O(n) in the worst case?
Archived-At: <http://www.w3.org/mid/CA+3+x5Ft8MfRduWp1RzJ9_qAgQiCujac8f5FLz81Xpyu3cTDxw@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/33362
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

I implemented the HTTP/2 response scheduler in Go's HTTP/2 server library.
I'm trying to understand the worst-case behavior of that scheduler. I
believe the worst-case behavior is necessarily O(n) operations per frame
sent on the wire, where n is the number of streams in the dependency tree.
Here is my rationale.

The key operation is finding the highest-priority stream that is ready to
send data.

If we don't care about weights, and we don't care about balancing bandwidth
usage across sibling nodes in a tree, then we can label each node with two
fields: "ready" (true if the stream is ready to send data) and "depth" (the
node's distance from the root of the tree). The scheduler must find a node
with the smallest depth over all nodes with ready = true. It is fairly
trivial to implement this in O(log n).

Now, let's introduce weights. The scheduler must allocate bandwidth to all
ready nodes, which happens recursively as follows:

  func allocateBandwidth(node, bw) {
    if (node.ready) {
      node.bandwidthShare = bw
      return
    }
    totalWeight = 0
    for (n in node.children) {
      if (n.ready || descendantIsReady(n)) {
        totalWeight += n.weight
      }
    }
    for (n in node.children) {
      if (n.ready || descendantIsReady(n)) {
        allocateBandwidth(n, bw * n.weight / totalWeight)
      }
    }
  }
  allocateBandwidth(root, 1.0)

I believe the above definition is a direct translation of RFC 7540 Section
5.3.2 (also see this thread
<https://lists.w3.org/Archives/Public/ietf-http-wg/2016OctDec/0601.html>,
which discussed bandwidth allocation). Given a complete bandwidth
allocation, the server can translate each node's bandwidthShare to a number
of tokens that are updated using a token bucket algorithm (or similar). The
scheduler would then pick the node with the most available tokens. The
scheduler looks something like:

  func scheduleNextFrame() {
    if (tree changed since last frame written) {
      allocateBandwidth(root, 1.0)
      assign tokens and build a priority queue containing all nodes with
allocated bandwidth
    }
    node = priorityQueue.head()
    node.consumeTokens() // consume up to frame size or flow-control limit
    priorityQueue.update(node)
    return node
  }

There are two steps in scheduleNextFrame. The first step updates the
bandwidth allocation if the tree changed since the last frame was written.
This is the most expensive step. I don't believe it's possible to implement
allocateBandwidth using fewer than O(n) worst-case steps. For example, if
the tree is mostly flat, meaning that most nodes are children of the same
node, then the loop to compute totalWeight is O(n). I believe Firefox
<http://bitsup.blogspot.com/2015/01/http2-dependency-priorities-in-firefox.html>
creates mostly-flat trees. Further, allocateBandwidth makes O(depth)
recursive steps, where "depth" is the maximum depth of the dependency tree.
If the tree is mostly-linear, then O(depth) becomes O(n). Chrome
<https://cs.chromium.org/chromium/src/net/spdy/http2_priority_dependencies.cc>
creates mostly-linear trees.

The overall runtime depends on how often the tree changes. If the tree
changes rarely, the scheduler is cheap. If the tree changes frequently, the
scheduler is worst-case O(n). A tree has "changed" if it changed shape
(nodes are added/moved/removed) or if any node's ready state has changed.
Both kinds of changes can happen often in practice, suggesting that the
overall scheduler is worst-case O(n). For example, consider a server that
sends small responses (one or two DATA frames per response) -- each stream
will be closed after one or two DATA frames, so on average, the tree will
change shape every few frames. Further, it's possible for ready states to
change more frequently than you might think. In Go's HTTP/2 implementation,
a stream does not become ready until the application handler calls Write to
buffer response data. After that buffered data is serialized on the wire,
the stream transitions to "not ready" because the scheduler cannot know
when the next Write call will happen. The stream will transition back to
"ready" during the next Write call. Each Write call typically buffers about
32KB of data. This is a "push" model, where the application handler
"pushes" data into the scheduler. I'm aware of one other HTTP/2 server that
works similarly. I suspect that frequent ready/not-ready transitions are
common to most HTTP/2 servers that use a "push" model. These servers will
be more susceptible to the O(n) worst case.

Questions:

1. Am I missing a clever implementation, or is it true that a faithful
HTTP/2 scheduler necessarily requires O(n) operations per frame sent on the
wire, in the worst case? I could not find much discussion of this question
after a quick search. H2O claims to implement an O(1) scheduler
<https://h2o.examp1e.net/configure/http2_directives.html>, however, the
code seems to be worst-case O(depth) or O(n) -- see here
<https://github.com/h2o/h2o/blob/9b70b61b0705ccce22a188e65363cdb506285ff1/lib/http2/scheduler.c#L326>
, here
<https://github.com/h2o/h2o/blob/9b70b61b0705ccce22a188e65363cdb506285ff1/lib/http2/scheduler.c#L342>,
and here
<https://github.com/h2o/h2o/blob/9b70b61b0705ccce22a188e65363cdb506285ff1/lib/http2/scheduler.c#L214>
.

2. If the above is correct, should I be concerned about the O(n) worst
case? I doubt that a typical web browsing session will trigger O(n)
behavior frequently, so I'm less concerned about the average case; I'm more
concerned about pathological cases or possible DoS vectors. Also, think
about cases where the "client" is actually a proxy server, meaning the
HTTP/2 connection may have many more concurrent streams than a typical
browsing session. For comparison, if you recall the predecessor to HTTP/2
(SPDY), a SPDY scheduler could be trivially implemented in O(1), since SPDY
used just eight priority buckets.

3. If I should be concerned about an O(n) worst case, are there any
suggested mitigations beyond setting SETTINGS_MAX_CONCURRENT_STREAMS to a
smallish constant?