Re: http/2 prioritization/fairness bug with proxies

Nico Williams <> Wed, 13 February 2013 21:50 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0248621E8094 for <>; Wed, 13 Feb 2013 13:50:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -7.576
X-Spam-Status: No, score=-7.576 tagged_above=-999 required=5 tests=[AWL=2.401, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_HI=-8]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 7qEmANBf2pPC for <>; Wed, 13 Feb 2013 13:50:00 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 9CBC221E8054 for <>; Wed, 13 Feb 2013 13:50:00 -0800 (PST)
Received: from lists by with local (Exim 4.72) (envelope-from <>) id 1U5kCN-0001xV-D3 for; Wed, 13 Feb 2013 21:49:03 +0000
Resent-Date: Wed, 13 Feb 2013 21:49:03 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtp (Exim 4.72) (envelope-from <>) id 1U5kCF-0001wm-HW for; Wed, 13 Feb 2013 21:48:55 +0000
Received: from ([] by with esmtp (Exim 4.72) (envelope-from <>) id 1U5kCC-0006bE-49 for; Wed, 13 Feb 2013 21:48:55 +0000
Received: from (localhost []) by (Postfix) with ESMTP id A379C5406F for <>; Wed, 13 Feb 2013 13:48:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed;; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type;; bh=aP1kUv51RGEX5bUXwmxh QgTzMdQ=; b=QL2kWJUsfbClMFlTZ+a11trjpeiYuZwqcRQB2LbueGu2/jyOxq+t K8621BBGUJ2mbUCeE1Mjc3AU0JXZQxq6HFn/sY4RObvK8KJb5NMZka+SnptT8ABT cx9egX+0icDSsk7MK+AOA/3k6bxtRfrhmLr4XjC5S17PDAzSNSGpVEg=
Received: from ( []) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id 3188254058 for <>; Wed, 13 Feb 2013 13:48:30 -0800 (PST)
Received: by with SMTP id 8so1393216wgl.30 for <>; Wed, 13 Feb 2013 13:48:28 -0800 (PST)
MIME-Version: 1.0
X-Received: by with SMTP id om5mr41151016wjc.27.1360792108630; Wed, 13 Feb 2013 13:48:28 -0800 (PST)
Received: by with HTTP; Wed, 13 Feb 2013 13:48:28 -0800 (PST)
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
Date: Wed, 13 Feb 2013 15:48:28 -0600
Message-ID: <>
From: Nico Williams <>
To: Roberto Peon <>
Cc: Yoav Nir <>, HTTP Working Group <>
Content-Type: text/plain; charset=UTF-8
Received-SPF: none client-ip=;;
X-W3C-Hub-Spam-Status: No, score=-4.5
X-W3C-Hub-Spam-Report: AWL=-2.499, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001
X-W3C-Scan-Sig: 1U5kCC-0006bE-49 b07ece9ba0fa6d1d563818280e2ff07f
Subject: Re: http/2 prioritization/fairness bug with proxies
Archived-At: <>
X-Mailing-List: <> archive/latest/16599
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>

On Wed, Feb 13, 2013 at 3:23 PM, Roberto Peon <> wrote:
> On Wed, Feb 13, 2013 at 12:57 PM, Nico Williams <>
> wrote:
> Even if the network does the right thing and the bytes have arrived, TCP's
> API still only lets you access the packets in-order.

Are we brave enough to try SCTP instead of TCP for HTTP/2.0?

I didn't think so.

:) (that should be a sad smiley, actually)

>> There's two ways to address these issues: either don't it (it ==
>> multiplex diff QoS traffic over the same TCP conn.) or try hard never
>> to write more than one BDP's worth of bulk without considering higher
>> priority traffic.
> QoS for packets on multiple connections also doesn't work- each entity
> owning a connection sends at what it believes is its max rate, induces
> packet loss, gets throttled appropriately, and then takes too make RTTs to
> recover. You end up not fully utilizing the channel(s).

No, no, all bulk traffic should be sent over one connection at max
rate.  Multiple bulk flows can be multiplexed safely over one TCP
connection, therefore they should be.

High priority traffic _generally_ means "non-bulk", therefore "max
rate" for non-bulk is generally much, much less than for bulk and,
therefore, non-bulk traffic can be multiplexed safely over a single
TCP connection, being careful to move to a bulk connection when a
non-bulk flow changes nature.

The sender will know what whether a message is a bulk message or not.

One complication here is that many requests will be non-bulk but their
responses will be.  I.e., you might want to write the responses to
requests on a different connection from the request!  And now you need
an XID or some such, but you probably want one anyways so that
responses can be interleaved.

(For example by analogy, if we were talking about doing this as an
SSHv2 extension we might migrate a pty stdout channel to a bulk
connection when the user does a cat(1) of a huge file.  This is much
harder in SSHv2 because we have logical octet streams for interactive,
high-priority data, but we don't have such a thing in HTTP, so this is
not a concern at all.  This is just an analogy to illustrate the

> The hard part is "considering higher priority traffic" when that traffic is
> being send from a different machine, as would occur in the multiple
> connection case.

Are you talking about proxies aggregating traffic from multiple
clients into one [set of] TCP connection[s] to a given server?  Sure,
but all the proxy needs is to know whether a given request (or
response) is bulk or not.

> With a single connection, this is easy to coordinate. Agreed that estimating
> BDP isn't trivial (however it is something that TCP effectively has to do).

A single connection is a bad idea.  We already use multiple
connections today in _browsers_.  Of course, for non-browser apps
multiple connections may be quite a change, but that should be a)
optional, b) acceptable anyways.

>> > which we'd otherwise be able to do without. Unfortunately, per priority
>> > TCP
>> > connections don't work well for large loadbalancers where each of these
>> > connections will likely be terminating at a different place. This would
>> > create a difficult synchronization problem server side, full of races
>> > and
>> > complexity, and likely quite a bit worse in complexity than getting flow
>> > control working well.
>> I think you're saying that because of proxies it's difficult to ensure
>> per-priority TCP connections, but this is HTTP/2.0 we're talking
>> about.  We have the power to dictate that HTTP/2.0 proxies replicate
>> the client's per-priority TCP connection scheme.
> No, I'm saying that it is somewhere between difficult and impossible to
> ensure that separate connections from a client end up on one machine in the
> modern loadbalancer world.

I don't think it should be difficult, much less impossible, for
HTTP/_2.0_.  What you need for this is to identify flows so their
requests/responses can be grouped.  The main thing that comes to mind
is that the load balancer needs to understand Chunked PUTs/POSTs and
get them to go to the same end server -- surely this is handled
already in HTTP/1.1 load balancers.

> From a latency perspective, opening up the multiple connections can be a
> loss as well-- it increases server load for both CPU and memory and vastly
> increases the chance that you'll get a lost-packet on the SYN which takes
> far longer to recover from as it requires an RTO before RTT has likely been
> computed.

Well, sure, but the sender could share one connection for multiple QoS
traffic types while the additional connections come up, and hope for
the best -- mostly it should work out.

>> > Note that the recommendation will be that flow control be effectively
>> > disabled unless you know what you're doing, and have a good reason
>> > (memory
>> > pressure) to use it.
>> Huh?  Are you saying "we need and will specify flow control.  It won't
>> work.  Therefore we'll have it off by default."  How can that help?!
>> I don't see how it can.
> Everyone will be required to implement the flow control mechanism as a
> sender.
> Only those people who have effective memory limitations will require its use
> when receiving (since the receiver dictates policy for flow control).

So this is a source quench type flow control?  (As opposed to window
size type, as in SSHv2.)  But note that the issue isn't the need to
quench fast sources from slow sinks.  The issue is that by the time
you notice that you have a source/sink bandwidth mismatch it's too
late and TCP flow control has kicked in.  Of course, the receiver can
recover by quenching the sender and then reading and buffering
whatever's left on the wire, thus freeing up bandwidth on the wire for
other sources, but the cost is lots of buffer space on the receiver,
unless you can tell the sender to resend later and then you can throw
away instead of buffer.