Re: Deadlocking in the transport

"Brian Trammell (IETF)" <> Wed, 10 January 2018 07:05 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D0E3C126D46 for <>; Tue, 9 Jan 2018 23:05:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 2mNfbYmcov6I for <>; Tue, 9 Jan 2018 23:04:58 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 97D8B12704A for <>; Tue, 9 Jan 2018 23:04:58 -0800 (PST)
Received: from (localhost []) by localhost (Postfix) with ESMTP id AF50F3402A0; Wed, 10 Jan 2018 08:04:55 +0100 (CET)
Received: from localhost (localhost []) by localhost (ACF/6597.12186); Wed, 10 Jan 2018 08:04:53 +0100 (CET)
Received: from ( []) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS; Wed, 10 Jan 2018 08:04:53 +0100 (CET)
Received: from [] (account HELO []) by (CommuniGate Pro SMTP 6.1.18) with ESMTPSA id 41652013; Wed, 10 Jan 2018 08:04:53 +0100
From: "Brian Trammell (IETF)" <>
Message-Id: <>
Content-Type: multipart/signed; boundary="Apple-Mail=_F6232BD4-D1B9-4872-B19C-E5202C5D9B03"; protocol="application/pgp-signature"; micalg=pgp-sha512
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: Deadlocking in the transport
Date: Wed, 10 Jan 2018 08:04:52 +0100
In-Reply-To: <>
Cc: QUIC WG <>
To: Martin Thomson <>
References: <>
X-Mailer: Apple Mail (2.3273)
Archived-At: <>
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 10 Jan 2018 07:05:02 -0000

hi Martin, all,

> On 10 Jan 2018, at 07:17, Martin Thomson <> wrote:
> Building a complex application protocol on top of QUIC continues to
> produce surprises.
> Today in the header compression design team meeting we discussed a
> deadlocking issue that I think warrants sharing with the larger group.
> This has implications for how people build a QUIC transport layer.  It
> might need changes to the API that is exposed by that layer.
> This isn't really that new, but I don't think we've properly addressed
> the problem.

Indeed, we ran into a closely related problem with IPFIX. IPFIX itself had no flow control, so reordering and unreliable transport of dependent information (in this case, templates and the data they describe) was a matter of ambiguous interpretation, not deadlock. But most of the substantive changes between the Proposed Standard (RFC 5101) and Internet Standard (RFC 7011) versions of the protocol had to do with fixing how collectors handled potentially unordered or missing template information (see section 8 of 7011 if you're interested in details).

The IPFIX case is further complicated because it runs over TCP (in order, reliable, single stream, so template ambiguity is not a problem), SCTP (where there can be stream dependencies; 5101 allowed anything on any stream at any time, 6526 suggested a refinement where every template ID got its own stream, thereby ordering template change messages, and 7011 specified generalized ordering), and UDP (where template ID reuse is only permitted after a long delay).

> ## The Basic Problem
> If a protocol creates a dependency between streams, there is a
> potential for flow control to deadlock.
> Say that I send X on stream 3 and Y on stream 7.  Processing Y
> requires that X is processed first.
> X cannot be sent due to flow control but Y is sent.  This is always
> possible even if X is appropriately prioritized.  The receiver then
> leaves Y in its receive buffer until X is received.
> The receiver cannot give flow control credit for consuming Y because
> it can't consume Y until X is sent.  But the sender needs flow control
> credit to send X.  We are deadlocked.
> It doesn't matter whether the stream or connection flow control is
> causing the problem, either produces the same result.
> (To give some background on this, we were considering a preface to
> header blocks that identified the header table state that was
> necessary to process the header block.  This would allow for
> concurrent population of the header table and sending message that
> depended on the header table state that is under construction.  A
> receiver would read the identifier and then leave the remainder of the
> header block in the receive buffer until the header table was ready.)
> ## Options
> It seems like there are a few decent options for managing this.  These
> are what occurred to me (there are almost certainly more options):
> 1. Don't do that.  We might concede in this case that seeking the
> incremental improvement to compression efficiency isn't worth the
> risk.  That is, we might make a general statement that this sort of
> inter-stream blocking is a bad idea.

As I noted in the chat during the Vancouver interim, both for the specific case of header compression, as well as in general, I am an emphatic and enthusiastic supporter of "don't do that". Introducing stream dependencies takes a trivially implementable dictionary-transfer protocol and turns it into something that is very, very difficult to get right, and "implementability by people outside the QUIC Interop Slack channel" is a primary technical requirement of the protocol spec.

> 2. Force receivers to consume data or reset streams in the case of
> unfulfilled dependencies.  The former seems like it might be too much
> like magical thinking, in the sense that it requires that receivers
> conjure more memory up, but if the receiver were required to read Y
> and release the flow control credit, then all would be fine.  For
> instance, we could require that the receiver reset a stream if it
> couldn't read and handle data.  It seems like a bad arrangement
> though: you either have to allocate more memory than you would like or
> suffer the time and opportunity cost of having to do Y over.

This is the approach we used in IPFIX, with varying success. We have the advantage that every IPFIX message can be ordered by its export timestamp and sequence number, so we basically tell collectors to use a buffer of unspecified size to reorder messages, to consider them in the order they should have been sent in, and to freak out if that still leads to detectable ambiguity.

AFAIK, nobody successfully uses template deletion and ID reuse in practice unless there are relatively long delays between them.

> 3. Create an exception for flow control.  This is what Google QUIC
> does for its headers stream.  Roberto observed that we could
> alternatively create a frame type that was excluded from flow control.
> If this were used for data that had dependencies, then it would be
> impossible to deadlock.  It would be similarly difficult to account
> for memory allocation, though if it were possible to process on
> receipt, then this *might* work.  We'd have to do something to address
> out-of-order delivery though.  It's possible that the stream
> abstraction is not appropriate in this case.

This seems to me to be inexcusably ugly.

In any case, this would need to be generalized to be an "application-layer control information frame", and there would need to be big red stickers all over it, because it would be so tempting to abuse, because such abuse would almost always work in test.

> 4. Block the problem at the source.  It was suggested that in cases
> where there is a potential dependency, then it can't be a problem if
> the transport refused to accept data that it didn't have flow control
> credit for.  Writes to the transport would consume flow control credit
> immediately.  That way applications would only be able to write X if
> there was a chance that it would be delivered.  Applications that have
> ordering requirements can ensure that Y is written after X is accepted
> by the transport and thereby avoid the deadlock.  Writes might block
> rather than fail, if the API wasn't into the whole non-blocking I/O
> thing.  The transport might still have to buffer X for other reasons,
> like congestion control, but it can guarantee that flow control isn't
> going to block delivery.

I'm not sure how I see this is fundamentally different from "don't do that" from the transport protocol's point of view.

> ## My Preference
> Right now, I'm inclined toward option 4. Option 1 seems a little too
> much of a constraint.  Protocols create this sort of inter-dependency
> naturally.

Do we have any actual numbers on how much efficiency can actually be squeezed out of allowing compression dictionaries to create unbounded dependencies between streams on common, non-pathological workloads?



> There's a certain purity in having the flow control exert back
> pressure all the way to the next layer up.  Not being able to build a
> transport with unconstrained writes is potentially creating
> undesirable externalities on transport users.  Now they have to worry
> about flow control as well.  Personally, I'm inclined to say that this
> is something that application protocols and their users should be
> exposed to.  We've seen with the JS streams API that it's valuable to
> have back pressure available at the application layer and also how it
> is possible to do that relatively elegantly.
> I'm almost certain that I haven't thought about all the potential
> alternatives.  I wonder if there isn't some experience with this
> problem in SCTP that might lend some insights.