Re: Deadlocking in the transport

Subodh Iyengar <subodh@fb.com> Wed, 10 January 2018 06:55 UTC

Return-Path: <prvs=454823342e=subodh@fb.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AB604126D46 for <quic@ietfa.amsl.com>; Tue, 9 Jan 2018 22:55:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.72
X-Spam-Level:
X-Spam-Status: No, score=-2.72 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=fb.com header.b=JN96xNWy; dkim=pass (1024-bit key) header.d=fb.onmicrosoft.com header.b=XXmrZswZ
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BrfMbB2kkJ1Z for <quic@ietfa.amsl.com>; Tue, 9 Jan 2018 22:55:08 -0800 (PST)
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 94C631200C1 for <quic@ietf.org>; Tue, 9 Jan 2018 22:55:08 -0800 (PST)
Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0A6sUse002815; Tue, 9 Jan 2018 22:55:05 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : mime-version; s=facebook; bh=DktV9kBgTfCBgpmz/JrxOycVK6QF0azl4Q3MMwMWIAU=; b=JN96xNWyuCIQM7G4vkxqER9Iwd/YucT+xHtf7BfsIAumNqQ2MLQHukDQrYLcvTqABnrF /V05yvSYzYogUl9olevbSDZ7L/QBDyNIdjwoFoKReoWUqM06xjGfESS9guSTGrCcJawz aQeInquZhktxzi1erJAzJ93liAW9WGqYdhE=
Received: from maileast.thefacebook.com ([199.201.65.23]) by mx0a-00082601.pphosted.com with ESMTP id 2fdb91r980-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 09 Jan 2018 22:55:05 -0800
Received: from NAM02-BL2-obe.outbound.protection.outlook.com (192.168.183.28) by o365-in.thefacebook.com (192.168.177.25) with Microsoft SMTP Server (TLS) id 14.3.361.1; Wed, 10 Jan 2018 01:55:02 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=DktV9kBgTfCBgpmz/JrxOycVK6QF0azl4Q3MMwMWIAU=; b=XXmrZswZ+tcbzuMr/TwU5+Fc8hRTWnp5mgTwcSAaSFDu+yPWaeht1ZP1fv5gkN1QKa41aglUB1Hx4AX7+0NNtJEf1K3LTlkOKGH6YTMoZbL65uMYiU1qMOZl+NvuTWuvHHMbCl6jHSwOtPkyJxA5mboUxLGp7yvxiAEijWroo+8=
Received: from MWHPR15MB1455.namprd15.prod.outlook.com (10.173.234.145) by MWHPR15MB1453.namprd15.prod.outlook.com (10.173.234.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.386.5; Wed, 10 Jan 2018 06:55:01 +0000
Received: from MWHPR15MB1455.namprd15.prod.outlook.com ([10.173.234.145]) by MWHPR15MB1455.namprd15.prod.outlook.com ([10.173.234.145]) with mapi id 15.20.0386.008; Wed, 10 Jan 2018 06:55:01 +0000
From: Subodh Iyengar <subodh@fb.com>
To: Jana Iyengar <jri@google.com>, Martin Thomson <martin.thomson@gmail.com>
CC: QUIC WG <quic@ietf.org>
Subject: Re: Deadlocking in the transport
Thread-Topic: Deadlocking in the transport
Thread-Index: AQHTidq/a4cBdzPDRUarOmEVeSeK1qNsqpwAgAAA0Rw=
Date: Wed, 10 Jan 2018 06:55:01 +0000
Message-ID: <MWHPR15MB14551E79F98123F466AB2235B6110@MWHPR15MB1455.namprd15.prod.outlook.com>
References: <CABkgnnUSMYRvYNUwzuJk4TQ28qb-sEHmgXhxpjKOBON43_rWCg@mail.gmail.com>, <CAGD1bZYV7iHg_YarUMqUSnpbAB2q8dwEWO=dHE2wbw8Oea_zfA@mail.gmail.com>
In-Reply-To: <CAGD1bZYV7iHg_YarUMqUSnpbAB2q8dwEWO=dHE2wbw8Oea_zfA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [2620:10d:c090:180::fd76]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; MWHPR15MB1453; 7:PtQ/mJ75ztnkaCsVnePj8WiEWR2FHMZMCLT3Rf9PPOa1kkGAacxlKEzK8OP9a8UVpM8K2CmWIfgdBWCxPjBzJjOE+LvbpMg/gD54ufbq3Yrt5xmuhQBJOSmNKzVxckt5ubFCnz4IQYNHU05Og5eDX/DeAfMCD2sN6FdsWpUidCZeeYK2kDCsai62dDDU9P+f/v9oe2TOVC5viU1/gphwzm8+E0FbuiHGxF9bkY5+e0VAboocWLDJY1RVRPEj0bgv; 20:6wIL98iK7iGPKeijpKPGmcOnrA6+1qD6feZBuyzdYLbE2Wtvma+9Ot/iJRRCk95bW1TQB+s60cDJmxkdBt2q0ajoHhsnpWv8dWVR1bcaxWWejhoS+NjpJPHjtrdG1fDY6Dt4BrPSx998JMzZ13bG/y67NPt9pIhupKstR/CCDP4=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-correlation-id: fc946f95-31e6-4f8a-4315-08d557f71215
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020047)(4652020)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020); SRVR:MWHPR15MB1453;
x-ms-traffictypediagnostic: MWHPR15MB1453:
x-microsoft-antispam-prvs: <MWHPR15MB14538AC9EAD8E2CE8C22B6CBB6110@MWHPR15MB1453.namprd15.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(278428928389397)(85827821059158)(211936372134217)(153496737603132);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040470)(2401047)(5005006)(8121501046)(93006095)(93001095)(3231023)(11241501184)(944501119)(10201501046)(3002001)(6041268)(20161123558120)(20161123560045)(20161123562045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:MWHPR15MB1453; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:MWHPR15MB1453;
x-forefront-prvs: 0548586081
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(979002)(366004)(346002)(39860400002)(396003)(376002)(39380400002)(24454002)(54094003)(199004)(189003)(6246003)(53936002)(478600001)(25786009)(86362001)(7736002)(39060400002)(4326008)(2900100001)(7696005)(236005)(99286004)(33656002)(19627405001)(9686003)(6436002)(81156014)(105586002)(8676002)(77096006)(54896002)(55016002)(53546011)(74316002)(81166006)(102836004)(6506007)(106356001)(229853002)(97736004)(76176011)(59450400001)(68736007)(6606003)(2950100002)(3660700001)(5660300001)(14454004)(8936002)(3480700004)(6116002)(110136005)(316002)(3280700002)(2906002)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1102; SCL:1; SRVR:MWHPR15MB1453; H:MWHPR15MB1455.namprd15.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en;
received-spf: None (protection.outlook.com: fb.com does not designate permitted sender hosts)
x-microsoft-antispam-message-info: V4NPxsRPBRzKG6QnBn5wvP/sxO8/8aMzNruPNW1JHso5Y78/Bu1nHYYjh1/81+ZM9I+rku6B7Sg2ewCvO/n1FQ==
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_MWHPR15MB14551E79F98123F466AB2235B6110MWHPR15MB1455namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: fc946f95-31e6-4f8a-4315-08d557f71215
X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Jan 2018 06:55:01.3465 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR15MB1453
X-OriginatorOrg: fb.com
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-01-10_04:, , signatures=0
X-Proofpoint-Spam-Reason: safe
X-FB-Internal: Safe
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/aRIvLbeGicy47hqG8XwvOqS6gbI>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jan 2018 06:55:11 -0000

Nice catch.


Option 4. is possible without protocol support I believe and this coincidentally is also the API that mvfst implements, i.e. our API counts buffered data as well towards the flow control for API writes.


Although even with this it seems very gotcha behavior which is not super obvious which people will definitely be burned by. I'm in favor of something like 3 which could potentially allow the receiver to signal to the sender to create an exception to the connection flow control for some streams when it gets into deadlocked states. There are some races with this, but it could be designed to work.


Subodh


________________________________
From: QUIC <quic-bounces@ietf.org> on behalf of Jana Iyengar <jri@google.com>
Sent: Tuesday, January 9, 2018 10:49:28 PM
To: Martin Thomson
Cc: QUIC WG
Subject: Re: Deadlocking in the transport

Martin,

You are right that this isn't a new concern, and that this is worth noting somewhere, perhaps in the applicability/API doc.

The crux of this issue is that there's structure in application data that the transport is unaware of. Specifically, there are dependencies among application data units that is opaque to the transport. Using the transport buffers as a part of handling these dependencies seems like a bad idea, especially since the transport is likely to make decisions about flow window updates based on rate at which data is consumed out of the receive buffer(s). GQUIC does this, and so does every respectable TCP receiver implementation.

The SCTP API avoids this problem by not allowing the application to read specific stream data out of the socket buffers. The receiving app receives data that could belong to any stream and has to demux after reading out of the socket. (Note that SCTP does not have per-stream flow control, so the receive side here is more like SPDY/TCP, modulo HoL blocking at the transport.)

Protocols that create inter-stream dependency should be able to express that in priorities down to the transport, which I believe is expected to be part of the API. I believe that handles this issue, doesn't it?

- jana

On Tue, Jan 9, 2018 at 10:17 PM, Martin Thomson <martin.thomson@gmail.com<mailto:martin.thomson@gmail.com>> wrote:
Building a complex application protocol on top of QUIC continues to
produce surprises.

Today in the header compression design team meeting we discussed a
deadlocking issue that I think warrants sharing with the larger group.
This has implications for how people build a QUIC transport layer.  It
might need changes to the API that is exposed by that layer.

This isn't really that new, but I don't think we've properly addressed
the problem.


## The Basic Problem

If a protocol creates a dependency between streams, there is a
potential for flow control to deadlock.

Say that I send X on stream 3 and Y on stream 7.  Processing Y
requires that X is processed first.

X cannot be sent due to flow control but Y is sent.  This is always
possible even if X is appropriately prioritized.  The receiver then
leaves Y in its receive buffer until X is received.

The receiver cannot give flow control credit for consuming Y because
it can't consume Y until X is sent.  But the sender needs flow control
credit to send X.  We are deadlocked.

It doesn't matter whether the stream or connection flow control is
causing the problem, either produces the same result.

(To give some background on this, we were considering a preface to
header blocks that identified the header table state that was
necessary to process the header block.  This would allow for
concurrent population of the header table and sending message that
depended on the header table state that is under construction.  A
receiver would read the identifier and then leave the remainder of the
header block in the receive buffer until the header table was ready.)


## Options

It seems like there are a few decent options for managing this.  These
are what occurred to me (there are almost certainly more options):

1. Don't do that.  We might concede in this case that seeking the
incremental improvement to compression efficiency isn't worth the
risk.  That is, we might make a general statement that this sort of
inter-stream blocking is a bad idea.

2. Force receivers to consume data or reset streams in the case of
unfulfilled dependencies.  The former seems like it might be too much
like magical thinking, in the sense that it requires that receivers
conjure more memory up, but if the receiver were required to read Y
and release the flow control credit, then all would be fine.  For
instance, we could require that the receiver reset a stream if it
couldn't read and handle data.  It seems like a bad arrangement
though: you either have to allocate more memory than you would like or
suffer the time and opportunity cost of having to do Y over.

3. Create an exception for flow control.  This is what Google QUIC
does for its headers stream.  Roberto observed that we could
alternatively create a frame type that was excluded from flow control.
If this were used for data that had dependencies, then it would be
impossible to deadlock.  It would be similarly difficult to account
for memory allocation, though if it were possible to process on
receipt, then this *might* work.  We'd have to do something to address
out-of-order delivery though.  It's possible that the stream
abstraction is not appropriate in this case.

4. Block the problem at the source.  It was suggested that in cases
where there is a potential dependency, then it can't be a problem if
the transport refused to accept data that it didn't have flow control
credit for.  Writes to the transport would consume flow control credit
immediately.  That way applications would only be able to write X if
there was a chance that it would be delivered.  Applications that have
ordering requirements can ensure that Y is written after X is accepted
by the transport and thereby avoid the deadlock.  Writes might block
rather than fail, if the API wasn't into the whole non-blocking I/O
thing.  The transport might still have to buffer X for other reasons,
like congestion control, but it can guarantee that flow control isn't
going to block delivery.


## My Preference

Right now, I'm inclined toward option 4. Option 1 seems a little too
much of a constraint.  Protocols create this sort of inter-dependency
naturally.

There's a certain purity in having the flow control exert back
pressure all the way to the next layer up.  Not being able to build a
transport with unconstrained writes is potentially creating
undesirable externalities on transport users.  Now they have to worry
about flow control as well.  Personally, I'm inclined to say that this
is something that application protocols and their users should be
exposed to.  We've seen with the JS streams API that it's valuable to
have back pressure available at the application layer and also how it
is possible to do that relatively elegantly.

I'm almost certain that I haven't thought about all the potential
alternatives.  I wonder if there isn't some experience with this
problem in SCTP that might lend some insights.