Re: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time

Kobby Carmona <kobby.Carmona@qlogic.com> Thu, 11 August 2016 20:46 UTC

Return-Path: <kobby.Carmona@qlogic.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 005E612D927 for <tcpm@ietfa.amsl.com>; Thu, 11 Aug 2016 13:46:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.142
X-Spam-Level:
X-Spam-Status: No, score=-1.142 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_NEUTRAL=0.779] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=qlgc.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id je2nKEwqByQQ for <tcpm@ietfa.amsl.com>; Thu, 11 Aug 2016 13:46:24 -0700 (PDT)
Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on0113.outbound.protection.outlook.com [104.47.36.113]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E727712D925 for <tcpm@ietf.org>; Thu, 11 Aug 2016 13:46:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qlgc.onmicrosoft.com; s=selector1-qlogic-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=e2iGuoshFisrm8CoF467lTwSg+RUwZQiHYw3x8Y2XtM=; b=AxAPndwfvEWeFSJuj2k1GdbbjSykhqA4Mr8/jKRmskZWBCy9iazEj12g7A4j8SNr0hZWEHWtNAvpC8sdk1Y4lxhH1OxlS41kMCBRbHQeOMSvCEwO8J5L5MgSUgnlqH0H4epr7BSbu3I9bnY6IpIZpfwc3rqKqGP7OcOamHIC6cw=
Received: from CY4PR11MB1878.namprd11.prod.outlook.com (10.175.61.140) by CY4PR11MB1878.namprd11.prod.outlook.com (10.175.61.140) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.549.15; Thu, 11 Aug 2016 20:46:21 +0000
Received: from CY4PR11MB1878.namprd11.prod.outlook.com ([10.175.61.140]) by CY4PR11MB1878.namprd11.prod.outlook.com ([10.175.61.140]) with mapi id 15.01.0549.025; Thu, 11 Aug 2016 20:46:21 +0000
From: Kobby Carmona <kobby.Carmona@qlogic.com>
To: David Borman <dab@weston.borman.com>
Thread-Topic: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time
Thread-Index: AdHth2zqZmjYio0mRrKVRRoD8V9FcgBABbmAAH+o2jAAG1n8AADHNNgA
Date: Thu, 11 Aug 2016 20:46:20 +0000
Message-ID: <CY4PR11MB1878E7E911194A1032E32428841E0@CY4PR11MB1878.namprd11.prod.outlook.com>
References: <MWHPR11MB1374A50BC599B093EA09668984070@MWHPR11MB1374.namprd11.prod.outlook.com> <7070553C-65D1-46EE-95F4-DAE82E1F5A5E@weston.borman.com> <CY4PR11MB187848FCCEF4DB140F85913E841A0@CY4PR11MB1878.namprd11.prod.outlook.com> <2D524A8D-A5CA-45A6-B94D-FA1DA0CEE609@weston.borman.com>
In-Reply-To: <2D524A8D-A5CA-45A6-B94D-FA1DA0CEE609@weston.borman.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=kobby.Carmona@qlogic.com;
x-originating-ip: [46.19.86.168]
x-ms-office365-filtering-correlation-id: 7c5f6f7c-f4d1-4d80-1253-08d3c2288d2e
x-microsoft-exchange-diagnostics: 1; CY4PR11MB1878; 20:FPas+AHxTN2mgz/VI9L7JeaFtTA3lIBkklQBEHF1ansGeC3dLXv2Zh7z0klosXDGXwYqk2JeKKBuoL14zwXI1iHuvEJe8O6l7q2d2B+H04IjqsVzWn4AOPaKs59QoJ5zPMiEqs0XF/RgLmb6iCOnwyvBIOwtKmQL9kDl6d4Iek8=
x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY4PR11MB1878;
x-microsoft-antispam-prvs: <CY4PR11MB1878308D61427C8CB55EBE84841E0@CY4PR11MB1878.namprd11.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:;
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040174)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001); SRVR:CY4PR11MB1878; BCL:0; PCL:0; RULEID:; SRVR:CY4PR11MB1878;
x-forefront-prvs: 0031A0FFAF
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(7916002)(377454003)(40764003)(24454002)(13464003)(199003)(189002)(53754006)(81166006)(81156014)(101416001)(10400500002)(33656002)(15975445007)(586003)(7696003)(8676002)(9686002)(122556002)(3280700002)(76576001)(2900100001)(7736002)(68736007)(7846002)(11100500001)(8936002)(3660700001)(2950100001)(92566002)(86362001)(102836003)(87936001)(2906002)(6116002)(19580405001)(3846002)(5002640100001)(4326007)(305945005)(189998001)(54356999)(99286002)(77096005)(110136002)(93886004)(105586002)(76176999)(97736004)(50986999)(19580395003)(74316002)(106356001)(66066001); DIR:OUT; SFP:1102; SCL:1; SRVR:CY4PR11MB1878; H:CY4PR11MB1878.namprd11.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en;
received-spf: None (protection.outlook.com: qlogic.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: qlogic.com
X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Aug 2016 20:46:20.7534 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 0d68a1f9-1490-4d0e-8767-a87dab3ef2ba
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR11MB1878
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/BRtrhC1fAqVFF-ORzcZYYFXfTuk>
X-Mailman-Approved-At: Fri, 12 Aug 2016 08:48:38 -0700
Cc: "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Aug 2016 20:46:26 -0000

Hi David,
This makes a lot of sense. We will fix our code.
Thanks for your help on this,

BTW,
Is this issue of mentioned in any RFC? If not do you see a point in adding explicit note on the SEQ of pure ACK in case of retransmission?

	Kobby

-----Original Message-----
From: David Borman [mailto:dab@weston.borman.com] 
Sent: Monday, August 08, 2016 12:35 AM
To: Kobby Carmona <kobby.Carmona@qlogic.com>
Cc: tcpm@ietf.org
Subject: Re: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time

> On Aug 7, 2016, at 3:37 AM, Kobby Carmona <kobby.Carmona@qlogic.com> wrote:
> 
> Hi David,
> The code below is true for fast-retransmit.
> But in case of retransmit timer expiration (TCPT_REXMT) snd_nxt is set to snd_una. And in this case CQND will be set to 1MSS so in the example below the transmitters can send only a single segment from sequence 2000/12000.

Your underlying problem is that the ACK-only packets are being sent with the wrong sequence number, and that is what is causing them to be dropped.  One way to fix that is to put SND.NXT back to its previous value after do the the retransmit.  But you are correct, in the BSD code it only does that in the fast retransmit code; when the timer based retransmit code fires, it just pulls back SND.NXT to SND.UNA.  However, in the BSD code that I’m looking at, it keeps track of the largest sequence number sent in SND.MAX, and in the tcp_output() path there is this bit of code: 

       /*
        * If we are doing retransmissions, then snd_nxt will
        * not reflect the first unsent octet.  For ACK only
        * packets, we do not want the sequence number of the 
        * retransmitted packet, we want the sequence number
        * of the next unsent octet.  So, if there is no data
        * (and no SYN or FIN), use snd_max instead of snd_nxt
        * when filling in th_seq.  But if we are in persist
        * state, snd_max might reflect one byte beyond the
        * right edge of the window, so use snd_nxt in that
        * case, since we know we aren't doing a retransmission.
        * (retransmit and persist are mutually exclusive...)
        */
       if (len || (flags & (TH_SYN|TH_FIN)) || tp->t_timer[TCPT_PERSIST])
               th->th_seq = htonl(tp->snd_nxt);
       else
               th->th_seq = htonl(tp->snd_max);

That is what your implementation appears to be missing, and what is causing your ACK storm.  So yes, some variant of your problem has been seen before, and this is how the BSD code fixed it.

			-David Borman

> 
> 	Kobby
> 
> -----Original Message-----
> From: David Borman [mailto:dab@weston.borman.com]
> Sent: Thursday, August 04, 2016 10:37 PM
> To: Kobby Carmona <kobby.Carmona@qlogic.com>
> Cc: tcpm@ietf.org
> Subject: Re: [tcpm] Possible deadlock scenario with retransmission on 
> both sides at the same time
> 
> After you pull back SND.NXT and do the retransmission, you should then restore SND.NXT back to where it was, not leave it at the backed off value; then the ACKs wouldn’t be dropped, since they wouldn’t have old seq values.  For example, in the 4.4BSD fast retransmit code it had:
> 
> 			tcp_seq onxt = tp->snd_nxt;
> 			...
> 			tp->snd_nxt = th->th_ack;
> 			...
> 			(void) tcp_output(tp);
> 			...
> 			if (SEQ_GT(onxt, tp->snd_nxt))  
> 				tp->snd_nxt = onxt;
> 
> 
> 			-David Borman
> 
>> On Aug 4, 2016, at 4:08 AM, Kobby Carmona <kobby.Carmona@qlogic.com> wrote:
>> 
>> Hi all,
>> While running a bidirectional scenario with random drops in a network simulator of our (QLogic's NIC) TCP stack we found a case where it seems there is deadlock in the TCP protocol (the connection will keep sending pure acks from both sides until RTO will expire multiple times and a RST will sent to close the connection).
>> The scenario is as follows (there is an example with numbers for each stage assuming the MSS and each packet is 1000B):
>> 1. Both sides are transmitting data and a single packet is dropped on either side and the next two packets are received properly
>> 	Side A - SND.MAX=3000, SND.NXT=3000, SND.UNA=1000, RCV.NXT=11000, out-of-order block 12000-13000
>> 	Side B - SND.MAX =13000, SND.NXT =13000, SND.UNA=11000, 
>> RCV.NXT=1000, out-of-order block 2000-3000 2. RTO timer expires on both sides
>> 	Side A - SND.MAX=3000, SND.NXT=1000, SND.UNA=1000, RCV.NXT=11000, out-of-order block 12000-13000
>> 	Side B - SND.MAX =13000, SND.NXT=11000, SND.UNA=11000, RCV.NXT=1000, 
>> out-of-order block 2000-3000 3. Both sides transmit a single packet to the peer:
>> 	A->B - pkt.seq=1000, pkt.ack=11000, len=1000
>> 	B->A - pkt.seq=11000, pkt.ack=1000, len=1000 3. Both sides receive 
>> the packets and update the receive context:
>> 	Side A - SND.MAX=3000, SND.NXT=2000, SND.UNA=1000, RCV.NXT=13000
>> 	Side B - SND.MAX=13000, SND.NXT=12000, SND.UNA=11000, RCV.NXT=3000 4. 
>> Both sides send another segment:
>> 	A->B - pkt.seq=2000, pkt.ack=13000, len=1000
>> 	B->A - pkt.seq=12000, pkt.ack=3000, len=1000 5. Both sides don't 
>> accept the packet (and don't update SND.UNA) since the sequence on the packet is less than RCV.NXT (sequence number check in page 69 of RFC793) and send a pure ACK instead
>> 	A->B - pkt.seq=2000, pkt.ack=13000, len=0 (pure ACK)
>> 	B->A - pkt.seq=12000, pkt.ack=3000, len=0 (pure ACK) 6. This will 
>> continue forever (until the connection will be terminated by RST) since every packet that ends before RCV.NXT (even a retransmit from SND.UNA) will be dropped.
>> 
>> Did anyone encountered this issue before? Is the anything we missed on this sequence?
>> If this is indeed a real deadlock, there might be several solutions to this which will require a modification in receive processing of RFC793. But I would like to know if you think this is a real issue before dealing with solutions.
>> Thanks,
>> 
>> 	Kobby
>> 
>> 
>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
> 
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm