Re: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time
Yoshifumi Nishida <nishida@sfc.wide.ad.jp> Mon, 15 August 2016 22:39 UTC
Return-Path: <nishida@sfc.wide.ad.jp>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0D3C912D797 for <tcpm@ietfa.amsl.com>; Mon, 15 Aug 2016 15:39:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.147
X-Spam-Level:
X-Spam-Status: No, score=-3.147 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-1.247, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id j8YaPjK9gxC1 for <tcpm@ietfa.amsl.com>; Mon, 15 Aug 2016 15:39:19 -0700 (PDT)
Received: from mail.sfc.wide.ad.jp (shonan.sfc.wide.ad.jp [IPv6:2001:200:0:8803::53]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AFC3112D796 for <tcpm@ietf.org>; Mon, 15 Aug 2016 15:39:19 -0700 (PDT)
Received: from mail-ua0-f182.google.com (mail-ua0-f182.google.com [209.85.217.182]) by mail.sfc.wide.ad.jp (Postfix) with ESMTPSA id 7DDC7278283 for <tcpm@ietf.org>; Tue, 16 Aug 2016 07:39:17 +0900 (JST)
Received: by mail-ua0-f182.google.com with SMTP id k90so94883045uak.1 for <tcpm@ietf.org>; Mon, 15 Aug 2016 15:39:17 -0700 (PDT)
X-Gm-Message-State: AEkoous075/pFUFx/oPTf7iVzL0SYCtA/TdAF88+zOhA3Ul7sSx40GxNO2d0JTtaAF5HY2aBx8pTc0XFw/NG3A==
X-Received: by 10.159.32.2 with SMTP id 2mr4330428uam.74.1471300756039; Mon, 15 Aug 2016 15:39:16 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.159.37.1 with HTTP; Mon, 15 Aug 2016 15:39:15 -0700 (PDT)
In-Reply-To: <CY4PR11MB1878E7E911194A1032E32428841E0@CY4PR11MB1878.namprd11.prod.outlook.com>
References: <MWHPR11MB1374A50BC599B093EA09668984070@MWHPR11MB1374.namprd11.prod.outlook.com> <7070553C-65D1-46EE-95F4-DAE82E1F5A5E@weston.borman.com> <CY4PR11MB187848FCCEF4DB140F85913E841A0@CY4PR11MB1878.namprd11.prod.outlook.com> <2D524A8D-A5CA-45A6-B94D-FA1DA0CEE609@weston.borman.com> <CY4PR11MB1878E7E911194A1032E32428841E0@CY4PR11MB1878.namprd11.prod.outlook.com>
From: Yoshifumi Nishida <nishida@sfc.wide.ad.jp>
Date: Mon, 15 Aug 2016 15:39:15 -0700
X-Gmail-Original-Message-ID: <CAO249yeTNRFuFcWn5ga854g24GM_7DAjeKOSw3d7h=HQQYKX1Q@mail.gmail.com>
Message-ID: <CAO249yeTNRFuFcWn5ga854g24GM_7DAjeKOSw3d7h=HQQYKX1Q@mail.gmail.com>
To: Kobby Carmona <kobby.Carmona@qlogic.com>
Content-Type: multipart/alternative; boundary="94eb2c04c796ebf35b053a23e6fc"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/pq_3io49ahipw9iRZXoRWgqyVIU>
Cc: David Borman <dab@weston.borman.com>, "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2016 22:39:23 -0000
Hello, I personally think this is an interesting corner case for discussion. It looks a minor one, but I'm not not very sure if we can leave it for each implementation. I also guess a question would be if the BSD's fix is the best way for the issue. Thanks, -- Yoshi On Thu, Aug 11, 2016 at 1:46 PM, Kobby Carmona <kobby.Carmona@qlogic.com> wrote: > Hi David, > This makes a lot of sense. We will fix our code. > Thanks for your help on this, > > BTW, > Is this issue of mentioned in any RFC? If not do you see a point in adding > explicit note on the SEQ of pure ACK in case of retransmission? > > Kobby > > -----Original Message----- > From: David Borman [mailto:dab@weston.borman.com] > Sent: Monday, August 08, 2016 12:35 AM > To: Kobby Carmona <kobby.Carmona@qlogic.com> > Cc: tcpm@ietf.org > Subject: Re: [tcpm] Possible deadlock scenario with retransmission on both > sides at the same time > > > On Aug 7, 2016, at 3:37 AM, Kobby Carmona <kobby.Carmona@qlogic.com> > wrote: > > > > Hi David, > > The code below is true for fast-retransmit. > > But in case of retransmit timer expiration (TCPT_REXMT) snd_nxt is set > to snd_una. And in this case CQND will be set to 1MSS so in the example > below the transmitters can send only a single segment from sequence > 2000/12000. > > Your underlying problem is that the ACK-only packets are being sent with > the wrong sequence number, and that is what is causing them to be dropped. > One way to fix that is to put SND.NXT back to its previous value after do > the the retransmit. But you are correct, in the BSD code it only does that > in the fast retransmit code; when the timer based retransmit code fires, it > just pulls back SND.NXT to SND.UNA. However, in the BSD code that I’m > looking at, it keeps track of the largest sequence number sent in SND.MAX, > and in the tcp_output() path there is this bit of code: > > /* > * If we are doing retransmissions, then snd_nxt will > * not reflect the first unsent octet. For ACK only > * packets, we do not want the sequence number of the > * retransmitted packet, we want the sequence number > * of the next unsent octet. So, if there is no data > * (and no SYN or FIN), use snd_max instead of snd_nxt > * when filling in th_seq. But if we are in persist > * state, snd_max might reflect one byte beyond the > * right edge of the window, so use snd_nxt in that > * case, since we know we aren't doing a retransmission. > * (retransmit and persist are mutually exclusive...) > */ > if (len || (flags & (TH_SYN|TH_FIN)) || tp->t_timer[TCPT_PERSIST]) > th->th_seq = htonl(tp->snd_nxt); > else > th->th_seq = htonl(tp->snd_max); > > That is what your implementation appears to be missing, and what is > causing your ACK storm. So yes, some variant of your problem has been seen > before, and this is how the BSD code fixed it. > > -David Borman > > > > > Kobby > > > > -----Original Message----- > > From: David Borman [mailto:dab@weston.borman.com] > > Sent: Thursday, August 04, 2016 10:37 PM > > To: Kobby Carmona <kobby.Carmona@qlogic.com> > > Cc: tcpm@ietf.org > > Subject: Re: [tcpm] Possible deadlock scenario with retransmission on > > both sides at the same time > > > > After you pull back SND.NXT and do the retransmission, you should then > restore SND.NXT back to where it was, not leave it at the backed off value; > then the ACKs wouldn’t be dropped, since they wouldn’t have old seq > values. For example, in the 4.4BSD fast retransmit code it had: > > > > tcp_seq onxt = tp->snd_nxt; > > ... > > tp->snd_nxt = th->th_ack; > > ... > > (void) tcp_output(tp); > > ... > > if (SEQ_GT(onxt, tp->snd_nxt)) > > tp->snd_nxt = onxt; > > > > > > -David Borman > > > >> On Aug 4, 2016, at 4:08 AM, Kobby Carmona <kobby.Carmona@qlogic.com> > wrote: > >> > >> Hi all, > >> While running a bidirectional scenario with random drops in a network > simulator of our (QLogic's NIC) TCP stack we found a case where it seems > there is deadlock in the TCP protocol (the connection will keep sending > pure acks from both sides until RTO will expire multiple times and a RST > will sent to close the connection). > >> The scenario is as follows (there is an example with numbers for each > stage assuming the MSS and each packet is 1000B): > >> 1. Both sides are transmitting data and a single packet is dropped on > either side and the next two packets are received properly > >> Side A - SND.MAX=3000, SND.NXT=3000, SND.UNA=1000, RCV.NXT=11000, > out-of-order block 12000-13000 > >> Side B - SND.MAX =13000, SND.NXT =13000, SND.UNA=11000, > >> RCV.NXT=1000, out-of-order block 2000-3000 2. RTO timer expires on both > sides > >> Side A - SND.MAX=3000, SND.NXT=1000, SND.UNA=1000, RCV.NXT=11000, > out-of-order block 12000-13000 > >> Side B - SND.MAX =13000, SND.NXT=11000, SND.UNA=11000, > RCV.NXT=1000, > >> out-of-order block 2000-3000 3. Both sides transmit a single packet to > the peer: > >> A->B - pkt.seq=1000, pkt.ack=11000, len=1000 > >> B->A - pkt.seq=11000, pkt.ack=1000, len=1000 3. Both sides receive > >> the packets and update the receive context: > >> Side A - SND.MAX=3000, SND.NXT=2000, SND.UNA=1000, RCV.NXT=13000 > >> Side B - SND.MAX=13000, SND.NXT=12000, SND.UNA=11000, RCV.NXT=3000 > 4. > >> Both sides send another segment: > >> A->B - pkt.seq=2000, pkt.ack=13000, len=1000 > >> B->A - pkt.seq=12000, pkt.ack=3000, len=1000 5. Both sides don't > >> accept the packet (and don't update SND.UNA) since the sequence on the > packet is less than RCV.NXT (sequence number check in page 69 of RFC793) > and send a pure ACK instead > >> A->B - pkt.seq=2000, pkt.ack=13000, len=0 (pure ACK) > >> B->A - pkt.seq=12000, pkt.ack=3000, len=0 (pure ACK) 6. This will > >> continue forever (until the connection will be terminated by RST) since > every packet that ends before RCV.NXT (even a retransmit from SND.UNA) will > be dropped. > >> > >> Did anyone encountered this issue before? Is the anything we missed on > this sequence? > >> If this is indeed a real deadlock, there might be several solutions to > this which will require a modification in receive processing of RFC793. But > I would like to know if you think this is a real issue before dealing with > solutions. > >> Thanks, > >> > >> Kobby > >> > >> > >> _______________________________________________ > >> tcpm mailing list > >> tcpm@ietf.org > >> https://www.ietf.org/mailman/listinfo/tcpm > > > > _______________________________________________ > > tcpm mailing list > > tcpm@ietf.org > > https://www.ietf.org/mailman/listinfo/tcpm > > _______________________________________________ > tcpm mailing list > tcpm@ietf.org > https://www.ietf.org/mailman/listinfo/tcpm >
- Re: [tcpm] Possible deadlock scenario with retran… David Borman
- Re: [tcpm] Possible deadlock scenario with retran… Neal Cardwell
- Re: [tcpm] Possible deadlock scenario with retran… Yoshifumi Nishida
- Re: [tcpm] Possible deadlock scenario with retran… Kobby Carmona
- Re: [tcpm] Possible deadlock scenario with retran… David Borman
- Re: [tcpm] Possible deadlock scenario with retran… Kobby Carmona
- Re: [tcpm] Possible deadlock scenario with retran… David Borman
- [tcpm] Possible deadlock scenario with retransmis… Kobby Carmona
- Re: [tcpm] Possible deadlock scenario with retran… Yoshifumi Nishida
- Re: [tcpm] Possible deadlock scenario with retran… Neal Cardwell
- Re: [tcpm] Possible deadlock scenario with retran… Yoshifumi Nishida
- Re: [tcpm] Possible deadlock scenario with retran… Yoshifumi Nishida