Re: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time

Yoshifumi Nishida <nishida@sfc.wide.ad.jp> Mon, 15 August 2016 22:39 UTC

Return-Path: <nishida@sfc.wide.ad.jp>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0D3C912D797 for <tcpm@ietfa.amsl.com>; Mon, 15 Aug 2016 15:39:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.147
X-Spam-Level:
X-Spam-Status: No, score=-3.147 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-1.247, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id j8YaPjK9gxC1 for <tcpm@ietfa.amsl.com>; Mon, 15 Aug 2016 15:39:19 -0700 (PDT)
Received: from mail.sfc.wide.ad.jp (shonan.sfc.wide.ad.jp [IPv6:2001:200:0:8803::53]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AFC3112D796 for <tcpm@ietf.org>; Mon, 15 Aug 2016 15:39:19 -0700 (PDT)
Received: from mail-ua0-f182.google.com (mail-ua0-f182.google.com [209.85.217.182]) by mail.sfc.wide.ad.jp (Postfix) with ESMTPSA id 7DDC7278283 for <tcpm@ietf.org>; Tue, 16 Aug 2016 07:39:17 +0900 (JST)
Received: by mail-ua0-f182.google.com with SMTP id k90so94883045uak.1 for <tcpm@ietf.org>; Mon, 15 Aug 2016 15:39:17 -0700 (PDT)
X-Gm-Message-State: AEkoous075/pFUFx/oPTf7iVzL0SYCtA/TdAF88+zOhA3Ul7sSx40GxNO2d0JTtaAF5HY2aBx8pTc0XFw/NG3A==
X-Received: by 10.159.32.2 with SMTP id 2mr4330428uam.74.1471300756039; Mon, 15 Aug 2016 15:39:16 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.159.37.1 with HTTP; Mon, 15 Aug 2016 15:39:15 -0700 (PDT)
In-Reply-To: <CY4PR11MB1878E7E911194A1032E32428841E0@CY4PR11MB1878.namprd11.prod.outlook.com>
References: <MWHPR11MB1374A50BC599B093EA09668984070@MWHPR11MB1374.namprd11.prod.outlook.com> <7070553C-65D1-46EE-95F4-DAE82E1F5A5E@weston.borman.com> <CY4PR11MB187848FCCEF4DB140F85913E841A0@CY4PR11MB1878.namprd11.prod.outlook.com> <2D524A8D-A5CA-45A6-B94D-FA1DA0CEE609@weston.borman.com> <CY4PR11MB1878E7E911194A1032E32428841E0@CY4PR11MB1878.namprd11.prod.outlook.com>
From: Yoshifumi Nishida <nishida@sfc.wide.ad.jp>
Date: Mon, 15 Aug 2016 15:39:15 -0700
X-Gmail-Original-Message-ID: <CAO249yeTNRFuFcWn5ga854g24GM_7DAjeKOSw3d7h=HQQYKX1Q@mail.gmail.com>
Message-ID: <CAO249yeTNRFuFcWn5ga854g24GM_7DAjeKOSw3d7h=HQQYKX1Q@mail.gmail.com>
To: Kobby Carmona <kobby.Carmona@qlogic.com>
Content-Type: multipart/alternative; boundary="94eb2c04c796ebf35b053a23e6fc"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/pq_3io49ahipw9iRZXoRWgqyVIU>
Cc: David Borman <dab@weston.borman.com>, "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2016 22:39:23 -0000

Hello,
I personally think this is an interesting corner case for discussion.
It looks a minor one, but I'm not not very sure if we can leave it for each
implementation.
I also guess a question would be if the BSD's fix is the best way for the
issue.

Thanks,
--
Yoshi

On Thu, Aug 11, 2016 at 1:46 PM, Kobby Carmona <kobby.Carmona@qlogic.com>
wrote:

> Hi David,
> This makes a lot of sense. We will fix our code.
> Thanks for your help on this,
>
> BTW,
> Is this issue of mentioned in any RFC? If not do you see a point in adding
> explicit note on the SEQ of pure ACK in case of retransmission?
>
>         Kobby
>
> -----Original Message-----
> From: David Borman [mailto:dab@weston.borman.com]
> Sent: Monday, August 08, 2016 12:35 AM
> To: Kobby Carmona <kobby.Carmona@qlogic.com>
> Cc: tcpm@ietf.org
> Subject: Re: [tcpm] Possible deadlock scenario with retransmission on both
> sides at the same time
>
> > On Aug 7, 2016, at 3:37 AM, Kobby Carmona <kobby.Carmona@qlogic.com>
> wrote:
> >
> > Hi David,
> > The code below is true for fast-retransmit.
> > But in case of retransmit timer expiration (TCPT_REXMT) snd_nxt is set
> to snd_una. And in this case CQND will be set to 1MSS so in the example
> below the transmitters can send only a single segment from sequence
> 2000/12000.
>
> Your underlying problem is that the ACK-only packets are being sent with
> the wrong sequence number, and that is what is causing them to be dropped.
> One way to fix that is to put SND.NXT back to its previous value after do
> the the retransmit.  But you are correct, in the BSD code it only does that
> in the fast retransmit code; when the timer based retransmit code fires, it
> just pulls back SND.NXT to SND.UNA.  However, in the BSD code that I’m
> looking at, it keeps track of the largest sequence number sent in SND.MAX,
> and in the tcp_output() path there is this bit of code:
>
>        /*
>         * If we are doing retransmissions, then snd_nxt will
>         * not reflect the first unsent octet.  For ACK only
>         * packets, we do not want the sequence number of the
>         * retransmitted packet, we want the sequence number
>         * of the next unsent octet.  So, if there is no data
>         * (and no SYN or FIN), use snd_max instead of snd_nxt
>         * when filling in th_seq.  But if we are in persist
>         * state, snd_max might reflect one byte beyond the
>         * right edge of the window, so use snd_nxt in that
>         * case, since we know we aren't doing a retransmission.
>         * (retransmit and persist are mutually exclusive...)
>         */
>        if (len || (flags & (TH_SYN|TH_FIN)) || tp->t_timer[TCPT_PERSIST])
>                th->th_seq = htonl(tp->snd_nxt);
>        else
>                th->th_seq = htonl(tp->snd_max);
>
> That is what your implementation appears to be missing, and what is
> causing your ACK storm.  So yes, some variant of your problem has been seen
> before, and this is how the BSD code fixed it.
>
>                         -David Borman
>
> >
> >       Kobby
> >
> > -----Original Message-----
> > From: David Borman [mailto:dab@weston.borman.com]
> > Sent: Thursday, August 04, 2016 10:37 PM
> > To: Kobby Carmona <kobby.Carmona@qlogic.com>
> > Cc: tcpm@ietf.org
> > Subject: Re: [tcpm] Possible deadlock scenario with retransmission on
> > both sides at the same time
> >
> > After you pull back SND.NXT and do the retransmission, you should then
> restore SND.NXT back to where it was, not leave it at the backed off value;
> then the ACKs wouldn’t be dropped, since they wouldn’t have old seq
> values.  For example, in the 4.4BSD fast retransmit code it had:
> >
> >                       tcp_seq onxt = tp->snd_nxt;
> >                       ...
> >                       tp->snd_nxt = th->th_ack;
> >                       ...
> >                       (void) tcp_output(tp);
> >                       ...
> >                       if (SEQ_GT(onxt, tp->snd_nxt))
> >                               tp->snd_nxt = onxt;
> >
> >
> >                       -David Borman
> >
> >> On Aug 4, 2016, at 4:08 AM, Kobby Carmona <kobby.Carmona@qlogic.com>
> wrote:
> >>
> >> Hi all,
> >> While running a bidirectional scenario with random drops in a network
> simulator of our (QLogic's NIC) TCP stack we found a case where it seems
> there is deadlock in the TCP protocol (the connection will keep sending
> pure acks from both sides until RTO will expire multiple times and a RST
> will sent to close the connection).
> >> The scenario is as follows (there is an example with numbers for each
> stage assuming the MSS and each packet is 1000B):
> >> 1. Both sides are transmitting data and a single packet is dropped on
> either side and the next two packets are received properly
> >>      Side A - SND.MAX=3000, SND.NXT=3000, SND.UNA=1000, RCV.NXT=11000,
> out-of-order block 12000-13000
> >>      Side B - SND.MAX =13000, SND.NXT =13000, SND.UNA=11000,
> >> RCV.NXT=1000, out-of-order block 2000-3000 2. RTO timer expires on both
> sides
> >>      Side A - SND.MAX=3000, SND.NXT=1000, SND.UNA=1000, RCV.NXT=11000,
> out-of-order block 12000-13000
> >>      Side B - SND.MAX =13000, SND.NXT=11000, SND.UNA=11000,
> RCV.NXT=1000,
> >> out-of-order block 2000-3000 3. Both sides transmit a single packet to
> the peer:
> >>      A->B - pkt.seq=1000, pkt.ack=11000, len=1000
> >>      B->A - pkt.seq=11000, pkt.ack=1000, len=1000 3. Both sides receive
> >> the packets and update the receive context:
> >>      Side A - SND.MAX=3000, SND.NXT=2000, SND.UNA=1000, RCV.NXT=13000
> >>      Side B - SND.MAX=13000, SND.NXT=12000, SND.UNA=11000, RCV.NXT=3000
> 4.
> >> Both sides send another segment:
> >>      A->B - pkt.seq=2000, pkt.ack=13000, len=1000
> >>      B->A - pkt.seq=12000, pkt.ack=3000, len=1000 5. Both sides don't
> >> accept the packet (and don't update SND.UNA) since the sequence on the
> packet is less than RCV.NXT (sequence number check in page 69 of RFC793)
> and send a pure ACK instead
> >>      A->B - pkt.seq=2000, pkt.ack=13000, len=0 (pure ACK)
> >>      B->A - pkt.seq=12000, pkt.ack=3000, len=0 (pure ACK) 6. This will
> >> continue forever (until the connection will be terminated by RST) since
> every packet that ends before RCV.NXT (even a retransmit from SND.UNA) will
> be dropped.
> >>
> >> Did anyone encountered this issue before? Is the anything we missed on
> this sequence?
> >> If this is indeed a real deadlock, there might be several solutions to
> this which will require a modification in receive processing of RFC793. But
> I would like to know if you think this is a real issue before dealing with
> solutions.
> >> Thanks,
> >>
> >>      Kobby
> >>
> >>
> >> _______________________________________________
> >> tcpm mailing list
> >> tcpm@ietf.org
> >> https://www.ietf.org/mailman/listinfo/tcpm
> >
> > _______________________________________________
> > tcpm mailing list
> > tcpm@ietf.org
> > https://www.ietf.org/mailman/listinfo/tcpm
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
>