Re: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time
Yoshifumi Nishida <nishida@sfc.wide.ad.jp> Fri, 30 September 2016 08:06 UTC
Return-Path: <nishida@sfc.wide.ad.jp>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 53AD312B15B for <tcpm@ietfa.amsl.com>; Fri, 30 Sep 2016 01:06:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.716
X-Spam-Level:
X-Spam-Status: No, score=-3.716 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-2.316, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uXQw1xbBFwE5 for <tcpm@ietfa.amsl.com>; Fri, 30 Sep 2016 01:06:11 -0700 (PDT)
Received: from mail.sfc.wide.ad.jp (shonan.sfc.wide.ad.jp [203.178.142.130]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BFE0D12B074 for <tcpm@ietf.org>; Fri, 30 Sep 2016 01:06:10 -0700 (PDT)
Received: from mail-ua0-f178.google.com (mail-ua0-f178.google.com [209.85.217.178]) by mail.sfc.wide.ad.jp (Postfix) with ESMTPSA id BFDFE2D98F1 for <tcpm@ietf.org>; Fri, 30 Sep 2016 17:06:08 +0900 (JST)
Received: by mail-ua0-f178.google.com with SMTP id v7so13232037uaa.0 for <tcpm@ietf.org>; Fri, 30 Sep 2016 01:06:08 -0700 (PDT)
X-Gm-Message-State: AA6/9RllAlEmHRk84xQKNOlcNGC4k2hEwno4WzngGLFbA16hyvDb08VmS10Pw/9q2loZQ80B6Qx/QzKBGxy3vA==
X-Received: by 10.159.55.138 with SMTP id q10mr4459980uaq.131.1475222767110; Fri, 30 Sep 2016 01:06:07 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.159.33.210 with HTTP; Fri, 30 Sep 2016 01:06:06 -0700 (PDT)
In-Reply-To: <CAO249ydtAPCa2U4A19r6bRUDXsEuGJ-bcN_yQHLQ9q6MDW8URQ@mail.gmail.com>
References: <MWHPR11MB1374A50BC599B093EA09668984070@MWHPR11MB1374.namprd11.prod.outlook.com> <7070553C-65D1-46EE-95F4-DAE82E1F5A5E@weston.borman.com> <CY4PR11MB187848FCCEF4DB140F85913E841A0@CY4PR11MB1878.namprd11.prod.outlook.com> <2D524A8D-A5CA-45A6-B94D-FA1DA0CEE609@weston.borman.com> <CY4PR11MB1878E7E911194A1032E32428841E0@CY4PR11MB1878.namprd11.prod.outlook.com> <CAO249yeTNRFuFcWn5ga854g24GM_7DAjeKOSw3d7h=HQQYKX1Q@mail.gmail.com> <CADVnQymeyst9De4F8Zqc6wGfLEdzmGsypSC-ZKZ7bXT6PO=J4g@mail.gmail.com> <CAO249yeeEo3B9H3SyGqA8aiWOWjoHYji=JXEh1GHOHSsf+RszA@mail.gmail.com> <CADVnQym-wP=7pgSQ3ziWS-WmU9T-q2NVr1XSpB5ZkYOD8NYbAA@mail.gmail.com> <CAO249ydtAPCa2U4A19r6bRUDXsEuGJ-bcN_yQHLQ9q6MDW8URQ@mail.gmail.com>
From: Yoshifumi Nishida <nishida@sfc.wide.ad.jp>
Date: Fri, 30 Sep 2016 01:06:06 -0700
X-Gmail-Original-Message-ID: <CAO249yd3cSoCYsNx8h8FgzSzn+R5U-ybh-z2=oXJ46ZHmZCtcQ@mail.gmail.com>
Message-ID: <CAO249yd3cSoCYsNx8h8FgzSzn+R5U-ybh-z2=oXJ46ZHmZCtcQ@mail.gmail.com>
To: Yoshifumi Nishida <nishida@sfc.wide.ad.jp>
Content-Type: multipart/alternative; boundary="94eb2c041f22ff9945053db5104d"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/6p9q2Z9Sd8XFrVHpFs3jnHaIvVo>
Cc: Kobby Carmona <kobby.Carmona@qlogic.com>, David Borman <dab@weston.borman.com>, "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] Possible deadlock scenario with retransmission on both sides at the same time
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Sep 2016 08:06:13 -0000
Hello, I'm not very sure how many folks are interested in this, but anyway... I have tested this with packetdrill on linux, netbsd, freebsd and openbsd. As far as I tested, linux works just like Neal described. On the other hand, I found a minor issue in others' behavior. In their tcp input functions, they haven't checked sequence number when they update ack value while it is checked later in the code. Because of this, it behaves like the followings. 00:42:58.395100 192.0.2.1.54613 > 192.168.0.1.8080: Flags [S], seq 0, win 20000, options [mss 1000], length 0 00:42:58.395120 192.168.0.1.8080 > 192.0.2.1.54613: Flags [S.], seq 4183682776, ack 1, win 65535, length 0 00:43:01.422728 192.168.0.1.8080 > 192.0.2.1.54613: Flags [S.], seq 4183682776, ack 1, win 65535, length 0 00:43:01.422801 192.0.2.1.54613 > 192.168.0.1.8080: Flags [.], seq 1, ack 1, win 20000, length 0 00:43:02.443166 192.0.2.1.54613 > 192.168.0.1.8080: Flags [.], seq 2001:3001, ack 1, win 20000, length 1000 00:43:02.468148 192.168.0.1.8080 > 192.0.2.1.54613: Flags [.], seq 1, ack 1, win 65535, length 0 00:43:04.471409 192.168.0.1.8080 > 192.0.2.1.54613: Flags [P.], seq 1:1001, ack 1, win 65535, length 1000 00:43:06.531462 192.0.2.1.54613 > 192.168.0.1.8080: Flags [.], seq 1:1001, ack 1001, win 20000, length 1000 00:43:06.555492 192.168.0.1.8080 > 192.0.2.1.54613: Flags [.], seq 1001:2001, ack 1001, win 65000, length 1000 00:43:06.555973 192.168.0.1.8080 > 192.0.2.1.54613: Flags [P.], seq 2001:3001, ack 1001, win 65000, length 1000 00:43:07.056276 192.0.2.1.54613 > 192.168.0.1.8080: Flags [.], seq 1001:2001, ack 1001, win 20000, length 1000 00:43:07.084910 192.168.0.1.8080 > 192.0.2.1.54613: Flags [.], seq 3001, ack 3001, win 63000, length 0 00:43:07.585918 192.0.2.1.54613 > 192.168.0.1.8080: Flags [.], seq 1001:2001, ack 2001, win 20000, length 1000 00:43:07.607117 192.168.0.1.8080 > 192.0.2.1.54613: Flags [.], seq 3001, ack 3001, win 63000, length 0 00:43:08.109788 192.0.2.1.54613 > 192.168.0.1.8080: Flags [.], seq 2001, ack 2001, win 20000, length 0 00:43:13.908734 192.168.0.1.8080 > 192.0.2.1.54613: Flags [P.], seq 2001:3001, ack 3001, win 63000, length 1000 In this dump file, src addr 192.0.2.1 are the packets generated by packetdrill and src addr 192.168.0.1 are the packets from OSes. While the ack values in packets at 00:43:07.585918 and 00:43:13.908734 should be ignored, it is actually used to update snd_nxt. As the result, we saw seq 2001:3001 was retransmited after timeout instead of 1001:2001. (But, it also means this logic won't create the deadlock situation Kobby mentioned) When I added seq num check in updating ack value in FreeBSD code, it works just like expected. -- Yoshi On Tue, Aug 23, 2016 at 12:06 AM, Yoshifumi Nishida <nishida@sfc.wide.ad.jp> wrote: > Hi Neal, > > Oh. I see. Thanks for the explanation. > I'd like to think about it a bit more. > Thanks, > -- > Yoshi > > > On Sun, Aug 21, 2016 at 12:44 PM, Neal Cardwell <ncardwell@google.com> > wrote: > >> I am pretty sure Linux does not have the issue Kobby pointed out in this >> thread. >> >> At a high level Linux should be OK because it follows the principle >> David Borman laid out in his August 16 email: "ACK-only packets should >> be sent with the largest in-window sequence number that has ever been >> sent." >> >> Linux obeys that principle by using tp->snd_nxt to store the largest >> sequence number that has ever been sent, and having >> tcp_acceptable_seq() use tp->snd_nxt but clamp the outgoing sequence >> number to make sure it is in-window. To be able to do this, in Linux, >> the sender does not rewind tp->snd_nxt on retransmissions. >> >> neal >> >> On Sun, Aug 21, 2016 at 1:25 PM, Yoshifumi Nishida >> <nishida@sfc.wide.ad.jp> wrote: >> > Hi Neal, >> > >> > Thanks for the info. >> > So, it seems to me that the linux code has the issue Kobby pointed out. >> > Or, am I missing something? >> > -- >> > Yoshi >> > >> > >> > On Mon, Aug 15, 2016 at 6:18 PM, Neal Cardwell <ncardwell@google.com> >> wrote: >> >> >> >> On Mon, Aug 15, 2016 at 6:39 PM, Yoshifumi Nishida >> >> <nishida@sfc.wide.ad.jp> wrote: >> >> > Hello, >> >> > I personally think this is an interesting corner case for discussion. >> >> > It looks a minor one, but I'm not not very sure if we can leave it >> for >> >> > each >> >> > implementation. >> >> > I also guess a question would be if the BSD's fix is the best way for >> >> > the >> >> > issue. >> >> >> >> Yes, I agree this is an interesting case for discussion. >> >> >> >> FWIW, as a point of comparison for discussion, Linux's approach is a >> >> little different: in Linux, the sender does not rewind SND.NXT on >> >> retransmissions (RTO or Fast Recovery). Then the sender usually uses >> >> SND.NXT for the seq field of outgoing pure ACKs. I say "usually" >> >> because the Linux code has some code to deal with the case where the >> >> receiver has withdrawn the receive window, so that SND.NXT is now >> >> beyond the receive window. The code tcp_send_ack() uses to pick a seq >> >> for outgoing pure ACKs looks like: >> >> >> >> /* SND.NXT, if window was not shrunk. >> >> * If window has been shrunk, what should we make? It is not clear at >> >> all. >> >> * Using SND.UNA we will fail to open window, SND.NXT is out of >> >> window. :-( >> >> * Anything in between SND.UNA...SND.UNA+SND.WND also can be already >> >> * invalid. OK, let's make this for now: >> >> */ >> >> static inline __u32 tcp_acceptable_seq(const struct sock *sk) >> >> { >> >> const struct tcp_sock *tp = tcp_sk(sk); >> >> >> >> if (!before(tcp_wnd_end(tp), tp->snd_nxt)) >> >> return tp->snd_nxt; >> >> else >> >> return tcp_wnd_end(tp); >> >> } >> >> >> >> neal >> > >> > >> > >
- Re: [tcpm] Possible deadlock scenario with retran… David Borman
- Re: [tcpm] Possible deadlock scenario with retran… Neal Cardwell
- Re: [tcpm] Possible deadlock scenario with retran… Yoshifumi Nishida
- Re: [tcpm] Possible deadlock scenario with retran… Kobby Carmona
- Re: [tcpm] Possible deadlock scenario with retran… David Borman
- Re: [tcpm] Possible deadlock scenario with retran… Kobby Carmona
- Re: [tcpm] Possible deadlock scenario with retran… David Borman
- [tcpm] Possible deadlock scenario with retransmis… Kobby Carmona
- Re: [tcpm] Possible deadlock scenario with retran… Yoshifumi Nishida
- Re: [tcpm] Possible deadlock scenario with retran… Neal Cardwell
- Re: [tcpm] Possible deadlock scenario with retran… Yoshifumi Nishida
- Re: [tcpm] Possible deadlock scenario with retran… Yoshifumi Nishida