Re: [tcpm] Fwd: TCP Loopback Connections with the Same Src/Dest Port

David Borman <dab@weston.borman.com> Mon, 22 July 2013 14:17 UTC

Return-Path: <dab@weston.borman.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ACA7421F99FB for <tcpm@ietfa.amsl.com>; Mon, 22 Jul 2013 07:17:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uxljpjseQA6h for <tcpm@ietfa.amsl.com>; Mon, 22 Jul 2013 07:17:52 -0700 (PDT)
Received: from frantic.weston.borman.com (frantic.weston.borman.com [70.57.156.33]) by ietfa.amsl.com (Postfix) with ESMTP id 3B96011E80F5 for <tcpm@ietf.org>; Mon, 22 Jul 2013 07:17:51 -0700 (PDT)
Received: from [127.0.0.1] (frantic.weston.borman.com [70.57.156.33]) by frantic.weston.borman.com (8.12.5/8.12.5) with ESMTP id r6MEHa7F007904; Mon, 22 Jul 2013 09:17:36 -0500 (CDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
From: David Borman <dab@weston.borman.com>
In-Reply-To: <51ECBCE7.8080805@isi.edu>
Date: Mon, 22 Jul 2013 09:17:36 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <E7C6F731-C737-47BE-AE15-7C573115BA2E@weston.borman.com>
References: <CAFc6gu_q1X10EzsHrvnmYuQ0ZnKz9uNXbfJJe-guva6J-QKAow@mail.gmail.com> <51EC10A6.7040300@gont.com.ar> <51ECBCE7.8080805@isi.edu>
To: Joe Touch <touch@isi.edu>
X-Mailer: Apple Mail (2.1508)
Cc: "tcpm@ietf.org" <tcpm@ietf.org>, Fernando Gont <fernando@gont.com.ar>
Subject: Re: [tcpm] Fwd: TCP Loopback Connections with the Same Src/Dest Port
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Jul 2013 14:17:56 -0000

It's a socket connected to itself, and thus only makes sense for a local IP address, be it the loopback address or any other address on the machine.  You can write to the socket, and then read the data back out.  If your TCP properly handles this case of a self-connected socket, then it handles a bunch of corner cases.  If you have to disallow them because it sends your system into an infinite loop of packets, then you have a latent bug just waiting to be tickled between a pair of sockets.  It's better to fix the problem than disallow the symptom.  In all these cases that I've seen, the problem is that there is valid ACK information in the packet, and the incoming TCP processing is dropping the packet without processing the ACK information.  Crossing SYNs, crossing FINs, crossing probes, they all can cause this problem if you don't process the ACK information.  The self-connected socket just guarantees that you'll hit both the crossing SYN and crossing FIN cases.

Testing that a self-connected socket works should be part of every TCP regression test.

			-David Borman

On Jul 22, 2013, at 12:02 AM, Joe Touch <touch@isi.edu> wrote:

> Same src/dst for the same IP makes no sense; there need to be two ends to a connection, and each end is supposed to be uniquely determined by the socket (as defined in 793, not Un*x).
> 
> IMO, it ought to be rejected by the API, just as would be one that was otherwise incompletely or incorrectly specified (picking a source address not on a local interface, picking a port range you don't have privilege to access, etc.).
> 
> However, loopback is a subnet (127.0.0.0/8), not just a single address. It ought to be feasible and correct to open a connection to yourself on the same port on different loopback addresses.
> 
> Joe
> 
> On 7/21/2013 9:47 AM, Fernando Gont wrote:
>> Folks,
>> 
>> Found this by chance -- probably a datapoint that advice is needed in
>> this area (that is, draft-gont-tcpm-tcp-seq-validation).
>> 
>> P.S.: Will present results from real-world testing at the next tcpm meeting.
>> 
>> Cheers,
>> Fernando
>> 
>> 
>> 
>> 
>> -------- Original Message --------
>> From: Matt Miller <matt@matthewjmiller.net>
>> Date: Wed, 17 Jul 2013 07:08:26 -0400
>> X-Google-Sender-Auth: ba5SYKKiksigElenhewyH0EffCs
>> Message-ID:
>> <CAFc6gu_q1X10EzsHrvnmYuQ0ZnKz9uNXbfJJe-guva6J-QKAow@mail.gmail.com>
>> Subject: TCP Loopback Connections with the Same Src/Dest Port
>> To: FreeBSD Net <freebsd-net@freebsd.org>
>> 
>> Our system is based on FreeBSD 8.1.  In some tests, we were having
>> issues caused by connections of this form (more details below):
>> 
>> TCP4      0      0      0/   0/   0    127.0.0.1.665   127.0.0.1.665
>> FIN_WAIT_1
>> TCP4      0      0      0/   0/   0    127.0.0.1.637   127.0.0.1.637
>> FIN_WAIT_1
>> TCP4      0      0      0/   0/   0    127.0.0.1.648   127.0.0.1.648
>> FIN_WAIT_1
>> 
>> Some questions we had:
>> 
>> - Has anyone else ever seen these same src/dest address/port TCP
>> connections created?  Does anyone know of a legitimate reason why they
>> should be allowed?
>> 
>> - If there are no known use cases for this type of connection, does
>> anyone have more context/insight on the design here: should this type
>> of inpcb creation be prevented in the kernel or is it the
>> application's responsibility to ensure it never creates this type of
>> socket?
>> 
>> For those interested, more details of the issue seen follow.  The
>> connection seems to get stuck in swi_net sending and receiving pure
>> FIN/ACKs to itself:
>> 
>> #12 0xffffffff804372ce in ip_output (m=0xffffff0003ccf300,
>> opt=<optimized out>, ro=0xffffff8020c2b6a0, flags=0, imo=0x0,
>> inp=0xffffff0019933968) at ../../../../sys/netinet/ip_output.c
>> #13 0xffffffff804423dc in tcp_output (tp=0xffffff0019de2370) at
>> ../../../../sys/netinet/tcp_output.c
>> #14 0xffffffff8043ef5d in tcp_do_segment (m=0xffffff0019af1200,
>> th=0x100200, so=0xffffff011ac59570, tp=0xffffff0019de2370,
>> drop_hdrlen=52, tlen=0, iptos=0 '\000', ti_locked=3) at
>> ../../../../sys/netinet/tcp_input.c
>> #15 0xffffffff80440311 in tcp_input (m=0xffffff0019af1200,
>> off0=<optimized out>) at ../../../../sys/netinet/tcp_input.c
>> #16 0xffffffff8043530b in ip_input (m=0xffffff0019af1200) at
>> ../../../../sys/netinet/ip_input.c
>> #17 0xffffffff8040889f in netisr_process_workstream_proto
>> (proto=<optimized out>, nwsp=<optimized out>) at
>> ../../../../sys/net/netisr.c
>> #18 swi_net (arg=0xffffffff80f59800) at ../../../../sys/net/netisr.c
>> 
>> swi_net() just continues in this loop, ad nauseam:
>> 
>> 759         while ((bits = nwsp->nws_pendingbits) != 0) {
>> 760                 while ((prot = ffs(bits)) != 0) {
>> 761                         prot--;
>> 762                         bits &= ~(1 << prot);
>> 763                         (void)netisr_process_workstream_proto(nwsp,
>> prot);
>> 764                 }
>> 765         }
>> 
>> The tcp_output() being triggered in tcp_do_segment() in the case is
>> the one show on line 2303 below:
>> 
>> 2212         /*
>> 2213          * In ESTABLISHED state: drop duplicate ACKs; ACK out of range
>> 2214          * ACKs.  If the ack is in the range
>> 2215          *      tp->snd_una < th->th_ack <= tp->snd_max
>> 2216          * then advance tp->snd_una to th->th_ack and drop
>> 2217          * data from the retransmission queue.  If this ACK reflects
>> 2218          * more up to date window information we update our
>> window information.
>> 2219          */
>> 2220         case TCPS_ESTABLISHED:
>> 2221         case TCPS_FIN_WAIT_1:
>> 2222         case TCPS_FIN_WAIT_2:
>> 2223         case TCPS_CLOSE_WAIT:
>> 2224         case TCPS_CLOSING:
>> 2225         case TCPS_LAST_ACK:
>> 2226                 if (SEQ_GT(th->th_ack, tp->snd_max)) {
>> 2227                         TCPSTAT_INC(tcps_rcvacktoomuch);
>> 2228                         goto dropafterack;
>> 2229                 }
>> ...
>> 2234                 if (SEQ_LEQ(th->th_ack, tp->snd_una)) {
>> ...
>> 2248                         if (tlen == 0 && tiwin == tp->snd_wnd) {
>> 2249                                 TCPSTAT_INC(tcps_rcvdupack);
>> ...
>> 2277                                 if (!tcp_timer_active(tp, TT_REXMT) ||
>> 2278                                     th->th_ack != tp->snd_una)
>> 2279                                         tp->t_dupacks = 0;
>> 2280                                 else if (++tp->t_dupacks >
>> tcprexmtthresh ||
>> 2281                                     ((V_tcp_do_newreno ||
>> 2282                                       (tp->t_flags &
>> TF_SACK_PERMIT)) &&
>> 2283                                      IN_FASTRECOVERY(tp))) {
>> 2284                                         if ((tp->t_flags &
>> TF_SACK_PERMIT) &&
>> 2285                                             IN_FASTRECOVERY(tp)) {
>> 2286                                                 int awnd;
>> 2287
>> 2288                                                 /*
>> 2289                                                  * Compute the
>> amount of data in flight first.
>> 2290                                                  * We can inject
>> new data into the pipe iff
>> 2291                                                  * we have less
>> than 1/2 the original window's
>> 2292                                                  * worth of data in
>> flight.
>> 2293                                                  */
>> 2294                                                 awnd =
>> (tp->snd_nxt - tp->snd_fack) +
>> 2295
>> tp->sackhint.sack_bytes_rexmit;
>> 2296                                                 if (awnd <
>> tp->snd_ssthresh) {
>> 2297
>> tp->snd_cwnd += tp->t_maxseg;
>> 2298                                                         if
>> (tp->snd_cwnd > tp->snd_ssthresh)
>> 2299
>> tp->snd_cwnd = tp->snd_ssthresh;
>> 2300                                                 }
>> 2301                                         } else
>> 2302                                                 tp->snd_cwnd +=
>> tp->t_maxseg;
>> 2303                                         (void) tcp_output(tp);
>> 2304                                         goto drop;
>> 
>> I've noticed that we don't yet have this patch in our code:
>> 
>> http://svnweb.freebsd.org/base?view=revision&revision=239672
>> 
>> Which seems like it could be relevant here to the general case of both
>> ends of the connection entering FIN_WAIT_1 at the same time and
>> sending FIN/ACKs repeatedly (though our connections are a bizarre case
>> of this where both ends of the connection are actually the same
>> connection).
>> 
>> Thanks,
>> 
>> Matt
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>> 
>> 
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm