Re: [tcpm] TCP Loopback Connections with the Same Src/Dest Port

David Borman <dab@weston.borman.com> Mon, 22 July 2013 17:49 UTC

Return-Path: <dab@weston.borman.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8DBB411E813F for <tcpm@ietfa.amsl.com>; Mon, 22 Jul 2013 10:49:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, J_CHICKENPOX_21=0.6]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wjKD9hUsIBWp for <tcpm@ietfa.amsl.com>; Mon, 22 Jul 2013 10:49:31 -0700 (PDT)
Received: from frantic.weston.borman.com (frantic.weston.borman.com [70.57.156.33]) by ietfa.amsl.com (Postfix) with ESMTP id 8A9B511E8145 for <tcpm@ietf.org>; Mon, 22 Jul 2013 10:49:31 -0700 (PDT)
Received: from [127.0.0.1] (frantic.weston.borman.com [70.57.156.33]) by frantic.weston.borman.com (8.12.5/8.12.5) with ESMTP id r6MHnP7F008796; Mon, 22 Jul 2013 12:49:25 -0500 (CDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
From: David Borman <dab@weston.borman.com>
In-Reply-To: <51ED5156.9030808@isi.edu>
Date: Mon, 22 Jul 2013 12:49:25 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <2F414DA0-BCD4-47BE-8AC5-F3F598719934@weston.borman.com>
References: <CAFc6gu_q1X10EzsHrvnmYuQ0ZnKz9uNXbfJJe-guva6J-QKAow@mail.gmail.com> <51EC10A6.7040300@gont.com.ar> <51ECBCE7.8080805@isi.edu> <E7C6F731-C737-47BE-AE15-7C573115BA2E@weston.borman.com> <51ED5156.9030808@isi.edu>
To: Joe Touch <touch@ISI.EDU>
X-Mailer: Apple Mail (2.1508)
Cc: "tcpm@ietf.org" <tcpm@ietf.org>, Fernando Gont <fernando@gont.com.ar>
Subject: Re: [tcpm] TCP Loopback Connections with the Same Src/Dest Port
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Jul 2013 17:49:36 -0000

On Jul 22, 2013, at 10:35 AM, Joe Touch <touch@ISI.EDU> wrote:

> 
> 
> On 7/22/2013 7:17 AM, David Borman wrote:
>> Testing that a self-connected socket works should be part of every TCP regression test.
> 
> From RFC793:
> 
>    To allow for many processes within a single Host to use TCP
>    communication facilities simultaneously, the TCP provides a set of
>    addresses or ports within each host.  Concatenated with the network
>    and host addresses from the internet communication layer, this forms
>    a socket.  A pair of sockets uniquely identifies each connection.
> 
> This text says "a pair". I interpreted that as prohibiting use of a single socket for both ends, but I suppose you could allow it to happen.

Yeah..., I don't make that interpretation.  Logically it's still a pair, it just happens to be that the same socket is being used for both sides of the pair.  Self connected TCP sockets have been around and working for a long time (they worked in CSRG BSD, I know because I fixed 'em), but people make changes to code and don't test for self-connected sockets, so they can get broken. And when things break for self connected sockets, the same situation can happen with a pair of sockets.  You can test all the same cases with a pair of sockets, it just takes more work to get the timing just right.

> 
> The benefit of that situation is it would test simultaneous open and close, but note that some docs have written those off lately when we think that "can't" happen.

And that's when the bugs creep in. :-)

> 
> You'd have to prove that every protocol could handle simultaneous cases and state.

For every protocol used by sockets?  No, this is only about TCP sockets, not sockets in general.  Different protocols have different semantics, it doesn't matter what other protocols will or will not allow.

> 
> The key question is "why would this ever happen, or should it ever reasonably happen"? It's just as easily handled by having the Un*x socket "lie" about having TCP and just copy the buffer from send to receive - and what, really, is the point of that?

I disagree, that question is a red herring.  It doesn't matter whether or not you have an application use for it, it ought to work, and as a test tool it's an easy way to verify that crossing SYNs and crossing FINs are being handled properly.

			-David Borman

P.S. A use case?  Besides testing, the only semi-useful case that pops into my mind would be to pass data across an exec() call in lieu of using a tmp file, but you can also do that with a pair of sockets and a little more work.

> 
> Joe
> 
>> 
>> 			-David Borman
>> 
>> On Jul 22, 2013, at 12:02 AM, Joe Touch <touch@isi.edu> wrote:
>> 
>>> Same src/dst for the same IP makes no sense; there need to be two ends to a connection, and each end is supposed to be uniquely determined by the socket (as defined in 793, not Un*x).
>>> 
>>> IMO, it ought to be rejected by the API, just as would be one that was otherwise incompletely or incorrectly specified (picking a source address not on a local interface, picking a port range you don't have privilege to access, etc.).
>>> 
>>> However, loopback is a subnet (127.0.0.0/8), not just a single address. It ought to be feasible and correct to open a connection to yourself on the same port on different loopback addresses.
>>> 
>>> Joe
>>> 
>>> On 7/21/2013 9:47 AM, Fernando Gont wrote:
>>>> Folks,
>>>> 
>>>> Found this by chance -- probably a datapoint that advice is needed in
>>>> this area (that is, draft-gont-tcpm-tcp-seq-validation).
>>>> 
>>>> P.S.: Will present results from real-world testing at the next tcpm meeting.
>>>> 
>>>> Cheers,
>>>> Fernando
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -------- Original Message --------
>>>> From: Matt Miller <matt@matthewjmiller.net>
>>>> Date: Wed, 17 Jul 2013 07:08:26 -0400
>>>> X-Google-Sender-Auth: ba5SYKKiksigElenhewyH0EffCs
>>>> Message-ID:
>>>> <CAFc6gu_q1X10EzsHrvnmYuQ0ZnKz9uNXbfJJe-guva6J-QKAow@mail.gmail.com>
>>>> Subject: TCP Loopback Connections with the Same Src/Dest Port
>>>> To: FreeBSD Net <freebsd-net@freebsd.org>
>>>> 
>>>> Our system is based on FreeBSD 8.1.  In some tests, we were having
>>>> issues caused by connections of this form (more details below):
>>>> 
>>>> TCP4      0      0      0/   0/   0    127.0.0.1.665   127.0.0.1.665
>>>> FIN_WAIT_1
>>>> TCP4      0      0      0/   0/   0    127.0.0.1.637   127.0.0.1.637
>>>> FIN_WAIT_1
>>>> TCP4      0      0      0/   0/   0    127.0.0.1.648   127.0.0.1.648
>>>> FIN_WAIT_1
>>>> 
>>>> Some questions we had:
>>>> 
>>>> - Has anyone else ever seen these same src/dest address/port TCP
>>>> connections created?  Does anyone know of a legitimate reason why they
>>>> should be allowed?
>>>> 
>>>> - If there are no known use cases for this type of connection, does
>>>> anyone have more context/insight on the design here: should this type
>>>> of inpcb creation be prevented in the kernel or is it the
>>>> application's responsibility to ensure it never creates this type of
>>>> socket?
>>>> 
>>>> For those interested, more details of the issue seen follow.  The
>>>> connection seems to get stuck in swi_net sending and receiving pure
>>>> FIN/ACKs to itself:
>>>> 
>>>> #12 0xffffffff804372ce in ip_output (m=0xffffff0003ccf300,
>>>> opt=<optimized out>, ro=0xffffff8020c2b6a0, flags=0, imo=0x0,
>>>> inp=0xffffff0019933968) at ../../../../sys/netinet/ip_output.c
>>>> #13 0xffffffff804423dc in tcp_output (tp=0xffffff0019de2370) at
>>>> ../../../../sys/netinet/tcp_output.c
>>>> #14 0xffffffff8043ef5d in tcp_do_segment (m=0xffffff0019af1200,
>>>> th=0x100200, so=0xffffff011ac59570, tp=0xffffff0019de2370,
>>>> drop_hdrlen=52, tlen=0, iptos=0 '\000', ti_locked=3) at
>>>> ../../../../sys/netinet/tcp_input.c
>>>> #15 0xffffffff80440311 in tcp_input (m=0xffffff0019af1200,
>>>> off0=<optimized out>) at ../../../../sys/netinet/tcp_input.c
>>>> #16 0xffffffff8043530b in ip_input (m=0xffffff0019af1200) at
>>>> ../../../../sys/netinet/ip_input.c
>>>> #17 0xffffffff8040889f in netisr_process_workstream_proto
>>>> (proto=<optimized out>, nwsp=<optimized out>) at
>>>> ../../../../sys/net/netisr.c
>>>> #18 swi_net (arg=0xffffffff80f59800) at ../../../../sys/net/netisr.c
>>>> 
>>>> swi_net() just continues in this loop, ad nauseam:
>>>> 
>>>> 759         while ((bits = nwsp->nws_pendingbits) != 0) {
>>>> 760                 while ((prot = ffs(bits)) != 0) {
>>>> 761                         prot--;
>>>> 762                         bits &= ~(1 << prot);
>>>> 763                         (void)netisr_process_workstream_proto(nwsp,
>>>> prot);
>>>> 764                 }
>>>> 765         }
>>>> 
>>>> The tcp_output() being triggered in tcp_do_segment() in the case is
>>>> the one show on line 2303 below:
>>>> 
>>>> 2212         /*
>>>> 2213          * In ESTABLISHED state: drop duplicate ACKs; ACK out of range
>>>> 2214          * ACKs.  If the ack is in the range
>>>> 2215          *      tp->snd_una < th->th_ack <= tp->snd_max
>>>> 2216          * then advance tp->snd_una to th->th_ack and drop
>>>> 2217          * data from the retransmission queue.  If this ACK reflects
>>>> 2218          * more up to date window information we update our
>>>> window information.
>>>> 2219          */
>>>> 2220         case TCPS_ESTABLISHED:
>>>> 2221         case TCPS_FIN_WAIT_1:
>>>> 2222         case TCPS_FIN_WAIT_2:
>>>> 2223         case TCPS_CLOSE_WAIT:
>>>> 2224         case TCPS_CLOSING:
>>>> 2225         case TCPS_LAST_ACK:
>>>> 2226                 if (SEQ_GT(th->th_ack, tp->snd_max)) {
>>>> 2227                         TCPSTAT_INC(tcps_rcvacktoomuch);
>>>> 2228                         goto dropafterack;
>>>> 2229                 }
>>>> ...
>>>> 2234                 if (SEQ_LEQ(th->th_ack, tp->snd_una)) {
>>>> ...
>>>> 2248                         if (tlen == 0 && tiwin == tp->snd_wnd) {
>>>> 2249                                 TCPSTAT_INC(tcps_rcvdupack);
>>>> ...
>>>> 2277                                 if (!tcp_timer_active(tp, TT_REXMT) ||
>>>> 2278                                     th->th_ack != tp->snd_una)
>>>> 2279                                         tp->t_dupacks = 0;
>>>> 2280                                 else if (++tp->t_dupacks >
>>>> tcprexmtthresh ||
>>>> 2281                                     ((V_tcp_do_newreno ||
>>>> 2282                                       (tp->t_flags &
>>>> TF_SACK_PERMIT)) &&
>>>> 2283                                      IN_FASTRECOVERY(tp))) {
>>>> 2284                                         if ((tp->t_flags &
>>>> TF_SACK_PERMIT) &&
>>>> 2285                                             IN_FASTRECOVERY(tp)) {
>>>> 2286                                                 int awnd;
>>>> 2287
>>>> 2288                                                 /*
>>>> 2289                                                  * Compute the
>>>> amount of data in flight first.
>>>> 2290                                                  * We can inject
>>>> new data into the pipe iff
>>>> 2291                                                  * we have less
>>>> than 1/2 the original window's
>>>> 2292                                                  * worth of data in
>>>> flight.
>>>> 2293                                                  */
>>>> 2294                                                 awnd =
>>>> (tp->snd_nxt - tp->snd_fack) +
>>>> 2295
>>>> tp->sackhint.sack_bytes_rexmit;
>>>> 2296                                                 if (awnd <
>>>> tp->snd_ssthresh) {
>>>> 2297
>>>> tp->snd_cwnd += tp->t_maxseg;
>>>> 2298                                                         if
>>>> (tp->snd_cwnd > tp->snd_ssthresh)
>>>> 2299
>>>> tp->snd_cwnd = tp->snd_ssthresh;
>>>> 2300                                                 }
>>>> 2301                                         } else
>>>> 2302                                                 tp->snd_cwnd +=
>>>> tp->t_maxseg;
>>>> 2303                                         (void) tcp_output(tp);
>>>> 2304                                         goto drop;
>>>> 
>>>> I've noticed that we don't yet have this patch in our code:
>>>> 
>>>> http://svnweb.freebsd.org/base?view=revision&revision=239672
>>>> 
>>>> Which seems like it could be relevant here to the general case of both
>>>> ends of the connection entering FIN_WAIT_1 at the same time and
>>>> sending FIN/ACKs repeatedly (though our connections are a bizarre case
>>>> of this where both ends of the connection are actually the same
>>>> connection).
>>>> 
>>>> Thanks,
>>>> 
>>>> Matt
>>>> _______________________________________________
>>>> freebsd-net@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>>> 
>>>> 
>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/tcpm
>> 
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm