Re: [tcpm] Fwd: TCP Loopback Connections with the Same Src/Dest Port

Hi, all,

OK, I've been convinced that supporting such loopback isn't prohibited 
by 793, but I remain convinced that supporting it within the protocol 
isn't strictly required.

There are other cases where TCP implementations take shortcuts when it 
knows better - e.g., ignoring congestion control when both ends are on 
the same subnet. TCP could just as easily ignore the entire state 
machine when both sockets are identical and just reflect everything back 
in one step, or the Un*x socket layer could do so directly.

IMO, an app that wants to talk to itself ought to either use different 
ports or different looback addresses (or both) - though, given my 
experience with Linux of late (65K total connection limit), I'd be more 
surprised if that works too.

However, the case below also hints at a bug where both ends enter 
FIN-WAIT-1 at the same time - which can happen whether reflexive or not, 
and should be fixed anyway.

Joe

On 7/21/2013 10:02 PM, Joe Touch wrote:
> Same src/dst for the same IP makes no sense; there need to be two ends
> to a connection, and each end is supposed to be uniquely determined by
> the socket (as defined in 793, not Un*x).
>
> IMO, it ought to be rejected by the API, just as would be one that was
> otherwise incompletely or incorrectly specified (picking a source
> address not on a local interface, picking a port range you don't have
> privilege to access, etc.).
>
> However, loopback is a subnet (127.0.0.0/8), not just a single address.
> It ought to be feasible and correct to open a connection to yourself on
> the same port on different loopback addresses.
>
> Joe
>
> On 7/21/2013 9:47 AM, Fernando Gont wrote:
>> Folks,
>>
>> Found this by chance -- probably a datapoint that advice is needed in
>> this area (that is, draft-gont-tcpm-tcp-seq-validation).
>>
>> P.S.: Will present results from real-world testing at the next tcpm
>> meeting.
>>
>> Cheers,
>> Fernando
>>
>>
>>
>>
>> -------- Original Message --------
>> From: Matt Miller <matt@matthewjmiller.net>
>> Date: Wed, 17 Jul 2013 07:08:26 -0400
>> X-Google-Sender-Auth: ba5SYKKiksigElenhewyH0EffCs
>> Message-ID:
>> <CAFc6gu_q1X10EzsHrvnmYuQ0ZnKz9uNXbfJJe-guva6J-QKAow@mail.gmail.com>
>> Subject: TCP Loopback Connections with the Same Src/Dest Port
>> To: FreeBSD Net <freebsd-net@freebsd.org>
>>
>> Our system is based on FreeBSD 8.1.  In some tests, we were having
>> issues caused by connections of this form (more details below):
>>
>>   TCP4      0      0      0/   0/   0    127.0.0.1.665   127.0.0.1.665
>>   FIN_WAIT_1
>>   TCP4      0      0      0/   0/   0    127.0.0.1.637   127.0.0.1.637
>>   FIN_WAIT_1
>>   TCP4      0      0      0/   0/   0    127.0.0.1.648   127.0.0.1.648
>>   FIN_WAIT_1
>>
>> Some questions we had:
>>
>> - Has anyone else ever seen these same src/dest address/port TCP
>> connections created?  Does anyone know of a legitimate reason why they
>> should be allowed?
>>
>> - If there are no known use cases for this type of connection, does
>> anyone have more context/insight on the design here: should this type
>> of inpcb creation be prevented in the kernel or is it the
>> application's responsibility to ensure it never creates this type of
>> socket?
>>
>> For those interested, more details of the issue seen follow.  The
>> connection seems to get stuck in swi_net sending and receiving pure
>> FIN/ACKs to itself:
>>
>> #12 0xffffffff804372ce in ip_output (m=0xffffff0003ccf300,
>> opt=<optimized out>, ro=0xffffff8020c2b6a0, flags=0, imo=0x0,
>> inp=0xffffff0019933968) at ../../../../sys/netinet/ip_output.c
>> #13 0xffffffff804423dc in tcp_output (tp=0xffffff0019de2370) at
>> ../../../../sys/netinet/tcp_output.c
>> #14 0xffffffff8043ef5d in tcp_do_segment (m=0xffffff0019af1200,
>> th=0x100200, so=0xffffff011ac59570, tp=0xffffff0019de2370,
>> drop_hdrlen=52, tlen=0, iptos=0 '\000', ti_locked=3) at
>> ../../../../sys/netinet/tcp_input.c
>> #15 0xffffffff80440311 in tcp_input (m=0xffffff0019af1200,
>> off0=<optimized out>) at ../../../../sys/netinet/tcp_input.c
>> #16 0xffffffff8043530b in ip_input (m=0xffffff0019af1200) at
>> ../../../../sys/netinet/ip_input.c
>> #17 0xffffffff8040889f in netisr_process_workstream_proto
>> (proto=<optimized out>, nwsp=<optimized out>) at
>> ../../../../sys/net/netisr.c
>> #18 swi_net (arg=0xffffffff80f59800) at ../../../../sys/net/netisr.c
>>
>> swi_net() just continues in this loop, ad nauseam:
>>
>>   759         while ((bits = nwsp->nws_pendingbits) != 0) {
>>   760                 while ((prot = ffs(bits)) != 0) {
>>   761                         prot--;
>>   762                         bits &= ~(1 << prot);
>>   763                         (void)netisr_process_workstream_proto(nwsp,
>> prot);
>>   764                 }
>>   765         }
>>
>> The tcp_output() being triggered in tcp_do_segment() in the case is
>> the one show on line 2303 below:
>>
>> 2212         /*
>> 2213          * In ESTABLISHED state: drop duplicate ACKs; ACK out of
>> range
>> 2214          * ACKs.  If the ack is in the range
>> 2215          *      tp->snd_una < th->th_ack <= tp->snd_max
>> 2216          * then advance tp->snd_una to th->th_ack and drop
>> 2217          * data from the retransmission queue.  If this ACK reflects
>> 2218          * more up to date window information we update our
>> window information.
>> 2219          */
>> 2220         case TCPS_ESTABLISHED:
>> 2221         case TCPS_FIN_WAIT_1:
>> 2222         case TCPS_FIN_WAIT_2:
>> 2223         case TCPS_CLOSE_WAIT:
>> 2224         case TCPS_CLOSING:
>> 2225         case TCPS_LAST_ACK:
>> 2226                 if (SEQ_GT(th->th_ack, tp->snd_max)) {
>> 2227                         TCPSTAT_INC(tcps_rcvacktoomuch);
>> 2228                         goto dropafterack;
>> 2229                 }
>> ...
>> 2234                 if (SEQ_LEQ(th->th_ack, tp->snd_una)) {
>> ...
>> 2248                         if (tlen == 0 && tiwin == tp->snd_wnd) {
>> 2249                                 TCPSTAT_INC(tcps_rcvdupack);
>> ...
>> 2277                                 if (!tcp_timer_active(tp,
>> TT_REXMT) ||
>> 2278                                     th->th_ack != tp->snd_una)
>> 2279                                         tp->t_dupacks = 0;
>> 2280                                 else if (++tp->t_dupacks >
>> tcprexmtthresh ||
>> 2281                                     ((V_tcp_do_newreno ||
>> 2282                                       (tp->t_flags &
>> TF_SACK_PERMIT)) &&
>> 2283                                      IN_FASTRECOVERY(tp))) {
>> 2284                                         if ((tp->t_flags &
>> TF_SACK_PERMIT) &&
>> 2285                                             IN_FASTRECOVERY(tp)) {
>> 2286                                                 int awnd;
>> 2287
>> 2288                                                 /*
>> 2289                                                  * Compute the
>> amount of data in flight first.
>> 2290                                                  * We can inject
>> new data into the pipe iff
>> 2291                                                  * we have less
>> than 1/2 the original window's
>> 2292                                                  * worth of data in
>> flight.
>> 2293                                                  */
>> 2294                                                 awnd =
>> (tp->snd_nxt - tp->snd_fack) +
>> 2295
>> tp->sackhint.sack_bytes_rexmit;
>> 2296                                                 if (awnd <
>> tp->snd_ssthresh) {
>> 2297
>> tp->snd_cwnd += tp->t_maxseg;
>> 2298                                                         if
>> (tp->snd_cwnd > tp->snd_ssthresh)
>> 2299
>> tp->snd_cwnd = tp->snd_ssthresh;
>> 2300                                                 }
>> 2301                                         } else
>> 2302                                                 tp->snd_cwnd +=
>> tp->t_maxseg;
>> 2303                                         (void) tcp_output(tp);
>> 2304                                         goto drop;
>>
>> I've noticed that we don't yet have this patch in our code:
>>
>> http://svnweb.freebsd.org/base?view=revision&revision=239672
>>
>> Which seems like it could be relevant here to the general case of both
>> ends of the connection entering FIN_WAIT_1 at the same time and
>> sending FIN/ACKs repeatedly (though our connections are a bizarre case
>> of this where both ends of the connection are actually the same
>> connection).
>>
>> Thanks,
>>
>> Matt
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>
>>