Re: Summary of responses so far and proposal moving forward[WasRe: [tcpm] Is this a problem?]

Mahesh Jethanandani <mahesh@cisco.com> Tue, 27 November 2007 23:21 UTC

Return-path: <tcpm-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Ix9kY-0007pn-Vf; Tue, 27 Nov 2007 18:21:54 -0500
Received: from tcpm by megatron.ietf.org with local (Exim 4.43) id 1Ix9kX-0007pH-Dz for tcpm-confirm+ok@megatron.ietf.org; Tue, 27 Nov 2007 18:21:53 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Ix9kX-0007p8-4J for tcpm@ietf.org; Tue, 27 Nov 2007 18:21:53 -0500
Received: from sj-iport-5.cisco.com ([171.68.10.87]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Ix9kU-0005Fw-Ln for tcpm@ietf.org; Tue, 27 Nov 2007 18:21:53 -0500
Received: from sj-dkim-4.cisco.com ([171.71.179.196]) by sj-iport-5.cisco.com with ESMTP; 27 Nov 2007 15:21:50 -0800
Received: from sj-core-4.cisco.com (sj-core-4.cisco.com [171.68.223.138]) by sj-dkim-4.cisco.com (8.12.11/8.12.11) with ESMTP id lARNLo9F013602; Tue, 27 Nov 2007 15:21:50 -0800
Received: from [171.69.75.93] (dhcp-171-69-75-93.cisco.com [171.69.75.93]) by sj-core-4.cisco.com (8.12.10/8.12.6) with ESMTP id lARNLdnb003871; Tue, 27 Nov 2007 23:21:39 GMT
Message-ID: <474CA683.2030600@cisco.com>
Date: Tue, 27 Nov 2007 15:21:39 -0800
From: Mahesh Jethanandani <mahesh@cisco.com>
Organization: Cisco Systems Inc.
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
MIME-Version: 1.0
To: David Borman <david.borman@windriver.com>
Subject: Re: Summary of responses so far and proposal moving forward[WasRe: [tcpm] Is this a problem?]
References: <20071126193803.585E12FC5BE@lawyers.icir.org> <61806008-0CBC-417D-B5EB-46A7EE18446F@windriver.com>
In-Reply-To: <61806008-0CBC-417D-B5EB-46A7EE18446F@windriver.com>
DKIM-Signature: v=0.5; a=rsa-sha256; q=dns/txt; l=9001; t=1196205710; x=1197069710; c=relaxed/simple; s=sjdkim4002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=mahesh@cisco.com; z=From:=20Mahesh=20Jethanandani=20<mahesh@cisco.com> |Subject:=20Re=3A=20Summary=20of=20responses=20so=20far=20and=20proposal= 20moving=20forward[WasRe=3A=0A=20[tcpm]=20Is=20this=20a=20problem?] |Sender:=20; bh=ZdW85VMHQKCslR68oxx68AXQEf4gPitl64IbziZESIU=; b=L2yeUUbRRjeveNK15txYMdW6lOom5p6zMucKyfjUd1XU47J6JD8pFvB0ugiDArZWCu8sc66+ jqZF9wluCpPkIRm4uuOrYFwYuaa7pc3YmsqIj7xez0zrpHRZzm8U7EQ/;
Authentication-Results: sj-dkim-4; header.From=mahesh@cisco.com; dkim=pass ( sig from cisco.com/sjdkim4002 verified; );
X-Spam-Score: -4.0 (----)
X-Scan-Signature: 48472a944c87678fcfe8db15ffecdfff
Cc: tcpm@ietf.org, Mark Allman <mallman@icir.org>
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0897513013=="
Errors-To: tcpm-bounces@ietf.org


David Borman wrote:
> Ok, I haven't chimed in yet on this conversation.
I am glad you finally did :-)
>
> While I agree with the document on the identification of the problem, 
> I disagree with the proposed solution (changing TCP to time out 
> connections in persist state).  Having a connection stay in persist 
> state for long periods of time (i.e., zero window probes continue to 
> be ACKed) by itself is not a bad thing.  That is how TCP was designed 
> to work.  Connections can survive through lots of adversity.  If a 
> connection is stuck because it is waiting for user action and the user 
> walked away and went home for the day, he should be able to come back 
> the next morning and do what needs to be done, and then the connection 
> will continue.
My understanding of rfc 1122 is and I quote from the rfc itself:

> A TCP MAY keep its offered receive window closed
>             indefinitely.  As long as the receiving TCP continues to
>             send acknowledgments in response to the probe segments, the
>             sending TCP MUST allow the connection to stay open.
>
>             DISCUSSION:
>                  It is extremely important to remember that ACK
>                  (acknowledgment) segments that contain no data are not
>                  reliably transmitted by TCP.  If zero window probing is
>                  not supported, a connection may hang forever when an
>                  ACK segment that re-opens the window is lost.
>   
This tells me that the concern was with ACK's getting lost in the 
network and that is why the need to keep the connection open. The point 
we bring up in the draft is in the case the ACK's are being received 
reliably then the need to keep the connection open just to make sure the 
ACK has made it to the other end goes away. That is why the request to 
change the language to say that *in case of reliable ACK*, TCP MAY tear 
the connection down if it is not able to service existing or new 
connections.

We seem to agree on the user scenario you describe above.  That is why 
we make it clear in the draft that we should not tear down a connection 
just because the connection is open for a long time. Where there is one 
or a few connections that are keeping the connection open, the solution 
will not tear the connection down. The problem happens is when lots of 
users (or attackers) do the same.
>
> As has already been stated, the issue is what should the OS do when it 
> runs out of resources.  TCP implementations typically oversubscribe 
> their resources, and run into problems when all the open connections 
> try to use up all the resources that they've been told they can use.  
> In this situation, the OS has to figure out some way to free up 
> resources.  There may be some things it can do without killing 
> connections (e.g., flush TCP resequencing queues), but usually that 
> won't be sufficient if you have a runaway or malicious source that is 
> causing the resource problem in the first place.  In this situation, 
> anything the OS decides to do, including killing TCP connections, is 
> at the discretion of the OS, and I don't that view as violating any 
> RFC.  You're out of resources, you have to do something.  This is not 
> a TCP protocol issue, it is an OS implementation issue.
True. But it was caused by TCP's insistence on keeping the connection 
open that causes the OS to even run out of resources, even if the reason 
to keep the connection open (unreliable ACKs) may not be true.

I know there is very little support for this argument, but for a reader 
reading the rfc there is sufficient confusion on whether the connection 
can be cleared or not. Why not change the MUST to a MAY for reliable ACKs?

People on this mailing list have been arguing on the point of if 
connections can even be cleared and this is tcpm mailing list!! 
Everybody is a TCP expert here. Is it not telling of a problem in the 
language of the rfc?

/mahesh
_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www1.ietf.org/mailman/listinfo/tcpm