Re: Summary of responses so far and proposal moving forward[WasRe: [tcpm] Is this a problem?]

----- Original Message ----
> From: David Borman <david.borman@windriver.com>
> To: Mark Allman <mallman@icir.org>; Joe Touch <touch@isi.edu>
> Cc: TCP Maintenance and Minor Extensions WG <tcpm@ietf.org>
> Sent: Tuesday, November 27, 2007 6:58:40 AM
> Subject: Re: Summary of responses so far and proposal moving forward[WasRe: [tcpm] Is this a problem?] 
> 
> Ok, I haven't chimed in yet on this conversation.
> 
> While I agree with the document on the identification of the problem,  
> I disagree with the proposed solution (changing TCP to time out  
> connections in persist state).  Having a connection stay in persist  
> state for long periods of time (i.e., zero window probes continue to  
> be ACKed) by itself is not a bad thing.  That is how TCP was designed  
> to work.  Connections can survive through lots of adversity.  If a  
> connection is stuck because it is waiting for user action and the
> user
> 

> walked away and went home for the day, he should be able to come back  
> the next morning and do what needs to be done, and then the
> connection
> 

> will continue.

This is fine for a telnet/rlogin type of connection, it's never the normal behaviour for
the web or http/xml type of connections (u don't click on a link today and come back the 
next day to see the page). And this is precisely the environment and circumstance that
the RFC addresses and mandates the TCP sender persist indefinitely. Things have
changed a lot since those days. Today TCP sender behaviour  has to accomodate for a wide range of
application usage. For the web application, i can tell you with confidence that for a sender
to be persisting indefinitely for several hours together and for a large number of connections
at that, definitely indicates anomalous receiver behaviour.  

On a per application basis, we are suggesting the app has
the flexibility to enable this different TCP behaviour of not persisting indefinitely.
Or the administrator enables this globally for the system.
The default behaviour continues to be the same as today.

> 
> As has already been stated, the issue is what should the OS do when
> it
> 

> runs out of resources.  TCP implementations typically oversubscribe  
> their resources, and run into problems when all the open connections  
> try to use up all the resources that they've been told they can use.   
> In this situation, the OS has to figure out some way to free up  
> resources.  There may be some things it can do without killing  
> connections (e.g., flush TCP resequencing queues), but usually that  
> won't be sufficient if you have a runaway or malicious source that is  
> causing the resource problem in the first place.  In this situation,  
> anything the OS decides to do, including killing TCP connections, is  
> at the discretion of the OS, and I don't that view as violating any  
> RFC.  You're out of resources, you have to do something.  This is not  
> a TCP protocol issue, it is an OS implementation issue.

There are 2 finite resources at stake here, TCP sender (not receiver) buffer resources and 
TCP connections.

OS keeps resources for the entire system including other protocols, for example UDP 
could be running on the system too, and frequently TCP can reach a limit on the resources
 it is allowed to use and yet the OS/system cannot detect this since from a system 
point of view,  it does have resources. The OS certainly cannot detect TCP connection 
pool being exhausted. Seems like TCP should clean up its own resources taking total
connection availability and total buffer pool into account. A clean solution shouldn't be lumping
OS and TCP together.

> 
> Now, it might be that connections that have been in persist state for  
> a long period of time are good candidates for the OS to abort to free  
> up resources.  But doing that has to be a decision of the OS or the  
> application, not of TCP.  TCP can keep track of how long connections  
> are in persist state, so that if the OS or application asks, it can  
> judiciously choose which ones are the best to abort.

Since you going down this path, what's wrong with TCP doing it by itself having
received go-ahead from the application and/or the administrator of the system?
Seems as if we are going to extreme lengths and playing with system boundary
definitions here to avoid making a change where it hurts most (TCP).

> 
> Let's look at keep-alives.  They are not part of the TCP  
> specification.  They aren't perfect.  In RFC 1122, we acknowledged  
> their existence, and placed restrictions on them.  The must default
> to
> 

> off.  The default interval for sending keep-alives must be at least 2  
> hours.  You don't drop a connection due to just one missed keep- 
> alive.  "A TCP keep-alive mechanism should only be invoked in server  
> applications that might otherwise hang indefinitely and consume  
> resources unnecessarily if a client crashes or aborts a connection  
> during a network failure."  But they do serve a useful purpose.
> 
> In the end, it is the responsibility of the application to place  
> limits on its TCP connections.  If the OS provides a simple way for  
> the application to say "ABORT this TCP connection if it remains in  
> persist state (or idle state, or...) for more than X period of time",  
> I don't have any objection to that.  That's an agreement between the  
> application and the OS.  It has the nice advantage that the OS knows  
> what to do with the connection after the application has written all  
> its data and closed its side of the connection, and hence is no
> longer
> 

Why should the OS and not TCP that should be doing the abort, it's TCP 
responsibility after all to abort a connection? Or by OS do you mean the 
TCP implementation here?

> able to ABORT the connection.  When the system runs out of resources,  
> it is the responsibility of the OS to decide how to deal with that  
> situation.  If TCP is consuming large amounts of resources, then the  
> OS will have to have some way to tell TCP how to free up resources,  
> including ABORTing connections.
> 
> There have always been ways that TCP implementation can tie up  
> resources, and we've been working to mitigate those things all along. 
> 

> (The first one I remember dealing with was the "send each octet in a  
> separate packet, but don't send the first octet".  That tied up  
> resources on BSD on the TCP resequencing queue.) One difference  
> between now and 15-20 years ago, is that back then many of the  
> resource issues were not intentional, but due to poorly written  
> applications or just new scenarios that hadn't been exercised before. 
> 

> But what hasn't changed is that the problems are usually due to  
> implementation issues, not problems with the TCP protocol.  And that  
> holds true in this case.
> 

In this instance, the implementation issue stems from the protocol definition itself,
the TCP protocol implementation has faithfully followed the RFC here i.e persist
indefinitely and not abort, and for the OS to be aborting connections without considering connection 
state and context would definitely lead to non-compliant TCP behaviour. I don't agree that
Aborting an idle connection falls in the same category as this issue.

      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www1.ietf.org/mailman/listinfo/tcpm