Re: [tcpm] draft-ananth-tcpm-persist-00.txt as a WG document

(WG co-chair hat off)

First, I agree with everything that Ted said.

It might be helpful in this discussion to know why the wording on  
connections in persist state was put into RFC 1122 in the first  
place.  Everything that is in 1122 and 1123 is in there for a reason,  
whether or not that reason is included in the document.

In the case of connections in persist state, my fuzzy recollection was  
that there were TCP implementations that were timing out and  
terminating TCP connections stuck in persist state, the same way that  
they timed out and terminated non-responsive connections.  An example  
scenario where this is bad is if you have TCP connection open to a  
printer, and the printer has stopped accepting data because it is  
waiting for someone to load more paper; you don't want TCP arbitrarily  
timing out and closing that connection, just because the printer can't  
accept any more data.  The printer might sit without paper for a  
couple of hours over a long lunch break, or overnight, and once  
someone comes back and reloads paper, the printer should be able to  
continue receiving data on TCP connection, and continue printing where  
it left off.

That's an example of why RFC 1122 explicitly calls out keeping  
connections in persist state open, and is also an example of why *TCP*  
does not have enough information by itself to decide to shut down a  
connection in persist state.

I do understand that in many TCP stacks, you can create a TCP socket,  
write a bunch of data and close the socket, and TCP will keep the  
state and data, and continue to send the queued up data.  If the  
connection goes into persist state, there is no longer an application  
associated with the socket to say this is taking to long and abort the  
connection, hence as long as the other side responds to the probes,  
the state & data stay around.

But terminating that connection is an OS implementation issue of how  
to reclaim data when the system is out of resources.  If it decides to  
abort existing TCP connections, it could choose to abort first those  
connections in persist state where the application has closed the  
socket.  Or it might have some other criteria.  And in terms of the  
protocol definition, as Ted points out, the way you do that is with an  
ABORT.  Perhaps the OS should provide an interface to abort arbitrary  
TCP connections, so that the sys admin can kill off any TCP  
connection.  But that is all implementation details, it has nothing to  
do with how the TCP protocol operates.

If the other side is being malicious, it can just as easily  
acknowledge one byte of data every so often to keep the connection  
from staying perpetually in persist state, effectively defeating any  
remediation based on looking at how long a connection has been in  
persist state.  So, the connection is making progress now, but  
slowly.  So do we now need to call out how to deal with connections  
that are stuck with very slow progress, in addition to those that are  
making no progress?  It's a slippery slope.

It might be useful in the draft to note that this is just one specific  
example of how a malicious client can tie up resources on the server,  
and that the server application or OS is perfectly free to terminate  
any TCP connection in any state for any reason, and that a connection  
stuck in in persist state is just one example.

My personal feeling is that there isn't a problem here that is crying  
out to be documented, it is an implementation issue of how to deal  
with a lack of resources.  On the other hand, if this item in 1122 has  
caused implementors of resource recovery to treat connections in  
persist state different than other TCP connections, then I have no  
objection to an informational document that can give some  
clarification.  And it should be vetted through the TCPM WG.

			-David Borman

On Oct 3, 2008, at 1:37 PM, Ted Faber wrote:

> I don't mean to be answering from John, but your comments interested  
> me.
>
> On Thu, Oct 02, 2008 at 05:13:50PM -0700, Murali Bashyam wrote:
>>
>>> From: John Heffner [mailto:johnwheffner@gmail.com]
>>> I have reservations about moving forward with this draft as a wg
>>> document.  While the information is technically correct, it is  
>>> overly
>>> specific to the point of possibly being misleading rather than
>>> clarifying.  An operating system may terminate a tcp connection at  
>>> any
>>> time, not just when the connection is in the persist state.   
>>> Further,
>>> I would actually argue that terminating a connection *because*  
>>> it's in
>>> the persist state is a bad idea and should be discouraged.  (The
>>> reason for termination should be for its use of resources, not TCP's
>>> state.)
>>
>> No implementation today (the well-known ones BSD, Windows and Linux)
>> terminates the TCP connection in that persist state as long as ACKs
>> are being reliably received from the peer. Those three  
>> implementations
>> are not doing what you are saying they should be doing, and if it's
>> crystal clear from the standard that they should be doing so, why
>> aren't they?
>
> Your comment is a little ambiguous.  John says that he believes that
> persist state is not necessarily a good marker for aborting  
> connections
> when resources are scarce.  To take him to task because  
> implementations
> are not using it as a marker - because the implementers agreed with
> his position - is a little unfair, IMHO.
>
> On the WG item:
>
> As a bigger picture issue, I find it helpful to remember that
> protocol standards, like 1122, are primarily interoperation documents.
> As such, they tend to focus on what protocol implementers must do to
> talk with one another successfully, not to nail down all possible
> choices.
>
> Once we start offering advice (or stronger) to developers, we're  
> leaving
> the interoperability domain.  We, as a standards body, have to assume
> that implementers know their application and their environment better
> than we do, and that they can and will make appropriate decisions if  
> we
> give them the room to do so.
>
> Where a standard has unnecessarily tied the hands of implementers,  
> that
> standard should be changed, IMHO.  I am also sympathetic to clarifying
> the standards body's intent where poor language has obscured it.  I
> don't see that kind of confusion in the text the authors are  
> addressing
> (that is RFC 1122 Section 4.2.2.17), but reasonable people can  
> disagree
> on such things.
>
> As such, taking on this work item - clarifying 1122 4.2.2.17 - doesn't
> excite me, but I don't oppose it.
>
> The draft should, IMHO, point out that 1122 has nothing to say about
> resource management.  Designers and implementers are free to be as
> clever or foolish as they'd like.
>
> End of thoughts on the WG item.
>
> As for the draft itself, I'm concerned that it seems to be  
> advocating a
> particular resource management position more strongly than simply
> pointing out that RFC1122 should not affect resource allocation
> decisions.  I also see some misplaced standards language that is
> somewhat confusing.  I'm looking at this paragraph in Section 2:
>
>
> 	An extensive discussion took place recently about this issue on
> 	the TCPM WG mailing list [TCPM].  The general opinion seemed to
> 	be that terminating a TCP connection in persist condition does
> 	not violate RFC 1122.  In particular the operating system, a
> 	resource manager, or an application can instruct TCP to abort a
> 	connection in the persist condition.  TCP itself SHOULD not take
> 	any action and continue to keep the connection open as mandated
> 	by RFC 1122 unless otherwise instructed to do so.  The exact
> 	mechanism by which the instruction to abort the connection is
> 	conveyed to TCP is an implementation decision and falls beyond
> 	the scope of the current memo.
>
> There's no instruction going on, and one doesn't request an abort.
> Abort is a command in the TCP interface (defined in RFC 793, p. 50)  
> that
> destroys a TCP connection.  Abort works in any state.  Anything with
> access to the TCP interface, including the OS, a resource manager,  
> or an
> application, can call it.  A TCP abort is "the exact mechanism by  
> which
> the instruction to abort the connection is conveyed to TCP", and the
> only reason that it's outside the scope of this document is that it's
> defined in 793.
>
> In light of that I think that the sentence about "TCP itself SHOULD
> not[sic] take any action..." is more confusing than the 1122 section.
> It seems to imply that TCP might react differently to an abort than
> the single paragraph in 793, and I don't think that's the case.
>
> Now, a strictly defined abort command seems to somewhat contradict my
> earlier comments about allowing implementers some freedom.  This is a
> spot where the standards body has required a specific tool to be
> available to applications (and OSes and resource managers): there must
> be a way to annihilate a TCP connection from outside without regard to
> its state or anything else.  The standard is silent about how the
> command is used; that's the flexibility here.
>
>
> -- 
> Ted Faber
> http://www.isi.edu/~faber           PGP: http://www.isi.edu/~faber/pubkeys.asc
> Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG

_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www.ietf.org/mailman/listinfo/tcpm