Re: [tcpm] On the implementation of TCP urgent data (IETF Internet Draft)

Jerry Leichter <leichter@lrw.com> Sat, 28 February 2009 21:51 UTC

Return-Path: <leichter@lrw.com>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 443F13A69CD for <tcpm@core3.amsl.com>; Sat, 28 Feb 2009 13:51:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mvmP-rnbXWR8 for <tcpm@core3.amsl.com>; Sat, 28 Feb 2009 13:51:36 -0800 (PST)
Received: from smtp2.bestweb.net (smtp2.bestweb.net [209.94.103.42]) by core3.amsl.com (Postfix) with ESMTP id A00613A6870 for <tcpm@ietf.org>; Sat, 28 Feb 2009 13:51:33 -0800 (PST)
Received: from [10.0.1.3] (unknown [69.177.11.241]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp2.bestweb.net (Postfix) with ESMTPSA id 566ED119C38; Sat, 28 Feb 2009 16:51:56 -0500 (EST)
Message-Id: <6F1AF259-F2A0-4266-8A92-C3712E9E1430@lrw.com>
From: Jerry Leichter <leichter@lrw.com>
To: Fernando Gont <fernando@gont.com.ar>
In-Reply-To: <49A9A056.30207@gont.com.ar>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Sat, 28 Feb 2009 16:51:55 -0500
References: <20090227222910.4AAF55654E@rebar.astron.com> <F2BD5C91-4566-487A-8CC0-D180C30B0058@old-ones.com> <49A9A056.30207@gont.com.ar>
X-Mailer: Apple Mail (2.930.3)
X-Mailman-Approved-At: Sun, 01 Mar 2009 18:38:37 -0800
Cc: ayourtch@cisco.com, groo@netbsd.org, James Chacon <jmc@netbsd.org>, tcpm@ietf.org, Bill Squier <groo@old-ones.com>, Christos Zoulas <christos@zoulas.com>
Subject: Re: [tcpm] On the implementation of TCP urgent data (IETF Internet Draft)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 28 Feb 2009 22:13:55 -0000

I haven't seen a copy of the paper, so I have to respond indirectly.

Microsoft confirmed the problem.  It was present in all their stacks  
prior to Vista.  Based on our repeated complaints, they eventually  
produced a patch for XP.  For some time, it was one of their "if you  
know to ask we'll give it to you" patches.  For all I know, they  
eventually decided to roll it in to some regular patch release.

The "OOB data leaking in line" has many causes.  I know of at least  
two completely different mechanisms:

	- On the sender, the problem can occur because you can only send
		one OOB byte per segment, but there is no correlation
		between segments and OOB write requests.  Consider the
		situation where the segmentation code wants to send a
		segment, and within the range of bytes it wants to send,
		there are two that were marked OOB.  The only way to handle
		this correctly is to send two segments, even if one of them
		is artificially short - perhaps even a single byte.  I
		haven't checked implementations in detail, but I've
		never seen this mentioned and I would be surprised if
		implementations get this right.

	- On the receiver, there is a messy case, the details of which
		I forget, in which the implementation doesn't keep enough
		state to properly handle two "close" OOB bytes.  If you
		look in Stevens, he actually reproduces a comment from
		the original BSD code which describes the problem, but
		basically says "you lose".  It's not clear anyone ever
		bothered to fix this.

The fundamental issue is this:  OOB bytes exist at the socket API  
level, but there is no such concept at the TCP level.  The urgent  
pointer exists at the TCP level, but it's basically invisible at the  
socket API level.  The theory is that you can use the urgent pointer  
to implement OOB bytes - and, in fact, that's true.  I've convinced  
myself of this by writing out a semi-formal definition of the  
theoretical semantics at the two levels and sketching an  
implementation that actually presents the right semantics at the  
socket API level.  I'm pretty sure that no one actually implements  
this right other than Microsoft (ironically) because it requires the  
ability to allocate space for OOB bytes at the receiver dynamically,  
and I know of no implementations that do so other than Microsoft's.   
By counting the allocated bytes against the window size you are  
willing you offer, you can avoid the problem that MS ran into, where  
there's no way to bound the amount of space allocated.  The  
implementation is potentially expensive, but it's only expensive for  
applications that actually send many OOB bytes - which is a reasonable  
tradeoff.

What actually happens is that the TCP stack implementers seem never to  
have believed in OOB on general principles, and in any case it seems  
that the TCP stack implementors (the network guys) and the socket API  
implementers (the OS guys) don't seem to talk to each other much.  So  
this delicate implementation issue, which lives exactly at the  
interface between the two, gets lost in the shuffle.  (Note that the  
multiple-OOB-bytes-in-a-segment problem probably cannot be solved  
without changing the TCP stack/socket API interface, since typically  
there is no way for the segmentation layer even to *see* that there  
are multiple outstanding OOB bytes:  The only available interface is a  
single urgent pointer value that gets passed across.)

There's an even more fundamental problem that it's worth pointing  
out:  The actual urgent pointer protocol has a flaw.  The UP is a 16- 
bit offset into the stream.  As long as no segment can be more than  
2^16 bytes - as was true when TCP was designed, because the windows  
size is represented in 16 bits - this works.  But if you use scaled  
windows, it's possible to be in a situation where you need to send a  
UP value that cannot be represented, because it's too far into a large  
segment.  There's an obvious work-around - send a short segment when  
this happens - but I doubt anyone does this.  In fact, I would guess  
that the offset gets calculated ignoring the 16-bit limitation and the  
actual UP sent is the lower 16 bits of the true UP.  Obviously, this  
can cause severe problems to any use of TCP that actually relies on  
the UP being correct!

Finally, there's yet another level of problem:  Some router-like  
devices (Cisco PIX firewalls - in their default configuration! - are a  
known example) simply turn off the Urgent bit!  This is to block a  
very old (10+ years?) Windows bug.  However, this is a disaster for  
any program that actually relies on OOB data/the UP.
                                                         -- Jerry


On Feb 28, 2009, at 3:36 PM, Fernando Gont wrote:

> Bill Squier wrote:
>
> (Added Andrew Yourtchenko (draft co-author) to the recipient's list)
>
> Comments inline...
>
>
>> I haven't had time to read the article completely, but I did skim the
>> Windows section, and Christos is correct.
>
> IIRC, we did the windows tests with cygwin. Maybe this is what lead to
> different results?
>
> That said, the next version of our Internet Draft will include text
> describing the buggy implementations you are referring to.
>
>
>> Further, I noticed that some of your analysis discusses OOB data
>> bleeding in-line.  This is almost certainly caused by an  
>> interaction (on
>> the _sender_) of Nagle and the fact that TCP defines only a single  
>> OOB
>> pointer.  The receiver is not returning bytes which it knows to be  
>> OOB
>> bytes inline, the _sender_ is accidentally placing more than a single
>> byte of OOB in each packet that it sends.
>
> There is no problem with that. TCP just provides a mechanism for  
> marking
> the end of urgent data. Just a mark.
>
>
>
>> The receiver has no way to know that, as the only means of  
>> communication
>> about OOB data between sender and receiver is a single pointer.
>
> Exactly. Any data that's before the pointer should be considered  
> "urgent".
>
> Thanks!
>
> Kind regards,
> -- 
> Fernando Gont
> e-mail: fernando@gont.com.ar || fgont@acm.org
> PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1
>
>
>
>
>