Re: [tcpm] Congestion control in face of ICMP unreachable messages

Daniel Schaffrath <daniel.schaffrath@mac.com> Thu, 20 September 2007 19:22 UTC

Return-path: <tcpm-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IYRbI-0004CJ-3v; Thu, 20 Sep 2007 15:22:12 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IYRbG-0004BW-Ez for tcpm@ietf.org; Thu, 20 Sep 2007 15:22:10 -0400
Received: from smtpoutm.mac.com ([17.148.16.75]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IYRbA-00086B-3t for tcpm@ietf.org; Thu, 20 Sep 2007 15:22:10 -0400
Received: from mac.com (smtpin05-en2 [10.13.10.150]) by smtpoutm.mac.com (Xserve/smtpout012/MantshX 4.0) with ESMTP id l8KJLmPk004813; Thu, 20 Sep 2007 12:21:48 -0700 (PDT)
Received: from [192.168.178.24] (p57a4bfe5.dip0.t-ipconnect.de [87.164.191.229]) (authenticated bits=0) by mac.com (Xserve/smtpin05/MantshX 4.0) with ESMTP id l8KJLehp028489 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 20 Sep 2007 12:21:43 -0700 (PDT)
In-Reply-To: <20070914005315.GL13168@hut.isi.edu>
References: <8B61F72F-2F75-4388-976F-9748F8784AB3@mac.com> <20070817162542.GA2511@hut.isi.edu> <4AD719E5-AF65-4D4C-8BE8-B070793F69C3@mac.com> <20070907004335.GB48227@hut.isi.edu> <F0EB06F1-58B9-49B1-8CC0-AFEF49ABC276@mac.com> <20070914005315.GL13168@hut.isi.edu>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset="US-ASCII"; delsp="yes"; format="flowed"
Message-Id: <12EA28CC-C380-46EB-8886-C7BA9EC86148@mac.com>
Content-Transfer-Encoding: 7bit
From: Daniel Schaffrath <daniel.schaffrath@mac.com>
Subject: Re: [tcpm] Congestion control in face of ICMP unreachable messages
Date: Thu, 20 Sep 2007 21:21:14 +0200
To: Ted Faber <faber@ISI.EDU>
X-Mailer: Apple Mail (2.752.3)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: f49c97ce49302a02285a2d36a99eef8c
Cc: tcpm@ietf.org
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Errors-To: tcpm-bounces@ietf.org

On 2007/09/14  , at 02:53, Ted Faber wrote:
> On Wed, Sep 12, 2007 at 08:23:32PM +0200, Daniel Schaffrath wrote:
>> On 2007/09/07  , at 02:43, Ted Faber wrote:
[...]
>>> If TCP has gotten to the point where a retransmission timer has
>>> gone off
>>> not only has the transmission of the packet to be retransmitted
>>> failed,
>>> but enough other packets have been lost that the fast retransmit
>>> algorithm (3 dupacks) has also not happened (yes, the window  
>>> needs to
>>> have grown to 4...).  In short, there's something very wrong with  
>>> the
>>> communication between the endpoints, and drastic action is called  
>>> for.
>>> Specifically, the sender acts as though the connection is almost  
>>> new -
>>> window is as small as possible and the ssthresh is halved.
>>>
>>> Your text above sounds like you're saying that if a sender has
>>> heard the
>>> network say that there's a connectivity problem between the  
>>> sender and
>>> receiver (ICMP destination unreachable) the sender is entitled to
>>> react
>>> less conservatively, even though all the evidence is that the  
>>> link is
>>> congested.
>> From my understanding the evidence is that just some link is (was)
>> not working and that is not the same as a link is congested. Even,
>> there is some slight indication that there is no congestion in the
>> network (or at least the very part of it) as the ICMP message (as
>> well as the segment eliciting the ICMP message) was not dropped and
>> got to the TCP source.
>
> So, the TCP stack really just knows that packets are not being
> acknowledged, and the current design interprets that as congestion.
> That's a conservative approach, and one people can and do argue about.
> If you'd like to argue about that interpretation of loss, the end- 
> to-end
> mailing list might be a better place to do so.  I mean we'll talk  
> about
> it, too, but I think of it as a bigger picture issue.
I am subscribed to it. My impression is that the (active) audience is  
quite is same as on this list... and they all are "holistics" ;)

> Still and all, one could say that the ICMPs count as extra evidence  
> that
> a connection isn't congested but confused and should be treated
> differently.  I'd recommend against treating them this way because:
>
> 	* ICMP operates on a different timescale than TCP.  It may take
> 	  a router longer to decide that there's a host unreachable
> 	  situation than the TCP stack would.  It may also take a router
> 	  longer to detect the opposite.
In my (naive?) understanding they not necessarily operate on a  
different timescale. For instance, it may take a router some time to  
execute an ARP request which may fail and then it would send a late  
ICMP (for the initial early segment, the later one got queued, then  
dropped). But if for whatever reason a port is disconnected (and by  
this a whole (sub-)net vanishes from the routing table) the router  
might be able to immediately (within an RTT) decide to send an ICMP.  
In either case the TCP source can decide about the timescale an  
arriving ICMP operates on (by looking at the sequence numbers  
therein). Of course, provided all involved nodes being non-malicious.

> 	* They're really easy to spoof.
I got that in the meantime from Jon explanation. And how come, if  
they are really that easy to spoof, that for ICMP port/proto/etc.  
unreachable messages RFC 1122 asks to drop the connection? Isn't that  
a very easy attack then. btw: I just noticed while browsing the Linux  
source that Linux treats port/proto/net/host unreachable messages all  
the same, i.e., flag an error on the socket and that's it.

Anyways: if ICMP were used to skip doubling RTO an attacker cannot  
make a TCP source to flood the network by this.

> I don't think you gain very much, either.  If your application  
> believes
> that the connection has stuttered - a route flap or something - the
> easiest way to fix it may be to reconnect.  I'll bet you do this a
> couple times a day with your browser.  Hitting reload on a slow  
> loading
> page does exactly this.  Other applications may value continuity of
> connection more.
I exhibit exactly this behavior. And I don't like it. I'd like the  
network to solve these issues for me... :)

[...]

>>> Explain to me why ICMP messages indicating that your packets are not
>>> being delivered indicate you should not slow down.
>>
>> I am not saying you shouldn't slow down. After RTO of course you
>> should reset cwnd and halve ssthresh. I was just thinking of skipping
>> (or delaying) doubling RTO if the retransmission after RTO does not
>> simply vanish in the network but is  replied to with an ICMP (host/
>> net) unreachable message. This seems reasonable to me if my above
>> finding is true.
>
> I think its extra complexity for minimal gain.
> But, I haven't done anything to validate that opinion.  Do you have  
> some
> evidence either way?

Complexity seems to be almost a dysphemism to me here. The gain is  
obviously a much faster restarting connection. And it's  
extraordinary... given an adequate scenario which is of course not  
the usual dial-up leafnode to the wired Internet but for instance a  
node which is part of a subnet managed by olsrd (causing route  
interruptions and the like).

The question is if there is some rationale for this behavior.  
Actually, I was thinking there was (ICMP reply for single slow-start  
recovery retransmit indicating no congestion for the first few hops  
at least). But in the meantime I found that for zero window probing  
(which is comparable to "roadblock" probing) RFC 1122 in 4.2.2.17  
recommends  exponentiell back-off as well. Whereas I understand  that  
exponentiell back-off is necessary for the _network_ to recover from  
severe congestion, I don't understand why exponentiell back-off is  
recommended for recovering from zero windows (which is more or less  
_application_ congestion). Of course, for zero window probes which  
are not ACKed back-off is essentiel. There is a DISCUSSION paragraph  
in 4.2.2.17 which is unfortunately not clear to me. It might be  
understood as if exponentiell back-off allows fast recovery from a  
zero window condition, although (from my understanding) it does  
exactly the opposite. Of course, it may all boil down to defining  
words, i.e., exponentiell back-off being fast by definition as  
complexity theory knows off much worse growth. But "traditionally"  
exponentiell is not fast.

Maybe you have further pointers?

Thank you,
Daniel Schaffrath





_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www1.ietf.org/mailman/listinfo/tcpm