Re: Last Call: TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms to Proposed Standard
Thomas Narten <narten@vnet.ibm.com> Wed, 20 March 1996 20:54 UTC
Received: from ietf.cnri.reston.va.us by IETF.CNRI.Reston.VA.US id aa25573; 20 Mar 96 15:54 EST
Received: from CNRI.Reston.VA.US by IETF.CNRI.Reston.VA.US id aa25569; 20 Mar 96 15:54 EST
Received: from ietf.cnri.reston.va.us by CNRI.Reston.VA.US id aa12968; 20 Mar 96 15:54 EST
Received: from ietf.cnri.reston.va.us by IETF.CNRI.Reston.VA.US id aa25560; 20 Mar 96 15:54 EST
Received: from vnet.ibm.com by IETF.CNRI.Reston.VA.US id aa25555; 20 Mar 96 15:54 EST
Received: from RTP by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 6916; Wed, 20 Mar 96 15:54:09 EST
Received: by RTP (XAGENTA 4.0) id 4843; Wed, 20 Mar 1996 15:54:22 -0500
Received: from localhost.raleigh.ibm.com by cichlid.raleigh.ibm.com (AIX 3.2/UCB 5.64/4.03) id AA18885; Wed, 20 Mar 1996 15:54:19 -0500
Message-Id: <9603202054.AA18885@cichlid.raleigh.ibm.com>
To: Richard Stevens <rstevens@noao.edu>
Cc: The IESG <iesg@IETF.CNRI.Reston.VA.US>
Subject: Re: Last Call: TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms to Proposed Standard
In-Reply-To: Your message of "Tue, 19 Mar 1996 13:19:06 EST." <9603191319.aa18538@IETF.CNRI.Reston.VA.US>
Date: Wed, 20 Mar 1996 15:54:18 -0400
X-Orig-Sender: iesg-request@IETF.CNRI.Reston.VA.US
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Thomas Narten <narten@vnet.ibm.com>
Overall, it is great to see this finally made an RFC. I have some wording suggestions that I think might improve the document. General comment. As a Proposed Standard, this should be telling implementors what to do. There are places where the text comes off more as a suggestion/explanation, or "here is what some implementations do". This document should make it very clear what implementations should do. That is, when is fast retransmit invoked? MUST slow-start be done on LANs? Etc. I'd suggest reorganizing a bit and moving all the general descriptions to the front (including Fast Retransmit and Fast Recovery, plus the history of what was implemented where), and then having a final section that describes the combined algorithms in perhaps even more detail than is already done. The combined algorithms should be very clear about what MUST be done, and what parameters should used. Right now, the current description doesn't seem to fully describe Fast Retransmit/Recovery, for example. The remaining comments are more specific. > The algorithm to avoid this is called slow start. It operates by > observing that the rate at which new packets should be injected into > the network is the rate at which the acknowledgments are returned by > the other end. To be more precise, new packets should be injected at the same rate as packets are leaving the network at the other end of the connection. Received ACKs provide an indication of packets exiting. > Slow start adds another window to the sender's TCP: the congestion > window, called "cwnd". When a new connection is established with a > host on another network, the congestion window is initialized to one > segment (i.e., the segment size announced by the other end, or > the Question: aren't the 'cwnd' calculations typically implemented in bytes rather than segments? I understand that describing everything in segments makes it easier to explain (at a high level), but I'm not sure we want to leave out those details. It would help the document to be more consistent in the units it uses. > default, typically 536 or 512). Each time an ACK is received, the > congestion window is increased by one segment. The sender can > transmit up to the minimum of the congestion window and the > advertised window. The congestion window is flow control imposed by > the sender, while the advertised window is flow control imposed by > the receiver. The former is based on the sender's assessment of > perceived network congestion; the latter is related to the amount of > available buffer space at the receiver for this connection. > > The sender starts by transmitting one segment and waiting for its > ACK. When that ACK is received, the congestion window is > incremented from one to two, and two segments can be sent. When > each of those two segments is acknowledged, the congestion window is > increased to four. This maybe getting too picky, but the above sentence could be interpreted as don't expand the window until both ACKs for two segments have been received. This provides an exponential growth, although it > is not exactly exponential because the receiver may delay its ACKs, > typically sending one ACK for every two segments that it receives. > > At some point the capacity of the internet can be reached, and an > intermediate router will start discarding packets. This tells the > sender that its congestion window has gotten too large. > > Early implementations performed slow start only if the other end was > on a different network. Current implementations always perform slow > start. Be stronger? Say MUST always perform SS? > > 2. Congestion Avoidance > > Congestion can occur when data arrives on a big pipe (a fast LAN) > and gets sent out a smaller pipe (a slower WAN). Congestion can > also occur when multiple input streams arrive at a router whose > output capacity is less than the sum of the inputs. Congestion > avoidance is a way to deal with lost packets. It is described in > [2]. "congestion avoidance" seems to have a number of definitions. I'm not sure of which one VJ uses, but it has always seemed more correct to me use "avoidance" in the context of preventing congestion that leads to packet loss. Once loss has occurred, congestion is present, and "congestion control" takes over. Indeed, doesn't VJ use congestion avoidance to refer specfically to the phase where the window increases linearly? I.e., the window is being expanded slowly under the assumption that is already operating at close to the optimal window size? > > The assumption of the algorithm is that packet loss caused by damage > is very small (much less than 1%), therefore the loss of a packet > signals congestion somewhere in the network between the source and > destination. There are two indications of packet loss: a timeout > occurring and the receipt of duplicate ACKs. > > Congestion avoidance and slow start are independent algorithms with > different objectives. But when congestion occurs TCP must slow down > its transmission rate of packets into the network, and then invoke > slow start to get things going again. In practice they are > implemented together. Better define the two modes/objectives. SS is designed to increase the window (relatively) quickly, back to a "known" "optimal" window size (without sending a window-sized burst of back-to-back packets). CA increases the window much more slowly, with the goal of using more bandwidth that may have recently become available. > > Congestion avoidance and slow start require that two variables be > maintained for each connection: a congestion window, cwnd, and a > slow start threshold size, ssthresh. The combined algorithm > operates as follows: > > 1. Initialization for a given connection sets cwnd to one segment > and ssthresh to 65535 bytes. > > 2. The TCP output routine never sends more than the minimum of cwnd > and the receiver's advertised window. Comparing bytes and segments. Confusing? Would it be possible to use consistent units everywhere? > > 3. When congestion occurs (indicated by a timeout or the reception > of duplicate ACKs), How many duplicate ACKs? Isn't this fast retransmit? Shouldn't it be described independent of this description? It would be nice, for instance, to describe a bit why fast retransmit skips SS. [Oh. I see that comes later in the text.] > one-half of the current window size (the > minimum of cwnd and the receiver's advertised window, but at > least two segments) is saved in ssthresh. Additionally, if the > congestion is indicated by a timeout, cwnd is set to one segment > (i.e., slow start). > > 4. When new data is acknowledged by the other end, increase cwnd, > but the way it increases depends on whether TCP is performing > slow start or congestion avoidance. > > > > Stevens [Page 3] > > INTERNET-DRAFT March 1996 > > > If cwnd is less than or equal to ssthresh, TCP is in slow start; > otherwise TCP is performing congestion avoidance. Slow start > continues until TCP is halfway to where it was when congestion > occurred (since it recorded half of the window size that caused > the problem in step 2), and then congestion avoidance takes ^^^ add: (i.e., cwnd reaches ssthress) > over. > Slow start has cwnd begin at one segment, and be incremented by > one segment every time an ACK is received. As mentioned > earlier, this opens the window exponentially: send one segment, > then two, then four, and so on. Congestion avoidance dictates > that cwnd be incremented by segsize*segsize/cwnd each time an > ACK is received, where segsize is the segment size and cwnd is > maintained in bytes. I must be missing something really basic here wrt the segsize*segsize/cwnd formula. Congestion avoidance increases cwnd by one packet each RTT. If segsize and cwnd are in bytes, shouldn't the increase be "cwnd += segsize/cwnd" ? Also, is the above strictly true, even if the ACK covers multiple segments? And when the ACK is a duplicate? >This is a linear growth of cwnd, compared > to slow start's exponential growth. The increase in cwnd should > be at most one segment each round-trip time (regardless how many > ACKs are received in that RTT), whereas slow start increments > cwnd by the number of ACKs received in a round-trip time. > > Many implementations incorrectly add a small fraction of the segment > size (typically the segment size divided by 8) during congestion > avoidance. This is wrong and should not be emulated in future > releases. > > 3. Fast Retransmit Move this above the previous section, since the previous section references Fast Retransmit. > > Modifications to the congestion avoidance algorithm were proposed in > 1990 [3]. Before describing the change, realize that TCP may > generate an immediate acknowledgment (a duplicate ACK) when an out- > of-order segment is received (Section 4.2.2.21 of [1], with a note > that one reason for doing so was for the experimental fast- > retransmit algorithm). This duplicate ACK should not be delayed. > The purpose of this duplicate ACK is to let the other end know that > a segment was received out of order, and to tell it what sequence > number is expected. > > Since TCP does not know whether a duplicate ACK is caused by a lost > segment or just a reordering of segments, it waits for a small > number of duplicate ACKs to be received. It is assumed that if > there is just a reordering of the segments, there will be only one > or two duplicate ACKs before the reordered segment is processed, > which will then generate a new ACK. If three or more duplicate ACKs > are received in a row, it is a strong indication that a segment has > been lost. TCP then performs a retransmission of what appears to be > the missing segment, without waiting for a retransmission timer to > expire. Need to be more firm. At what point are implementations supposed to retransmit? If it is 3, say so. > > 4. Fast Recovery > > > > Stevens [Page 4] > > INTERNET-DRAFT March 1996 > > > After fast retransmit sends what appears to be the missing segment, > congestion avoidance, but not slow start is performed. This is the > fast recovery algorithm. It is an improvement that allows high > throughput under moderate congestion, especially for large windows. > > The reason for not performing slow start in this case is that the > receipt of the duplicate ACKs tells TCP more than just a packet has > been lost. Since the receiver can only generate the duplicate ACK > when another segment is received, that segment has left the network > and is in the receiver's buffer. That is, there is still data > flowing between the two ends, and TCP does not want to reduce the > flow abruptly by going into slow start. > > The fast retransmit and fast recovery algorithms are usually > implemented together as follows. > > 1. When the third duplicate ACK in a row is received, set ssthresh > to one-half the current congestion window, cwnd, but no less > than two segments. Retransmit the missing segment. Set cwnd to > ssthresh plus 3 times the segment size. This inflates the > congestion window by the number of segments that have left the > network and which the other end has cached (3). > > 2. Each time another duplicate ACK arrives, increment cwnd by the > segment size. This inflates the congestion window for the > additional segment that has left the network. Transmit a > packet, if allowed by the new value of cwnd. > > 3. When the next ACK arrives that acknowledges new data, set cwnd > to ssthresh (the value set in step 1). This ACK should be the > acknowledgment of the retransmission from step 1, one round-trip > time after the retransmission. Additionally, this ACK should > acknowledge all the intermediate segments sent between the lost > packet and the receipt of the first duplicate ACK. This step is > congestion avoidance, since TCP is down to one-half the rate it > was at when the packet was lost. > > The fast retransmit algorithm first appeared in the 4.3BSD Tahoe > release, and it was followed by slow start. The fast recovery > algorithm appeared in the 4.3BSD Reno release. > > 5. Security Considerations > > Security considerations are not discussed in this memo. > > > > > > > > Stevens [Page 5] > > INTERNET-DRAFT March 1996 > > > 6. References > > [1] B. Braden, ed., "Requirements for Internet Hosts -- > Communication Layers," RFC 1122, Oct. 1989. > > [2] V. Jacobson, "Congestion Avoidance and Control," Computer > Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. > ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. > > [3] V. Jacobson, "Modified TCP Congestion Avoidance Algorithm," > end2end-interest mailing list, April 30, 1990. > ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail. > > [4] W. R. Stevens, "TCP/IP Illustrated, Volume 1: The Protocols", > Addison-Wesley, 1994. > > [5] G. R. Wright, W. R. Stevens, "TCP/IP Illustrated, Volume 2: > The Implementation", Addison-Wesley, 1995. > > Author's Address: > > W. Richard Stevens > 1202 E. Paseo del Zorro > Tucson, AZ 85718 > > Phone: 520-297-9416 > > EMail: rstevens@noao.edu > Home Page: http://www.noao.edu/~rstevens > > > Expires: September 14, 1996 > > > > > > > > > Stevens [Page 6]
- Re: Last Call: TCP Slow Start, Congestion Avoidan… Geert Jan de Groot
- Re: Last Call: TCP Slow Start, Congestion Avoidan… Thomas Narten