Re: [tcpm] WGLC: 2581bis

Markku Kojo <kojo@cs.helsinki.fi> Sat, 22 December 2007 03:04 UTC

Return-path: <tcpm-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1J5uf4-0007ak-1E; Fri, 21 Dec 2007 22:04:26 -0500
Received: from tcpm by megatron.ietf.org with local (Exim 4.43) id 1J5uf2-0007af-Jg for tcpm-confirm+ok@megatron.ietf.org; Fri, 21 Dec 2007 22:04:24 -0500
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1J5uf2-0007aX-8E for tcpm@ietf.org; Fri, 21 Dec 2007 22:04:24 -0500
Received: from courier.cs.helsinki.fi ([128.214.9.1] helo=mail.cs.helsinki.fi) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1J5uf1-00034O-Bv for tcpm@ietf.org; Fri, 21 Dec 2007 22:04:24 -0500
Received: from x40-26.cs.helsinki.fi (a88-112-189-166.elisa-laajakaista.fi [88.112.189.166]) (AUTH: PLAIN cs-relay, TLS: TLSv1/SSLv3,256bits,AES256-SHA) by mail.cs.helsinki.fi with esmtp; Sat, 22 Dec 2007 05:04:21 +0200 id 000805C0.476C7EB5.00007DDB
Received: by x40-26.cs.helsinki.fi (Postfix, from userid 3011) id 8B1A8BFC5; Sat, 22 Dec 2007 05:04:20 +0200 (EET)
Received: from localhost (localhost [127.0.0.1]) by x40-26.cs.helsinki.fi (Postfix) with ESMTP id 63BC7BFB4; Sat, 22 Dec 2007 05:04:20 +0200 (EET)
Date: Sat, 22 Dec 2007 05:04:19 +0200
From: Markku Kojo <kojo@cs.helsinki.fi>
To: tcpm@ietf.org
Subject: Re: [tcpm] WGLC: 2581bis
In-Reply-To: <20071127004720.GD3385@hut.isi.edu>
Message-ID: <Pine.LNX.4.64.0712220127420.7480@x40-26.cs.helsinki.fi>
References: <20071127004720.GD3385@hut.isi.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 944ecb6e61f753561f559a497458fb4f
Cc: blanton@cs.purdue.edu, Ted Faber <faber@ISI.EDU>, vern@icir.org, mallman@icir.org
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Errors-To: tcpm-bounces@ietf.org

Ted, Mark, all,

I'd like to see this document to advance to DS. However, 
after reading the draft and taking a look at a few traces
from Linux TCP implementation it seems that there is one issue
that may need some attention first.

Looking into the Linux TCP behavior (that differs from what the 
draft specifies) seems to reveal an issue with the draft in its 
usage of FlightSize as a bad estimate for the amount of outstanding
data in the *network* and consequently inappropriate adjustment of 
ssthresh (and cwnd). That is, replacing cwnd with FlightSize in 
equation 4 seem to have resulted in similar kind of problems as there
were earlier when cwnd was used in the equation:

  When Limited Transmit alg (step 1 of fast rexmit & fast recovery
  alg in Section 3.2) is used with the current definition of 
  FlightSize and equation 4, ssthresh and cwnd will be assigned
  larger value than what would be appropriate. The reason is 
  that FlightSize is increased in step 1 and it is then used in 
  step 2 and 3 to determine the new value of ssthresh and cwnd. 
  However, allowing Limited Transmit to send a new data segment
  on the arrival of the 1st and 2nd dupack rests on the assumption
  that a dupack indicates that a segment has left the network
  and thereby the number of outstanding segments in the network
  remains unchanged (like cwnd remains). 

  This means that using Limited Transmit results in less reduction
  in ssthresh and cwnd compared to the case where Limited Transmit
  is not in use. As the difference in the new ssthresh (and cwnd) 
  value is only (at most) one MSS, this does not make TCP 
  significantly more aggressive with large windows, but with a small
  window size the difference is significant. For example, with cwnd 
  of 4 segments and a single segment loss, a TCP sender applying 
  Limited Transmit per current spec continues with cwnd of 3 
  segments while a TCP sender not applying Limited Transmit halves
  its cwnd and continues with cwnd of 2 segments. This may have
  a significant effect on a bottleneck link that is shared by a
  number of connections proceeding with a small window.

  One simple possibility of fixing this is to redefine equation 4 as

     ssthresh = max ( min(FlightSize,cwnd) / 2, 2*SMSS) 


  Similar problems in correctly determining a new value of ssthresh 
  may occur also in other cases where the actual amount of outstanding
  data (significanly) differs from FlightSize, i.e., when TCP sender 
  is already in loss recovery.

  Linux does not experience this problem with FlightSize as it 
  maintains a more accurate estimate (akin to pipe variable in RFC 
  3517) for the amount of outstanding data, and uses it to determine
  the new value of ssthresh (and cwnd) when entering loss recovery.


Other comments/suggestions:


1. Section 3.1, 3rd para:

  It might be useful also note that the purpose of the slow start 
  algorithm is to (re)start the ack clock (in addition to determining
  the available capacity).   


2. Section 3.1: 

  "On the other hand, when a TCP sender detects segment loss using 
   the retransmission timer and the given segment has already been
   retransmitted at least once, the value of ssthresh is held
   constant." 

   Should be clarified that this applies only when the retransmission
   timer expires again for the same segment, not when retransmission
   timer expires for a fast retransmitted segment.


3. Section 3.2, 1st step of the fast retransmit and fast recovery alg:

 It would be useful to note that allowing a TCP sender to send a 
 new data segment on the 1st and 2nd dupack is in violation to
 the definition of cwnd in Section 2:

   "At any given time, a TCP MUST NOT send data with a sequence 
    number higher than the sum of the highest acknowledged sequence
    number and the minimum of cwnd and rwnd."


4. Section 4.3:

  "Loss in two successive windows of data, or the loss of a 
   retransmission,  should be taken as two indications of congestion
   and, therefore, cwnd (and ssthresh) MUST be lowered twice in this 
   case."

  Lowering ssthresh twice on the loss of a retransmission triggered
  by an RTO would be in contradiction with what is said in Section 3.1 
  (see item 2 above). Should clarify that this is valid only with the
  loss of a fast retransmit (or the loss of a retransmission in fast 
  recovery with an advanced loss recovery alg such as NewReno or 
  SACK-based fast recovery) 


Thanks,

/Markku


_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www1.ietf.org/mailman/listinfo/tcpm