[tcpm] Increasing TCP's Initial Congestion

"Scheffenegger, Richard" <rs@netapp.com> Wed, 10 November 2010 19:29 UTC

Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Wed, 10 Nov 2010 19:27:29 -0000
Message-ID: <5FDC413D5FA246468C200652D63E627A0B65C814@LDCMVEXC1-PRD.hq.netapp.com>
Thread-Topic: [tcpm] Increasing TCP's Initial Congestion
Thread-Index: AcuBDVCuBA27FU5TSZaEq60S69mRKg==
From: "Scheffenegger, Richard" <rs@netapp.com>
To: Jerry Chu <hkchu@google.com>, Yaogong Wang <ywang15@ncsu.edu>
Cc: Arvind Jain <arvind@google.com>, tcpm <tcpm@ietf.org>, Matt Mathis <mattmathis@google.com>
Subject: [tcpm] Increasing TCP's Initial Congestion
Precedence: list

Hi Jerry, Yaogong,

Reading your slides (i won't be online when you give your talk):


http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf


| Is SACK required for IW10 to Perform?
| 
| * From the testbed at Google
|  - SACK does help reducing UCT but only by a small
|    percentage for both IW10 and IW3
| * 24 hours, 3-way parallel experiment at Google's
|   frontend servers
|  - IW10+NewReno still beats IW3+SACK
|
| Photos download Avg response time Retransmission rate
| IW10+SACK       2.6secs           4.1%
| IW10+NewReno    2.8secs           4.1%
| IW3+SACK        3.0secs           3.3%
| November 11, 2010     79th IETF, Beijing China      19


First, let me thank you for conducting these experiments with SACK and
NewReno. 


My comment at IETF78 was that Linux SACK (since 2.4.xx) can detect lost
retransmissions, and recover from them; 

On the 2nd slide you paraphrase this:

| How does IW10 perform
|  when SACK is either not available, or not adequately
|  implemented?

Did you disable this aspect of Linux SACK during your tests (to have a
RFC3517 compliant SACK sender)? Also, IIRC, Linux 2.6.2x already
implements draft-jarvinen-tcpm-sack-recovery-entry-02.txt // RFC3517bis.

I assume the fraction of lost retransmissions and early recovery entries
was small (afaik, Linux records these events individually). If these
counters were zero during your the testing, I'm fully satisfied.

I'm just worried that a good fraction of loss recovery was due to these
recent features (not widely available on other stacks), causing grief if
one simply tries to do IW10 without these other modifications.

Also, this comment on slide 13 indicates, that some flows had to do
RTO-based loss recovery, correct?

| * Under heavy load IW10 lost UCT advantage due to high PLR
| * IW10 UCT exhibits long tails 


Did you get traces off your testbed to investigate these long tail flows
further? One of the reasons might be the recent observation, that with
short flows, NewReno can sometimes recover more speedy than SACK (
http://www.ietf.org/mail-archive/web/tcpm/current/msg05875.html ).
(RecoveryPoint is close to the end-of-stream, and at least the last
segment prior to RP was lost).

The other reason could be, that SACK retransmissions are lost, but Linux
can not recover them any more. (RecoveryPoint is at the end-of-stream).

There might be additional cases causing RTO, though...

I would be very interested in looking at these long tail (high latency)
flow traces myself.



Richard Scheffenegger

[tcpm] Increasing TCP's Initial Congestion Scheffenegger, Richard
Re: [tcpm] Increasing TCP's Initial Congestion Jerry Chu