[tcpm] 2581 implementation report, take 2

Mark Allman <mallman@icir.org> Tue, 30 October 2007 19:58 UTC

Return-path: <tcpm-bounces@ietf.org>
Received: from [] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1ImxEH-0002XR-Vb; Tue, 30 Oct 2007 15:58:25 -0400
Received: from tcpm by megatron.ietf.org with local (Exim 4.43) id 1ImxEH-0002XD-Aw for tcpm-confirm+ok@megatron.ietf.org; Tue, 30 Oct 2007 15:58:25 -0400
Received: from [] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1ImxEG-0002X3-VW for tcpm@ietf.org; Tue, 30 Oct 2007 15:58:25 -0400
Received: from pork.icsi.berkeley.edu ([]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1ImxEG-0000E2-5e for tcpm@ietf.org; Tue, 30 Oct 2007 15:58:24 -0400
Received: from guns.icir.org (adsl-69-222-35-58.dsl.bcvloh.ameritech.net []) by pork.ICSI.Berkeley.EDU ( with ESMTP id l9UJwMgq023066 for <tcpm@ietf.org>; Tue, 30 Oct 2007 12:58:22 -0700
Received: from lawyers.icir.org (adsl-69-222-35-58.dsl.bcvloh.ameritech.net []) by guns.icir.org (Postfix) with ESMTP id C9CBF11512E1 for <tcpm@ietf.org>; Tue, 30 Oct 2007 15:58:15 -0400 (EDT)
Received: from lawyers.icir.org (localhost []) by lawyers.icir.org (Postfix) with ESMTP id 9CA6C2D9591 for <tcpm@ietf.org>; Tue, 30 Oct 2007 15:56:09 -0400 (EDT)
To: tcpm@ietf.org
From: Mark Allman <mallman@icir.org>
Organization: ICSI Center for Internet Research (ICIR)
Song-of-the-Day: 30 Days in the Hole
MIME-Version: 1.0
Date: Tue, 30 Oct 2007 15:56:09 -0400
Message-Id: <20071030195609.9CA6C2D9591@lawyers.icir.org>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 25eb6223a37c19d53ede858176b14339
Subject: [tcpm] 2581 implementation report, take 2
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: mallman@icir.org
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0227490603=="
Errors-To: tcpm-bounces@ietf.org

Attached is a slightly tweaked version of the 2581 implementation
report.  The report includes input from the linux community noting that
2581 is implemented in their stack and that they have not seen any sort
of big problems because of it.



  + RFC 2581 is a re-write of RFC 2001.  RFC 2001 was a description
    of TCP's congestion control algorithms that was published long
    after these algorithms were in nearly ubiquitous deployment
    throughout the Internet (largely triggered by the congestion
    collapses of the mid-1980s).

  + While RFC 2001 was a description of the algorithms, RFC 2581 is
    a more traditional specification.  We stress that the RFC was
    written based on running code and experience.

  + The mechanisms in RFC 3042 (Limited Transmit) and RFC 3390
    (Larger Initial Congestion Window) are also rolled into the
    current document.  Both of these enhancements are Proposed
    Standards that have gathered wide consensus within the community
    based on deployment experience.

  + The traditional test of two interoperable implementations to
    move a Proposed Standard to Draft Standard is less obvious in
    the case of congestion control mechanisms.  Congestion control
    is about *when* to send a segment and not *what* that segment
    looks like, how to process it, how big fields are, etc.
    Therefore, it is difficult to assess "interoperability" in the
    traditional sense.  Below we cite several sources that show or
    suggest that multiple implementations of the mechanisms exist
    and seem to work as intended.

  + The new version of the document clarifies a number of small
    issues that implementers have asked about over the years, but
    does not make any large changes to the algorithms.

Known Implementations:

  + [WS95] discusses the BSD implementation of the core algorithms
    in RFC 2581 (slow start, congestion avoidance, fast retransmit
    and fast recovery).  This implementation has formed the basis of
    the TCP stack in numerous operating systems (NetBSD, FreeBSD,
    OpenBSD, SunOS 4.x, BSDI, etc.).  While various operating
    systems may have diverged in small details (some of which is
    documented in RFC 2581) the basic algorithms do not seem to have

  + Linux also supports RFC 2581 and does not report any adverse
    impacts.  See Attachment 1 below.

    (The complaint in that email is not about the document itself or
    even the algorithm within RFC 2581, but rather goes to our
    congestion control principles.  Further, as sketched the
    behavior given in RFC 2581 is more conservative than desired and
    therefore if this RFC is in error, it is erroring in the right
    direction for stable operation.)

  + [Pax97] analyzes a number of implementations, finding both
    correct and incorrect behavior relative to RFC 2581 across a
    variety of implementations.  The incorrect behavior fed into

  + [MAF05] tests for conformance along a number of angles by
    probing the TCPs of over 70K web servers with specialized packet
    streams that induce the stack to show how it handles various
    situations. The results include:

      + The vast majority of server reduce their congestion window
        by half in response to congestion (per RFC 2581's congestion

      + The majority of the web servers used an initial congestion
        window of 1--2 packets.

      + Limited Transmit was used in over 20% of the servers.

      + While some servers do not use fast retransmit the
        overwhelming majority implement it.

      + Many web servers use the fast recovery algorithm (with a
        number using more advanced recovery such as NewReno
        [RFC3782] or SACK-based loss recovery techniques

    (Note that [MAF05] updates some of the results of [PF01].  The
    newer results confirm the older results.)


  [MAF05] Alberto Medina, Mark Allman, Sally Floyd.  Measuring the
    Evolution of Transport Protocols in the Internet.  ACM Computer
    Communication Review, 35(2), April 2005.

  [Pax97] Vern Paxson.  Automated Packet Trace Analysis of TCP
    Implementations.  ACM SIGCOMM, September 1997.

  [PF01] Jitu Padhye, Sally Floyd.  Identifying the TCP Behavior of
  Web Servers, SIGCOMM 2001, August 2001. 

  [WS95] Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2: The
    Implementation", Addison-Wesley, 1995.

Attachment 1:

  Date:    Mon, 24 Sep 2007 19:55:20 PDT
  To:      mallman@icir.org
  From:    Stephen Hemminger <shemminger@linux-foundation.org>
  Subject: Re: rfc2581


  Yes Linux implements RFC2581 and has not had any unstable or
  congestion problems caused by that. In recent years, there has
  been lots of refinements and alternatives added, but all the other
  algorithms are more complex attempts to ensure proper and stable
  response in "corner case" domains of large delay bandwidth
  products and/or small router queues.

  Linux also implements RFC2861 (congestion window validation) by
  default which makes it less aggressive than many other
  implementations. Because this caused some bursty applications to
  have poor performance it was made optional.

  The only real complaint against the principles of congestion
  control has come from the financial community. Slow start can
  cause connections to have latency, and when latency equates to
  real $$ during transactions, customers get very sensitive to the
  added delay.  For a discussion of this see the presentation from
  Credit Suisse at this 2007 Kernel
  Summit. http://lwn.net/Articles/248878/ For that reason, they are
  looking to alternatives to TCP/IP such as Infiniband.

  Stephen Hemminger <shemminger@linux-foundation.org>
tcpm mailing list