[tcpm] Window update algorithm differences

Andre Oppermann <andre@freebsd.org> Wed, 11 June 2008 12:50 UTC

Return-Path: <tcpm-bounces@ietf.org>
X-Original-To: tcpm-archive@megatron.ietf.org
Delivered-To: ietfarch-tcpm-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3A42F3A67E2; Wed, 11 Jun 2008 05:50:35 -0700 (PDT)
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 92E9C3A6780 for <tcpm@core3.amsl.com>; Wed, 11 Jun 2008 05:50:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.759
X-Spam-Level:
X-Spam-Status: No, score=-1.759 tagged_above=-999 required=5 tests=[AWL=-0.360, BAYES_00=-2.599, J_CHICKENPOX_33=0.6, J_CHICKENPOX_35=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ByCLIb5gepbp for <tcpm@core3.amsl.com>; Wed, 11 Jun 2008 05:50:32 -0700 (PDT)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by core3.amsl.com (Postfix) with ESMTP id 6B8EE3A63D2 for <tcpm@ietf.org>; Wed, 11 Jun 2008 05:50:30 -0700 (PDT)
Received: (qmail 36052 invoked from network); 11 Jun 2008 11:46:11 -0000
Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender <andre@freebsd.org>) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for <tcpm@ietf.org>; 11 Jun 2008 11:46:11 -0000
Message-ID: <484FCA2C.2020600@freebsd.org>
Date: Wed, 11 Jun 2008 14:50:52 +0200
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Thunderbird 1.5.0.14 (Windows/20071210)
MIME-Version: 1.0
To: tcpm@ietf.org
Subject: [tcpm] Window update algorithm differences
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://www.ietf.org/mailman/private/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: tcpm-bounces@ietf.org
Errors-To: tcpm-bounces@ietf.org

There is some considerable disagreement on the correctness of the original
window update test in various operating systems.  Here is an overview of
the current approaches used by the popular and open source TCP implementations:

RFC793: section 3.9, page 72
  SND.UNA < SEG.ACK =< SND.NXT, update window but not SND.WU.[SEQ|ACK]
  (SND.WU.SEQ < SEG.SEQ or (SND.WU.SEQ = SEG.SEQ and SND.WU.ACK =< SEG.ACK))
  update everything.

Stevens Vol.2: section 29.7, page 981-983
FreeBSD: src/sys/netinet/tcp_input.c, rev. 1.376
OpenBSD: src/sys/netinet/tcp_input.c, rev. 1.215
NetBSD: src/sys/netinet/tcp_input.c, rev. 1.287
  SEG.SEQ > SND.WU.SEQ or (SEG.SEQ = SND.WU.SEQ and (SEG.ACK > SND.WU.ACK or
  (SEG.ACK = SND.WU.ACK and SEG.WND > SND.WND))) update everything.

OpenSolaris: src/uts/common/inet/tcp/tcp.c, @swnd_update, rev. 6707
  SEG.ACK > SND.WU.ACK or SEG.SEQ > SND.WU.SEQ or
  (SEG.SEQ = SND.WU.SEQ and SEQ.WND > SND.WND) update everything.

Linux: net/ipv4/tcp_input.c, @tcp_ack_update_window(), rel. 2.6.25
  SEG.ACK > SND.UNA or SEG.SEQ > SND.WU.SEQ or
  (SEG.SEQ = SND.WU.SEQ and SEG.WND > SND.WND) update everything.

The OpenSolaris code contains some comments about being better in case
of bi-directional traffic and alleged problems with the RFC793 method.
The Linux code contains some general comments about the incorrectness
of the BSD method without further elaboration.

The obvious question is which one is correct or better than the others?

Lets have a look at the basic requirement of the send window update.

  o Only newer than already seen segment should update the send window
    to prevent old and outdated information being used.

  o All evolves around how to reliably detect newer updates.

Lets have a look at what makes a segment new:

  o When using timestamps, either the reflected TS is higher than the
    last one we got (we're sending data), or the TS from the other end
    is newer than what we currently reflect (we're receiving data or
    a window update).
    Problem: what to do when the round trip time is faster than the
    timestamp resolution?  Fall back to the SEQ and ACK checks.

    SEG.TSECR > TS_RECENT_AGE or SEG.TSVAL > TS_RECENT

  o Data we sent has been ack'ed.
    Problem: None really.  Doesn't trigger on old retransmits or out-of
    order.

    SEG.ACK > SND.UNA  (and implicit SEG.ACK <= SND.NXT)

  o We receive new data.
    Problem: out-of order into reassembly queue, retransmits of missing
    segments, reordering of segments.  Retransmit contains newer value.

    SEG.SEQ > RCV.NXT

  o No data sent or received but window increases.
    Problem: old delayed segment.  Only allow if window increases.

    SEG.WND > SND.WND


Hence I propose the following updated acceptable window update check:

  [1] (TS and SEG.TSECR > TS_RECENT_AGE or SEG.TSVAL > TS_RECENT or
  [2] SEG.ACK > SND.UNA or
  [3] (SEG.SEQ > SND.WU.SEQ and SEG.ACK >= SND.UNA) or
  [4] (SEG.SEQ = SND.WU.SEQ and SEG.ACK = SND.UNA and SEG.WND > SND.WND)

	SND.WND <- SEG.WND
  [5]	SEG.SEQ > SND.WU.SEQ
		SND.WU.SEQ <- SEG.SEQ
  [6]	(SND.WU.ACK <- SEG.ACK)

[1] If either timestamp is newer than what we've already seen this
     is a new segment and the window it contains is certain to be valid
     without any further checks.
[2] This is reliable indicator of a genuine window update.  With the
     arrival of new data that is ack'ed the window also has been updated.
[3] A higher sequence number tells us new data was received but if
     the ACK is lower than what we've already seen it must be a retransmit.
[4] A pure window update if the sequence number is the same, the ACK is
     not lower than what we've already seen and the advertised window is
     larger than the one we had.
[5] Only change the last update that gave us a window update if it is
     higher than what we have.  This prevents retransmitted or reordered
     segments without a new ACK from updating our window.  With timestamps
     we can reliably differentiate retransmits from out-of order segments.
[6] Tracking the last ACK that updated the window has become unnecessary.
     SND.WU.ACK is also known as SND.WL2.

Cases:

  o In unidirectional send we trigger always on [2] when our data is ack'ed.
  o In unidirectional receive we trigger on [3] for in-order segments.  Out
    of order segments do not update the window unless they advance SND.WU.SEQ.
    Retransmits are not detected unless timestamps are enabled.  In that case
    [1] triggers if the RTT is larger than the resolution of the timestamp
    clock.  Otherwise window updates will resume when all missing segments
    are retransmitted and new segments beyond SND.WU.SEQ arrive.
  o In bidirectional traffic we trigger on [3] and [2].  If the transfer has
    loss or is re-ordered in either or both directions we also trigger in all
    important cases due to [2] when new data was ack'ed, and [3] new data with
    an up to date ACK is received.  Above all the timestamp check allows all
    new segments no matter what order they are in.


Feedback and pointing out of mistakes are welcome.

BTW: TAC's are gone for good!

-- 
Andre

andre@FreeBSD.org
_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www.ietf.org/mailman/listinfo/tcpm