Re: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity

Matt Mathis <> Tue, 22 January 2008 22:40 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1JHRn8-0004EG-Az; Tue, 22 Jan 2008 17:40:26 -0500
Received: from tcpm by with local (Exim 4.43) id 1JHRn7-0004E6-29 for; Tue, 22 Jan 2008 17:40:25 -0500
Received: from [] ( by with esmtp (Exim 4.43) id 1JHRn6-0004Db-6Y for; Tue, 22 Jan 2008 17:40:24 -0500
Received: from [2001:5e8:2:42:2e0:81ff:fe30:e898] ( by with esmtp (Exim 4.43) id 1JHRn3-0002ST-Pp for; Tue, 22 Jan 2008 17:40:23 -0500
Received: from ( []) by (8.14.2/8.13.3) with ESMTP id m0MMeJAD021806 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 22 Jan 2008 17:40:20 -0500 (EST)
Received: from ( []) by (8.13.1/8.13.1) with ESMTP id m0MMeJPw008662; Tue, 22 Jan 2008 17:40:19 -0500
Date: Tue, 22 Jan 2008 17:40:19 -0500
From: Matt Mathis <>
To: Andre Oppermann <>
Subject: Re: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity
In-Reply-To: <>
Message-ID: <>
References: <>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"; format="flowed"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 082a9cbf4d599f360ac7f815372a6a15
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

> The first is the ordering of SACK blocks.  The block based reassembly
> queue lends itself to generate the SACK option from.  The blocks are
> ordered ascending from the first segment closest to rcv_nxt.  Per RFC
> 2018 Section 4 the first SACK block MUST be the one with the most
> recent addition to it.  This is easily done by adding a pointer to
> the block last modified.  However a bit later it says the following
> SACK blocks SHOULD be also in order of their 'recentness' while also
> specifying they may be listed in arbitrary order.  Now tracking this
> information in the block based and sequence number ordered reassembly
> queue it a bit more involved and I wonder if it really necessary and
> useful.  In general the SACK blocks tend to be ordered sequentially
> after the first SACK block in single random loss cases.  Only in
> packet reordering cases, possibly mixed with loss, this may not be
> the case.  On the other hand the most valuable information for the
> sender are the blocks closest to rcv_nxt so it can fill the hole and
> complete the sequence space.  While the RFC leaves me free to do as
> I please I wonder if there are observations or rationales that clearly
> show the the ordering by recentness to be better than sequential
> ordering after the first SACK block?

RFC 2018 is written the way it is written to maximize the robustness in the 
presence of lost ACKs.  If the network never looses ACKs, then just reporting 
the single SACK block with the newest data is guaranteed to be sufficient for 
the sender to have perfect knowledge about the state of the reassembly queue 
at the receiver.

If occasional ACKs and one of several non-contiguous retransmissions are lost, 
the sender might not learn that some of the retransmissions were successful if 
the 2nd and 3rd SACK block do not reflect the history of the first block.

Although in most cases you want to know about blocks closest to rcv_nxt, that 
is not always true.  In particular if the two oldest retransmissions are lost 
and then some ACKs are also lost, the sender will not know that some of the 
later holes have been successfully filled.  If you are going to take a timeout 
anyhow, this isn't so important, but we tried to avoid timeouts at all costs.

If the 2nd and 3rd block reflect most recent history of the 1st block, the 
sender will have perfect knowledge of the reassembly queue as long as the 
network never looses 3 consecutive ACKs (and usually tolerates many more 

The "may use other algorithms" is there because we suspect that you can do 
even better, for example by making the 2nd and 3rd blocks periodically or 
randomly re-report all old SACK blocks.  As far a I know this has never been 
pursued, and we certainly did not want to specify or even suggest it in 2018.

Everyone else implements it as written....

Oh, except nix "timeouts MUST clear the scoreboard"... If you preserve the 
scoreboard on a timeout, you might be able to do goodput=throughput (zero 
duplicate data at the receiver) under very high loss rates, which is a good 

Matt Mathis
Work:412.268.3319    Home/Cell:412.654.7529
Evil is defined by mortals who think they know
"The Truth" and use force to apply it to others.

tcpm mailing list