Re: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity

Andre Oppermann <andre@freebsd.org> Thu, 24 January 2008 00:01 UTC

Return-path: <tcpm-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1JHpWf-0003Al-1p; Wed, 23 Jan 2008 19:01:01 -0500
Received: from tcpm by megatron.ietf.org with local (Exim 4.43) id 1JHpWd-0003Ae-Qi for tcpm-confirm+ok@megatron.ietf.org; Wed, 23 Jan 2008 19:00:59 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1JHpWd-0003AV-Gu for tcpm@ietf.org; Wed, 23 Jan 2008 19:00:59 -0500
Received: from c00l3r.networx.ch ([62.48.2.2]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1JHpWb-0003sl-Oc for tcpm@ietf.org; Wed, 23 Jan 2008 19:00:59 -0500
Received: (qmail 86584 invoked from network); 23 Jan 2008 23:22:18 -0000
Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender <andre@freebsd.org>) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for <mathis@psc.edu>; 23 Jan 2008 23:22:18 -0000
Message-ID: <4797D53C.10908@freebsd.org>
Date: Thu, 24 Jan 2008 01:01:00 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Thunderbird 1.5.0.14 (Windows/20071210)
MIME-Version: 1.0
To: Matt Mathis <mathis@psc.edu>
Subject: Re: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity
References: <47952032.3030809@freebsd.org> <Pine.LNX.4.64.0801221619190.27196@tesla.psc.edu>
In-Reply-To: <Pine.LNX.4.64.0801221619190.27196@tesla.psc.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: b7b9551d71acde901886cc48bfc088a6
Cc: tcpm@ietf.org
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Errors-To: tcpm-bounces@ietf.org

Matt Mathis wrote:
>> The first is the ordering of SACK blocks.  The block based reassembly
>> queue lends itself to generate the SACK option from.  The blocks are
>> ordered ascending from the first segment closest to rcv_nxt.  Per RFC
>> 2018 Section 4 the first SACK block MUST be the one with the most
>> recent addition to it.  This is easily done by adding a pointer to
>> the block last modified.  However a bit later it says the following
>> SACK blocks SHOULD be also in order of their 'recentness' while also
>> specifying they may be listed in arbitrary order.  Now tracking this
>> information in the block based and sequence number ordered reassembly
>> queue it a bit more involved and I wonder if it really necessary and
>> useful.  In general the SACK blocks tend to be ordered sequentially
>> after the first SACK block in single random loss cases.  Only in
>> packet reordering cases, possibly mixed with loss, this may not be
>> the case.  On the other hand the most valuable information for the
>> sender are the blocks closest to rcv_nxt so it can fill the hole and
>> complete the sequence space.  While the RFC leaves me free to do as
>> I please I wonder if there are observations or rationales that clearly
>> show the the ordering by recentness to be better than sequential
>> ordering after the first SACK block?
> 
> RFC 2018 is written the way it is written to maximize the robustness in 
> the presence of lost ACKs.  If the network never looses ACKs, then just 
> reporting the single SACK block with the newest data is guaranteed to be 
> sufficient for the sender to have perfect knowledge about the state of 
> the reassembly queue at the receiver.
> 
> If occasional ACKs and one of several non-contiguous retransmissions are 
> lost, the sender might not learn that some of the retransmissions were 
> successful if the 2nd and 3rd SACK block do not reflect the history of 
> the first block.
> 
> Although in most cases you want to know about blocks closest to rcv_nxt, 
> that is not always true.  In particular if the two oldest 
> retransmissions are lost and then some ACKs are also lost, the sender 
> will not know that some of the later holes have been successfully 
> filled.  If you are going to take a timeout anyhow, this isn't so 
> important, but we tried to avoid timeouts at all costs.

This pretty much where my question originates from.  The approach of
reporting the most recent block first and then closest and second
closest to rcv_nxt works just fine as long as the number of holes is
not large or excessive.  However even then I may argue that filling
the hole after rcv_nxt is the most important goal, above reporting
second and third recent blocks (if different).  Once the first hole
is filled the closest and second closest blocks to rcv_nxt automatically
shift to the right by one and get reported immediately.  When a large
number of holes appear the sender is very likely to face a collapsed
congestion window and will have to throttle down its rate of sending.
This makes it probable that guaranteed knowledge further out is not
relevant at this point in time.  By the time this becomes relevant it
get reported because it gets closer to rcv_nxt.

I'm not forcefully arguing this view but want to bring it up for further
elaboration and comparison with the other recommended approaches.

> If the 2nd and 3rd block reflect most recent history of the 1st block, 
> the sender will have perfect knowledge of the reassembly queue as long 
> as the network never looses 3 consecutive ACKs (and usually tolerates 
> many more losses).
> 
> The "may use other algorithms" is there because we suspect that you can 
> do even better, for example by making the 2nd and 3rd blocks 
> periodically or randomly re-report all old SACK blocks.  As far a I know 
> this has never been pursued, and we certainly did not want to specify or 
> even suggest it in 2018.
> 
> Everyone else implements it as written....

FreeBSD currently does so as well.  It's in my development branch I'm
working and experimenting with alternative approaches.  Using the block
building logic of the reassembly queue itself for SACK reporting is
quite tempting.  ;-)

> Oh, except nix "timeouts MUST clear the scoreboard"... If you preserve 
> the scoreboard on a timeout, you might be able to do goodput=throughput 
> (zero duplicate data at the receiver) under very high loss rates, which 
> is a good thing.

Thanks for your extensive explanations.

-- 
Andre



_______________________________________________
tcpm mailing list
tcpm@ietf.org
https://www1.ietf.org/mailman/listinfo/tcpm