Re: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity

Andre Oppermann <andre@freebsd.org> Thu, 21 February 2008 22:52 UTC

Return-Path: <tcpm-bounces@ietf.org>
X-Original-To: ietfarch-tcpm-archive@core3.amsl.com
Delivered-To: ietfarch-tcpm-archive@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id E88AA3A6CF5; Thu, 21 Feb 2008 14:52:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.617
X-Spam-Level:
X-Spam-Status: No, score=-0.617 tagged_above=-999 required=5 tests=[AWL=-0.180, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, HELO_MISMATCH_ORG=0.611, RDNS_NONE=0.1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NWS+sELUbWnF; Thu, 21 Feb 2008 14:52:18 -0800 (PST)
Received: from core3.amsl.com (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6DFA53A6B51; Thu, 21 Feb 2008 14:52:18 -0800 (PST)
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5BC823A6882 for <tcpm@core3.amsl.com>; Thu, 21 Feb 2008 14:52:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ojqge3aTckPL for <tcpm@core3.amsl.com>; Thu, 21 Feb 2008 14:52:15 -0800 (PST)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by core3.amsl.com (Postfix) with ESMTP id 56AFC3A683C for <tcpm@ietf.org>; Thu, 21 Feb 2008 14:51:49 -0800 (PST)
Received: (qmail 35990 invoked from network); 21 Feb 2008 22:07:41 -0000
Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender <andre@freebsd.org>) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for <mathis@psc.edu>; 21 Feb 2008 22:07:41 -0000
Message-ID: <47BE0080.1090009@freebsd.org>
Date: Thu, 21 Feb 2008 23:51:44 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Thunderbird 1.5.0.14 (Windows/20071210)
MIME-Version: 1.0
To: Matt Mathis <mathis@psc.edu>
References: <47952032.3030809@freebsd.org> <Pine.LNX.4.64.0801221619190.27196@tesla.psc.edu> <4797D53C.10908@freebsd.org> <Pine.LNX.4.64.0801240911250.27196@tesla.psc.edu>
In-Reply-To: <Pine.LNX.4.64.0801240911250.27196@tesla.psc.edu>
Cc: TCP Maintenance and Minor Extensions WG <tcpm@ietf.org>
Subject: Re: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <http://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <http://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: tcpm-bounces@ietf.org
Errors-To: tcpm-bounces@ietf.org

Matt Mathis wrote:
> If you are thinking at the scale of a dozen packet windows with just a 
> couple of losses, I don't disagree with you.  However, with 1000 packet 
> windows, you almost never see "just a couple of losses", and your 
> proposed simplification will be very exposed to lost ACKs.  Given that 
> the ACK rate doubles, your chances are not good.

Based on the excellent feedback and rationale I've implemented the
RFC2018 suggested ordering of the SACK blocks.  [The whole thing is
still in a separate development tree]

-- 
Andre

> BTW 1000 packet windows are not all that big: 1500 Byte MTU, 70 ms RTT 
> and only 90 Mb/s - well within reach for any university user in the US 
> or Europe
> and FIOS users in Korea or Japan.   The real challenge is getting the 
> scoreboard to work well with 10k or 100k packet windows....
> 
> Thanks,
> --MM--
> -------------------------------------------
> Matt Mathis      http://www.psc.edu/~mathis
> Work:412.268.3319    Home/Cell:412.654.7529
> -------------------------------------------
> Evil is defined by mortals who think they know
> "The Truth" and use force to apply it to others.
> 
> On Thu, 24 Jan 2008, Andre Oppermann wrote:
> 
>> Matt Mathis wrote:
>>>> The first is the ordering of SACK blocks.  The block based reassembly
>>>> queue lends itself to generate the SACK option from.  The blocks are
>>>> ordered ascending from the first segment closest to rcv_nxt.  Per RFC
>>>> 2018 Section 4 the first SACK block MUST be the one with the most
>>>> recent addition to it.  This is easily done by adding a pointer to
>>>> the block last modified.  However a bit later it says the following
>>>> SACK blocks SHOULD be also in order of their 'recentness' while also
>>>> specifying they may be listed in arbitrary order.  Now tracking this
>>>> information in the block based and sequence number ordered reassembly
>>>> queue it a bit more involved and I wonder if it really necessary and
>>>> useful.  In general the SACK blocks tend to be ordered sequentially
>>>> after the first SACK block in single random loss cases.  Only in
>>>> packet reordering cases, possibly mixed with loss, this may not be
>>>> the case.  On the other hand the most valuable information for the
>>>> sender are the blocks closest to rcv_nxt so it can fill the hole and
>>>> complete the sequence space.  While the RFC leaves me free to do as
>>>> I please I wonder if there are observations or rationales that clearly
>>>> show the the ordering by recentness to be better than sequential
>>>> ordering after the first SACK block?
>>>
>>> RFC 2018 is written the way it is written to maximize the robustness 
>>> in the presence of lost ACKs.  If the network never looses ACKs, then 
>>> just reporting the single SACK block with the newest data is 
>>> guaranteed to be sufficient for the sender to have perfect knowledge 
>>> about the state of the reassembly queue at the receiver.
>>>
>>> If occasional ACKs and one of several non-contiguous retransmissions 
>>> are lost, the sender might not learn that some of the retransmissions 
>>> were successful if the 2nd and 3rd SACK block do not reflect the 
>>> history of the first block.
>>>
>>> Although in most cases you want to know about blocks closest to 
>>> rcv_nxt, that is not always true.  In particular if the two oldest 
>>> retransmissions are lost and then some ACKs are also lost, the sender 
>>> will not know that some of the later holes have been successfully 
>>> filled.  If you are going to take a timeout anyhow, this isn't so 
>>> important, but we tried to avoid timeouts at all costs.
>>
>> This pretty much where my question originates from.  The approach of
>> reporting the most recent block first and then closest and second
>> closest to rcv_nxt works just fine as long as the number of holes is
>> not large or excessive.  However even then I may argue that filling
>> the hole after rcv_nxt is the most important goal, above reporting
>> second and third recent blocks (if different).  Once the first hole
>> is filled the closest and second closest blocks to rcv_nxt automatically
>> shift to the right by one and get reported immediately.  When a large
>> number of holes appear the sender is very likely to face a collapsed
>> congestion window and will have to throttle down its rate of sending.
>> This makes it probable that guaranteed knowledge further out is not
>> relevant at this point in time.  By the time this becomes relevant it
>> get reported because it gets closer to rcv_nxt.
>>
>> I'm not forcefully arguing this view but want to bring it up for further
>> elaboration and comparison with the other recommended approaches.
>>
>>> If the 2nd and 3rd block reflect most recent history of the 1st 
>>> block, the sender will have perfect knowledge of the reassembly queue 
>>> as long as the network never looses 3 consecutive ACKs (and usually 
>>> tolerates many more losses).
>>>
>>> The "may use other algorithms" is there because we suspect that you 
>>> can do even better, for example by making the 2nd and 3rd blocks 
>>> periodically or randomly re-report all old SACK blocks.  As far a I 
>>> know this has never been pursued, and we certainly did not want to 
>>> specify or even suggest it in 2018.
>>>
>>> Everyone else implements it as written....
>>
>> FreeBSD currently does so as well.  It's in my development branch I'm
>> working and experimenting with alternative approaches.  Using the block
>> building logic of the reassembly queue itself for SACK reporting is
>> quite tempting.  ;-)
>>
>>> Oh, except nix "timeouts MUST clear the scoreboard"... If you 
>>> preserve the scoreboard on a timeout, you might be able to do 
>>> goodput=throughput (zero duplicate data at the receiver) under very 
>>> high loss rates, which is a good thing.
>>
>> Thanks for your extensive explanations.
>>
>> -- 
>> Andre
>>
> 
> 

_______________________________________________
tcpm mailing list
tcpm@ietf.org
http://www.ietf.org/mailman/listinfo/tcpm