Re: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity
Andre Oppermann <andre@freebsd.org> Thu, 21 February 2008 22:52 UTC
Return-Path: <tcpm-bounces@ietf.org>
X-Original-To: ietfarch-tcpm-archive@core3.amsl.com
Delivered-To: ietfarch-tcpm-archive@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id E88AA3A6CF5; Thu, 21 Feb 2008 14:52:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.617
X-Spam-Level:
X-Spam-Status: No, score=-0.617 tagged_above=-999 required=5 tests=[AWL=-0.180, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, HELO_MISMATCH_ORG=0.611, RDNS_NONE=0.1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NWS+sELUbWnF; Thu, 21 Feb 2008 14:52:18 -0800 (PST)
Received: from core3.amsl.com (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6DFA53A6B51; Thu, 21 Feb 2008 14:52:18 -0800 (PST)
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5BC823A6882 for <tcpm@core3.amsl.com>; Thu, 21 Feb 2008 14:52:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ojqge3aTckPL for <tcpm@core3.amsl.com>; Thu, 21 Feb 2008 14:52:15 -0800 (PST)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by core3.amsl.com (Postfix) with ESMTP id 56AFC3A683C for <tcpm@ietf.org>; Thu, 21 Feb 2008 14:51:49 -0800 (PST)
Received: (qmail 35990 invoked from network); 21 Feb 2008 22:07:41 -0000
Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender <andre@freebsd.org>) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for <mathis@psc.edu>; 21 Feb 2008 22:07:41 -0000
Message-ID: <47BE0080.1090009@freebsd.org>
Date: Thu, 21 Feb 2008 23:51:44 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Thunderbird 1.5.0.14 (Windows/20071210)
MIME-Version: 1.0
To: Matt Mathis <mathis@psc.edu>
References: <47952032.3030809@freebsd.org> <Pine.LNX.4.64.0801221619190.27196@tesla.psc.edu> <4797D53C.10908@freebsd.org> <Pine.LNX.4.64.0801240911250.27196@tesla.psc.edu>
In-Reply-To: <Pine.LNX.4.64.0801240911250.27196@tesla.psc.edu>
Cc: TCP Maintenance and Minor Extensions WG <tcpm@ietf.org>
Subject: Re: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <http://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <http://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: tcpm-bounces@ietf.org
Errors-To: tcpm-bounces@ietf.org
Matt Mathis wrote: > If you are thinking at the scale of a dozen packet windows with just a > couple of losses, I don't disagree with you. However, with 1000 packet > windows, you almost never see "just a couple of losses", and your > proposed simplification will be very exposed to lost ACKs. Given that > the ACK rate doubles, your chances are not good. Based on the excellent feedback and rationale I've implemented the RFC2018 suggested ordering of the SACK blocks. [The whole thing is still in a separate development tree] -- Andre > BTW 1000 packet windows are not all that big: 1500 Byte MTU, 70 ms RTT > and only 90 Mb/s - well within reach for any university user in the US > or Europe > and FIOS users in Korea or Japan. The real challenge is getting the > scoreboard to work well with 10k or 100k packet windows.... > > Thanks, > --MM-- > ------------------------------------------- > Matt Mathis http://www.psc.edu/~mathis > Work:412.268.3319 Home/Cell:412.654.7529 > ------------------------------------------- > Evil is defined by mortals who think they know > "The Truth" and use force to apply it to others. > > On Thu, 24 Jan 2008, Andre Oppermann wrote: > >> Matt Mathis wrote: >>>> The first is the ordering of SACK blocks. The block based reassembly >>>> queue lends itself to generate the SACK option from. The blocks are >>>> ordered ascending from the first segment closest to rcv_nxt. Per RFC >>>> 2018 Section 4 the first SACK block MUST be the one with the most >>>> recent addition to it. This is easily done by adding a pointer to >>>> the block last modified. However a bit later it says the following >>>> SACK blocks SHOULD be also in order of their 'recentness' while also >>>> specifying they may be listed in arbitrary order. Now tracking this >>>> information in the block based and sequence number ordered reassembly >>>> queue it a bit more involved and I wonder if it really necessary and >>>> useful. In general the SACK blocks tend to be ordered sequentially >>>> after the first SACK block in single random loss cases. Only in >>>> packet reordering cases, possibly mixed with loss, this may not be >>>> the case. On the other hand the most valuable information for the >>>> sender are the blocks closest to rcv_nxt so it can fill the hole and >>>> complete the sequence space. While the RFC leaves me free to do as >>>> I please I wonder if there are observations or rationales that clearly >>>> show the the ordering by recentness to be better than sequential >>>> ordering after the first SACK block? >>> >>> RFC 2018 is written the way it is written to maximize the robustness >>> in the presence of lost ACKs. If the network never looses ACKs, then >>> just reporting the single SACK block with the newest data is >>> guaranteed to be sufficient for the sender to have perfect knowledge >>> about the state of the reassembly queue at the receiver. >>> >>> If occasional ACKs and one of several non-contiguous retransmissions >>> are lost, the sender might not learn that some of the retransmissions >>> were successful if the 2nd and 3rd SACK block do not reflect the >>> history of the first block. >>> >>> Although in most cases you want to know about blocks closest to >>> rcv_nxt, that is not always true. In particular if the two oldest >>> retransmissions are lost and then some ACKs are also lost, the sender >>> will not know that some of the later holes have been successfully >>> filled. If you are going to take a timeout anyhow, this isn't so >>> important, but we tried to avoid timeouts at all costs. >> >> This pretty much where my question originates from. The approach of >> reporting the most recent block first and then closest and second >> closest to rcv_nxt works just fine as long as the number of holes is >> not large or excessive. However even then I may argue that filling >> the hole after rcv_nxt is the most important goal, above reporting >> second and third recent blocks (if different). Once the first hole >> is filled the closest and second closest blocks to rcv_nxt automatically >> shift to the right by one and get reported immediately. When a large >> number of holes appear the sender is very likely to face a collapsed >> congestion window and will have to throttle down its rate of sending. >> This makes it probable that guaranteed knowledge further out is not >> relevant at this point in time. By the time this becomes relevant it >> get reported because it gets closer to rcv_nxt. >> >> I'm not forcefully arguing this view but want to bring it up for further >> elaboration and comparison with the other recommended approaches. >> >>> If the 2nd and 3rd block reflect most recent history of the 1st >>> block, the sender will have perfect knowledge of the reassembly queue >>> as long as the network never looses 3 consecutive ACKs (and usually >>> tolerates many more losses). >>> >>> The "may use other algorithms" is there because we suspect that you >>> can do even better, for example by making the 2nd and 3rd blocks >>> periodically or randomly re-report all old SACK blocks. As far a I >>> know this has never been pursued, and we certainly did not want to >>> specify or even suggest it in 2018. >>> >>> Everyone else implements it as written.... >> >> FreeBSD currently does so as well. It's in my development branch I'm >> working and experimenting with alternative approaches. Using the block >> building logic of the reassembly queue itself for SACK reporting is >> quite tempting. ;-) >> >>> Oh, except nix "timeouts MUST clear the scoreboard"... If you >>> preserve the scoreboard on a timeout, you might be able to do >>> goodput=throughput (zero duplicate data at the receiver) under very >>> high loss rates, which is a good thing. >> >> Thanks for your extensive explanations. >> >> -- >> Andre >> > > _______________________________________________ tcpm mailing list tcpm@ietf.org http://www.ietf.org/mailman/listinfo/tcpm
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Joshua Blanton
- [tcpm] Ordering of SACK blocks, flushing of reass… Andre Oppermann
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Matt Mathis
- RE: [tcpm] Ordering of SACK blocks, flushing of r… Mahdavi, Jamshid
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Andre Oppermann
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Andre Oppermann
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Matt Mathis
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Lars Eggert
- Re: [tcpm] Ordering of SACK blocks, flushing of r… David Malone
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Matt Mathis
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Andre Oppermann