RE: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity
"Mahdavi, Jamshid" <jamshid.mahdavi@bluecoat.com> Wed, 23 January 2008 09:02 UTC
Return-path: <tcpm-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1JHbVT-0002Cf-Pl; Wed, 23 Jan 2008 04:02:51 -0500
Received: from tcpm by megatron.ietf.org with local (Exim 4.43) id 1JHbVS-0002CS-Kh for tcpm-confirm+ok@megatron.ietf.org; Wed, 23 Jan 2008 04:02:50 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1JHbVL-0002Bg-7D for tcpm@ietf.org; Wed, 23 Jan 2008 04:02:43 -0500
Received: from whisker.bluecoat.com ([216.52.23.28]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1JHbVH-0000JP-1Y for tcpm@ietf.org; Wed, 23 Jan 2008 04:02:43 -0500
Received: from bcs-mail6.internal.cacheflow.com ([10.2.2.69]) by whisker.bluecoat.com (8.13.8/8.13.8) with ESMTP id m0N92cAN014594 for <tcpm@ietf.org>; Wed, 23 Jan 2008 01:02:38 -0800 (PST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01C85D9E.B03F8241"
Subject: RE: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity
Date: Wed, 23 Jan 2008 01:02:32 -0800
Message-ID: <FA453CB3F5AE6F4996063F1ED83315BD3D1BB2@bcs-mail6.internal.cacheflow.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator: <FA453CB3F5AE6F4996063F1ED83315BD3D1BB2@bcs-mail6.internal.cacheflow.com>
Thread-Topic: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity
Thread-Index: AchdGE0pX7tMAUkrQV+RarHCOIjkGgAg96xo
References: <E1JHMU7-0007X8-7V@megatron.ietf.org>
From: "Mahdavi, Jamshid" <jamshid.mahdavi@bluecoat.com>
To: tcpm@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: d9238570526f12788af3d33c67f37625
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Errors-To: tcpm-bounces@ietf.org
Matt described the thinking behind the text in 2018 perfectly. In my mind, that SHOULD is really almost a MUST because the consequence of messing up is that the sender can retransmit data which has already correctly arrived at the sender (or it would have to be so conservative that you probably end up doing something more equivalent to new-reno in cases where you get all 3 SACK blocks filled in). I can't tell for sure from your question, but it sounds like you might not be trying to coalesce adjacent SACKs together into a single SACK option when the data being SACKed is contiguous. In particular, in the case of a single dropped packet, no ACK ever needs to have more than one SACK block in it; that SACK block just keeps getting bigger and bigger. In simulations, it is really easy to get "every other packet dropped" as a result of slow-start with no delayed-ack. This is one of the pathological cases in terms of needing the robustness, because every ACK introduces a new and distinct SACK block. I believe if you tried out some simple tests with FreeBSD and dummynet, you could get something pretty close to this behavior with a regular network and stack. A long time ago (>10 years?), one stack author described a clever solution to the problem you are asking about. I don't want to repeat it here, because I can't remember if they were an open source developer or someone working on a proprietary stack. Hopefully they are reading this and wouldn't mind repeating it though. (I think it was probably a NetBSD or FreeBSD developer, but I just can't remember...) --J -----Original Message----- Date: Mon, 21 Jan 2008 23:44:02 +0100 From: Andre Oppermann <andre@freebsd.org> Subject: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity To: tcpm@ietf.org Message-ID: <47952032.3030809@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed I'm currently rewriting the tcp reassembly queue of FreeBSD (which is still almost 4.4BSD verbatim) to deal with large socket buffers and to be more economic with kernel memory in the form of mbufs. [1] The new code is based on blocks of contiguous segment data instead of a list of received out-of-order segment that has to be traversed to find the right place for the new segment. It also does all the obvious optimizations based on the chance of things happening. Now there are two issues I'd like to solicit the input from this wg. The first is the ordering of SACK blocks. The block based reassembly queue lends itself to generate the SACK option from. The blocks are ordered ascending from the first segment closest to rcv_nxt. Per RFC 2018 Section 4 the first SACK block MUST be the one with the most recent addition to it. This is easily done by adding a pointer to the block last modified. However a bit later it says the following SACK blocks SHOULD be also in order of their 'recentness' while also specifying they may be listed in arbitrary order. Now tracking this information in the block based and sequence number ordered reassembly queue it a bit more involved and I wonder if it really necessary and useful. In general the SACK blocks tend to be ordered sequentially after the first SACK block in single random loss cases. Only in packet reordering cases, possibly mixed with loss, this may not be the case. On the other hand the most valuable information for the sender are the blocks closest to rcv_nxt so it can fill the hole and complete the sequence space. While the RFC leaves me free to do as I please I wonder if there are observations or rationales that clearly show the the ordering by recentness to be better than sequential ordering after the first SACK block? The second is how long to hold onto data in the reassembly queue. The general theme here is resource exhaustion be it through malicious activity or just end points that drop off the net. I think we can all agree that holding onto reassembly queue data until the session times out (if ever) is not really useful considering the overall resource constrains. The question now is after what time to flush the reassembly queue (and to send an appropriate ACK)? A range of options are available. On the wide side we have a flush timeout of something like 2 times MSL. On the small side we can go down to the current calculated retransmit timeout value as seen from our side. Also of importance is from where the timeout is calculated. From the time the first segment arrived in the reassembly queue (resetting when rcv_nxt is advanced), or from the arrival time of the most recent segment. For the moment and testing I've chosen the former at four times retransmit timeout as something that probably marks the boundary between spurious network losses or partitioning and longer-term disconnect or malicious activity. Is any empirical data available on abandoned sessions with data in the reassembly queue? What is your opinion and rationale on this? [1] http://perforce.freebsd.org/fileLogView.cgi?FSPC=//depot/projects/tcp_reass/netinet/tcp_reass.c -- Andre andre@freebsd.org
_______________________________________________ tcpm mailing list tcpm@ietf.org https://www1.ietf.org/mailman/listinfo/tcpm
- [tcpm] Ordering of SACK blocks, flushing of reass… Andre Oppermann
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Joshua Blanton
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Matt Mathis
- RE: [tcpm] Ordering of SACK blocks, flushing of r… Mahdavi, Jamshid
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Andre Oppermann
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Andre Oppermann
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Matt Mathis
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Lars Eggert
- Re: [tcpm] Ordering of SACK blocks, flushing of r… David Malone
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Matt Mathis
- Re: [tcpm] Ordering of SACK blocks, flushing of r… Andre Oppermann