[tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity

Andre Oppermann <andre@freebsd.org> Mon, 21 January 2008 22:44 UTC

Return-path: <tcpm-bounces@ietf.org>
Received: from [] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1JH5N4-0002yF-AO; Mon, 21 Jan 2008 17:44:02 -0500
Received: from tcpm by megatron.ietf.org with local (Exim 4.43) id 1JH5N3-0002y9-HK for tcpm-confirm+ok@megatron.ietf.org; Mon, 21 Jan 2008 17:44:01 -0500
Received: from [] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1JH5N3-0002y1-7O for tcpm@ietf.org; Mon, 21 Jan 2008 17:44:01 -0500
Received: from c00l3r.networx.ch ([]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1JH5N1-00009E-HJ for tcpm@ietf.org; Mon, 21 Jan 2008 17:44:01 -0500
Received: (qmail 59138 invoked from network); 21 Jan 2008 22:05:42 -0000
Received: from localhost (HELO []) ([]) (envelope-sender <andre@freebsd.org>) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for <tcpm@ietf.org>; 21 Jan 2008 22:05:42 -0000
Message-ID: <47952032.3030809@freebsd.org>
Date: Mon, 21 Jan 2008 23:44:02 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Thunderbird (Windows/20071210)
MIME-Version: 1.0
To: tcpm@ietf.org
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: f607d15ccc2bc4eaf3ade8ffa8af02a0
Subject: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
Errors-To: tcpm-bounces@ietf.org

I'm currently rewriting the tcp reassembly queue of FreeBSD (which
is still almost 4.4BSD verbatim) to deal with large socket buffers
and to be more economic with kernel memory in the form of mbufs. [1]
The new code is based on blocks of contiguous segment data instead
of a list of received out-of-order segment that has to be traversed
to find the right place for the new segment.  It also does all the
obvious optimizations based on the chance of things happening.

Now there are two issues I'd like to solicit the input from this wg.

The first is the ordering of SACK blocks.  The block based reassembly
queue lends itself to generate the SACK option from.  The blocks are
ordered ascending from the first segment closest to rcv_nxt.  Per RFC
2018 Section 4 the first SACK block MUST be the one with the most
recent addition to it.  This is easily done by adding a pointer to
the block last modified.  However a bit later it says the following
SACK blocks SHOULD be also in order of their 'recentness' while also
specifying they may be listed in arbitrary order.  Now tracking this
information in the block based and sequence number ordered reassembly
queue it a bit more involved and I wonder if it really necessary and
useful.  In general the SACK blocks tend to be ordered sequentially
after the first SACK block in single random loss cases.  Only in
packet reordering cases, possibly mixed with loss, this may not be
the case.  On the other hand the most valuable information for the
sender are the blocks closest to rcv_nxt so it can fill the hole and
complete the sequence space.  While the RFC leaves me free to do as
I please I wonder if there are observations or rationales that clearly
show the the ordering by recentness to be better than sequential
ordering after the first SACK block?

The second is how long to hold onto data in the reassembly queue.
The general theme here is resource exhaustion be it through malicious
activity or just end points that drop off the net.  I think we can
all agree that holding onto reassembly queue data until the session
times out (if ever) is not really useful considering the overall
resource constrains.  The question now is after what time to flush
the reassembly queue (and to send an appropriate ACK)?  A range of
options are available.  On the wide side we have a flush timeout
of something like 2 times MSL.  On the small side we can go down to
the current calculated retransmit timeout value as seen from our side.
Also of importance is from where the timeout is calculated.  From the
time the first segment arrived in the reassembly queue (resetting when
rcv_nxt is advanced), or from the arrival time of the most recent
segment.  For the moment and testing I've chosen the former at four
times retransmit timeout as something that probably marks the boundary
between spurious network losses or partitioning and longer-term
disconnect or malicious activity.  Is any empirical data available on
abandoned sessions with data in the reassembly queue?  What is your
opinion and rationale on this?




tcpm mailing list