Re: [tcpm] Ordering of SACK blocks, flushing of reassembly queue after inactivity

Matt Mathis wrote:
> If you are thinking at the scale of a dozen packet windows with just a 
> couple of losses, I don't disagree with you.  However, with 1000 packet 
> windows, you almost never see "just a couple of losses", and your 
> proposed simplification will be very exposed to lost ACKs.  Given that 
> the ACK rate doubles, your chances are not good.

Based on the excellent feedback and rationale I've implemented the
RFC2018 suggested ordering of the SACK blocks.  [The whole thing is
still in a separate development tree]

-- 
Andre

> BTW 1000 packet windows are not all that big: 1500 Byte MTU, 70 ms RTT 
> and only 90 Mb/s - well within reach for any university user in the US 
> or Europe
> and FIOS users in Korea or Japan.   The real challenge is getting the 
> scoreboard to work well with 10k or 100k packet windows....
> 
> Thanks,
> --MM--
> -------------------------------------------
> Matt Mathis      http://www.psc.edu/~mathis
> Work:412.268.3319    Home/Cell:412.654.7529
> -------------------------------------------
> Evil is defined by mortals who think they know
> "The Truth" and use force to apply it to others.
> 
> On Thu, 24 Jan 2008, Andre Oppermann wrote:
> 
>> Matt Mathis wrote:
>>>> The first is the ordering of SACK blocks.  The block based reassembly
>>>> queue lends itself to generate the SACK option from.  The blocks are
>>>> ordered ascending from the first segment closest to rcv_nxt.  Per RFC
>>>> 2018 Section 4 the first SACK block MUST be the one with the most
>>>> recent addition to it.  This is easily done by adding a pointer to
>>>> the block last modified.  However a bit later it says the following
>>>> SACK blocks SHOULD be also in order of their 'recentness' while also
>>>> specifying they may be listed in arbitrary order.  Now tracking this
>>>> information in the block based and sequence number ordered reassembly
>>>> queue it a bit more involved and I wonder if it really necessary and
>>>> useful.  In general the SACK blocks tend to be ordered sequentially
>>>> after the first SACK block in single random loss cases.  Only in
>>>> packet reordering cases, possibly mixed with loss, this may not be
>>>> the case.  On the other hand the most valuable information for the
>>>> sender are the blocks closest to rcv_nxt so it can fill the hole and
>>>> complete the sequence space.  While the RFC leaves me free to do as
>>>> I please I wonder if there are observations or rationales that clearly
>>>> show the the ordering by recentness to be better than sequential
>>>> ordering after the first SACK block?
>>>
>>> RFC 2018 is written the way it is written to maximize the robustness 
>>> in the presence of lost ACKs.  If the network never looses ACKs, then 
>>> just reporting the single SACK block with the newest data is 
>>> guaranteed to be sufficient for the sender to have perfect knowledge 
>>> about the state of the reassembly queue at the receiver.
>>>
>>> If occasional ACKs and one of several non-contiguous retransmissions 
>>> are lost, the sender might not learn that some of the retransmissions 
>>> were successful if the 2nd and 3rd SACK block do not reflect the 
>>> history of the first block.
>>>
>>> Although in most cases you want to know about blocks closest to 
>>> rcv_nxt, that is not always true.  In particular if the two oldest 
>>> retransmissions are lost and then some ACKs are also lost, the sender 
>>> will not know that some of the later holes have been successfully 
>>> filled.  If you are going to take a timeout anyhow, this isn't so 
>>> important, but we tried to avoid timeouts at all costs.
>>
>> This pretty much where my question originates from.  The approach of
>> reporting the most recent block first and then closest and second
>> closest to rcv_nxt works just fine as long as the number of holes is
>> not large or excessive.  However even then I may argue that filling
>> the hole after rcv_nxt is the most important goal, above reporting
>> second and third recent blocks (if different).  Once the first hole
>> is filled the closest and second closest blocks to rcv_nxt automatically
>> shift to the right by one and get reported immediately.  When a large
>> number of holes appear the sender is very likely to face a collapsed
>> congestion window and will have to throttle down its rate of sending.
>> This makes it probable that guaranteed knowledge further out is not
>> relevant at this point in time.  By the time this becomes relevant it
>> get reported because it gets closer to rcv_nxt.
>>
>> I'm not forcefully arguing this view but want to bring it up for further
>> elaboration and comparison with the other recommended approaches.
>>
>>> If the 2nd and 3rd block reflect most recent history of the 1st 
>>> block, the sender will have perfect knowledge of the reassembly queue 
>>> as long as the network never looses 3 consecutive ACKs (and usually 
>>> tolerates many more losses).
>>>
>>> The "may use other algorithms" is there because we suspect that you 
>>> can do even better, for example by making the 2nd and 3rd blocks 
>>> periodically or randomly re-report all old SACK blocks.  As far a I 
>>> know this has never been pursued, and we certainly did not want to 
>>> specify or even suggest it in 2018.
>>>
>>> Everyone else implements it as written....
>>
>> FreeBSD currently does so as well.  It's in my development branch I'm
>> working and experimenting with alternative approaches.  Using the block
>> building logic of the reassembly queue itself for SACK reporting is
>> quite tempting.  ;-)
>>
>>> Oh, except nix "timeouts MUST clear the scoreboard"... If you 
>>> preserve the scoreboard on a timeout, you might be able to do 
>>> goodput=throughput (zero duplicate data at the receiver) under very 
>>> high loss rates, which is a good thing.
>>
>> Thanks for your extensive explanations.
>>
>> -- 
>> Andre
>>
> 
> 

_______________________________________________
tcpm mailing list
tcpm@ietf.org
http://www.ietf.org/mailman/listinfo/tcpm