[Tmrg] tmix-linux: burst of FIN packets causes packet loss at end of experiment

lachlan.andrew at gmail.com (Lachlan Andrew) Wed, 15 October 2008 01:54 UTC

From: "lachlan.andrew at gmail.com"
Date: Wed, 15 Oct 2008 12:54:09 +1100
Subject: [Tmrg] tmix-linux: burst of FIN packets causes packet loss at end of experiment
In-Reply-To: <48F5291C.8020507@swin.edu.au>
References: <48F5291C.8020507@swin.edu.au>
Message-ID: <aa7d2c6d0810141854v50c9d379wcbb54435e71e890f@mail.gmail.com>

2008/10/15 Ritesh Kumar <ritesh at cs.unc.edu>:
>
> On Tue, Oct 14, 2008 at 3:40 PM, Tom Quetchenbach
> <quetchen at caltech.edu> wrote:
>>
>> When a tmix experiment ends, the stop_generator() function closes
>> all of the currently-active connections. This produces a large
>> burst of FIN packets, which can result in a burst of packet losses.
>> (This is especially true if using routers that specify their buffer
>> sizes in packets.) In addition to being a somewhat unrealistic
>> situation, this results in inaccurate statistics of total packet
>> loss.
>
> Linux timers are not microsecond accurate. You might want to check
> the actual time a usleep(200) sleeps. It might not be a bad idea to
> sleep for larger periods of time after a given small number of
> connections terminate...

Good suggestion.  A simpler alternative may be just to usleep(1000).

> I would actually recommend to skip a portion of all your experimental
> data in the beginning and the end of the experiment. Most of our
> scripts have the
> following assumption: The experiment lasts 4800seconds. We skip the
> beginning and ending 1200seconds of experimental data. Hence it would
> be worthwhile to sample the interface for packet losses at 1200seconds and 3600seconds to get a reliable set of results. This
> also eliminates the possibility of the sleep between close() being
> not nearly enough to save packet losses in some oddly configured
> scenarios.

Unfortunately, so of our statistics come from SNMP counters on routers,
which are only updated every 5 seconds or so, and so we have to wait
about 10 seconds after all traffic has finished before we can get
reliable values.

There is a fundamental statistical need to ignore the first little part,
so that the number of connections can reach "steady state", but I don't
know any   fundamental   reason to ignore the last 1/3 of the
experiment, provided that traffic generator ends flows cleanly.  Since
our suite is very time-constrained, we want to cut out any unnecessary
waiting.

I'm Cc'ing this to TMRG in case someone on the list knows of a strong
reason to ignore the end of an experiment (or knows a way around the
SNMP problem).

Cheers,
Lachlan

-- 
Lachlan Andrew  Centre for Advanced Internet Architectures (CAIA)
Swinburne University of Technology, Melbourne, Australia
Ph +613 9214 4837   http://netlab.caltech.edu/lachlan