Re: [iccrg] Review of draft-ietf-iccrg-tcpval-01/ Issues on traffic traces

Greetings Mirja,

Thanks for the feedback.

First of all, sorry that I dropped the ball 5 years ago, and a big
"thank you" to the Davids for trying to clean this up.  Sally and I
should both probably be dropped from the author list.

As you note several times, the goal of this work is to push out a
"best practice" RFC.  You can think of it as a stop-gap until the next
"best practice in the same area" is put out.  As such, I think that
most of the issues with "it would be nicer if" can be put off until
the next document, once we get over the issues of "it can't be done
outside ns".

As you say, the goal was definitely *not* to produce tests tied to
ns2.  The reason that  tmix  was chosen was precisely that it had
Linux and FreeBSD implementations as well.  I'm sorry to hear that
there are problems getting tmix working well.  I'm Cc'ing Michele
Wiegle and Jay Aikat, who used to maintain tmix, to see if they can
provide an update on any of this.

I'd be happy to revisit the choice of traffic generator, and/or the
choice of traces, but I think it is important that
(a) the traffic be standardized across tests
(b) the traffic contain a "realistic" mix of flow sizes.
In the initial discussions, I was arguing mainly for the importance of
being repeatable / standardized / informative, and Sally Floyd was
arguing for the importance of being realistic.  I would personally
have no objection to having fewer flows that don't leave slow start,
but Sally has a lot of experience and so we shouldn't ignore her
advice lightly.  I'd also be very happy to have traffic flows that are
"simpler" (especially more uniform across long time scales).  I was
originally arguing for random traces, with a specified random number
generator and seed, so that everyone uses the same traffic.  (With
heavy-tailed traffic, random fluctuations can make a big difference.
That makes the use of identical traffic important for repeatability.)

You correctly say that the traces are outdated.  However, they have
some characteristics that I expect are still present in "real" traffic
that would be missing from most synthetic traffic:
- A wide distribution of flow sizes, including many small flows
- Non-stationarity (this isn't just due to how the trace was collected)
- Changes in the ratio of forward to backward traffic over time.
As such, the issue is not whether we should use these vs more "up to
date" traces, but whether we should use traffic that reflects the
internet or synthetic/sanitized traffic that gives us a better
understanding of how TCP behaves in unrealistic conditions.  As a
theorist, I am entirely happy with the latter, provided the IETF
community is.

Having said that, if it means repeating all the work that the Davids
did, then it isn't clear that it is worthwhile. Personally, I'd be
happy with either Poisson flow arrivals (or exponential inter-flow
times -- see my response to your comment 2.1).

You are right that the IETF's concern at the time these tests were
written was "is it safe to deploy", but that wasn't the motivation for
this test suite.  My motivation was to avoid the arguments of: "your
algorithm works badly on my testbed";  "Oh yeah, well yours works
badly on mine".  Sally's was to be able to tell people "don't wast our
time if you're just doing tests on a single unidirectional on a single
bottleneck", before going on to check corner cases, which will be
different for different proposals.

In response to your specific comments:
2.1:  the difference between specifying interarrival times and
inter-flow times is big, regardless of whether the timescale is such
that the flow re-enters slow start.  As you point out, in the first
case, it is possible for the link to be "overloaded" in the sense that
the number of flows builds up indefinitely, whereas in the second, the
link is always "stable" in a queueing-theoretic sense.  Specifying
inter-flow times is not much harder to implement, and captures an
important qualitative features of user behaviour: less traffic arrives
if the network is congested.  Since there are arguments both ways, I'd
suggest that we leave the current version.

3.1: I entirely agree with the sentiment that we shouldn't consider
overload in the sense of "A>C".  Again, it was Sally pushing for that,
and the biggest delay in getting this document out was my attempt to
prove that we can get high loss rates (as sometimes observed in the
internet) without resorting to A>C.  In trying that, I was defeated by
the non-stationarity of the traces, which is why the Davids eventually
chose to go ahead with Sally's method.

Overall, what I think would be an ideal outcome is:
- to ensure  tmix  works reliably on platforms other than ns
- to have guidelines on what tweaks are needed to allow Linux to have
sufficiently many simultaneous flows
- to maintain a completely repeatable traffic source
- to get this out as "version 1" of a test suite, and move on to a better one.

Cheers,
Lachlan

On 13 December 2014 at 03:45, Mirja Kühlewind
<mirja.kuehlewind@tik.ee.ethz.ch> wrote:
> Hi,
>
> I reviewed draft-ietf-iccrg-tcpval-01. I understand that the goal of this
> document is to finish work that was started more than 5 years ago (and
> actually should have been finished at that time as well). In general I think
> it would be very useful to have a document that describes initial test
> cases, however I'm not sure how useful this document is to provide "quick
> and easy" initial results (for comparison).
>
> Some background information: I recently performed a larger evaluation study
> to evaluate my own congestion control proposal for my thesis. I know the
> scenarios described are only meant to provide some initial evaluation and
> are surely not sufficient to provide an exhausting evaluation as needed for
> a thesis, but I actually fail/give up to use any of the described scenarios
> at all.
>
> I have to say that I'm not using ns-2 for simulation but an own simulation
> library that is able to fully include VMs into the simulation without any
> influence of host system (therefore provide reproducible results). However,
> not matter why I'm using a different tool, I believe the idea of the draft
> is to described scenarios for exactly this use case. If the draft only ends
> up documenting what's implemented in ns-2, this should not be an RFC but
> should be part of the ns-2 documentation instead. Therefore my first review
> comment is that I would rather move all ns-2 (and Tmix) specific parts in
> the annex (or even try to remove it at all).
>
> However, the reason why I ended up not using these scenarios are:
> 1) It was extremely hard (not quick and easy) to get any simulation running
> with the given traffic traces (and more than a little load) for more than a
> few seconds mainly due to implementation limitation such as the max. number
> of active flows. For some reason the real kernel in the VM did not release
> the socket right after the end of the transmission, so the number of active
> flow the kernel was counting was higher than the number of active flows in
> the simulation. There are several solutions to this problem, e.g. digging
> into the kernel code or just use multiple kernels instead. However, I gave
> up as it was simply not quick and easy and because of reason 2)
> 2) The proposed scenario based on the given traffic traces did not allow me
> any useful comparison (at least for the limited cases that I tried).
> Basically what I'm saying is that there is no significant different for any
> TCP extension in test because I was not changing Slow Start while most of
> the short flows in the traffic traces would never leave Slow Start. Further
> with these rather complicated traffic pattern, it is basically impossible to
> say if the different results you see actually depend on the used TCP
> extension or if these are just e.g. minor timing differences that might have
> a bigger effect.
>
> Again I know the goal is to finish this old work, but my main concern
> regarding the usefulness of this document is due to the traffic generation.
> And believe this is also the point where they got stocked then, because
> that's not easy.
>
> Unfortunately I have to say that everything you added (with respect to
> draft-irtf-tmrg-tests-02) does make the situation better and mostly reads
> like 'try-and-error' while being very specific to this one (out-dated) data
> set. I'll give more detailed comments further below. But I think all this
> text should at least be moved to the appendix.
>
> Than, as just indicated, the data set is from 2006. If the main reason to
> use these traces it to achieve more realistic traffic pattern, the goal is
> failed as traffic pattern have strongly changed since then (e.g. a huge
> proportion of the traffic is now video).
>
> Further, I don't even agree to the point that "traffic must be reasonably
> realistic". If the goal is "to compare and contrast proposal against
> standard TCP", it is more useful to use very simple traffic models such that
> effects that occurred can actually be related to certain behaviors of an
> algorithm. If the goals to check if the extension is safe for
> experimentation in the wider Internet (which seems to be a not spelled-out
> goal), than it would be most important to investigate corner/extreme cases
> which do not occur very often in the Internet but are so different that
> things could break.
>
> Of course, when you design a new extension, you also have to show that it
> works well in a "reasonably realistic" environment but that can not be the
> goal for this document. Further the best way to do that is to run it on the
> Internet. In this case you can't analyze any specific effects as you don't
> know the network condition, but you can basically say: it works or it
> doesn't work...
>
> Here are some more detailed comments on the traffic generation part; I will
> provide further comment on the rest on the document as soon as we have
> resolved the issues raised above...
>
> - 1. paragraph of section 2.1:
> This section argues that start time and flow size (distributions) are not
> enough, instead traffic should be model by start time, request and response
> size, and think time (distributions). First of all the paragraph is slightly
> hard to read and thus it would be great if it could be spelled out more
> clearly what the used model is (at the begining and then give the
> reasoning).
> However, I disagree with this model. I don't think that it is needed to
> model a dependency between a request and a response for TCP evaluations. Of
> course this might be more realistic but as long as we don't model
> application behavior or even user interactions that doesn't give us much and
> makes thing more complicated. Instead we should investigate scenarios where
> the load level on the forward and backward channel can be set independent of
> each other.
> Further modeling think times in contrast to inter-arrival times (IATs) only
> makes a difference for TCP evaluations where the 'waiting' is very small and
> therefore TCP will not fall back to Slow Start in the first case. What I do
> to cover both cases is, that I always just model flow sizes and IATs but
> have an option to say if the data generated by one generator should be send
> on the same TCP connection or if a new connection should be open for each
> data flow. This can lead to the case when the IAT is small and the available
> capacity is small as well, that all flows will be send back to back
> appearing as one long flow. However, i think that's in responsibility of the
> researcher doing the experiment to verify if this case has happened.
>
> - section 2.2.
> As I said I don't think there should be Tmix specific text in here.
>
> - section 2.3.2
> Non-stationarity is a big problem which makes all experimentation with this
> data complicated (and not quick and easy). It is not clear at all how much
> your applied hacks actually changes the traffic characteristics of the
> trace. Therefore it also not clear how realistic the traffic still remains
> at the end
> (Therefore I would rather just like to see the traffic characteristic, such
> as flow size and IAT distribution, of this traced written down as an input
> for an artificial traffic generator; I don't this is less realistic.)
>
> But I also believe some of the problems with non-stationarity are specific
> to this trace. The trace seems to take only new started flows into account
> for the measurement which does not reflect the actually traffic load of the
> measured system at the beginning of the trace. This might be different with
> a different trace that has been measured differently. And I only can say
> this again, because I think that is really important, the trace is out-dated
> and we should not completely rely on this one trace in this document.
>
> - section 2.2.3.1
> The number of 500e6 just seems random and is probably very specific to this
> one trace.
>
> The forward reference to section 4.4.2. is not understandable.
>
> - section 2.4.
> This again sound like guess work; I would rather like to define values and
> actually apply them to an artifical traffic model.
>
> - section 3
> should really go in the appendix
>
> - section 3.1.
> says "it is important to test congestion control in overload". That is not
> wrong but, in fact, if there is sufficient data to send TCP will always try
> to fill the link; I would not call this overload. If there is not enough
> data to send congestion control does not even become active (except Slow
> Start). Therefore if you re-design the congestion control behavior and would
> like to evaluate this against 'standard TCP', the only cases that are
> interesting is when TCP is able to fill the link. However, even if the total
> load is e.g. only 85%, there will be phases during the simulation run where
> the link is full. Therefore there should never be a case where A>C (or
> offered load > 100%). This only leads to non-steady behavior, as you say
> correctly. In a real system this will lead to congestion collapse where even
> TCP cannot help anymore. But in a real system there usually is a user behind
> a computer who will just give up.
>
> - section 3.3
> Without having tried to apply this, this part doesn't seem to be super
> useful. Again values seems to be arbitrary and at end, if I get you right,
> you save less than one RTT of simulation time. That wouldn't help any
> problems for my. And again, as I said above, this is a problem of this trace
> and could have been avoided when the trace would have been differently
> collected.
>
> - I'll comment on the rest of the document later.
>
> To conclude, I would like to propose to remove the traffic traces from the
> document (or move to the appendix but I believe this text should go into
> some ns-2 documentation instead). I know the intention is to finish the
> original work, but especially as the traces are out-dated for me the
> document was not useful as it is.  Maybe this should be discussed with the
> original authors.
>
> Instead of the traffic traces I would like to use simple scenarios with a
> certain (small) number of greedy flows and/or short flow cross traffic (with
> a certain IAT and flow size distribution). I can provide numbers of what
> I've used or there are also scenarios described in
> draft-sarker-rmcat-eval-test-00 (which often in addition uses video traffic
> which makes things even more complicated than we would need it here for this
> initial evaluation). Maybe if possible it would anyway be useful to try and
> align the structure and/or terminology of these two draft.
>
> Sorry for the long email. I hope that is still helpful.
>
> Mirja
>
>
> _______________________________________________
> iccrg mailing list
> iccrg@irtf.org
> https://www.irtf.org/mailman/listinfo/iccrg

-- 
What some people mistake for the high cost of living is really the
cost of high living.