Re: [iccrg] Review of draft-ietf-iccrg-tcpval-01/ Issues on traffic traces

Hi Lachlan,

first of all happy new year! Further see below...

> Am 16.12.2014 um 23:22 schrieb Lachlan Andrew <lachlan.andrew@gmail.com>:
> 
> Greetings Mirja,
> 
> Thanks for the feedback.
> 
> First of all, sorry that I dropped the ball 5 years ago, and a big
> "thank you" to the Davids for trying to clean this up.  Sally and I
> should both probably be dropped from the author list.
> 
> As you note several times, the goal of this work is to push out a
> "best practice" RFC.  You can think of it as a stop-gap until the next
> "best practice in the same area" is put out.  As such, I think that
> most of the issues with "it would be nicer if" can be put off until
> the next document, once we get over the issues of "it can't be done
> outside ns“.

I understood this. My question is how much best practice is currently documented (if people are not using the described setup).

> 
> As you say, the goal was definitely *not* to produce tests tied to
> ns2.  The reason that  tmix  was chosen was precisely that it had
> Linux and FreeBSD implementations as well.  I'm sorry to hear that
> there are problems getting tmix working well.  I'm Cc'ing Michele
> Wiegle and Jay Aikat, who used to maintain tmix, to see if they can
> provide an update on any of this.

I’m not using ns2/3 nor tmix. We have an own simulation tool integrating VMs in the simulation network where we have a small user space helper program that runs in the VM, opens a connection and receives user data. In the simulation we read the vector traffic trace and push the (dummy) data to the user space helper program at the right point of time (similar as tmix would do). Also the problem is not tmix, I guess, is the characteristics of this one (out-dated) traffic trace.

> 
> I'd be happy to revisit the choice of traffic generator, and/or the
> choice of traces, but I think it is important that
> (a) the traffic be standardized across tests

Yes!

> (b) the traffic contain a "realistic" mix of flow sizes.

Yes, the question how realistic? I’m arguing that an out-dated traffic trace is not more realistic than artificial traffic that is generated based on distribution taken form traces.

> In the initial discussions, I was arguing mainly for the importance of
> being repeatable / standardized / informative, and Sally Floyd was
> arguing for the importance of being realistic.  I would personally
> have no objection to having fewer flows that don't leave slow start,
> but Sally has a lot of experience and so we shouldn't ignore her
> advice lightly.  I'd also be very happy to have traffic flows that are
> "simpler" (especially more uniform across long time scales).  I was
> originally arguing for random traces, with a specified random number
> generator and seed, so that everyone uses the same traffic.  (With
> heavy-tailed traffic, random fluctuations can make a big difference.
> That makes the use of identical traffic important for repeatability.)

I know where this is coming from and I definitely support Sally input based on her experience.
To produce meaningful evaluation results, it is necessary to not only evaluate basic scenarios but also realistic traffic scenarios with a wide range of traffic characteristics. I’m not sure though if this should be based on one traffic trace only or if it e.g. might more useful to isolate certain characteristics/effect.

In fact, I’m just arguing that for the current goal of producing/finalizing a useful document that allows people to easily generate some initial comparable results, I believe it might be more useful to define some simple scenarios that are easy to implement and analyze.

> 
> You correctly say that the traces are outdated.  However, they have
> some characteristics that I expect are still present in "real" traffic
> that would be missing from most synthetic traffic:
> - A wide distribution of flow sizes, including many small flows
> - Non-stationarity (this isn't just due to how the trace was collected)

Both of the effect are problematic for simulation and I’m not sure if these effect are really needed (as least for an initial simulation). Regarding flow sizes, I would propose to explicitly include a scenarios with a large number of small flows running against one (or more) ling-running flows.

> - Changes in the ratio of forward to backward traffic over time.

If you just use certain traffic distributions (instead of the trace itself) you will also have (short-term) phases with more or less traffic. 

The vector format used by tmix in addition introduces some relation between a request and a response. However, not sure if this is really needed for TCP evaluation and moreover it is still not fully realistic because no user interactions are considered. The important things it to model the effect of cross traffic on the ACKs (of the flow(s) under tests). This could also be introduced artificially by ACK loss and/or additional delays.

> As such, the issue is not whether we should use these vs more "up to
> date" traces, but whether we should use traffic that reflects the
> internet or synthetic/sanitized traffic that gives us a better
> understanding of how TCP behaves in unrealistic conditions.  As a
> theorist, I am entirely happy with the latter, provided the IETF
> community is.

As I guess I stated my opinion already. However for the record, I would prefer the latter because this might be more suitable for an initial evaluation which is the goal of this doc (as the doc also states that these evaluations are not sufficient for an exhausting evaluation of a new proposal and only use for the propose of comparison).

> 
> Having said that, if it means repeating all the work that the Davids
> did, then it isn't clear that it is worthwhile. Personally, I'd be
> happy with either Poisson flow arrivals (or exponential inter-flow
> times -- see my response to your comment 2.1)
> 
> You are right that the IETF's concern at the time these tests were
> written was "is it safe to deploy", but that wasn't the motivation for
> this test suite.  My motivation was to avoid the arguments of: "your
> algorithm works badly on my testbed";  "Oh yeah, well yours works
> badly on mine".  Sally's was to be able to tell people "don't wast our
> time if you're just doing tests on a single unidirectional on a single
> bottleneck", before going on to check corner cases, which will be
> different for different proposals.
> 
> In response to your specific comments:
> 2.1:  the difference between specifying interarrival times and
> inter-flow times is big, regardless of whether the timescale is such
> that the flow re-enters slow start.  As you point out, in the first
> case, it is possible for the link to be "overloaded" in the sense that
> the number of flows builds up indefinitely, whereas in the second, the
> link is always "stable" in a queueing-theoretic sense.  Specifying
> inter-flow times is not much harder to implement, and captures an
> important qualitative features of user behaviour: less traffic arrives
> if the network is congested.  Since there are arguments both ways, I'd
> suggest that we leave the current version.

Okay. In our simulation we have a traffic generator that generators data chunks (with a certain size distribution and IAT distribution) and we have two option how to send these data over the simulated network:
1) have one TCP connection for each generator, or
2) start a new TCP connection for each chunk

Case 2 can still lead to this overloaded case. Also not sure how useful it is to investigate a case where there is a congestion collapse that could not even been resolved by congestion control. Okay there is still the TCP backoff mechanism… maybe someone wants to evaluate something related to this mechanism. Otherwise the only useful action in such a situation (in the real world) would be provide more capacity or (as a short-term solution) terminate connection manually. I’d say in a simulation this translates into a wrong choice of parameters. 

> 
> 3.1: I entirely agree with the sentiment that we shouldn't consider
> overload in the sense of "A>C".  Again, it was Sally pushing for that,
> and the biggest delay in getting this document out was my attempt to
> prove that we can get high loss rates (as sometimes observed in the
> internet) without resorting to A>C.  In trying that, I was defeated by
> the non-stationarity of the traces, which is why the Davids eventually
> chose to go ahead with Sally's method.

High loss rates can (easily) be reached when using artificial traffic with e.g. one long lived flow and a large number of short flows… of course depending on what you call ‚high low rates‘.

> 
> 
> Overall, what I think would be an ideal outcome is:
> - to ensure  tmix  works reliably on platforms other than ns
> - to have guidelines on what tweaks are needed to allow Linux to have
> sufficiently many simultaneous flows
> - to maintain a completely repeatable traffic source
> - to get this out as "version 1" of a test suite, and move on to a better one.

Still sounds like a lot of work to me…

Mirja

> 
> Cheers,
> Lachlan
> 
> 
> On 13 December 2014 at 03:45, Mirja Kühlewind
> <mirja.kuehlewind@tik.ee.ethz.ch> wrote:
>> Hi,
>> 
>> I reviewed draft-ietf-iccrg-tcpval-01. I understand that the goal of this
>> document is to finish work that was started more than 5 years ago (and
>> actually should have been finished at that time as well). In general I think
>> it would be very useful to have a document that describes initial test
>> cases, however I'm not sure how useful this document is to provide "quick
>> and easy" initial results (for comparison).
>> 
>> Some background information: I recently performed a larger evaluation study
>> to evaluate my own congestion control proposal for my thesis. I know the
>> scenarios described are only meant to provide some initial evaluation and
>> are surely not sufficient to provide an exhausting evaluation as needed for
>> a thesis, but I actually fail/give up to use any of the described scenarios
>> at all.
>> 
>> I have to say that I'm not using ns-2 for simulation but an own simulation
>> library that is able to fully include VMs into the simulation without any
>> influence of host system (therefore provide reproducible results). However,
>> not matter why I'm using a different tool, I believe the idea of the draft
>> is to described scenarios for exactly this use case. If the draft only ends
>> up documenting what's implemented in ns-2, this should not be an RFC but
>> should be part of the ns-2 documentation instead. Therefore my first review
>> comment is that I would rather move all ns-2 (and Tmix) specific parts in
>> the annex (or even try to remove it at all).
>> 
>> However, the reason why I ended up not using these scenarios are:
>> 1) It was extremely hard (not quick and easy) to get any simulation running
>> with the given traffic traces (and more than a little load) for more than a
>> few seconds mainly due to implementation limitation such as the max. number
>> of active flows. For some reason the real kernel in the VM did not release
>> the socket right after the end of the transmission, so the number of active
>> flow the kernel was counting was higher than the number of active flows in
>> the simulation. There are several solutions to this problem, e.g. digging
>> into the kernel code or just use multiple kernels instead. However, I gave
>> up as it was simply not quick and easy and because of reason 2)
>> 2) The proposed scenario based on the given traffic traces did not allow me
>> any useful comparison (at least for the limited cases that I tried).
>> Basically what I'm saying is that there is no significant different for any
>> TCP extension in test because I was not changing Slow Start while most of
>> the short flows in the traffic traces would never leave Slow Start. Further
>> with these rather complicated traffic pattern, it is basically impossible to
>> say if the different results you see actually depend on the used TCP
>> extension or if these are just e.g. minor timing differences that might have
>> a bigger effect.
>> 
>> Again I know the goal is to finish this old work, but my main concern
>> regarding the usefulness of this document is due to the traffic generation.
>> And believe this is also the point where they got stocked then, because
>> that's not easy.
>> 
>> Unfortunately I have to say that everything you added (with respect to
>> draft-irtf-tmrg-tests-02) does make the situation better and mostly reads
>> like 'try-and-error' while being very specific to this one (out-dated) data
>> set. I'll give more detailed comments further below. But I think all this
>> text should at least be moved to the appendix.
>> 
>> Than, as just indicated, the data set is from 2006. If the main reason to
>> use these traces it to achieve more realistic traffic pattern, the goal is
>> failed as traffic pattern have strongly changed since then (e.g. a huge
>> proportion of the traffic is now video).
>> 
>> Further, I don't even agree to the point that "traffic must be reasonably
>> realistic". If the goal is "to compare and contrast proposal against
>> standard TCP", it is more useful to use very simple traffic models such that
>> effects that occurred can actually be related to certain behaviors of an
>> algorithm. If the goals to check if the extension is safe for
>> experimentation in the wider Internet (which seems to be a not spelled-out
>> goal), than it would be most important to investigate corner/extreme cases
>> which do not occur very often in the Internet but are so different that
>> things could break.
>> 
>> Of course, when you design a new extension, you also have to show that it
>> works well in a "reasonably realistic" environment but that can not be the
>> goal for this document. Further the best way to do that is to run it on the
>> Internet. In this case you can't analyze any specific effects as you don't
>> know the network condition, but you can basically say: it works or it
>> doesn't work...
>> 
>> Here are some more detailed comments on the traffic generation part; I will
>> provide further comment on the rest on the document as soon as we have
>> resolved the issues raised above...
>> 
>> - 1. paragraph of section 2.1:
>> This section argues that start time and flow size (distributions) are not
>> enough, instead traffic should be model by start time, request and response
>> size, and think time (distributions). First of all the paragraph is slightly
>> hard to read and thus it would be great if it could be spelled out more
>> clearly what the used model is (at the begining and then give the
>> reasoning).
>> However, I disagree with this model. I don't think that it is needed to
>> model a dependency between a request and a response for TCP evaluations. Of
>> course this might be more realistic but as long as we don't model
>> application behavior or even user interactions that doesn't give us much and
>> makes thing more complicated. Instead we should investigate scenarios where
>> the load level on the forward and backward channel can be set independent of
>> each other.
>> Further modeling think times in contrast to inter-arrival times (IATs) only
>> makes a difference for TCP evaluations where the 'waiting' is very small and
>> therefore TCP will not fall back to Slow Start in the first case. What I do
>> to cover both cases is, that I always just model flow sizes and IATs but
>> have an option to say if the data generated by one generator should be send
>> on the same TCP connection or if a new connection should be open for each
>> data flow. This can lead to the case when the IAT is small and the available
>> capacity is small as well, that all flows will be send back to back
>> appearing as one long flow. However, i think that's in responsibility of the
>> researcher doing the experiment to verify if this case has happened.
>> 
>> - section 2.2.
>> As I said I don't think there should be Tmix specific text in here.
>> 
>> - section 2.3.2
>> Non-stationarity is a big problem which makes all experimentation with this
>> data complicated (and not quick and easy). It is not clear at all how much
>> your applied hacks actually changes the traffic characteristics of the
>> trace. Therefore it also not clear how realistic the traffic still remains
>> at the end
>> (Therefore I would rather just like to see the traffic characteristic, such
>> as flow size and IAT distribution, of this traced written down as an input
>> for an artificial traffic generator; I don't this is less realistic.)
>> 
>> But I also believe some of the problems with non-stationarity are specific
>> to this trace. The trace seems to take only new started flows into account
>> for the measurement which does not reflect the actually traffic load of the
>> measured system at the beginning of the trace. This might be different with
>> a different trace that has been measured differently. And I only can say
>> this again, because I think that is really important, the trace is out-dated
>> and we should not completely rely on this one trace in this document.
>> 
>> - section 2.2.3.1
>> The number of 500e6 just seems random and is probably very specific to this
>> one trace.
>> 
>> The forward reference to section 4.4.2. is not understandable.
>> 
>> - section 2.4.
>> This again sound like guess work; I would rather like to define values and
>> actually apply them to an artifical traffic model.
>> 
>> - section 3
>> should really go in the appendix
>> 
>> - section 3.1.
>> says "it is important to test congestion control in overload". That is not
>> wrong but, in fact, if there is sufficient data to send TCP will always try
>> to fill the link; I would not call this overload. If there is not enough
>> data to send congestion control does not even become active (except Slow
>> Start). Therefore if you re-design the congestion control behavior and would
>> like to evaluate this against 'standard TCP', the only cases that are
>> interesting is when TCP is able to fill the link. However, even if the total
>> load is e.g. only 85%, there will be phases during the simulation run where
>> the link is full. Therefore there should never be a case where A>C (or
>> offered load > 100%). This only leads to non-steady behavior, as you say
>> correctly. In a real system this will lead to congestion collapse where even
>> TCP cannot help anymore. But in a real system there usually is a user behind
>> a computer who will just give up.
>> 
>> - section 3.3
>> Without having tried to apply this, this part doesn't seem to be super
>> useful. Again values seems to be arbitrary and at end, if I get you right,
>> you save less than one RTT of simulation time. That wouldn't help any
>> problems for my. And again, as I said above, this is a problem of this trace
>> and could have been avoided when the trace would have been differently
>> collected.
>> 
>> - I'll comment on the rest of the document later.
>> 
>> To conclude, I would like to propose to remove the traffic traces from the
>> document (or move to the appendix but I believe this text should go into
>> some ns-2 documentation instead). I know the intention is to finish the
>> original work, but especially as the traces are out-dated for me the
>> document was not useful as it is.  Maybe this should be discussed with the
>> original authors.
>> 
>> Instead of the traffic traces I would like to use simple scenarios with a
>> certain (small) number of greedy flows and/or short flow cross traffic (with
>> a certain IAT and flow size distribution). I can provide numbers of what
>> I've used or there are also scenarios described in
>> draft-sarker-rmcat-eval-test-00 (which often in addition uses video traffic
>> which makes things even more complicated than we would need it here for this
>> initial evaluation). Maybe if possible it would anyway be useful to try and
>> align the structure and/or terminology of these two draft.
>> 
>> Sorry for the long email. I hope that is still helpful.
>> 
>> Mirja
>> 
>> 
>> _______________________________________________
>> iccrg mailing list
>> iccrg@irtf.org
>> https://www.irtf.org/mailman/listinfo/iccrg
> 
> 
> 
> -- 
> What some people mistake for the high cost of living is really the
> cost of high living.