[iccrg] Review of draft-ietf-iccrg-tcpval-01/ Issues on traffic traces

Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> Fri, 12 December 2014 16:45 UTC

Message-ID: <548B1B90.5050004@tik.ee.ethz.ch>
Date: Fri, 12 Dec 2014 17:45:04 +0100
From: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: iccrg@irtf.org, David hayes <davihay@ifi.uio.no>, David Ros <dros@simula.no>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/iccrg/DdjUeOOqN83RDz5S1Qp8WdKTM3Y
Subject: [iccrg] Review of draft-ietf-iccrg-tcpval-01/ Issues on traffic traces
Precedence: list

Hi,

I reviewed draft-ietf-iccrg-tcpval-01. I understand that the goal of this 
document is to finish work that was started more than 5 years ago (and actually 
should have been finished at that time as well). In general I think it would be 
very useful to have a document that describes initial test cases, however I'm 
not sure how useful this document is to provide "quick and easy" initial results 
(for comparison).

Some background information: I recently performed a larger evaluation study to 
evaluate my own congestion control proposal for my thesis. I know the scenarios 
described are only meant to provide some initial evaluation and are surely not 
sufficient to provide an exhausting evaluation as needed for a thesis, but I 
actually fail/give up to use any of the described scenarios at all.

I have to say that I'm not using ns-2 for simulation but an own simulation 
library that is able to fully include VMs into the simulation without any 
influence of host system (therefore provide reproducible results). However, not 
matter why I'm using a different tool, I believe the idea of the draft is to 
described scenarios for exactly this use case. If the draft only ends up 
documenting what's implemented in ns-2, this should not be an RFC but should be 
part of the ns-2 documentation instead. Therefore my first review comment is 
that I would rather move all ns-2 (and Tmix) specific parts in the annex (or 
even try to remove it at all).

However, the reason why I ended up not using these scenarios are:
1) It was extremely hard (not quick and easy) to get any simulation running with 
the given traffic traces (and more than a little load) for more than a few 
seconds mainly due to implementation limitation such as the max. number of 
active flows. For some reason the real kernel in the VM did not release the 
socket right after the end of the transmission, so the number of active flow the 
kernel was counting was higher than the number of active flows in the 
simulation. There are several solutions to this problem, e.g. digging into the 
kernel code or just use multiple kernels instead. However, I gave up as it was 
simply not quick and easy and because of reason 2)
2) The proposed scenario based on the given traffic traces did not allow me any 
useful comparison (at least for the limited cases that I tried). Basically what 
I'm saying is that there is no significant different for any TCP extension in 
test because I was not changing Slow Start while most of the short flows in the 
traffic traces would never leave Slow Start. Further with these rather 
complicated traffic pattern, it is basically impossible to say if the different 
results you see actually depend on the used TCP extension or if these are just 
e.g. minor timing differences that might have a bigger effect.

Again I know the goal is to finish this old work, but my main concern regarding 
the usefulness of this document is due to the traffic generation. And believe 
this is also the point where they got stocked then, because that's not easy.

Unfortunately I have to say that everything you added (with respect to 
draft-irtf-tmrg-tests-02) does make the situation better and mostly reads like 
'try-and-error' while being very specific to this one (out-dated) data set. I'll 
give more detailed comments further below. But I think all this text should at 
least be moved to the appendix.

Than, as just indicated, the data set is from 2006. If the main reason to use 
these traces it to achieve more realistic traffic pattern, the goal is failed as 
traffic pattern have strongly changed since then (e.g. a huge proportion of the 
traffic is now video).

Further, I don't even agree to the point that "traffic must be reasonably 
realistic". If the goal is "to compare and contrast proposal against standard 
TCP", it is more useful to use very simple traffic models such that effects that 
occurred can actually be related to certain behaviors of an algorithm. If the 
goals to check if the extension is safe for experimentation in the wider 
Internet (which seems to be a not spelled-out goal), than it would be most 
important to investigate corner/extreme cases which do not occur very often in 
the Internet but are so different that things could break.

Of course, when you design a new extension, you also have to show that it works 
well in a "reasonably realistic" environment but that can not be the goal for 
this document. Further the best way to do that is to run it on the Internet. In 
this case you can't analyze any specific effects as you don't know the network 
condition, but you can basically say: it works or it doesn't work...

Here are some more detailed comments on the traffic generation part; I will 
provide further comment on the rest on the document as soon as we have resolved 
the issues raised above...

- 1. paragraph of section 2.1:
This section argues that start time and flow size (distributions) are not 
enough, instead traffic should be model by start time, request and response 
size, and think time (distributions). First of all the paragraph is slightly 
hard to read and thus it would be great if it could be spelled out more clearly 
what the used model is (at the begining and then give the reasoning).
However, I disagree with this model. I don't think that it is needed to model a 
dependency between a request and a response for TCP evaluations. Of course this 
might be more realistic but as long as we don't model application behavior or 
even user interactions that doesn't give us much and makes thing more 
complicated. Instead we should investigate scenarios where the load level on the 
forward and backward channel can be set independent of each other.
Further modeling think times in contrast to inter-arrival times (IATs) only 
makes a difference for TCP evaluations where the 'waiting' is very small and 
therefore TCP will not fall back to Slow Start in the first case. What I do to 
cover both cases is, that I always just model flow sizes and IATs but have an 
option to say if the data generated by one generator should be send on the same 
TCP connection or if a new connection should be open for each data flow. This 
can lead to the case when the IAT is small and the available capacity is small 
as well, that all flows will be send back to back appearing as one long flow. 
However, i think that's in responsibility of the researcher doing the experiment 
to verify if this case has happened.

- section 2.2.
As I said I don't think there should be Tmix specific text in here.

- section 2.3.2
Non-stationarity is a big problem which makes all experimentation with this data 
complicated (and not quick and easy). It is not clear at all how much your 
applied hacks actually changes the traffic characteristics of the trace. 
Therefore it also not clear how realistic the traffic still remains at the end
(Therefore I would rather just like to see the traffic characteristic, such as 
flow size and IAT distribution, of this traced written down as an input for an 
artificial traffic generator; I don't this is less realistic.)

But I also believe some of the problems with non-stationarity are specific to 
this trace. The trace seems to take only new started flows into account for the 
measurement which does not reflect the actually traffic load of the measured 
system at the beginning of the trace. This might be different with a different 
trace that has been measured differently. And I only can say this again, because 
I think that is really important, the trace is out-dated and we should not 
completely rely on this one trace in this document.

- section 2.2.3.1
The number of 500e6 just seems random and is probably very specific to this one 
trace.

The forward reference to section 4.4.2. is not understandable.

- section 2.4.
This again sound like guess work; I would rather like to define values and 
actually apply them to an artifical traffic model.

- section 3
should really go in the appendix

- section 3.1.
says "it is important to test congestion control in overload". That is not wrong 
but, in fact, if there is sufficient data to send TCP will always try to fill 
the link; I would not call this overload. If there is not enough data to send 
congestion control does not even become active (except Slow Start). Therefore if 
you re-design the congestion control behavior and would like to evaluate this 
against 'standard TCP', the only cases that are interesting is when TCP is able 
to fill the link. However, even if the total load is e.g. only 85%, there will 
be phases during the simulation run where the link is full. Therefore there 
should never be a case where A>C (or offered load > 100%). This only leads to 
non-steady behavior, as you say correctly. In a real system this will lead to 
congestion collapse where even TCP cannot help anymore. But in a real system 
there usually is a user behind a computer who will just give up.

- section 3.3
Without having tried to apply this, this part doesn't seem to be super useful. 
Again values seems to be arbitrary and at end, if I get you right, you save less 
than one RTT of simulation time. That wouldn't help any problems for my. And 
again, as I said above, this is a problem of this trace and could have been 
avoided when the trace would have been differently collected.

- I'll comment on the rest of the document later.

To conclude, I would like to propose to remove the traffic traces from the 
document (or move to the appendix but I believe this text should go into some 
ns-2 documentation instead). I know the intention is to finish the original 
work, but especially as the traces are out-dated for me the document was not 
useful as it is.  Maybe this should be discussed with the original authors.

Instead of the traffic traces I would like to use simple scenarios with a 
certain (small) number of greedy flows and/or short flow cross traffic (with a 
certain IAT and flow size distribution). I can provide numbers of what I've used 
or there are also scenarios described in draft-sarker-rmcat-eval-test-00 (which 
often in addition uses video traffic which makes things even more complicated 
than we would need it here for this initial evaluation). Maybe if possible it 
would anyway be useful to try and align the structure and/or terminology of 
these two draft.

Sorry for the long email. I hope that is still helpful.

Mirja

[iccrg] Review of draft-ietf-iccrg-tcpval-01/ Iss… Mirja Kühlewind
Re: [iccrg] Review of draft-ietf-iccrg-tcpval-01/… Lachlan Andrew
Re: [iccrg] Review of draft-ietf-iccrg-tcpval-01/… Mirja Kühlewind