Re: [aqm] [iccrg] Comments on draft-irtf-iccrg-tcpeval

Michael Welzl <michawe@ifi.uio.no> Wed, 10 December 2014 01:53 UTC

Content-Type: multipart/alternative; boundary="Apple-Mail=_5507A61A-AC64-44E4-94ED-F8CF705FB4BA"
Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1990.1\))
From: Michael Welzl <michawe@ifi.uio.no>
In-Reply-To: <CAA93jw7YpMFcE1+dBpHr=9oJ_8wNUPbuY4N6hbJAxJeWxtjEcQ@mail.gmail.com>
Date: Wed, 10 Dec 2014 12:52:52 +1100
Message-Id: <A9F21E81-BF40-4421-B83C-0C3EB7FCD7C6@ifi.uio.no>
References: <87k320u5yz.fsf@toke.dk> <CAA93jw7YpMFcE1+dBpHr=9oJ_8wNUPbuY4N6hbJAxJeWxtjEcQ@mail.gmail.com>
To: Dave Taht <dave.taht@gmail.com>
Archived-At: http://mailarchive.ietf.org/arch/msg/aqm/s_6_2eGIEL-xeT4lZPqEjP5Wx7Y
Cc: "aqm@ietf.org" <aqm@ietf.org>, "iccrg@irtf.org" <iccrg@irtf.org>
Subject: Re: [aqm] [iccrg] Comments on draft-irtf-iccrg-tcpeval
Precedence: list

Hi,

I’ll start answering this with ICCRG chair hat on, and take it off later. Plus, I’ll leave it to the authors to address Toke’s email.

First, I would like to thank both of you for your comments, and Toke in particular for having carried out what looks to be a thorough review. Second, as chair, I have to say that I object to most of Dave’s comments below, because they call for adding more parameters / conditions to test in the document, and this puts the test suite in danger of missing its point.

Let me give some context: this document was created to create a common basis for evaluating TCP proposals against each other. It comes from a time when we saw presentations saying “we evaluated mechanism X under conditions A, B, C, and it consistently works better than Y”, followed by a presentation saying “but we evaluated Y, under conditions B', C' and D, where B’ and C’ are a little different but generally similar to B and C, and we find that Y always works better than X”. In such a situation, you’d like to have a common ground. Even though we don’t see many such presentations anymore these days, this IS the context of the document, this is what it’s written for - if someone would bring a new TCP congestion control mechanism to the floor, that person should be able to use the test suite as a good starting point for evaluation against whatever the current state of the art is.

The absolutely last thing that I, as chair, would like to see, is to move from this situation to the following:
1) "we evaluated mechanism X, using the test suite, and FQ_CoDel, and it always works better than Y”, followed by: 2) "we evaluated mechanism Y, using the test suite, and with *the new version of FQ_CoDel*, and Y always works better with X, so the authors of X have obviously used the wrong version of FQ_CoDel, thereby invalidating all their results!"

If this is what we end up with, the TCP test suite has achieved nothing. So, for this reason, I object quite strongly against adding more parameters and environment conditions. The test suite should provide boundary conditions that we need to test congestion control mechanisms against in order to be able to evaluate them against each other. It does NOT intend to document the conditions that are fashionable right now. Note that, despite this document being old, MANY AQM mechanisms and - behold! - even several variants of FQ existed (I don’t get tired of pointing out that this AQM survey: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6329367 <http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6329367>  has 194 references). There is a reason why the test suite doesn’t recommend testing with RED, BLUE, REM, PIM, CHOKe, Gentle RED, ARED, GREEN…   as well as various queue scheduling algorithms.

Now I take my chair hat off, and answer comments below, in line:


> On 10. des. 2014, at 04.26, Dave Taht <dave.taht@gmail.com> wrote:
> 
> adding aqm

and I suggest to remove them from this thread, after my answer.


> On Tue, Dec 9, 2014 at 8:45 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> At the Hawaii meeting I promised to review this document. Here are my
>> comments:
>> 
>> - Overall I think it is quite good. The document provides a nice
>>  baseline for evaluating TCP behaviour, and having an implementation
>>  available is definitely a plus. I'm not sufficiently familiar with NS2
>>  and Tmix to comment on the implementation details (and in particular
>>  on whether or not they achieve the stated goals of mixing and quickly
>>  attaining steady state), so I'm just going to assume that that is the
>>  case. :) An implementation that runs on real machines would be nice.
> 
> +10. Jeeze, guys, the internet stopped looking like sims a couple
> decades back….

Quoting a part of the abstract:

"This goal of the test suite is to allow researchers to quickly and easily evaluate their proposed TCP extensions in simulators and testbeds using a common set of well-defined, standard test cases,”

=> the document doesn’t talk about simulations only. I agree with Toke when he says that ”an implementation that runs on real machines would be nice”, but this is a call for voluntary work and not related to the document as such. Great if someone does it!



>> - I note there is some overlap between this and other efforts to produce
>>  test suites. I'm aware of at least the AQM and the BMWG groups'
>>  efforts. Pursuing deduplication of effort might be worthwhile.
> 
> Well, additive would be a nicer outcome.

Absolutely not, as I explained in the beginning with chair hat on.


>>  Cross-referencing relevant documents might also be. Not sure I have
>>  enough of an overview of this to recommend particular things to refer
>>  to / incorporate / etc, so I won't.
>> 
>> - I'm not sure it would be possible to perform the tests specified in
>>  the document without using the trace files (but see above regarding my
>>  lack of experience with the tools). I'm not sure if this is a problem
>>  or not (but could not find any information about the licensing of the
>>  trace files either); if it is a problem, perhaps providing, say, a
>>  table of TCP flow distributions to use as a fall-back, or something
>>  like that, would be worthwhile?
> 
> I would argue for a large set of trace files generated yearly from
> multiple vantage points.

Just to give context, this clearly isn’t a recommendation for the document; it’s a call for voluntarily work.


>> - Several of the tests specify that they should be run "with and
>>  without" the TCP feature under test. Section 6.1 specifies SACK but no
>>  ECN as a baseline for that particular test; why is it only for that
>>  test? Explicitly specifying a baseline for the whole thing might be a
>>  good idea. I'd argue that such a baseline should at least encourage to
>>  also test against CUBIC (but will freely admit to being biased towards
>>  Linux environments).
> 
> I think timestamps are well over 70% of all tcp traffic now. Sack is
> way up, also. ECN can be negotiated with 60+% of the alexa top 1m…

Good inputs I think; up to the authors to answer.


> I am not sure cubic is well defined anymore. Notably the addition of
> sch_fq + pacing + changes to the hystart algorithm in linux have made
> it a whole new ballgame, and there are too many other changes to linux
> tcp in the last 4 years in particular to list here.

- which I interpret as a good reason to *not* include CUBIC in the baseline (otherwise the baseline permanently changes; FWIW, the baseline doesn’t need to be “the best, most recent mechanism").


>> - I applaud the inclusion of an asymmetrical link in one of the
>>  scenarios (the satellite link in section 4.4.4). However, I'm a little
>>  concerned that effects in that test are going to be dominated by the
>>  long RTT, and so would like to see an asymmetrical link scenario with
>>  lower RTT (in the "earth surface internet" range, so ~100 ms or less).
>>  I've definitely seen weird things happen on asymmetrical links.
> 
> Well, as I am now commonly seeing asymmetric links deployed in the
> field with a 10x1 down/up ratio (examples include many DSL setups, and
> my new 120/12mbit cable connection), I would like to see this emulated
> and explored a lot further.

10x1 is exactly what the test case in the document seems to cover.


> Also wifi is commonly asymmetric (antennas on the station are much
> worse than those on the AP, stations are in a "taxi stand" topology.
> 
>> - As far as metrics is concerned, I'd argue for emphasising
>>  distributions more. Meaning that for the steady-state tests (section
>>  4.5), the variance measures should not in "other metrics of general
>>  interest" but be part of the core metric.
> 
> +1
> 
>>  For the transient tests, and for latency measures in particular, this
>>  becomes even more important. In my view the phrase "ideally it would
>>  be better to have more complete statistics" (section 5.1.3) is quite
>>  the understatement: It is quite essential to include the distribution
>>  characteristics, and in particular the outliers and the duration of
>>  spikes in latency.
> 
> +10. Also the Tracy-widom distribution has become rather interesting
> of late as a means to measure transitional behavior.
> 
> ...
> 
> One of the things that bother me about most published simulations is
> that they tend to run at low rates, where today we have networks going
> at 10GigE+. A lot of papers focus overmuch on packet loss at low rates
> below 5mbit, (where indeed it tends to be high with modern tcps) where
> at higher rates minimal packet loss is generally observed, and
> desirable.

This is an aside to the discussion of the test suite I think, but: you keep saying that and I fail to see the point (and I think it’s almost ironic that you usually couple this with criticism of the choice of long RTTs): the behavior of a congestion control mechanism depends on the BDP. This means that, using a BDP worth of DropTail queuing for example, I get the exact same result whether I test it with 10Gig / 10ms or with 1 Gig / 100ms.

This changes:
- when the TCP congestion control uses fixed timer values (which most CC’s I know about don’t)
- when the goal is to investigate e.g. limits of timer granularity (worth doing but not the point of most evaluations, at least not the ones I’ve seen you address with this comment)
- when there is an AQM mechanism in place that uses fixed timer values. So yes, this matters indeed when we have e.g. FQ_CoDel in place, but this is at least not the point of what we’re talking about here (I argued against including particular AQM mechanisms above).


> Furthermore packet loss of certain kinds of packets (acks) tend to be
> inconsequential and should be noted somehow. We can drop a lot more
> acks than we do....
> 
> In looking over my current netperf-wrapper data set (120mbit/12mbit)
> 
> http://snapon.lab.bufferbloat.net/~d/comprehensive_with_ecn.tgz
> 
> I was struck by the low packet loss rates at these speeds, and by the
> differences in average packet size. (cake2 is an all
> singing/all_dancing fq_codel based shaper)
> 
> What other statistics could I be gathering from the real world that
> would be useful here?
> 
> Inbound packet loss percentage: .02%
> Average packet size: 2823 (offloads are in use) (so this percentage is
> relative to the actual packet size to some extent)
> Details:
> 
> qdisc cake2 801e: root refcnt 2 bandwidth 115Mbit besteffort flows
> Sent 861811656 bytes 612322 pkt (dropped 89, overlimits 556131 requeues 0)
> backlog 0b 0p requeues 0
> 
>          Class 0
>  rate     115Mbit
>  target     6.2ms
>  delay        0us
>  maxpkt         0
>  pkts      305338
>  bytes  862069036
>  drops         89
>  marks          0
> 
> ...
> 
> Outbound packet loss percentage: .3%
> Average packet size: 403 (offloads are in use)
> 
> Details:
> 
> qdisc cake2 801d: root refcnt 9 bandwidth 12Mbit besteffort flows
> Sent 90295734 bytes 252230 pkt (dropped 922, overlimits 434160 requeues 1)
> backlog 0b 0p requeues 1
>          Class 0
>  rate      12Mbit
>  target     6.2ms
>  delay        0us
>  maxpkt         0
>  pkts      230721
>  bytes   93019420
>  drops        922
>  marks          0
> 
> 
>> 
>> - Section 5.1 mentions AQM algorithms in passing. Since the work of the
> 
> Same test, with ecn. zero packet loss, near perfect utilization and
> low latency, but honestly we could shoot a lot more acks.
> 
> qdisc cake2 8012: root refcnt 2 bandwidth 115Mbit besteffort flows
> Sent 861639003 bytes 610822 pkt (dropped 0, overlimits 534608 requeues 0)
> backlog 0b 0p requeues 0
> 
>  Class 0
>  rate     115Mbit
>  target     6.2ms
>  delay        0us
>  maxpkt         0
>  pkts      292907
>  bytes  861639003
>  drops          0
>  marks         87
> 
> qdisc cake2 8011: root refcnt 9 bandwidth 12Mbit besteffort flows
> Sent 90205372 bytes 246249 pkt (dropped 0, overlimits 419887 requeues 0)
> backlog 0b 0p requeues 0
>          Class 0
>  rate      12Mbit
>  target     6.2ms
>  delay        0us
>  maxpkt         0
>  pkts      222494
>  bytes   90205372
>  drops          0
>  marks        923
> 
> 
>>  AQM working group has progressed somewhat now, I believe it would be a
>>  good idea to upgrade this to at the very least *recommend* testing the
>>  behaviour when AQM and fairness queueing algorithms are introduced on
>>  the (bottleneck) links.
> 
> :)

For AQM, see above. For FQ, I don’t see the point. (whatever-)FQ isolates flows, and much of the test suite is about evaluating how flows interact with other traffic. So testing over FQ turns the test into a “minimal or no load” test.

Cheers,
Michael

Re: [aqm] [iccrg] Comments on draft-irtf-iccrg-tc… Dave Taht
Re: [aqm] [iccrg] Comments on draft-irtf-iccrg-tc… Michael Welzl