Re: [rmcat] Bursty sending ---- was RE: Review of Congestion Control Requirements For RMCAT

Re: [rmcat] Bursty sending ---- was RE: Review of Congestion Control Requirements For RMCAT - draft-ietf-rmcat-cc-requirements-00

Dave Taht <dave.taht@gmail.com> Sat, 16 November 2013 17:35 UTC

MIME-Version: 1.0
In-Reply-To: <AE7F97DB5FEE054088D82E836BD15BE920148194@xmb-aln-x05.cisco.com>
References: <AE7F97DB5FEE054088D82E836BD15BE920148194@xmb-aln-x05.cisco.com>
Date: Sat, 16 Nov 2013 09:35:21 -0800
Message-ID: <CAA93jw77r6W-kbCoJtxJjU09jeAdR0CgfOMcy4xyz6JPn74UZg@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: "Bill Ver Steeg (versteb)" <versteb@cisco.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: Randell Jesup <randell-ietf@jesup.org>, "rmcat@ietf.org" <rmcat@ietf.org>
Subject: Re: [rmcat] Bursty sending ---- was RE: Review of Congestion Control Requirements For RMCAT - draft-ietf-rmcat-cc-requirements-00
Precedence: list

On Fri, Nov 15, 2013 at 8:30 PM, Bill Ver Steeg (versteb)
<versteb@cisco.com> wrote:
> Randell-
>
> A side note on bursty ABR video flows follows. Not 100% germane to the discussion in RMCAT, but my pet peeve nonetheless.
>
> In the enclosed note, I was using "MPEG-DASH-like" as a place holder for the broad class of HTTP based rate adaptive video flows. All of the legacy ABR video flows send in a very nasty square wave pattern. The square wave corresponds to the length of the fetched chuck. The 10 second chunks are pretty bad, but the 2 second chunk stuff can really cause huge swings in bloat. This is particularly true when the video rate is just below the line rate, so the "on" portion of the square wave is much longer than the "off cycle". The worst case is when the chunk size is large relative to the TCP window size (with large tail-drop buffers). Cross traffic can get really hammered.
>
> The notion that "bursty is bad" is not new. We found it several times over the years, in old video servers over-running QAM buffers, VSAT networks, terminal servers, etc. We seem to forget and re-discover this every few years. The PIE/CoDel buffer management algorithms clearly help, and so does FQ. Bursty senders can make buffer management quite difficult, though.

+1

>
> There have been several recent papers on ways to reduce the burstiness of ABR flows. Some are client based application algorithms that try to move away from CWND based flow control and towards RWND based flow control ( SABRE- http://www.cc.gatech.edu/~amansy/GT-CS-12-07.pdf - presentation from last Spring's ACM ) and some are server based algorithms that use either TCP or application pacing (http://www.ietf.org/proceedings/88/slides/slides-88-iccrg-6.pdf - presented in ICCRG in Vancouver ).

I am tickled pink with the paced fq scheduler (discussed in the paper
above). It is in linux 3.12, which was released last week. There have
been a few teething pains with the TSO fixes, but the results have
generally been spectacular there too.

I think we are not long from the day where the default linux packet
scheduler can be  set to paced fq for servers and desktops and
fq_codel, for routers. The machinery to do so is now a single line in
sysctl

http://comments.gmane.org/gmane.linux.network/281378

I do hope more people experiment with this stuff now that it is so easy!

In an rmcat or DASH context there is now a new setsockopt to control
the pacing at a finer level. it was originally tcp specific, I am
pretty sure it was extended to udp as well, but am not sure what name
it retained

http://lwn.net/Articles/560082/

>
> Bvs
>
>
> -----Original Message-----
> From: rmcat-bounces@ietf.org [mailto:rmcat-bounces@ietf.org] On Behalf Of Randell Jesup
> Sent: Friday, November 15, 2013 4:17 PM
> To: rmcat@ietf.org
> Subject: Re: [rmcat] Review of Congestion Control Requirements For RMCAT - draft-ietf-rmcat-cc-requirements-00
>
> On 11/12/2013 11:27 AM, Bill Ver Steeg (versteb) wrote:
>> Randell-
>>
>> As I promised in Vancouver, here are my note on
>> draft-ietf-rmcat-cc-requirements-00
>>
>> In summary, this is quite a nice start. It is a difficult document to write, as it is intentionally quite broad. On the whole, I like it.
>
> That's a great start for a review!  ;-)   Thanks!  Now on to the meat...
>
>> The main concern is my item #1
>>
>> 1- In the abstract, do we want to mention that the RMCAT CC algorithm
>> needs to be robust in the presence of a wide range of cross traffic? If not, we certainly need to state it strongly in the introduction and the requirements. Perhaps a paragraph similar to "While the requirements for RMCAT differ from the requirements for the other flow types, these other flow types will be present in the network. The RMCAT congestion control algorithm must work properly when these other flow types are present as cross traffic on the network." should be in either the abstract or in the introduction. I would also like to see a statement that the RMCAT CC algorithm should drive scavenger flows (like LEDBAT) to nearly 0, taking that BW for the RMCAT/TCP/"normal" flows. These points are referenced in item #11, but it is a rather thin section late in the document. IMHO, this should be one of the primary points of the requirements document.
>
> Certainly upping the emphasis on robustness in the face of a variety of cross traffic is good - though it's not something that we can guarantee is achievable (just as we can't guarantee success in competing with greedy TCP flows).
>
> As for driving LEDBAT to 0.... I'm afraid that might be desirable, but unachievable.  We may well find that LEDBAT will put a floor on how low we can drive delay, driven by the delay constant hard-coded into LEDBAT implementations (100ms(!) in the spec, 25ms in Apple's kernel implementation used for things like Apple's TimeMachine(?) auto-backup stuff.  I'm not sure what BitTorrent is using, but I suspect it's 100ms.

100ms appears to be about correct for what is in the field for uTP. I
still see a lot of tcp based torrent traffic tho...

I note that there are two sets of dynamics in play here, slow start
and congestion avoidance...

and that any form of aqm or packet scheduling turn delay based e2e
algorithms into loss/marking based ones:

http://perso.telecom-paristech.fr/~drossi/paper/rossi13tma-b.pdf

Some LPCC folk think 100ms of induced delay is acceptable, but I
don't. 25ms is ok...

>
>> There are some minor issues, which I discuss below.
>>
>> 2- Requirements section, bullet 1 - Do we need to elaborate the other
>> cases in which the algorithm needs to adjust the BW? We state (in 1a)
>> that topology changes need to be handled. Do we need to state <1b -
>> Changes in cross traffic > and <1c - changes in offered load from the
>> application sending data over RMCAT>
>
> I'll check the text; I don't have it in front of me at the moment
>
>> 3- Requirements section, bullet 2a - If we are enumerating types of traffic that we need to be concerned with, I would more concerned with MPEG DASH style Adaptive BitRate video than generic web browsing. Web browsing is bursty, but at least it is well-bounded in time. ABR flows are bursty, cyclical, and persistent. I would either remove the reference to web browsing or change it to include other bursty flow types (I note that OTT ABR traffic is now more than 50% of the peak load on many networks).
>
> Good point, though MPEG DASH is not the dominant transfer protocol for such data (or not yet).  The major players are implementing DASH in JS (i.e. under provider/application control), so that adds another wrinkle in verifying this - but the idea that we have to coexist with such traffic is important, and HTTP streaming and DASH and proprietary protocols from Apple/MS/etc do pose real problems - HTTP streaming for example may look like a very hungry realtime protocol, or it may look like a periodic maximum-transfer TCP flow (Gettys showed a NetFlix BW vs Time graph that showed a ~10 second square-wave of bandwidth use, for example).

That was me, actually. It is generally my hope that with the short
RTTs we find with the movie traffic that the pacing stuff is going to
massively help, and fq + a drop strategy on the gateway makes it
nearly invisible.

>> 4- Requirements section 3 - do we need to mention that there is a temporal component to information sharing across streams. In other words, we may consider a previous 5-tuples experience as a baseline to seed a CC algorithm, particularly if it is the exact same source addr/dest addr/dest port/protocol? This is hinted in item #12 in the document, but the guidance is quite thin in that section. Temporal hinting is particularly valuable if the old session data was from 1 second ago. It is less true if it was 1 day ago, but may still have some value as an initial seed for the CC algorithm. 3b is also a bit awkward, as multiple  "flows" on a given 5-tuple introduces some difficult SSRC concepts that are currently being discussed in the AVT group. We either need to define "flow" or describe the concept in broader terms. I am reluctant to open that can of worms in this document, but if we are not clear it will be the source of endless debate/confusion. I hate glossaries in the front of RFCs, but we may have to do that.

My brain crashed here. I simply haven't collated all the documents in
play in my head enough yet. My own take on things is that distinct
flows should have a distinct 5 tuple, on each of voice, video and
data. I don't care about the impact on "available nat ports", when
compared to the ephemeral usage of a given web transaction on NATed
ports (hundreds opened, closed, and left in a wait state for while),
it's trivial.

(I hope I'm not alone in this)

>>
>> 5- In section 4, I do not see any details on how ECN, delay and loss
>> indicate congestion. Even though we all understand that this is a
>> complex relationship and will be difficult to characterize in a
>> requirements document, I think that a few paragraphs of detail are in
>> order. This level of detail would be important to a less experienced
>> reader. This is my major concern with the draft, and I could write a
>> paragraph or two to include this discussion (if there agreement that
>> this should be discussed in this draft)
>
> Let me take a try at it.  Understandability also can help avoid confusion among experts and implementations.

+1

>>
>>
>> 6- Is 5a a requirement, or are we starting to discuss the solution space here? I suspect that we will end up mandating AVFP, but we may be getting ahead of ourselves here. Perhaps we should soften this to a suggestion to examine RFC4585 rather than a MUST use statement.
>
> Hmmm.  Let me think.  I think we'd said that these algorithms *if* they use RTCP would need AVPF in order to respond within orders of magnitude of the RTT.  But there are other ways to provide the feedback needed than RTCP.  But that isn't to say that a new profile (AVPG ;-) ) couldn't also meet the needs here.
>
>> 7- If we are elaborating AQM schemes, we should include PIE (I know, a shameless plug for the one I am working on - but I think it is a valid comment nonetheless). And to be rigorous, RED, CoDel and PIE are buffer management algorithms that operate on a given queue. One can also apply multiple queues (FQ being one variant of multiple queues) to the problem as well.  To be more clear, we should mention queuing algorithms like RED, CoDel and PIE and then mention that each of these algorithms can be optionally mapped to multiple queues.

Nobody from the codel world recommends codel by itself, in its present form.

"If we're sticking code into boxes to deploy codel, don't do that.
Deploy fq_codel. It's just an across the board win." - Van Jacobson

So please feel free to mention fq_codel. :)

>If this is going to cause a debate of some sort, we could also just reference the broad class of AQMs and the broad class of queue allocation schemes without elaborating the specific algorithms.

fq_codel has been *extensively tested* against google hangouts in
particular. I'm looking forward to finding ways to express those tests
and results against multiple algorithms. I'm kind of hoping that the
ipv6 versions of the webrtc code land soon (anybody?) so as to get the
stun and turn things out of the loop for saner testing...

In the interim I have a pretty heavily instrumented setup and will
gladly "chat" with anyone over webrtc who wants some data.

> The specific were more to indicate which ones we thought were most important to work/test with, so perhaps we should go to a more generic statement and move mentions of specifics to Varun's document.
>
>> 8- I am reluctant to bring this into the conversation........ Once we mention multiple queues, we may want to mention that the multiple queues may represent different QOS buckets, and the activity in one queue may impact the drain rate of lower priority queues, whereas the activity of a lower priority queue will have a (very?) small impact on the drain rate of a higher queue. I am reluctant to start this train of thought in this document, so perhaps we do not mention multiple queues of multiple priorities. If we are to mention different priority flows, perhaps we use DSCP markings or VLAN tagging examples - but once again this quickly goes down a very deep rat hole. We may just want to state that there are often multiple priorities of flows, using the traditional SP voice, SP video, general purpose data, and scavenger class as examples......... I am torn on this one, and could be convinced to not include this in the document.

The dscp discussion is a rathole, indeed.

> Fundamentally, we have 4 classes of packets we want to get across the network in webrtc (which also came up in the W3 TPAC WebRTC discussions this week with regard to DSCP markings/etc): Faster-than-audio (typically low bandwidth and/or intermittent), audio, video, and best-effort (maybe split up into interactive use, and bulk-transfer).

I have long been showing off a shaper that I thought met these
requirements and it has been deployed for a few years now. The problem
I have is in generating traffic that can exercise it well, as however
it is being defined to be used.

See simple.qos in the ceropackages repository. The package works on
current versions of linux, can use all the new aqms and packet
schedulers, and has dscp filters to sort stuff into 3 htb buckets.

https://github.com/dtaht/ceropackages-3.3/tree/master/net/aqm-scripts/files/usr/lib/aqm

> Outside of webrtc we would also have scavenger (below immediate bulk-transfer).  Typical browsing traffic/etc would fall into the same general classes as best effort and bulk-transfer.  Note that "audio" and "video" don't mean they have to be exclusively packets of that type, just that they they would have network priorities of that type.
>
> Interestingly (or problematically), splitting the congestion regimes by DSCP markings really needs to be done unless you know (or strongly
> believe) they're being ignored at the bottleneck (note: not necessarily ignored everywhere).  But splitting the congestion controllers has the side effect (if they are in one queue at the bottleneck) of reducing the feedback (per controller) and slowing the ability to notice changes in the bottleneck (and to respond).
>
>> Bill VerStee
>
> Thanks again for a thoughtful review!
>
> --
> Randell Jesup
> randell-ietf@jesup.org
>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

[rmcat] Bursty sending ---- was RE: Review of Con… Bill Ver Steeg (versteb)
Re: [rmcat] Bursty sending ---- was RE: Review of… Dave Taht