Re: [rtcweb] An input for discussing congestion control (Fwd: New Version Notification for draft-alvestrand-rtcweb-congestion-00.txt)

A few comments on the nitty-gritty details inline.

/Henrik

On Mon, Sep 19, 2011 at 9:06 AM, Stefan Holmer <holmer@google.com> wrote:

> On Mon, Sep 19, 2011 at 12:10 AM, Randell Jesup <randell-ietf@jesup.org>wrote:
>
>> On 9/16/2011 9:26 AM, Magnus Westerlund wrote:
>>
>>> As an Individual I do have a few comments and question on this document
>>> and its algorithm.
>>>
>> As said on the call, I'm going to work with Harald (and Justin, and anyone
>> else interested) to provide requirements and likely a suggested algorithm.
>> At this point, I believe we'll specify the mechanism for telling the other
>> side the apparent bandwidth, but leave the algorithm for finding that out
>> open for innovation.  I plan to propose a baseline algorithm.  On the
>> sender
>> side, I expect to optionally involve the application in deciding how the
>> bits are allocated among the channels, and the option to add or remove
>> channels based on bandwidth.  I expect that we will mandate that if the
>> application doesn't choose, the rtcweb/webrtc code will.  This is all
>> still
>> to be hammered out; anyone else interested in helping write the draft
>> requirements let me know.
>>
>>
>>  1. Section 3: As delay based congestion control has been tried a number
>>> of times and meet with less than stellar success I wonder if you have
>>> any understanding of how this relates to the issues previously
>>> encountered:
>>>
>> Do you have any pointers to these earlier attempts?  My separate
>> experience
>> with this class has been extremely successful, though perhaps with a set
>> of
>> real-world use-cases that don't cover the situations you're referring to.
>> My understanding is that Radvision's NetSense is very similar from their
>> description of it.
>>
>>
> I'm interested in any references you may have as well.
>
>
>>
>>  - Stability (especially on short RTTs)
>>> - How it competes with TCP flows, which was a real issues for TCP Vegas
>>> but may be less of an issue here.
>>>
>> Perhaps Google can comment here about what tests they've done.  If
>> anything,
>> this class of algorithm when faced with aggressive TCP flows will
>> eventually
>> have problems, because TCP will tend to fill up the buffers at the
>> bottleneck.
>> Since this algorithm senses buffers starting to fill up, it will tend to
>> back
>> off before TCP is likely to see a loss event.  Now, it's more complex than
>> that of course, and multiple flows make it more complex still.  It's also
>> sensitive to the adaptation rate of this algorithm and speed of backoff.
>> If I remember, TFRC tends to back off more slowly than TCP but also
>> increase
>> more slowly; I suspect Google's algorithm is similar.
>>
>
> As you're saying, it will always be hard for delay based congestion control
> algorithms to compete with a packet loss based algorithm. The delay based
> algorithm will detect over-use much earlier than the packet loss based one,
> and that's basically what we've seen in the tests we've been running as
> well. I do think we would benefit from more tests in this area. For
> instance, we might want to go with an additive increase approach rather than
> the current one which is multiplicative, this would likely help improve the
> algorithms self fairness as well.
>
>
>> Web Browser traffic can be especially aggressive at filling large buffers
>> at
>> a bottleneck router, and will become more so as the initial window of 10
>> rolls out.
>>
>>
>>
>>  2. Section 3, I don't see any discussion of the startup issue. Is my
>>> assumption correct, that you will simply start at some rate and then
>>> adapt from that?
>>>
>> Startup is an open question; you have to start somewhere, and there's
>> no way to always get the start "right".  History of previous sessions may
>> be useful here; it's theoretically possible to use perhaps some of the
>> ICE negotiation for a first-level guess or we could send an initial packet
>> train and measure dispersion.  But that's just speculation; history can be
>> very wrong if you've changed networks or the other person has a very
>> different
>> network than the last call (or last call with that person).  I'm not sure
>> if
>> we should mandate something here, or use something more open to encourage
>> innovation.
>>
>>
>>  3. Section 3, I guess this mechanism would get significant issues with
>>> any sender side transmission filtering. Burst transmitting I-frames over
>>> a network, especially for high resolution video can easily result in
>>> packet loss in switch fabrics if there is any cross traffic in the
>>> switch fabric. Thus having a bit of send filtering for frame
>>> transmission is a good idea. Are I understanding it correctly that this
>>> could result in over-use indication if I has such mechanism which
>>> filters a packet to a slower transmission rate than what a single or
>>> small burst can achieve over the same path?
>>>
>> Well, such send filtering amounts to a bottleneck in practice at the
>> sending
>> node, so that actually might be ok.  I'm assuming you're talking about
>> pacing
>> the actual send()s of large dumps of fragmented IDRs/i-frames to some
>> maximum packet rate or bitrate (almost the same thing).  I think the
>> algorithm
>> looks at all the fragments and the time they came in to calculate the
>> bandwidth.
>> If so, it would only think it was over-bandwidth if the pacing slowed them
>> down
>> to less than the actual bottleneck rate; I suspect this would rarely be
>> the case
>> outside of high-bandwidth LANs at very high resolutions.
>
> Yes, the algorithm looks at the arrival of complete frames. And you are
right, Randell, as long as you do not spread the packets more than the
bottleneck link does, the over-use detector won't trigger.

>
>>
>>
>>
>>> 4. Section 4.
>>>
>>> "This algorithm is run every time a receive report arrives at the
>>>    sender, which will happen [[how often do we expect? and why?]].  If
>>>    no receive report is recieved within [[what timeout?]], the algorithm
>>>    will take action as if all packets in the interval have been lost.
>>>    [[does that make sense?]]"
>>>
>>> The transmission of receiver reports is highly dependent on RTP profile,
>>> the RTCP bandwidth, trr-int parameter and any feedback events.
>>>
>>> If I start with assuming AVPF which seems reasonable in most browser to
>>> browser case with our current proposals for RTP support. Then if there
>>> are no feedback events the transmission of receiver reports will occur
>>> as often as the bandwidth allows, but no more often than the value of
>>> trr-int parameter.
>>>
>>> Here is might be worth mentioning that I do expect trr-int to be set to
>>> avoid RTCP reporting to often due to the relatively high RTCP bandwidth
>>> values that will be set due to the multiplexing of audio and video in a
>>> single RTP session. Thus avoiding that RTCP rates are as high or larger
>>> than the audio streams they report in steady state.
>>>
>> Agreed.  This is where my plans to suggest a baseline algorithm that melds
>> the reception data of all the streams in the PeerConnection may be a
>> significant advantage over doing bandwidth estimation on each stream
>> independently.  We'll see if I can make the idea work, but there are some
>> significant advantages if it does.  If not, we can estimate in each
>> channel
>> independently as per the Google algorithm.
>
>
>> You can also use rtcp-fb PLI/etc events to hang these reports off of,
>> increasing
>> the frequency they get through with minimal extra bandwidth use.
>>
>>
>>
>>  When feedback events occur the stack will have the choice of sending a
>>> RTCP RR, that is a choice provided due to the reduced size RTCP
>>> specification included. But if the loss cumulative count diff is
>>> non-zero it might be worth mandating that the RR/SR is included in any
>>> such feedback RTCP packet.
>>>
>> Exactly - or a TMMBR with the results of the receiver-side bandwidth
>> calculations.
>>
>>
>>  For that reason causing a feedback event when there is losses and
>>> schedule them using the early algorithm may be a good choice to ensure
>>> timely reporting of any loss.
>>>
>>> If one uses AVP then the RTCP timing rules will give you when you
>>> transmit RTCP feeedback and thus the parameterization becomes central
>>> here. A clear issue is if people uses the minimal interval of 5 seconds
>>> or the 360/Session_BW(in kbps). I would note that 5 seconds is very long
>>> in adaptation situations.
>>>
>>
>> Yes.
>
> In the proposed algorithm, the RTCP interval adds to the system response
time. The response time governs the bandwidth increase rate so that the step
into over-use will have a limited delay build-up before it can be detected
and mitigated. Thus, a long RTCP interval results in a slow adaptation, but
it should still be stable.

>
>>
>>
>>  I think the timeout should be based on the RTT and possible reporting
>>> rate. But there clearly need to be some upper limit, either explicitly
>>> or mandating RTCP bandwidth rates.
>>>
>>> 5. Section 4:
>>> "We motivate the packet loss thresholds by noting that if we have
>>>    small amount of packet losses due to over-use, that amount will soon
>>>    increase if we don't adjust our bit rate.  Therefore we will soon
>>>    enough reach above the 10 % threshold and adjust As(i).  However if
>>>    the packet loss rate does not increase, the losses are probably not
>>>    related to self-induced channel over-use and therefore we should not
>>>    react on them."
>>>
>>> I am personally worried that a packet loss threshold of 10% is so high
>>> that it might push a lot of other traffic out of the way. Thus being
>>> quite aggressive towards other flows.
>>>
>> Yes; I want to visit these tuning parameters and how packet loss is
>> reacted to and at least tweak it.
>>
>>
>>  TCP has this relation between the average packet loss rate and the
>>> highest sustainable rate which can be interpreted as TCP gets more
>>> sensitive to loss the higher the average data rate becomes. Thus in
>>> certain situation the sender side algorithm part will be extremely
>>> aggressive in pushing TCP out of the way. The receiver part may
>>> compensate for this pretty well in a number of situations.
>>>
>>> But, I do wonder why for example the A(i) isn't bounded to be between
>>> TFRC's rate and some multiple of TFRC rate, not unbounded by other than
>>> the receiver side algorithm.
>>>
>> Good question; we should look into it.
>>
>>
> The idea behind those bounds is that if we believe in the TFRC equation, we
> should be able to transmit with that rate while still being fair to TCP.
>
>
>>
>>  I do understand this document is to start a discussion. But I think we
>>> need to be sensitive to that it is difficult to get even something that
>>> is self-fair over the wide range of conditions the Internet has to
>>> provide.
>>>
>>> Thus I would appreciate what issues with for example TFRC that you
>>> attempt to fix with the various new components you add.
>>>
>> Magnus: Welcome to my little congestion-control bof... :-)
>>
>>
> I'm interested in continued discussions as well.
>
>
>>  --
>> Randell Jesup
>> randell-ietf@jesup.org
>>
>>
>> ______________________________**_________________
>> rtcweb mailing list
>> rtcweb@ietf.org
>> https://www.ietf.org/mailman/**listinfo/rtcweb<https://www.ietf.org/mailman/listinfo/rtcweb>
>>
>
>
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/listinfo/rtcweb
>
>

-- 
Henrik Lundin | WebRTC Software Eng | hlundin@google.com | +46 70 646 13 41