Re: [rtcweb] An input for discussing congestion control (Fwd: New Version Notification for draft-alvestrand-rtcweb-congestion-00.txt)

Randell Jesup <randell-ietf@jesup.org> Sun, 18 September 2011 22:11 UTC

Return-Path: <randell-ietf@jesup.org>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 027C921F891D for <rtcweb@ietfa.amsl.com>; Sun, 18 Sep 2011 15:11:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.392
X-Spam-Level:
X-Spam-Status: No, score=-1.392 tagged_above=-999 required=5 tests=[AWL=-1.207, BAYES_40=-0.185]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TOLE5O7bomlt for <rtcweb@ietfa.amsl.com>; Sun, 18 Sep 2011 15:11:20 -0700 (PDT)
Received: from r2-chicago.webserversystems.com (r2-chicago.webserversystems.com [173.236.101.58]) by ietfa.amsl.com (Postfix) with ESMTP id CA3E321F8713 for <rtcweb@ietf.org>; Sun, 18 Sep 2011 15:11:20 -0700 (PDT)
Received: from pool-173-49-141-165.phlapa.fios.verizon.net ([173.49.141.165] helo=[192.168.1.12]) by r2-chicago.webserversystems.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from <randell-ietf@jesup.org>) id 1R5PcK-00030H-VI; Sun, 18 Sep 2011 17:13:41 -0500
Message-ID: <4E766C4C.2020201@jesup.org>
Date: Sun, 18 Sep 2011 18:10:20 -0400
From: Randell Jesup <randell-ietf@jesup.org>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1
MIME-Version: 1.0
To: rtcweb@ietf.org
References: <4E649FBD.1090001@alvestrand.no> <4E734E89.5010105@ericsson.com>
In-Reply-To: <4E734E89.5010105@ericsson.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - r2-chicago.webserversystems.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - jesup.org
X-Source:
X-Source-Args:
X-Source-Dir:
Subject: Re: [rtcweb] An input for discussing congestion control (Fwd: New Version Notification for draft-alvestrand-rtcweb-congestion-00.txt)
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Sep 2011 22:11:22 -0000

On 9/16/2011 9:26 AM, Magnus Westerlund wrote:
> As an Individual I do have a few comments and question on this document
> and its algorithm.
As said on the call, I'm going to work with Harald (and Justin, and anyone
else interested) to provide requirements and likely a suggested algorithm.
At this point, I believe we'll specify the mechanism for telling the other
side the apparent bandwidth, but leave the algorithm for finding that out
open for innovation.  I plan to propose a baseline algorithm.  On the sender
side, I expect to optionally involve the application in deciding how the
bits are allocated among the channels, and the option to add or remove
channels based on bandwidth.  I expect that we will mandate that if the
application doesn't choose, the rtcweb/webrtc code will.  This is all still
to be hammered out; anyone else interested in helping write the draft
requirements let me know.

> 1. Section 3: As delay based congestion control has been tried a number
> of times and meet with less than stellar success I wonder if you have
> any understanding of how this relates to the issues previously encountered:
Do you have any pointers to these earlier attempts?  My separate experience
with this class has been extremely successful, though perhaps with a set of
real-world use-cases that don't cover the situations you're referring to.
My understanding is that Radvision's NetSense is very similar from their
description of it.

> - Stability (especially on short RTTs)
> - How it competes with TCP flows, which was a real issues for TCP Vegas
> but may be less of an issue here.
Perhaps Google can comment here about what tests they've done.  If anything,
this class of algorithm when faced with aggressive TCP flows will eventually
have problems, because TCP will tend to fill up the buffers at the bottleneck.
Since this algorithm senses buffers starting to fill up, it will tend to back
off before TCP is likely to see a loss event.  Now, it's more complex than
that of course, and multiple flows make it more complex still.  It's also
sensitive to the adaptation rate of this algorithm and speed of backoff.
If I remember, TFRC tends to back off more slowly than TCP but also increase
more slowly; I suspect Google's algorithm is similar.

Web Browser traffic can be especially aggressive at filling large buffers at
a bottleneck router, and will become more so as the initial window of 10
rolls out.


> 2. Section 3, I don't see any discussion of the startup issue. Is my
> assumption correct, that you will simply start at some rate and then
> adapt from that?
Startup is an open question; you have to start somewhere, and there's
no way to always get the start "right".  History of previous sessions may
be useful here; it's theoretically possible to use perhaps some of the
ICE negotiation for a first-level guess or we could send an initial packet
train and measure dispersion.  But that's just speculation; history can be
very wrong if you've changed networks or the other person has a very different
network than the last call (or last call with that person).  I'm not sure if
we should mandate something here, or use something more open to encourage
innovation.

> 3. Section 3, I guess this mechanism would get significant issues with
> any sender side transmission filtering. Burst transmitting I-frames over
> a network, especially for high resolution video can easily result in
> packet loss in switch fabrics if there is any cross traffic in the
> switch fabric. Thus having a bit of send filtering for frame
> transmission is a good idea. Are I understanding it correctly that this
> could result in over-use indication if I has such mechanism which
> filters a packet to a slower transmission rate than what a single or
> small burst can achieve over the same path?
Well, such send filtering amounts to a bottleneck in practice at the sending
node, so that actually might be ok.  I'm assuming you're talking about pacing
the actual send()s of large dumps of fragmented IDRs/i-frames to some
maximum packet rate or bitrate (almost the same thing).  I think the algorithm
looks at all the fragments and the time they came in to calculate the bandwidth.
If so, it would only think it was over-bandwidth if the pacing slowed them down
to less than the actual bottleneck rate; I suspect this would rarely be the case
outside of high-bandwidth LANs at very high resolutions.


>
> 4. Section 4.
>
> "This algorithm is run every time a receive report arrives at the
>     sender, which will happen [[how often do we expect? and why?]].  If
>     no receive report is recieved within [[what timeout?]], the algorithm
>     will take action as if all packets in the interval have been lost.
>     [[does that make sense?]]"
>
> The transmission of receiver reports is highly dependent on RTP profile,
> the RTCP bandwidth, trr-int parameter and any feedback events.
>
> If I start with assuming AVPF which seems reasonable in most browser to
> browser case with our current proposals for RTP support. Then if there
> are no feedback events the transmission of receiver reports will occur
> as often as the bandwidth allows, but no more often than the value of
> trr-int parameter.
>
> Here is might be worth mentioning that I do expect trr-int to be set to
> avoid RTCP reporting to often due to the relatively high RTCP bandwidth
> values that will be set due to the multiplexing of audio and video in a
> single RTP session. Thus avoiding that RTCP rates are as high or larger
> than the audio streams they report in steady state.
Agreed.  This is where my plans to suggest a baseline algorithm that melds
the reception data of all the streams in the PeerConnection may be a
significant advantage over doing bandwidth estimation on each stream
independently.  We'll see if I can make the idea work, but there are some
significant advantages if it does.  If not, we can estimate in each channel
independently as per the Google algorithm.

You can also use rtcp-fb PLI/etc events to hang these reports off of, increasing
the frequency they get through with minimal extra bandwidth use.


> When feedback events occur the stack will have the choice of sending a
> RTCP RR, that is a choice provided due to the reduced size RTCP
> specification included. But if the loss cumulative count diff is
> non-zero it might be worth mandating that the RR/SR is included in any
> such feedback RTCP packet.
Exactly - or a TMMBR with the results of the receiver-side bandwidth
calculations.

> For that reason causing a feedback event when there is losses and
> schedule them using the early algorithm may be a good choice to ensure
> timely reporting of any loss.
>
> If one uses AVP then the RTCP timing rules will give you when you
> transmit RTCP feeedback and thus the parameterization becomes central
> here. A clear issue is if people uses the minimal interval of 5 seconds
> or the 360/Session_BW(in kbps). I would note that 5 seconds is very long
> in adaptation situations.

Yes.


> I think the timeout should be based on the RTT and possible reporting
> rate. But there clearly need to be some upper limit, either explicitly
> or mandating RTCP bandwidth rates.
>
> 5. Section 4:
> "We motivate the packet loss thresholds by noting that if we have
>     small amount of packet losses due to over-use, that amount will soon
>     increase if we don't adjust our bit rate.  Therefore we will soon
>     enough reach above the 10 % threshold and adjust As(i).  However if
>     the packet loss rate does not increase, the losses are probably not
>     related to self-induced channel over-use and therefore we should not
>     react on them."
>
> I am personally worried that a packet loss threshold of 10% is so high
> that it might push a lot of other traffic out of the way. Thus being
> quite aggressive towards other flows.
Yes; I want to visit these tuning parameters and how packet loss is
reacted to and at least tweak it.

> TCP has this relation between the average packet loss rate and the
> highest sustainable rate which can be interpreted as TCP gets more
> sensitive to loss the higher the average data rate becomes. Thus in
> certain situation the sender side algorithm part will be extremely
> aggressive in pushing TCP out of the way. The receiver part may
> compensate for this pretty well in a number of situations.
>
> But, I do wonder why for example the A(i) isn't bounded to be between
> TFRC's rate and some multiple of TFRC rate, not unbounded by other than
> the receiver side algorithm.
Good question; we should look into it.

> I do understand this document is to start a discussion. But I think we
> need to be sensitive to that it is difficult to get even something that
> is self-fair over the wide range of conditions the Internet has to provide.
>
> Thus I would appreciate what issues with for example TFRC that you
> attempt to fix with the various new components you add.
Magnus: Welcome to my little congestion-control bof... :-)

-- 
Randell Jesup
randell-ietf@jesup.org