Re: [rmcat] delay recovery phase

"Van Der Auwera, Geert" <geertv@qti.qualcomm.com> Mon, 03 November 2014 19:22 UTC

Return-Path: <geertv@qti.qualcomm.com>
X-Original-To: rmcat@ietfa.amsl.com
Delivered-To: rmcat@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D66A91A711A for <rmcat@ietfa.amsl.com>; Mon, 3 Nov 2014 11:22:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.734
X-Spam-Level:
X-Spam-Status: No, score=-6.734 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, HTML_OBFUSCATE_05_10=0.26, J_CHICKENPOX_25=0.6, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.594, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qrV8GQnVfDJ0 for <rmcat@ietfa.amsl.com>; Mon, 3 Nov 2014 11:22:39 -0800 (PST)
Received: from sabertooth02.qualcomm.com (sabertooth02.qualcomm.com [65.197.215.38]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 171151A00CF for <rmcat@ietf.org>; Mon, 3 Nov 2014 11:22:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=qti.qualcomm.com; i=@qti.qualcomm.com; q=dns/txt; s=qcdkim; t=1415042559; x=1446578559; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=yN2OxZlt9N2TRVNe/tIP0JBtrbIN/sj/XX2vWmJ4Ebc=; b=SuRv7lWAHRVBrGS3U4aNHXnGA1kSZVgCHGwHIsAQM2BzVNcVBoUmWFMx klCUlsmH64xXqMFncBCJxXXwEjKdSPVhPjZQ6TzY7PPAOXvDzuEdIS0c2 5WicM2sDJV+bopGVstrKbQuUDLoKDeX3pOFZGIOr+u1N2I/V4DPTZ/gCQ A=;
X-IronPort-AV: E=McAfee;i="5600,1067,7611"; a="78081568"
Received: from ironmsg01-lv.qualcomm.com ([10.47.202.180]) by sabertooth02.qualcomm.com with ESMTP; 03 Nov 2014 11:22:38 -0800
X-IronPort-AV: E=Sophos; i="5.07,308,1413270000"; d="scan'208,217"; a="31673131"
Received: from nasanexhc07.na.qualcomm.com ([172.30.39.190]) by ironmsg01-lv.qualcomm.com with ESMTP/TLS/RC4-SHA; 03 Nov 2014 11:22:38 -0800
Received: from nasanexm01a.na.qualcomm.com (129.46.53.228) by nasanexhc07.na.qualcomm.com (172.30.39.190) with Microsoft SMTP Server (TLS) id 14.3.181.6; Mon, 3 Nov 2014 11:22:37 -0800
Received: from NASANEXM01E.na.qualcomm.com (10.46.201.191) by nasanexm01a.na.qualcomm.com (129.46.53.228) with Microsoft SMTP Server (TLS) id 15.0.913.22; Mon, 3 Nov 2014 11:22:37 -0800
Received: from NASANEXM01E.na.qualcomm.com ([10.46.201.191]) by NASANEXM01E.na.qualcomm.com ([10.46.201.191]) with mapi id 15.00.0913.011; Mon, 3 Nov 2014 11:22:37 -0800
From: "Van Der Auwera, Geert" <geertv@qti.qualcomm.com>
To: Stefan Holmer <stefan@webrtc.org>, Randell Jesup <randell-ietf@jesup.org>, "rmcat@ietf.org" <rmcat@ietf.org>
Thread-Topic: [rmcat] delay recovery phase
Thread-Index: Ac/0h1WVO6v5I3xETHieiC/HSBKY6ABso9CAAET6qvwAEymuIA==
Date: Mon, 03 Nov 2014 19:22:36 +0000
Message-ID: <f74faf8784f24b97bcb47b39d9b03afc@NASANEXM01E.na.qualcomm.com>
References: <479bbc3c4afe41cc8c8226c71a96de2c@NASANEXM01E.na.qualcomm.com> <54552241.2080400@jesup.org> <CAEdus3+XMFPZH7kyUscg5xQJoWE8ywcDFVaPzyemT474mXGYMQ@mail.gmail.com>
In-Reply-To: <CAEdus3+XMFPZH7kyUscg5xQJoWE8ywcDFVaPzyemT474mXGYMQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.80.80.8]
Content-Type: multipart/alternative; boundary="_000_f74faf8784f24b97bcb47b39d9b03afcNASANEXM01Enaqualcommco_"
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/rmcat/4Anw8IghF7ae6JwcaVmG4AU8GQM
Subject: Re: [rmcat] delay recovery phase
X-BeenThere: rmcat@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "RTP Media Congestion Avoidance Techniques \(RMCAT\) Working Group discussion list." <rmcat.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rmcat>, <mailto:rmcat-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rmcat/>
List-Post: <mailto:rmcat@ietf.org>
List-Help: <mailto:rmcat-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rmcat>, <mailto:rmcat-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Nov 2014 19:22:43 -0000

Hi Stefan,

Please see [GV] for response. Thanks.


Sounds interesting.

In https://tools.ietf.org/html/draft-alvestrand-rmcat-congestion-02#page-10 we go into a HOLD state where the sender should keep its bitrate constant while waiting for buffers to drain. We don't try to estimate the time needed to drain the buffers, but instead look at the delay change estimate m(i); if it's less than zero buffers are likely draining.
[GV] I think there are two cases to consider. The first one is probing, where the rate is increased slightly step by step until the start of congestion is detected by the receiver. In this case, the delay buildup is likely small and it can be easily recovered. The second case, is a significant rate drop and delay buildup. With the same approach as in the first case, the delay buildup may decrease too slowly and instead a more aggressive delay recovery through much lower sending rate (or frame skipping) is needed.

On Sat Nov 01 2014 at 7:11:53 PM Randell Jesup <randell-ietf@jesup.org<mailto:randell-ietf@jesup.org>> wrote:
On 10/30/2014 8:37 PM, Van Der Auwera, Geert wrote:
Hi all,

I would like to start a discussion about potential sender CC behavior when the link rate suddenly down-switches and as a result there is significant delay buildup due to receiver CC detection delay, feedback message delay, etc. Assuming that the sender receives a TMMBR message (or similar) with the new max target bit rate, in this approach, the sender initiates a delay recovery phase (rate undershoot) to recover the delay of the buffered data as described in more details below. If people think that this topic is of interest, I could provide experimental results during the meeting (5min. presentation). Thanks.

This sounds interesting, especially as rate-switching is far more common in wireless scenarios, and unlike most (not all!) WiFi usecases, 3G/4G/etc connections may well be the bottleneck link.  (At low-signal-strength or high-contention WiFi may also cause this on a rate switch, I believe).



Summary:

At time instant t0, the link rate decreases suddenly from R0 to R1. Since there is a response delay ΔT at the sender, the sending rate is decreased at time t1 (t0 + ΔT). The delay increases fast from D0 to D1 during ΔT. To reduce the built up delay quickly, the sending rate is decreased to Ru (<R1) at time t1 for a period of time ΔTu during which the delay is reduced. After time ΔTu the rate is increased to R1.

The problem is to estimate ΔT at the sender side in order to compute ΔTu. The receiver sends a minimal compound RTCP packet including the RR and TMMBR message to the sender with the estimated maximum bit rate for the forward link. Typically, the receiver will send this message immediately after congestion is detected. With this information, the sender can make an estimate of ΔT as the time difference between the sending time of the RTCP SR (referenced in the RR by LSR) and the time that the RR+TMMBR is received, assuming that the congestion was detected at the receiver side after this particular SR was received.

Note this is similar to a new fixed-rate flow at the bottleneck link that uses delta-R bandwidth; the difference would be in packet drops (a flow would share in these).  The other part is that the receiver may not know the rate changed.  In that case this may collapse to an existing somewhat common case - not the same as a new TCP flow, but a new non-congestion-controlled flow (such as many existing VoIP flows - but perhaps of higher magnitude).

Receiver unable to know of rate change, or unable to know the magnitude of it:
X1 = time link changed (one direction?  both directions?) by delta-R
X2 = time receiver recognizes that delay is increasing.  If delta-R is large (and that's the interesting case here), this would be relatively short (a few inter-frame-reception times), though the actual duration will depend on the algorithm being used.
(receiver sends RTCP message)
X3 = time sender receives RTCP that indicates rate change (which depends on the current congestion and base one-way delay in the reverse direction - note that if both directions downshift (which may be common), and media is flowing in both directions (common), then delay may be increasing fast in the reverse direction as well!)
X4 = time sender is able to reduce the sending rate (typically at the next video frame to be encoded)
X5 = time receiver sees lower sending rate and delay starts to decrease
X6 = time delay is gone
X7 = time receiver notices delay has stopped decreasing & sends confirming RTCP
X8 = time sender gets RTCP indicating delay has stabilized

One may presume that the receiver will send additional RTCPs with updated info during this, so the sender will have an idea of when the receiver started to see improvement and how fast things are improving.

Another interesting question is that even if the receiver knows the raw link rate changed from 1Mbps to 500Kbps, what does that map to in effective maximum rate?  On wifi, even un-contended, a 1Mbps link rate can't get anywhere near 1Mbps throughput of user data.  So a receiver (depending what the link is, and what it knows) may need to downrate, and it may need to guess at the down-rating.   In addition, if the existing link had a max of 1Mbps, and the current congestion control estimate is 500Kbps, and the link then drops to 500Kbps - what should the sender do? Nothing?  Drop to 250?  Preemptively drop "some" and then wait for confirmation of delay-rate change via normal CC to estimate the rate at which delay spiked, and use that to improve the estimate of delay buildup it'll need to remove?

Fun fun fun...


Ignoring the theoretical, and ignoring the "we know the rate" cases, we're looking at the X1-8 case above.  In this case, using the magnitude of the delay increase rate (slope), time the estimated time from X1 to X4 we can estimate the amount of delay "overhang" that has built up, and we need to drain.  In the hypothetical above (1Mbps to 500Kbps), and with an RTT of 100ms (longish but not that unusual), and an X1-X4 time of 100 (X1->X2) + 70 (X2->X3 assuming little congestion in the reverse path) + 50 (X3->X4 assuming 10fps and random point in the timeline) = 220ms before the sender can react.  However, the bottleneck isn't at the sender side, it's at the receiver, and so you need to add the 1/2 RTT in for the lowered rate to hit the congested access link at the receiver, so +50ms = 270ms  (realize these are all rough estimates of a specific case).

270ms at 2x the link rate will mean ~270ms of added delay (a lot!), and that means 0.27*500Kbps = 130Kb of buffered data in the link.  To drain that "quickly" requires a major reduction in bitrate.  Options include: cutting the send rate to ~ 1/2 the link rate:  send at 250Kbps.  Delay reduction to 0 added delay would be ~130/250 or ~0.5 seconds to drain.  The sender could estimate this from the above info, and start raising the bitrate (say to 80 or 90% of the expected new bitrate, or just ramping up) after a bit more than the estimated time-to-drain, anticipating the result.  There will be pluses and minuses to this behavior depending on quality of estimate (some of which could be factored in) and other traffic on the access link (which might be estimated from the pre-change equilibrium if we know the link rate, which we may not).

Also, following reports to the initial one can let it refine the estimate of the amount of buffering that will occur, and change either the send rate or the time-to-start-increasing or both.

The "safe" solution is to not increase until you get info from the receiver that the delay has stabilized for "long enough", but that might add considerable time.

One other solution (given the non-continuous nature of video) is for video streams to skip a frame or two.  Assuming 500Kbps, minus overhead and audio (say total 50Kbps) at 10fps = 45Kb per frame.  So 130Kb ~= 3 frames; a single frame skip will dramatically cut the recovery time; two plus a small over-correction may drain it out very fast.  Note also the jitter buffer on the receiver side will have adapted, and so rate reductions will take time to be seen in an A/V usecase (data will start coming in at faster-than-realtime, and the adaptive jitter buffer will need to run audio faster to compensate, and it can only do it so fast - which may depend on who's talking, etc).  This may recommend in many cases to not try to handle it *all* with frameskips or even mostly (though if the sender and receiver are matched, the sender may be able to anticipate the receiver's reaction and how it handles who's talking, and jitter buffer reduction rate).

Knowing the actual link rate (i.e. TMMBR/etc) may let it more accurately model some of the estimates in the X1-X8 case, and reduce X1-X2 to a small number.

(The comments of a frustrated protocol designer who doesn't have anywhere near the time to work on protocols.... :-/ )



--

Randell Jesup -- rjesup a t mozilla d o t com