Re: [rmcat] Brief review of draft-alvestrand-rmcat-congestion-03
Stefan Holmer <holmer@google.com> Mon, 13 July 2015 14:28 UTC
Return-Path: <holmer@google.com>
X-Original-To: rmcat@ietfa.amsl.com
Delivered-To: rmcat@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C9B721B2B29 for <rmcat@ietfa.amsl.com>; Mon, 13 Jul 2015 07:28:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.612
X-Spam-Level: **
X-Spam-Status: No, score=2.612 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, GB_SUMOF=1, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Fl2EtETqse-5 for <rmcat@ietfa.amsl.com>; Mon, 13 Jul 2015 07:27:57 -0700 (PDT)
Received: from mail-ie0-x22d.google.com (mail-ie0-x22d.google.com [IPv6:2607:f8b0:4001:c03::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6AC9D1ACEB0 for <rmcat@ietf.org>; Mon, 13 Jul 2015 07:27:57 -0700 (PDT)
Received: by ietj16 with SMTP id j16so62905867iet.0 for <rmcat@ietf.org>; Mon, 13 Jul 2015 07:27:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-type; bh=+wSLHLtP3XOmjbOqZU1vNBCV9p/zW6ns/1xfBRqDeBc=; b=UHYwuiFhO7bNvCW9xAkPgD6MLfXpyJ71skH83XR8c6Z6sdf92Hd1C9TpwaNNlGKGj+ w6YgVrbd4F/DLbYyQFC9e2Tk6FRMO3nvTpWOdbnazr+ypRghgaviWkdhKt87pVBt3VPJ +Fcf/s6dSn2IiZSPkWm6Yl8NdfNSkM0f7ANC50TtK9HWZ8Ol8iHuh5mATbnnEUimngDG vSq1WKCEh5pU4b2aLOchIvvIdC6Mh6dnSJfmshku1JBalE97qUZXMazDh7zCtNQmcP1k QbDfWDS0cM01IyCxfYnhfMO+QnxCaqEJooLD1Cx6eBavs0tQ80R8uJbVkNCMT9N/3l36 +sjQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-type; bh=+wSLHLtP3XOmjbOqZU1vNBCV9p/zW6ns/1xfBRqDeBc=; b=NOVMeqFt/aIOQGOPG6yPT3olEENRguegqtgdXMYSXwx5i1ISfF+mQqVx4LF5DiHCHz K72hooufryB0oLI+m34bWseb4O4zg47A3b0EGe5h85oyhPzb2P5x2VdPuVY5scDRWBfX wcPv+cD8gZNBg62byUpz17k0XX0EDoIdDPfc/AJxVEjTQKgbQOx9vTbT7aCFEdoHI9G6 kh7+G2QN0Nkhz22oJi06LZRzxjG33GKMgJyfM0SXzUuEUw0qhfVJkQYr85+rffz+ttTJ 1fyiD/5IwvnZmLWejnt/W63IQZHPj4010Ju7RnDxPhMWQT9Nj6btaIz6QEIVrOh764eK oapA==
X-Gm-Message-State: ALoCoQl1zK1cdWKQhNcWiBMUSi4DDXnmvn5K6HuCHRMdmhpdjsWrHkHuAAsQmWAwPfC7MkKNnHug
X-Received: by 10.50.43.226 with SMTP id z2mr13186579igl.65.1436797676138; Mon, 13 Jul 2015 07:27:56 -0700 (PDT)
MIME-Version: 1.0
References: <559FEBCC.3060604@tik.ee.ethz.ch>
In-Reply-To: <559FEBCC.3060604@tik.ee.ethz.ch>
From: Stefan Holmer <holmer@google.com>
Date: Mon, 13 Jul 2015 14:27:46 +0000
Message-ID: <CAEdus3+49dP4F_fzDp089w3+tsm0RJ_MM1n_r0L0Oa2ezBTycg@mail.gmail.com>
To: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>, gaetano.carlucci@poliba.it, Luca De Cicco <l.decicco@poliba.it>, Saverio Mascolo <mascolo@poliba.it>
Content-Type: multipart/alternative; boundary="047d7bfea0f419debf051ac2877c"
Archived-At: <http://mailarchive.ietf.org/arch/msg/rmcat/0J3L4_VIoGUiDwTki_I_7Jjk1LA>
Cc: "rmcat@ietf.org" <rmcat@ietf.org>
Subject: Re: [rmcat] Brief review of draft-alvestrand-rmcat-congestion-03
X-BeenThere: rmcat@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "RTP Media Congestion Avoidance Techniques \(RMCAT\) Working Group discussion list." <rmcat.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rmcat>, <mailto:rmcat-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rmcat/>
List-Post: <mailto:rmcat@ietf.org>
List-Help: <mailto:rmcat-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rmcat>, <mailto:rmcat-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jul 2015 14:28:01 -0000
Hi Mirja, Thanks a lot for this detailed review. I have added some comments inline, but we will definitely take this input for the next revision. /Stefan On Fri, Jul 10, 2015 at 5:59 PM Mirja Kühlewind < mirja.kuehlewind@tik.ee.ethz.ch> wrote: > Hi Stefan, hi all, > > I've had a quick review of draft-alvestrand-rmcat-congestion-03 as an > individual > contributor (not as chairs). I have a couple of high level comments and a > few > more detail once. > > In general I, now after just reading NADA and Scream, found it slightly > hard to > read your draft because you are often use a different terminology and > different > variable names. Especially for the arrival model I wonder if you make > things > more complicated than they are. If I get you right, you effectively also > just > want to estimate the queuing delay which is your case however is denoted > w(i). > While the other approaches simply compare the sender and receiver > timestamps > (which requires synchronized clocks or at least additional mechanism to > compensate for not-synchronized clocks), you look at the inter-arrival > times at > the sender and the receiver and basically estimate the difference between > the > sending rate and receiving rate and therefore... the queue... What I don't > really understand it why to you need to look at groups of packets? > Right, we are looking at queuing delay as well. The main difference is that in GCC we are looking at queuing delay variations, while NADA and Scream are both looking at estimated queuing delay. Our signal is therefore both positive and negative, and can be expected to be zero mean if the queue is "stable". To some degree the groups of packets comes from that the algorithm was initially designed to use RTP timestamps and look at inter-departure and inter-arrival times of video frames. We generalized this by instead looking at packets which were sent at roughly the same time (5 ms window). In the same way we also group packets by arrival-time under the assumption that packets received close together in time don't add much more value about the delay variation. This is something we decided to do to better handle burstiness that we have been observing on wireless networks, where packets can be held up on the network for hundreds of milliseconds and then be delivered back to back. The filter typically won't handle those big spikes if we process one packet at a time, but if we wait a bit and process a group of packets, we will see that we recover from the spike immediately and that it wasn't caused by us sending too much. I hope this makes it more clear. Do you think it would be good to discuss this in more detail in the draft? > > Further you are using a Kalman filter. That's seem also complicated. Other > schemes use much simpler filters (like the min of the last few values or a > mean) > and achieve acceptable estimates as well. I can only say this again: > simplicity > is also an important goal when designing a successful mechanism. > I agree that the model we currently are proposing can be simplified. Back when this algorithm was based on video frames, there was a need to reduce the effect of large frames causing delay spikes simply beacuse they were large. Now that we are typically using pacing to limit this effect, there isn't much to be gained from this complicated filter. However, I do believe it can be useful to filter more aggressively if the delay varies a lot (a lot of jitter), and less if it's a clean signal. For instance by choosing alpha in avg_delay_variation = alpha*packet_delay + (1-alpha)*avg_delay_variation. I haven't tried a min filter, but from the top of my head I can't see why it would have any obvious benefits? Anyway, I agree it makes sense to look at how to simplify this. > > As just mention I'd really would like to see more meaningful variable > names. > That really helps the reader. Examples are: > - gamma_1(i) -> effectively this your delay target, often called > target_delay; > I guess it could be seen as a target, but in that case a delay variation target. However, we are comparing both to positive gamma_1 and negative gamma_1, which makes it a bit strange to see it as a target, in my opinion. Our goal is really to be inside the range [-gamma_1, gamma_1]. Maybe calling it threshold instead? > don't think you need denote this as a function by writing (i) because it's > simply a variable in our implementation, right? > That's true. If that's the way drafts are typically written I wouldn't mind changing style. :) > - A_hat -> this reflects the target sending rate where the other rmcat > proposal > usually maintain a congestion window for, called cwnd. Btw. how do you > maintain > the send-out process? Up to this point we haven't specified the send-out process, but we also suggest using a pacer similarly to other drafts. It wasn't entirely clear to me that this was supposed to be part of the proposals, but we can work on getting that specified as well in a future revision. Our algorithm computes a target rate for the encoder, and the pacer also has the target rate as input. > Is this ack-clocked/report-clocked? In this case it would > make sense to maintain a window in bytes. Or is it timer-based where you > calculate the next send-out time based on the rate? Anyway I'd rather like > to > see a longer more meaningful name like target-rate or output_rate (not > sure you > really need the _hat because effectively all your variables are _hat). > Sure, target_rate sounds like a good name to me. > - R_hat(i) -> for window based schemes this information is kept as > flight_size > (number of bytes in flight) > That is the same thing only if your window based scheme is ack-clocked, right? If not you may have increasingly more packets in flight without knowing if packets are coming in at the other end. Anyway, I think we can call this throughput or something? > > There is one sentence that really confuses me (detailed comment). In 4.4 > you say > "On every update the delay-based estimate of the available bandwidth > is increased, either multiplicatively or additively, depending on its > current state." > Here you mean A_hat(i), right? Because I would call this the target rate, > while > R_hat(i) would be an delay-based estimate...? > Yes, A_hat(i) is what we refer to. This will be a lot more clear if we do the renaming we have been talking about above. > > A few more detailed questions regarding the calculation of R_hat(i). Is > that > done at receiver-side? Depends on where the delay-based controller is implemented. If you have per packet acks it can easily be implemented at the sender, which is what we suggest. If you are using REMB instead it's better implemented at the receiver. In the final version we will probably only have one recommended way of implementing this. > Why are you using a window between 0.5 and 1 sec? Shouldn't that depend on the RTT and at best be one RTT? Maybe. That would allow us to quickly decrease to the right bitrate if we detect congestion. However, if there's a lot of jitter unrelated to congestion, we may back down too far. Taking the average over a larger interval may help avoid reducing the target_rate too much, at the expense of sometimes not reducing enough. Do you agree? > Can you maybe give > pseudo code for basically the second paragraph on page 11? > Yes, that seems like a good thing to provide. > > In general I think it would helpful to not only provide mathematical > formulas > but also some pseudo code that is closer to the implementation (in a > couple of > cases). > > Next and final point is on number, there are a couple of numbers where it > is not > clear where they came from and why they have been chosen. I guess most of > these > numbers should be parameter that could be tested in different scenarios. > These are: > - update interval for R_hat(i): 0.5-1s (see above) > - 8% increase factor -> really no idea with this has been chosen; actually > seems > rather low to me... > It has been fairly arbitrarily chosen, mostly to avoid being too aggressive especially since we can get into the multiplicative increase mode at any point in time (not only during initial ramp-up). We may want to consider making this larger as we start to converge on the other pieces of the algorithm. > - 100ms add-on to RTT for the response time -> really don't understand why > you > don't simply take the RTT; however this should at least not be a fixed > value > from my point of view (but a factor...?) > I see it like this; the time it takes for the detector to find out if a rate increase caused congestion, is the sum of the time it takes for the sender to send at the new rate, the time it takes those packets to start arriving at the receiver, the time it takes for the receiver to send feedback back to the sender, and the time it takes for the sender to filter the delay signal and detect the congestion. In the current draft we split this into two components, RTT for the time it takes for packets to go to the receiver and back and 100 ms for the time it takes for the detector to filter/detect a change in delay variations and for the codec to actually provide the requested bitrate. Does that make sense? > - maximum increase rate of 0.5 packets per response time (TCP RENO should > actually increase by 1 packet/RTT which is now fixed in the Linux kernel, > I think) > Right. We wanted to be a bit more cautious as we don't want to cause too much jitter. This can also be tuned as we start to converge on other parts, but how aggressive we can be here depends a bit on how the filter and detection is tuned, and so on. > - upper limit of 1.5*R_hat(i) for A_hat(i)...? > We have to allow our target_bitrate to be larger than what the encoder is producing (or in this case what the current throughput is). The encoder may produce too little, and this shouldn't have a substantial impact on the performance. That's why we have this relaxation. At the same time, we don't want the target_bitrate to become too much larger than what we are actually sending, as we don't know if it would be possible to send that much safely. > - alpha [0.8, 0.95] where 0.86 is recommended...??? Reno uses 0.5, Cubic > should > use 0.7. Where does this recommendation comes from? It has been picked in a similar way as the additive increase and multiplicative increase parameters. Basically manually tuned to get the behavior we wanted. > Btw. the decrease factor is > usually called beta while the increase factor/rate is called alpha in > AIMD/MIMD > systems. > Good point. > - And the number that worries me most is the loss level threshold for the > loss-based controller... 2% and 10% ???? That's a lot! TFRC is not > comparable in > this case because you are only looking at the loss rate since the last > report > while TFRC looks at an average over a slightly longer time scale. With > loss-based congestion control you have this probing behavior where there > are no > losses for a couple of RTTs (depending on the size of the congestion > window/link > speed) and then there are a couple of losses within one window. This > should give > you a relative low average loss rate (much belower than 10% and usually > even > below 1%) and the congestion control should reduce its sending rate > immediately > as soon as one loss is detected! If you now induce a loss rate of 2-10% > this > should starve all other competing flows! I'd say, for stability reasons > you need > to decrease as soon as there is loss detected. As you already apply this > multiplicative decrease rule (A_hat(i) = alpha * R_hat(i) ) for delay, I > really > don't see why you are not using the same rule for loss...? I agree about all of this, and that is why we mention this under Further work. We want to combine the control of both delay and loss in a better way, and make it a loss less tolerant to packet loss. > I've quickly looked > at your evaluation results and sees that GCC is able to co-exists with TCP. > However I'd assume that in this case still only the delay-based controller > controls the rate and the loss-based controller doesn't kick in at all. > Can you > confirm/check this? > I can confirm this, which is the main reason for reworking the loss-based controller. Actually if we tune the K_u and K_d parameters a bit differently, we can make the algorithm starve TCP flows, which of course shouldn't be possible. > > Now looking at the results, I'd like to request three more things > (Thanks!): > No problem, we'll get back to you with these plots. At the moment we're working on getting all the evaluation test cases committed to our code base. Once that is done this should be quickly done. > - Can you run one more simulation with more than one TCP flow? - What happen if the TCP stops after a while? Does the target delay value > gamma > actually go down again? Can you plot gamma as well in this sceanrio? > It does, we'll provide some plots. > - And can you show the plots on slide 33 (multiple GCC flows) for more > than 120s > because the flows seem to de-converge again. > Right, that is not the case, but I can see how you may get that impression. :) > > Thanks for all the work though! > > Mirja > > > > >
- [rmcat] Brief review of draft-alvestrand-rmcat-co… Mirja Kühlewind
- Re: [rmcat] Brief review of draft-alvestrand-rmca… Stefan Holmer
- Re: [rmcat] Brief review of draft-alvestrand-rmca… Mirja Kühlewind