Re: [rmcat] Brief review of draft-alvestrand-rmcat-congestion-03

Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> Mon, 13 July 2015 20:19 UTC

Return-Path: <mirja.kuehlewind@tik.ee.ethz.ch>
X-Original-To: rmcat@ietfa.amsl.com
Delivered-To: rmcat@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 387501A887A for <rmcat@ietfa.amsl.com>; Mon, 13 Jul 2015 13:19:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.91
X-Spam-Level:
X-Spam-Status: No, score=-2.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_SUMOF=1, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-2.3, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qWUMR4LS-tpz for <rmcat@ietfa.amsl.com>; Mon, 13 Jul 2015 13:19:08 -0700 (PDT)
Received: from smtp.ee.ethz.ch (smtp.ee.ethz.ch [129.132.2.219]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DB6D91A886B for <rmcat@ietf.org>; Mon, 13 Jul 2015 13:19:07 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by smtp.ee.ethz.ch (Postfix) with ESMTP id 748FED9304; Mon, 13 Jul 2015 22:19:06 +0200 (MEST)
X-Virus-Scanned: by amavisd-new on smtp.ee.ethz.ch
Received: from smtp.ee.ethz.ch ([127.0.0.1]) by localhost (.ee.ethz.ch [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 92HKTvVioUnr; Mon, 13 Jul 2015 22:19:06 +0200 (MEST)
Received: from [192.168.178.33] (x4d02d263.dyn.telefonica.de [77.2.210.99]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: mirjak) by smtp.ee.ethz.ch (Postfix) with ESMTPSA id 715D6D9303; Mon, 13 Jul 2015 22:19:05 +0200 (MEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\))
From: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
In-Reply-To: <CAEdus3+49dP4F_fzDp089w3+tsm0RJ_MM1n_r0L0Oa2ezBTycg@mail.gmail.com>
Date: Mon, 13 Jul 2015 22:19:04 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <3B0524F2-98F1-42DA-BCFB-583C03FD2B8D@tik.ee.ethz.ch>
References: <559FEBCC.3060604@tik.ee.ethz.ch> <CAEdus3+49dP4F_fzDp089w3+tsm0RJ_MM1n_r0L0Oa2ezBTycg@mail.gmail.com>
To: Stefan Holmer <holmer@google.com>
X-Mailer: Apple Mail (2.2102)
Archived-At: <http://mailarchive.ietf.org/arch/msg/rmcat/sj2ctvO-AIpAsYAWvad3RtyV0po>
Cc: "rmcat@ietf.org" <rmcat@ietf.org>, Saverio Mascolo <mascolo@poliba.it>, Luca De Cicco <l.decicco@poliba.it>, gaetano.carlucci@poliba.it
Subject: Re: [rmcat] Brief review of draft-alvestrand-rmcat-congestion-03
X-BeenThere: rmcat@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "RTP Media Congestion Avoidance Techniques \(RMCAT\) Working Group discussion list." <rmcat.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rmcat>, <mailto:rmcat-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rmcat/>
List-Post: <mailto:rmcat@ietf.org>
List-Help: <mailto:rmcat-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rmcat>, <mailto:rmcat-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jul 2015 20:19:12 -0000

Hi Stefan,

see some comments/answer below. (Don’t know why my mail program was not able distinguish my original mail from you reply anymore but I hope that’s still readable).

Mirja

> 
> In general I, now after just reading NADA and Scream, found it slightly hard to
> read your draft because you are often use a different terminology and different
> variable names. Especially for the arrival model I wonder if you make things
> more complicated than they are. If I get you right, you effectively also just
> want to estimate the queuing delay which is your case however is denoted w(i).
> While the other approaches simply compare the sender and receiver timestamps
> (which requires synchronized clocks or at least additional mechanism to
> compensate for not-synchronized clocks), you look at the inter-arrival times at
> the sender and the receiver and basically estimate the difference between the
> sending rate and receiving rate and therefore... the queue... What I don't
> really understand it why to you need to look at groups of packets?
> 
> Right, we are looking at queuing delay as well. The main difference is that in GCC we are looking at queuing delay variations, while NADA and Scream are both looking at estimated queuing delay. Our signal is therefore both positive and negative, and can be expected to be zero mean if the queue is "stable“.

Just FYI, Scream is looking to some extend at the, in this case called, trend of the queuing delay as well, see section 4.1.2.1. which can also be positive or negative. Would really be nice to better understand similarities and differences between the proposals.

> 
> To some degree the groups of packets comes from that the algorithm was initially designed to use RTP timestamps and look at inter-departure and inter-arrival times of video frames. We generalized this by instead looking at packets which were sent at roughly the same time (5 ms window).
> 
> In the same way we also group packets by arrival-time under the assumption that packets received close together in time don't add much more value about the delay variation. This is something we decided to do to better handle burstiness that we have been observing on wireless networks, where packets can be held up on the network for hundreds of milliseconds and then be delivered back to back. The filter typically won't handle those big spikes if we process one packet at a time, but if we wait a bit and process a group of packets, we will see that we recover from the spike immediately and that it wasn't caused by us sending too much.
> 
> I hope this makes it more clear. Do you think it would be good to discuss this in more detail in the draft?

Yes, some more discussion would definitely be useful here. Especially making it more clear that this is an optimization for wireless networks would actually help. Maybe, at least for presentation purposes, you could also try to describe a basic algorithm which is as simple as possible to still work well and e.g. describe this part as an (optional) add-on… not sure that makes sense…

>  
> 
> Further you are using a Kalman filter. That's seem also complicated. Other
> schemes use much simpler filters (like the min of the last few values or a mean)
> and achieve acceptable estimates as well. I can only say this again: simplicity
> is also an important goal when designing a successful mechanism.
> 
> I agree that the model we currently are proposing can be simplified. Back when this algorithm was based on video frames, there was a need to reduce the effect of large frames causing delay spikes simply beacuse they were large. Now that we are typically using pacing to limit this effect, there isn't much to be gained from this complicated filter.
> 
> However, I do believe it can be useful to filter more aggressively if the delay varies a lot (a lot of jitter), and less if it's a clean signal. For instance by choosing alpha in avg_delay_variation = alpha*packet_delay + (1-alpha)*avg_delay_variation.
> 
> I haven't tried a min filter, but from the top of my head I can't see why it would have any obvious benefits?

The only benefit would be simplicity. Maybe take the same approach here than just proposed above. Just say in the ‚base algorithm‘ that there needs to be a filter and then discuss later in an own section that you propose a Kalman filter. Just a thought...

> 
> Anyway, I agree it makes sense to look at how to simplify this.
>  
> 
> As just mention I'd really would like to see more meaningful variable names.
> That really helps the reader. Examples are:
> - gamma_1(i) -> effectively this your delay target, often called target_delay;
> 
> I guess it could be seen as a target, but in that case a delay variation target. However, we are comparing both to positive gamma_1 and negative gamma_1, which makes it a bit strange to see it as a target, in my opinion. Our goal is really to be inside the range [-gamma_1, gamma_1]. Maybe calling it threshold instead?

yes, maybe call it delay_var_thres …?

>  
> don't think you need denote this as a function by writing (i) because it's
> simply a variable in our implementation, right?
> 
> That's true. If that's the way drafts are typically written I wouldn't mind changing style. :)
>  
> - A_hat -> this reflects the target sending rate where the other rmcat proposal
> usually maintain a congestion window for, called cwnd. Btw. how do you maintain
> the send-out process?
> 
> Up to this point we haven't specified the send-out process, but we also suggest using a pacer similarly to other drafts. It wasn't entirely clear to me that this was supposed to be part of the proposals, but we can work on getting that specified as well in a future revision
> 
> Our algorithm computes a target rate for the encoder, and the pacer also has the target rate as input.

actually the pacing is e.g. in scream in the appendix. However, the interface to the video coder should probably be part of the draft (for now). The other schemes basically have a different rate as input for the encoder than the actual sending rate. If that’s the same in your case, I guess that’s okay as well.

>  
> Is this ack-clocked/report-clocked? In this case it would
> make sense to maintain a window in bytes. Or is it timer-based where you
> calculate the next send-out time based on the rate? Anyway I'd rather like to
> see a longer more meaningful name like target-rate or output_rate (not sure you
> really need the _hat because effectively all your variables are _hat).
> 
> Sure, target_rate sounds like a good name to me.
>  
> - R_hat(i) -> for window based schemes this information is kept as flight_size
> (number of bytes in flight)
> 
> That is the same thing only if your window based scheme is ack-clocked, right? If not you may have increasingly more packets in flight without knowing if packets are coming in at the other end. Anyway, I think we can call this throughput or something?

No, only difference is flight size is a value in bytes while you are looking at a rate. The difference between packets in flight and actual packet as the receiver is e.g. called pipe_ack in draft-ietf-tcpm-newcwv.

>  
> 
> There is one sentence that really confuses me (detailed comment). In 4.4 you say
> "On every update the delay-based estimate of the available bandwidth
>     is increased, either multiplicatively or additively, depending on its
>     current state."
> Here you mean A_hat(i), right? Because I would call this the target rate, while
> R_hat(i) would be an delay-based estimate...?
> 
> Yes, A_hat(i) is what we refer to. This will be a lot more clear if we do the renaming we have been talking about above.
>  
> 
> A few more detailed questions regarding the calculation of R_hat(i). Is that
> done at receiver-side?
> 
> Depends on where the delay-based controller is implemented. If you have per packet acks it can easily be implemented at the sender, which is what we suggest. If you are using REMB instead it's better implemented at the receiver. In the final version we will probably only have one recommended way of implementing this.
>  
> Why are you using a window between 0.5 and 1 sec?
> Shouldn't that depend on the RTT and at best be one RTT?
> 
> Maybe. That would allow us to quickly decrease to the right bitrate if we detect congestion. However, if there's a lot of jitter unrelated to congestion, we may back down too far. Taking the average over a larger interval may help avoid reducing the target_rate too much, at the expense of sometimes not reducing enough. Do you agree?

Okay. Not sure I agree but that’s maybe something to test. In any case this should not be a fixed value but something like 3*RTT.

>  
> Can you maybe give
> pseudo code for basically the second paragraph on page 11?
> 
> Yes, that seems like a good thing to provide.
>  
> 
> In general I think it would helpful to not only provide mathematical formulas
> but also some pseudo code that is closer to the implementation (in a couple of
> cases).
> 
> Next and final point is on number, there are a couple of numbers where it is not
> clear where they came from and why they have been chosen. I guess most of these
> numbers should be parameter that could be tested in different scenarios. These are:
> - update interval for R_hat(i): 0.5-1s (see above)
> - 8% increase factor -> really no idea with this has been chosen; actually seems
> rather low to me...
> 
> It has been fairly arbitrarily chosen, mostly to avoid being too aggressive especially since we can get into the multiplicative increase mode at any point in time (not only during initial ramp-up). We may want to consider making this larger as we start to converge on the other pieces of the algorithm.

Probably also worth some experimentation.

>  
> - 100ms add-on to RTT for the response time -> really don't understand why you
> don't simply take the RTT; however this should at least not be a fixed value
> from my point of view (but a factor...?)
> 
> I see it like this; the time it takes for the detector to find out if a rate increase caused congestion, is the sum of the time it takes for the sender to send at the new rate, the time it takes those packets to start arriving at the receiver, the time it takes for the receiver to send feedback back to the sender, and the time it takes for the sender to filter the delay signal and detect the congestion.
> 
> In the current draft we split this into two components, RTT for the time it takes for packets to go to the receiver and back and 100 ms for the time it takes for the detector to filter/detect a change in delay variations and for the codec to actually provide the requested bitrate.
> 
> Does that make sense?

Ah okay. That makes more sense.

>  
> - maximum increase rate of 0.5 packets per response time (TCP RENO should
> actually increase by 1 packet/RTT which is now fixed in the Linux kernel, I think)
> 
> Right. We wanted to be a bit more cautious as we don't want to cause too much jitter. This can also be tuned as we start to converge on other parts, but how aggressive we can be here depends a bit on how the filter and detection is tuned, and so on.

Maybe worth trying also a maximum increase rate of 1 packet per RTT.

>  
> - upper limit of 1.5*R_hat(i) for A_hat(i)...?
> 
> We have to allow our target_bitrate to be larger than what the encoder is producing (or in this case what the current throughput is). The encoder may produce too little, and this shouldn't have a substantial impact on the performance. That's why we have this relaxation. At the same time, we don't want the target_bitrate to become too much larger than what we are actually sending, as we don't know if it would be possible to send that much safely.

I understand why you have a limit, the question was only why is the limit 1.5*R. But I guess that the one value/parameter that seems to make most sense to me and doesn’t need to much further explanation.

>  
> - alpha [0.8, 0.95] where 0.86 is recommended...??? Reno uses 0.5, Cubic should
> use 0.7. Where does this recommendation comes from?
> 
> It has been picked in a similar way as the additive increase and multiplicative increase parameters. Basically manually tuned to get the behavior we wanted. 

Okay. So probably also depends strongly on the other parameters, e.g. if you use a higher maximum increase rate, you might also use a stronger decrease… maybe put slightly more explanation in the draft.

>  
> Btw. the decrease factor is
> usually called beta while the increase factor/rate is called alpha in AIMD/MIMD
> systems.
> 
> Good point.
>  
> - And the number that worries me most is the loss level threshold for the
> loss-based controller... 2% and 10% ???? That's a lot! TFRC is not comparable in
> this case because you are only looking at the loss rate since the last report
> while TFRC looks at an average over a slightly longer time scale. With
> loss-based congestion control you have this probing behavior where there are no
> losses for a couple of RTTs (depending on the size of the congestion window/link
> speed) and then there are a couple of losses within one window. This should give
> you a relative low average loss rate (much belower than 10% and usually even
> below 1%) and the congestion control should reduce its sending rate immediately
> as soon as one loss is detected! If you now induce a loss rate of 2-10% this
> should starve all other competing flows! I'd say, for stability reasons you need
> to decrease as soon as there is loss detected. As you already apply this
> multiplicative decrease rule (A_hat(i) = alpha * R_hat(i) ) for delay, I really
> don't see why you are not using the same rule for loss...?
> 
> I agree about all of this, and that is why we mention this under Further work. We want to combine the control of both delay and loss in a better way, and make it a loss less tolerant to packet loss.

Good!

>  
> I've quickly looked
> at your evaluation results and sees that GCC is able to co-exists with TCP.
> However I'd assume that in this case still only the delay-based controller
> controls the rate and the loss-based controller doesn't kick in at all. Can you
> confirm/check this?
> 
> I can confirm this, which is the main reason for reworking the loss-based controller. Actually if we tune the K_u and K_d parameters a bit differently, we can make the algorithm starve TCP flows, which of course shouldn't be possible.
>  
> 
> Now looking at the results, I'd like to request three more things (Thanks!):
> 
> No problem, we'll get back to you with these plots. At the moment we're working on getting all the evaluation test cases committed to our code base. Once that is done this should be quickly done.
>  
> - Can you run one more simulation with more than one TCP flow?
> - What happen if the TCP stops after a while? Does the target delay value gamma
> actually go down again? Can you plot gamma as well in this sceanrio?
> 
> It does, we'll provide some plots.
>  
> - And can you show the plots on slide 33 (multiple GCC flows) for more than 120s
> because the flows seem to de-converge again.
> 
> Right, that is not the case, but I can see how you may get that impression. :)

Great! See you at the interim!


>  
> 
> Thanks for all the work though!
> 
> Mirja
> 
> 
> 
>