Re: [rmcat] Brief review of draft-alvestrand-rmcat-congestion-03

Stefan Holmer <holmer@google.com> Mon, 13 July 2015 14:28 UTC

MIME-Version: 1.0
References: <559FEBCC.3060604@tik.ee.ethz.ch>
In-Reply-To: <559FEBCC.3060604@tik.ee.ethz.ch>
From: Stefan Holmer <holmer@google.com>
Date: Mon, 13 Jul 2015 14:27:46 +0000
Message-ID: <CAEdus3+49dP4F_fzDp089w3+tsm0RJ_MM1n_r0L0Oa2ezBTycg@mail.gmail.com>
To: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>, gaetano.carlucci@poliba.it, Luca De Cicco <l.decicco@poliba.it>, Saverio Mascolo <mascolo@poliba.it>
Content-Type: multipart/alternative; boundary="047d7bfea0f419debf051ac2877c"
Archived-At: <http://mailarchive.ietf.org/arch/msg/rmcat/0J3L4_VIoGUiDwTki_I_7Jjk1LA>
Cc: "rmcat@ietf.org" <rmcat@ietf.org>
Subject: Re: [rmcat] Brief review of draft-alvestrand-rmcat-congestion-03
Precedence: list

Hi Mirja,

Thanks a lot for this detailed review. I have added some comments inline,
but we will definitely take this input for the next revision.

/Stefan

On Fri, Jul 10, 2015 at 5:59 PM Mirja Kühlewind <
mirja.kuehlewind@tik.ee.ethz.ch> wrote:

> Hi Stefan, hi all,
>
> I've had a quick review of draft-alvestrand-rmcat-congestion-03 as an
> individual
> contributor (not as chairs). I have a couple of high level comments and a
> few
> more detail once.
>
> In general I, now after just reading NADA and Scream, found it slightly
> hard to
> read your draft because you are often use a different terminology and
> different
> variable names. Especially for the arrival model I wonder if you make
> things
> more complicated than they are. If I get you right, you effectively also
> just
> want to estimate the queuing delay which is your case however is denoted
> w(i).
> While the other approaches simply compare the sender and receiver
> timestamps
> (which requires synchronized clocks or at least additional mechanism to
> compensate for not-synchronized clocks), you look at the inter-arrival
> times at
> the sender and the receiver and basically estimate the difference between
> the
> sending rate and receiving rate and therefore... the queue... What I don't
> really understand it why to you need to look at groups of packets?
>

Right, we are looking at queuing delay as well. The main difference is that
in GCC we are looking at queuing delay variations, while NADA and Scream
are both looking at estimated queuing delay. Our signal is therefore both
positive and negative, and can be expected to be zero mean if the queue is
"stable".

To some degree the groups of packets comes from that the algorithm was
initially designed to use RTP timestamps and look at inter-departure and
inter-arrival times of video frames. We generalized this by instead looking
at packets which were sent at roughly the same time (5 ms window).

In the same way we also group packets by arrival-time under the assumption
that packets received close together in time don't add much more value
about the delay variation. This is something we decided to do to better
handle burstiness that we have been observing on wireless networks, where
packets can be held up on the network for hundreds of milliseconds and then
be delivered back to back. The filter typically won't handle those big
spikes if we process one packet at a time, but if we wait a bit and process
a group of packets, we will see that we recover from the spike immediately
and that it wasn't caused by us sending too much.

I hope this makes it more clear. Do you think it would be good to discuss
this in more detail in the draft?

>
> Further you are using a Kalman filter. That's seem also complicated. Other
> schemes use much simpler filters (like the min of the last few values or a
> mean)
> and achieve acceptable estimates as well. I can only say this again:
> simplicity
> is also an important goal when designing a successful mechanism.
>

I agree that the model we currently are proposing can be simplified. Back
when this algorithm was based on video frames, there was a need to reduce
the effect of large frames causing delay spikes simply beacuse they were
large. Now that we are typically using pacing to limit this effect, there
isn't much to be gained from this complicated filter.

However, I do believe it can be useful to filter more aggressively if the
delay varies a lot (a lot of jitter), and less if it's a clean signal. For
instance by choosing alpha in avg_delay_variation = alpha*packet_delay +
(1-alpha)*avg_delay_variation.

I haven't tried a min filter, but from the top of my head I can't see why
it would have any obvious benefits?

Anyway, I agree it makes sense to look at how to simplify this.

>
> As just mention I'd really would like to see more meaningful variable
> names.
> That really helps the reader. Examples are:
> - gamma_1(i) -> effectively this your delay target, often called
> target_delay;
>

I guess it could be seen as a target, but in that case a delay variation
target. However, we are comparing both to positive gamma_1 and negative
gamma_1, which makes it a bit strange to see it as a target, in my opinion.
Our goal is really to be inside the range [-gamma_1, gamma_1]. Maybe
calling it threshold instead?

> don't think you need denote this as a function by writing (i) because it's
> simply a variable in our implementation, right?
>

That's true. If that's the way drafts are typically written I wouldn't mind
changing style. :)

> - A_hat -> this reflects the target sending rate where the other rmcat
> proposal
> usually maintain a congestion window for, called cwnd. Btw. how do you
> maintain
> the send-out process?

Up to this point we haven't specified the send-out process, but we also
suggest using a pacer similarly to other drafts. It wasn't entirely clear
to me that this was supposed to be part of the proposals, but we can work
on getting that specified as well in a future revision.

Our algorithm computes a target rate for the encoder, and the pacer also
has the target rate as input.

> Is this ack-clocked/report-clocked? In this case it would
> make sense to maintain a window in bytes. Or is it timer-based where you
> calculate the next send-out time based on the rate? Anyway I'd rather like
> to
> see a longer more meaningful name like target-rate or output_rate (not
> sure you
> really need the _hat because effectively all your variables are _hat).
>

Sure, target_rate sounds like a good name to me.

> - R_hat(i) -> for window based schemes this information is kept as
> flight_size
> (number of bytes in flight)
>

That is the same thing only if your window based scheme is ack-clocked,
right? If not you may have increasingly more packets in flight without
knowing if packets are coming in at the other end. Anyway, I think we can
call this throughput or something?

>
> There is one sentence that really confuses me (detailed comment). In 4.4
> you say
> "On every update the delay-based estimate of the available bandwidth
>     is increased, either multiplicatively or additively, depending on its
>     current state."
> Here you mean A_hat(i), right? Because I would call this the target rate,
> while
> R_hat(i) would be an delay-based estimate...?
>

Yes, A_hat(i) is what we refer to. This will be a lot more clear if we do
the renaming we have been talking about above.

>
> A few more detailed questions regarding the calculation of R_hat(i). Is
> that
> done at receiver-side?

Depends on where the delay-based controller is implemented. If you have per
packet acks it can easily be implemented at the sender, which is what we
suggest. If you are using REMB instead it's better implemented at the
receiver. In the final version we will probably only have one recommended
way of implementing this.

> Why are you using a window between 0.5 and 1 sec?

Shouldn't that depend on the RTT and at best be one RTT?

Maybe. That would allow us to quickly decrease to the right bitrate if we
detect congestion. However, if there's a lot of jitter unrelated to
congestion, we may back down too far. Taking the average over a larger
interval may help avoid reducing the target_rate too much, at the expense
of sometimes not reducing enough. Do you agree?

> Can you maybe give
> pseudo code for basically the second paragraph on page 11?
>

Yes, that seems like a good thing to provide.

>
> In general I think it would helpful to not only provide mathematical
> formulas
> but also some pseudo code that is closer to the implementation (in a
> couple of
> cases).
>
> Next and final point is on number, there are a couple of numbers where it
> is not
> clear where they came from and why they have been chosen. I guess most of
> these
> numbers should be parameter that could be tested in different scenarios.
> These are:
> - update interval for R_hat(i): 0.5-1s (see above)
> - 8% increase factor -> really no idea with this has been chosen; actually
> seems
> rather low to me...
>

It has been fairly arbitrarily chosen, mostly to avoid being too aggressive
especially since we can get into the multiplicative increase mode at any
point in time (not only during initial ramp-up). We may want to consider
making this larger as we start to converge on the other pieces of the
algorithm.

> - 100ms add-on to RTT for the response time -> really don't understand why
> you
> don't simply take the RTT; however this should at least not be a fixed
> value
> from my point of view (but a factor...?)
>

I see it like this; the time it takes for the detector to find out if a
rate increase caused congestion, is the sum of the time it takes for the
sender to send at the new rate, the time it takes those packets to start
arriving at the receiver, the time it takes for the receiver to send
feedback back to the sender, and the time it takes for the sender to filter
the delay signal and detect the congestion.

In the current draft we split this into two components, RTT for the time it
takes for packets to go to the receiver and back and 100 ms for the time it
takes for the detector to filter/detect a change in delay variations and
for the codec to actually provide the requested bitrate.

Does that make sense?

> - maximum increase rate of 0.5 packets per response time (TCP RENO should
> actually increase by 1 packet/RTT which is now fixed in the Linux kernel,
> I think)
>

Right. We wanted to be a bit more cautious as we don't want to cause too
much jitter. This can also be tuned as we start to converge on other parts,
but how aggressive we can be here depends a bit on how the filter and
detection is tuned, and so on.

> - upper limit of 1.5*R_hat(i) for A_hat(i)...?
>

We have to allow our target_bitrate to be larger than what the encoder is
producing (or in this case what the current throughput is). The encoder may
produce too little, and this shouldn't have a substantial impact on the
performance. That's why we have this relaxation. At the same time, we don't
want the target_bitrate to become too much larger than what we are actually
sending, as we don't know if it would be possible to send that much safely.

> - alpha [0.8, 0.95] where 0.86 is recommended...??? Reno uses 0.5, Cubic
> should
> use 0.7. Where does this recommendation comes from?

It has been picked in a similar way as the additive increase and
multiplicative increase parameters. Basically manually tuned to get the
behavior we wanted.

> Btw. the decrease factor is
> usually called beta while the increase factor/rate is called alpha in
> AIMD/MIMD
> systems.
>

Good point.

> - And the number that worries me most is the loss level threshold for the
> loss-based controller... 2% and 10% ???? That's a lot! TFRC is not
> comparable in
> this case because you are only looking at the loss rate since the last
> report
> while TFRC looks at an average over a slightly longer time scale. With
> loss-based congestion control you have this probing behavior where there
> are no
> losses for a couple of RTTs (depending on the size of the congestion
> window/link
> speed) and then there are a couple of losses within one window. This
> should give
> you a relative low average loss rate (much belower than 10% and usually
> even
> below 1%) and the congestion control should reduce its sending rate
> immediately
> as soon as one loss is detected! If you now induce a loss rate of 2-10%
> this
> should starve all other competing flows! I'd say, for stability reasons
> you need
> to decrease as soon as there is loss detected. As you already apply this
> multiplicative decrease rule (A_hat(i) = alpha * R_hat(i) ) for delay, I
> really
> don't see why you are not using the same rule for loss...?

I agree about all of this, and that is why we mention this under Further
work. We want to combine the control of both delay and loss in a better
way, and make it a loss less tolerant to packet loss.

> I've quickly looked
> at your evaluation results and sees that GCC is able to co-exists with TCP.
> However I'd assume that in this case still only the delay-based controller
> controls the rate and the loss-based controller doesn't kick in at all.
> Can you
> confirm/check this?
>

I can confirm this, which is the main reason for reworking the loss-based
controller. Actually if we tune the K_u and K_d parameters a bit
differently, we can make the algorithm starve TCP flows, which of course
shouldn't be possible.

>
> Now looking at the results, I'd like to request three more things
> (Thanks!):
>

No problem, we'll get back to you with these plots. At the moment we're
working on getting all the evaluation test cases committed to our code
base. Once that is done this should be quickly done.

> - Can you run one more simulation with more than one TCP flow?

- What happen if the TCP stops after a while? Does the target delay value
> gamma
> actually go down again? Can you plot gamma as well in this sceanrio?
>

It does, we'll provide some plots.

> - And can you show the plots on slide 33 (multiple GCC flows) for more
> than 120s
> because the flows seem to de-converge again.
>

Right, that is not the case, but I can see how you may get that impression.
:)

>
> Thanks for all the work though!
>
> Mirja
>
>
>
>
>

[rmcat] Brief review of draft-alvestrand-rmcat-co… Mirja Kühlewind
Re: [rmcat] Brief review of draft-alvestrand-rmca… Stefan Holmer
Re: [rmcat] Brief review of draft-alvestrand-rmca… Mirja Kühlewind