[rmcat] Review of draft-ietf-rmcat-scream-cc-01

Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> Fri, 10 July 2015 12:06 UTC

Return-Path: <mirja.kuehlewind@tik.ee.ethz.ch>
X-Original-To: rmcat@ietfa.amsl.com
Delivered-To: rmcat@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA0681A87BB for <rmcat@ietfa.amsl.com>; Fri, 10 Jul 2015 05:06:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.91
X-Spam-Level:
X-Spam-Status: No, score=-5.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-2.3, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Qru-9Wdoj9LA for <rmcat@ietfa.amsl.com>; Fri, 10 Jul 2015 05:06:15 -0700 (PDT)
Received: from smtp.ee.ethz.ch (smtp.ee.ethz.ch [129.132.2.219]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EABC41A87EB for <rmcat@ietf.org>; Fri, 10 Jul 2015 05:06:14 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by smtp.ee.ethz.ch (Postfix) with ESMTP id 879D0D9305; Fri, 10 Jul 2015 14:06:13 +0200 (MEST)
X-Virus-Scanned: by amavisd-new on smtp.ee.ethz.ch
Received: from smtp.ee.ethz.ch ([127.0.0.1]) by localhost (.ee.ethz.ch [127.0.0.1]) (amavisd-new, port 10024) with LMTP id eWl97fZxLS2x; Fri, 10 Jul 2015 14:06:13 +0200 (MEST)
Received: from [82.130.103.143] (nb-10510.ethz.ch [82.130.103.143]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mirjak) by smtp.ee.ethz.ch (Postfix) with ESMTPSA id 42787D9302; Fri, 10 Jul 2015 14:06:13 +0200 (MEST)
Message-ID: <559FB533.5090105@tik.ee.ethz.ch>
Date: Fri, 10 Jul 2015 14:06:11 +0200
From: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Zaheduzzaman Sarker <zaheduzzaman.sarker@ericsson.com>, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/rmcat/dS5L9rILn18lbTLFypH3f2xmmXQ>
Cc: Karen Elisabeth Egede Nielsen <karen.nielsen@tieto.com>, "rmcat@ietf.org" <rmcat@ietf.org>
Subject: [rmcat] Review of draft-ietf-rmcat-scream-cc-01
X-BeenThere: rmcat@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "RTP Media Congestion Avoidance Techniques \(RMCAT\) Working Group discussion list." <rmcat.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rmcat>, <mailto:rmcat-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rmcat/>
List-Post: <mailto:rmcat@ietf.org>
List-Help: <mailto:rmcat-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rmcat>, <mailto:rmcat-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Jul 2015 12:06:19 -0000

Hi Zahed, hi Ingemar,

please find my review as an individual contributor below.

First of all I have a few general question on the algorithm(s):

1) When calculating cwnd you on the one hand calculate scl_i to basically 
quickly speed up to the previous maximum value cwnd_i and on the other hand try 
to maintain a low delay target. Aren't these two conflicting goals? Without 
having looked at your experimentation results now, I'd expect that you even when 
you are alone on the link induce more than owd_target delay (actually depending 
on the maximum buffer size as reflected by cwnd_i), and when competing with 
other traffic, you'd still cannot reach an equal sharing but will always get a 
lower share. Which is probably okay; however the part that maybe worries my a 
little is that the achieved delay and/or achieved sharing probably depends on 
the configured maximum buffer size (which might be large)...? Can you comment on 
this or provide more detail results on this?

2) I would have expected that the video rate controller takes cwnd or the actual 
sending window as an input parameter. However, your approach seems to have two 
more or less independent control loops for the sending rate and the coding rate. 
That seems quite weird to me (however the reasoning behind the design of the 
video coder is also not really clear to me; see below for further comments on 
this). I'd be worried that if those two loops are rather independent that they 
could actually make conflicting decision, e.g. video controller increases the 
rate and congestion controller decreases the rate. Also wouldn't it be useful to 
use the cwnd as an input parameter? Please comment!


And two more high-level/general comments on (the structure of) the document:

1) I think the algorithm is now well understandable with the code and respective 
explanation (expect the end of section 4.1.3 needs more explanation as just 
mentioned above). I would still like to propose some restructuring. I think the 
readability can be much improved by some rather simple restructuring using a 
model similar as in the LEDBAT draft: First give a high level summarize of all 
steps that have to be performed ("For each ACK/report the state on the delay and 
delay trend will be updated. If a loss is detected the congestion window cwnd 
will be decreased strongly, otherwise there are two operation mode: fast 
increase and regular operation. In fast increase the goal is to reach a previous 
target value quickly applying an multiplicative increase scheme, while otherwise 
a delay traget should be maintained..."). Then give all the pseudo code 
structured into functions and finally explain each function block in detail in 
an own paragraph. I already have a good idea how this would look like and if you 
want to we can sit down at the meeting and do this together.

2) Having reviewed NADA and Scream now basically in one go, I think the used 
framework is very similar. If you look at your Figure 1 and the Figure 1 in the 
NADA draft, I think I'm now basically able to do a 1:1 mapping. However, I think 
in your picture it is wrong to have multiple video coder/rate controller/queues 
shown because that indicates that there is only one transmission scheduler and 
congestion controller for all transmissions, which is not the case. If you 
remove that part, there is only one difference left between the two images and 
that is that there is an input from the congestion control to the video rate 
controller in NADA which you don't have (also see comment 2 above). Other than 
that the mapping would be the following way:

NADA
---
Reference Rate Calculator    -> Network congestion control
Video Target Rate Calculator -> Rate Control
Sending Rate Calculator      -> Transmission scheduler
Encoder                      -> Video Encoder
Rate Shaping Buffer          -> Queue RTP packets

I think it would be super helpful to use a common terminology here for everybody 
to better understand similarities and differences. I guess the same is true for 
variable and parameter names if there things that are actually the same. Note 
that you in some case already use the same names (I didn't check however if the 
things that have the same name actually are the same things...)

3) I think it is great to have a section to list all parameters and values. 
Thanks! However I do have to say I find tables here not the right choice because 
it just stretches the information over more pages than needed and therefore 
makes it actually harder to find the right bit. Further there are a few 
variables that are actually parameters and listed wrongly. Two examples I've 
found are: mss and min_cwnd (should be MSS and MIN_CWND which you even write 
this way sometimes). Further for variables I would not indicate the initial 
values here. I'd propose to have a init function instead in the later 
description where one can see all initial value at once (like in the LEDBAT 
draft). Further please review if all variables are actually needed. If there are 
only temporal and not variables that actually hold state in the pseudo code, I 
don't think they need t be listed here.


Further comments following the structure of the document:

Section 1: I've also made this comment regarding the NADA doc: I guess every 
algorithm doc should refer the requirements doc and to some extend discuss if 
and how these requirements have been addressed...

Section 1.1: not sure why this section is there; don't see a need to have this 
information in this document because it doesn't help me understand the algorithm 
any better.

Section 3:
- s/follows packet conservation principles/follows the packet conservation 
principle/
- I think think FACK is not the right reference for the packet conservation 
priniple, should be  "Congestion Avoidance and Control" by Van Jacobsen and 
Michael J. Karels, SIGCOM'88 (see reference Jac88 of FACK paper).
- s/The packet conservation principle is realized by/In Scream, the packet 
conservation principle is realized by/ -> otherwise it sounds like it's always 
realized this way, which is not true.
- Maybe give already a super brief description of what LEDBAT is doing:
   s/in a way similar to LEDBAT [RFC6817]./in a way similar to LEDBAT [RFC6817] 
by maintaining a given target delay./
- This text further say "a few steps to take to make the concept work with 
conversational media". Maybe already name these steps very briefly here: e.g. 
taking the delay trend into account and the previous maximum rate to compete 
with other non-delay based traffic...

Section 3.1
- You say there are similarities with newcwv but then never mention this again. 
Can you explain what the similarities are? I've just reviewed newcwv and this is 
absolutely not clear to me...
- Fast start: May I propose to name this Fast Increase instead (and also always 
write this with upper letters). Actually that's the same name I used for TCP 
SIAD where I also have this mechanism that is kind of similar to Slow Start but 
can be resumed later as well. I propose this change (not just because I want my 
name to be used ;-) but) because having 'start' in the name is confusing to my 
if you not only use this mechanism to actually start or re-start a transmission.
- Also maybe be slightly more concrete here (on a high level) about what fast 
start is doing: e.g "implements and multiplicative increase to quickly reach the 
maximum capacity where the increase factor is scaled based on a previous maximum 
rate (expect at transmission start). Fast start is resumed during a transmission 
if the delay trend goes below a certain threshold." Btw. I would also appreciate 
to introduce a parameter for this threshold and not only say "less than 0.2 for 
1.0 seconds or more" (see later in section 4.1.2.1).
- s/with FTP traffic/with bulk traffic/

Section 3.2.
- This section says packet pacing is used, however I don't think this is further 
mention in the rest of the text... please add something on this... or there 
might be a confusion between me and you what pacing means (see below).

Section 3.3.
- Maybe already include the relevant part of figure 1 here because a figure 
really helps. Or include the whole figure 1 in section 3 and structure the 
overview based on this image.

Section 4.1
- I guess you can remove the reference to I-D.ietf-rmcat-app-interaction because 
we are working on this draft anymore. However, it was anyway not clear what 
exactly you are referencing it for. From my point of view explaining the 
relationship to other docs (of the same wg) is often done best in the intro.
- On Figure 1: As said above I find it confusing that there are multiple video 
streams here because that indicates that there is only one congestion controller 
and one transmission scheduler for all transmissions. For the same reason I 
actually find the name transmission scheduler confusion because a scheduler 
would actually be the one who decides which packet to put on the link at which 
time (which often is done on the nic). However your transmission scheduler is 
'only' calculating the sending window, therefore I call it something like 
sending rate controller...

Section 4.1.2.1
- At the beginning of this section you define some terminology. Maybe put this 
in the terminology section instead?
- I've also asked this question for NADA: Don't you need synchronized clocks at 
the sender and receiver. Please further discuss this!
- Please also give more details how loss detection is done. I guess that's 
simply indicated in an RTCP report..? Please say which information are exactly used.
- Please give also some kind of pseudo code on how bytes_newly_acked is updated. 
That simply helps people to implement things correctly.
- As said above I would structure the pseudo code in functions where one 
function would be an init function here. That also helps to avoid implementation 
errors.
- Please give more details how the function R() is implemented.
- Why is scl_i called this way? What does the i stand for? Maybe take a slightly 
longer more meaningful name like scale_increase. This helps the reader to 
immediately understand what this is later used for!
- Why is scl_i limited to 0.2? Should this also a parameter?
- Please give some reasoning why scl_i is calculated this way! Is this similar 
to Cubic? If so please refer!
- I believe the calculation of off_target is only needed if not in fast start. 
If so please move it!
- cwnd_i is initialized to 1. That seems really weird to me. Does this work well 
for fast start at a beginning of a transmission? I guess the intention is to 
make fast start behave like slow start at the beginning of a connection. In this 
case scl_i should be 1. Why don't you just set scl_i to 1 if cwnd_i has not been 
set yet...?
- gain calculation: owd_trend is always a value between 0 and 1, therefore it 
can not be negative, therefore the max() in the gain calculation can be removed:
gain = GAIN*(1.0 + (1.0 - owd_trend/ 0.2))
- Where is this 0.2 in this gain calculation again coming from. Is there a good 
reasoning for this value? Should this be again a parameter? Is this the same 
parameter than the previous 0.2?
- I'd move the discussion what the difference to LEDBAT is into section 3.1 
because these point are very general and therefore no algorithm details are 
needed to be known and you mention those difference there already without saying 
what they actually are.
- However, you say you don't use ALLOWED_INCREASE but isn't 
MAX_BYTES_IN_FLIGHT_HEAD_ROOM effective the same...?
- The part on flow compensation is rather unclear! Suddenly you have two new 
parameters owd_norm_mean_sh and owd_norm_var: how are they calculated? When are 
they updated? I think there is something missing somewhere. Further where does 
this formula come from? I guess there is some previous working that proposed 
this approach? Can you give a reference and further explain why this approach is 
taken?

Section 4.1.2.2
- I'd move the initial part here with the two bullet points into section 3.2 
because this is rather some high level info.
- I don't think I would call the proposed approach pacing. Pacing is for me that 
if you can send you multiple packet at once (because the send/congestion window 
is large enough) that you spread the sending times of each packet over the time 
interval until the next ACK/update is expected.
- Further the calculation seems rather complicated. Is this really needed? Can 
you further explain in the doc in which case this approach helps which problem 
and why?!

Section 4.1.3
- Why is the estimation interval 200ms (shouldn't this be a parameter as well?) 
but the RATE_ADJUSTMENT_INTERVAL 100ms? If you only estimate every 200ms there 
will be no new information after 100ms to adjust something...?
- s/ if the congestion control is if fast start or not./ if the congestion 
control is in fast start or not./
- In general, as mention above, there is more explanation needed on the single 
steps of the video rate controller, especially on
   - What is RMAP_UP_TIME actually good for? Why is this 10s?
   - Please explain further why increment is calculated the way it is calculated!
   - Shouldn't the target_bitrate be only increased in steps that the encoder 
can handle?
   - Please you further explain the last part on rate_rtp_limit!!!

Section 4.2
- Here the table is good because there are less values and it actually provides 
a nice overview!

Section 5.
- This needs to go in an own draft at some point. Please also compare 
similarities with the information needed by NADA and GCC!

Section 7
- Actually a conclusion is not needed for an IETF Internet draft...

Section 8
- regarding the clock drift compensation, please see the appendix of the LEDBAT 
RFC. If you've implemented basically the same thing as described there, it 
probably enough to refer to that.

Mirja