Re: [rmcat] Review of draft-ietf-rmcat-scream-cc-01

Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> Tue, 14 July 2015 14:24 UTC

Return-Path: <mirja.kuehlewind@tik.ee.ethz.ch>
X-Original-To: rmcat@ietfa.amsl.com
Delivered-To: rmcat@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C2F811ACDD2 for <rmcat@ietfa.amsl.com>; Tue, 14 Jul 2015 07:24:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.11
X-Spam-Level:
X-Spam-Status: No, score=-4.11 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2, J_CHICKENPOX_17=0.6, J_CHICKENPOX_24=0.6, J_CHICKENPOX_47=0.6, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-2.3, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Cd30Y6XXLH6d for <rmcat@ietfa.amsl.com>; Tue, 14 Jul 2015 07:24:15 -0700 (PDT)
Received: from smtp.ee.ethz.ch (smtp.ee.ethz.ch [129.132.2.219]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A265D1ACCF5 for <rmcat@ietf.org>; Tue, 14 Jul 2015 07:24:14 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by smtp.ee.ethz.ch (Postfix) with ESMTP id 561CFD9307; Tue, 14 Jul 2015 16:24:13 +0200 (MEST)
X-Virus-Scanned: by amavisd-new on smtp.ee.ethz.ch
Received: from smtp.ee.ethz.ch ([127.0.0.1]) by localhost (.ee.ethz.ch [127.0.0.1]) (amavisd-new, port 10024) with LMTP id I1KywPyf8QiV; Tue, 14 Jul 2015 16:24:13 +0200 (MEST)
Received: from [82.130.103.143] (nb-10510.ethz.ch [82.130.103.143]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mirjak) by smtp.ee.ethz.ch (Postfix) with ESMTPSA id 11788D9305; Tue, 14 Jul 2015 16:24:13 +0200 (MEST)
Message-ID: <55A51B8C.6090303@tik.ee.ethz.ch>
Date: Tue, 14 Jul 2015 16:24:12 +0200
From: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Ingemar Johansson <info@ijdata.com>, rmcat@ietf.org, mirja.kuehlewind@tik.ee.ethz.ch, karen.nielsen@tieto.com, zaheduzzaman.sarker@ericsson.com, ingemar.s.johansson@ericsson.com
References: <mailman.0.1436549867.21676.rmcat@ietf.org> <55A0124D.9010907@ijdata.com>
In-Reply-To: <55A0124D.9010907@ijdata.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/rmcat/be03n39gheCVfCSdxRkzKx_dwCc>
Subject: Re: [rmcat] Review of draft-ietf-rmcat-scream-cc-01
X-BeenThere: rmcat@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "RTP Media Congestion Avoidance Techniques \(RMCAT\) Working Group discussion list." <rmcat.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rmcat>, <mailto:rmcat-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rmcat/>
List-Post: <mailto:rmcat@ietf.org>
List-Help: <mailto:rmcat-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rmcat>, <mailto:rmcat-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Jul 2015 14:24:20 -0000

Hi Ingemar,

see some comments below (I thought I've answered you yesterday already but now 
just realized that I didn't... ).

Mirja

On 10.07.2015 20:43, Ingemar Johansson wrote:
> Hi Mirja and thanks for the comments, answers to at least a few of the comments inline marked [IJ].
> Use my private email temporarly during the summer as I don't expect to pick up my working machine
> on a regular basis and this is anyway a public conversation
>
> /Ingemar
>
>
>     Hi Zahed, hi Ingemar,
>
>     please find my review as an individual contributor below.
>
>     First of all I have a few general question on the algorithm(s):
>
>     1) When calculating cwnd you on the one hand calculate scl_i to
>     basicallyquickly speed up to the previous maximum value cwnd_i and on the
>     other hand tryto maintain a low delay target. Aren't these two conflicting
>     goals? Withouthaving looked at your experimentation results now, I'd expect
>     that you even whenyou are alone on the link induce more than owd_target
>     delay (actually dependingon the maximum buffer size as reflected by cwnd_i),
>     and when competing withother traffic, you'd still cannot reach an equal
>     sharing but will always get alower share. Which is probably okay; however
>     the part that maybe worries my alittle is that the achieved delay and/or
>     achieved sharing probably depends onthe configured maximum buffer size
>     (which might be large)...? Can you comment onthis or provide more detail
>     results on this?2) I would have expected that the video rate controller
>     takes cwnd or the actualsending window as an input parameter. However, your
>     approach seems to have twomore or less independent control loops for the
>     sending rate and the coding rate.That seems quite weird to me (however the
>     reasoning behind the design of thevideo coder is also not really clear to
>     me; see below for further comments onthis). I'd be worried that if those two
>     loops are rather independent that theycould actually make conflicting
>     decision, e.g. video controller increases therate and congestion controller
>     decreases the rate. Also wouldn't it be useful touse the cwnd as an input
>     parameter? Please comment!
>
> [IJ] Actually the purpose of cwnd_i and scl_i is to slow down the increase on
> cwnd and target bitrate when close to the last known max value. This gives a
> chance for the algorithm to avoid a too large delay spike when the path capacity
> of a fixed BW link is reached, at the same time it gives a better opportunity to
> grab free bandwidth reasonably fast.

I understood this but this also means that you are fast(er) when your are far 
away from the target bit rate. So it's both.


> It may be an option to base the video target rate on something like the send
> window, not sure what it would like, you probably need to do something like
> cwnd/rtt. Like it is now there are actually two control loops as you correctly
> observe. Currently I don't see a way around this as the video coder can
> fluctuate quite a lot around the target bitrate. In the long run it is perhaps
> possible (and desirable) to make it one control loop, not sure that it will
> happens. Another thing that speaks in favor of two control loops is that SCReAM
> can congestion control multiple streams, for instance video and audio. This can
> be experimented with in the C++ code.

I didn't say it has to be one control loop but I was wondering if it makes sense 
to two completely independent control loop. I'd have expected that the video 
rate control loop takes the target sending rate as in input parameter.

>
>     And two more high-level/general comments on (the structure of) the document:
>
>     1) I think the algorithm is now well understandable with the code and
>     respectiveexplanation (expect the end of section 4.1.3 needs more
>     explanation as justmentioned above). I would still like to propose some
>     restructuring. I think thereadability can be much improved by some rather
>     simple restructuring using amodel similar as in the LEDBAT draft: First give
>     a high level summarize of allsteps that have to be performed ("For each
>     ACK/report the state on the delay anddelay trend will be updated. If a loss
>     is detected the congestion window cwndwill be decreased strongly, otherwise
>     there are two operation mode: fastincrease and regular operation. In fast
>     increase the goal is to reach a previoustarget value quickly applying an
>     multiplicative increase scheme, while otherwisea delay traget should be
>     maintained..."). Then give all the pseudo codestructured into functions and
>     finally explain each function block in detail inan own paragraph. I already
>     have a good idea how this would look like and if youwant to we can sit down
>     at the meeting and do this together.2) Having reviewed NADA and Scream now
>     basically in one go, I think the usedframework is very similar. If you look
>     at your Figure 1 and the Figure 1 in theNADA draft, I think I'm now
>     basically able to do a 1:1 mapping. However, I thinkin your picture it is
>     wrong to have multiple video coder/rate controller/queuesshown because that
>     indicates that there is only one transmission scheduler andcongestion
>     controller for all transmissions, which is not the case.
>
> [IJ] I'd like to have some kind of sit together, urfortunately I cannot attend
> the IETF this time, perhaps some remote appear.in session or the like can be
> arranged in August.
> We can probably remove the example with multiple videos if it makes things more
> simple, but fact of the matter is that SCReAM can handle multiple sources.
>
>     If youremove that part, there is only one difference left between the two
>     images andthat is that there is an input from the congestion control to the
>     video ratecontroller in NADA which you don't have (also see comment 2
>     above). Other thanthat the mapping would be the following way:
>
>     NADA
>     ---
>     Reference Rate Calculator    -> Network congestion control
>     Video Target Rate Calculator -> Rate Control
>     Sending Rate Calculator      -> Transmission scheduler
>     Encoder                      -> Video Encoder
>     Rate Shaping Buffer          -> Queue RTP packets
>
>     I think it would be super helpful to use a common terminology here for
>     everybodyto better understand similarities and differences.
>
> [IJ] Agree, if we can come up with some common terms, so much the better.
>
>     I guess the same is true forvariable and parameter names if there things
>     that are actually the same. Notethat you in some case already use the same
>     names (I didn't check however if thethings that have the same name actually
>     are the same things...)3) I think it is great to have a section to list all
>     parameters and values.Thanks! However I do have to say I find tables here
>     not the right choice becauseit just stretches the information over more
>     pages than needed and thereforemakes it actually harder to find the right
>     bit. Further there are a fewvariables that are actually parameters and
>     listed wrongly. Two examples I'vefound are: mss and min_cwnd (should be MSS
>     and MIN_CWND which you even writethis way sometimes).
>
> [IJ] I have to blame the use of tables on  my mediocre XML skills, is a list
> better. Will connect the MSS and MIN_CWND error, thansk

Yes, I guess a simple list with the variables names and some explanation would 
do it.
And finding a nice formating in the RFC style is often hard...

>
>     Further for variables I would not indicate the initialvalues here. I'd
>     propose to have a init function instead in the laterdescription where one
>     can see all initial value at once (like in the LEDBATdraft). Further please
>     review if all variables are actually needed. If there areonly temporal and
>     not variables that actually hold state in the pseudo code, Idon't think they
>     need t be listed here.
>
> [IJ] OK, sounds good.
>
>     Further comments following the structure of the document:
>
>     Section 1: I've also made this comment regarding the NADA doc: I guess
>     everyalgorithm doc should refer the requirements doc and to some extend
>     discuss ifand how these requirements have been addressed...Section 1.1: not
>     sure why this section is there; don't see a need to have thisinformation in
>     this document because it doesn't help me understand the algorithmany better.
>
>     Section 3:
>
>     - s/follows packet conservation principles/follows the packet
>     conservationprinciple/- I think think FACK is not the right reference for
>     the packet conservationpriniple, should be "Congestion Avoidance and
>     Control" by Van Jacobsen andMichael J. Karels, SIGCOM'88 (see reference
>     Jac88 of FACK paper).- s/The packet conservation principle is realized by/In
>     Scream, the packetconservation principle is realized by/ -> otherwise it
>     sounds like it's alwaysrealized this way, which is not true.
>
> [IJ] OK
>
>     - Maybe give already a super brief description of what LEDBAT is doing:
>
>     s/in a way similar to LEDBAT [RFC6817]./in a way similar to LEDBAT
>     [RFC6817]by maintaining a given target delay./- This text further say "a few
>     steps to take to make the concept work withconversational media". Maybe
>     already name these steps very briefly here: e.g.taking the delay trend into
>     account and the previous maximum rate to competewith other non-delay based
>     traffic...
>
> [IJ] OK
>
>     Section 3.1
>
>     - You say there are similarities with newcwv but then never mention this
>     again.Can you explain what the similarities are? I've just reviewed newcwv
>     and this isabsolutely not clear to me...- Fast start: May I propose to name
>     this Fast Increase instead (and also alwayswrite this with upper letters).
>     Actually that's the same name I used for TCPSIAD where I also have this
>     mechanism that is kind of similar to Slow Start butcan be resumed later as
>     well. I propose this change (not just because I want myname to be used ;-)
>     but) because having 'start' in the name is confusing to myif you not only
>     use this mechanism to actually start or re-start a transmission.- Also maybe
>     be slightly more concrete here (on a high level) about what faststart is
>     doing: e.g "implements and multiplicative increase to quickly reach
>     themaximum capacity where the increase factor is scaled based on a previous
>     maximumrate (expect at transmission start). Fast start is resumed during a
>     transmissionif the delay trend goes below a certain threshold." Btw. I would
>     also appreciateto introduce a parameter for this threshold and not only say
>     "less than 0.2 for1.0 seconds or more" (see later in section 4.1.2.1).
>
>     - s/with FTP traffic/with bulk traffic/
>
> [IJ] OK :-) , Fast Increase it is !, never felt comfortable with the "fast
> start" term. Do you have a reference to your TCP SIAD work. Would be nice to see
> it as I am looking at ways to recover from failed TCP Cubic HyStarts (related to
> issues we see with HyStart and LTE access)

I gave a presentation on this at the ICCRG meeting in Honolulu:

http://www.ietf.org/proceedings/91/slides/slides-91-iccrg-5.pdf

>
>       Section 3.2.
>
>     - This section says packet pacing is used, however I don't think this is
>     furthermention in the rest of the text... please add something on this... or
>     theremight be a confusion between me and you what pacing means (see below).
>
> [IJ] Packet pacing was broken out to the appendix section A.1, the reason was
> that it did not seem to fit with the earlier concensus that the congetsion
> control should compute a send window.

That actually seems to make sense. I think it's good to have this in the 
appendix for now (sorry I didn't really review the appendix), however, we might 
want to discuss this again at some point. I will try to remember this.

>
>     Section 3.3.
>
>     - Maybe already include the relevant part of figure 1 here because a
>     figurereally helps. Or include the whole figure 1 in section 3 and structure
>     theoverview based on this image.
>
> [IJ] OK, will give it a try
>
>     Section 4.1
>
>     - I guess you can remove the reference to I-D.ietf-rmcat-app-interaction
>     becausewe are working on this draft anymore. However, it was anyway not
>     clear whatexactly you are referencing it for. From my point of view
>     explaining therelationship to other docs (of the same wg) is often done best
>     in the intro.- On Figure 1: As said above I find it confusing that there are
>     multiple videostreams here because that indicates that there is only one
>     congestion controllerand one transmission scheduler for all transmissions.
>     For the same reason Iactually find the name transmission scheduler confusion
>     because a schedulerwould actually be the one who decides which packet to put
>     on the link at whichtime (which often is done on the nic). However your
>     transmission scheduler is'only' calculating the sending window, therefore I
>     call it something likesending rate controller...
>
> [IJ] Perhaps Zahed can explain the ref to the app intercation as I have not been
> involved in that work
>
>     Section 4.1.2.1
>
>     - At the beginning of this section you define some terminology. Maybe put
>     thisin the terminology section instead?- I've also asked this question for
>     NADA: Don't you need synchronized clocks atthe sender and receiver.
>
> [IJ] No. Synchronized clocks are not needed, I am a bit hesitant about the
> impact of clock skew. The "real" code resets the base delay history it the OWD
> takes on inreasonable values, this is not mentioned in the draft however. Will
> fix this

Okay. But then you need to remember the initial value of the first timestamps of 
the sender and the receiver and always compensate the difference. I think this 
should be described somewhere; probably also in the appendix.

>
>     Please further discuss this!- Please also give more details how loss
>     detection is done. I guess that'ssimply indicated in an RTCP report..?
>     Please say which information are exactly used.-
>
> [IJ] The n_loss counter is checked, if it has incremented by one or more steps
> in an RTT it is considered a loss event. Will clarify this
>
>     Please give also some kind of pseudo code on how bytes_newly_acked is
>     updated.That simply helps people to implement things correctly.- As said
>     above I would structure the pseudo code in functions where onefunction would
>     be an init function here. That also helps to avoid implementationerrors.
>
> [IJ] OK
>
>     - Please give more details how the function R() is implemented.
>
> [IJ] It is challenging to write an autocorrelation function in ASCII art but I
> give it a(nother) try :-)
>
>
>
>     - Why is scl_i called this way? What does the i stand for? Maybe take a
>     slightlylonger more meaningful name like scale_increase. This helps the
>     reader toimmediately understand what this is later used for!
>
>     - Why is scl_i limited to 0.2? Should this also a parameter?
>
> [IJ] Good question , would probably be better to just use scl or tmp instead.
> 0.2 is there to avoid that the congestion window growth is is set to zero.

It's clear that the cwnd should not be 0 but why is 0.2 the right value for the 
minimum scaling? Why not 0.1 or 0.3?

>
>     - Please give some reasoning why scl_i is calculated this way! Is this
>     similarto Cubic? If so please refer!-
>
> [IJ] Yes, it is very similar to Cubic, perhaps I have only referred to this in
> emails, will clarify tthis
>
>     I believe the calculation of off_target is only needed if not in fast
>     start.If so please move it!-
>
> [IJ] OK
>
>     cwnd_i is initialized to 1. That seems really weird to me.
>     Does this work wellfor fast start at a beginning of a transmission? I guess
>     the intention is tomake fast start behave like slow start at the beginning
>     of a connection. In thiscase scl_i should be 1. Why don't you just set scl_i
>     to 1 if cwnd_i has not beenset yet...?-
>
> [IJ] Yes, you are right,  could be an alternative,
>
>     gain calculation: owd_trend is always a value between 0 and 1, therefore
>     itcan not be negative, therefore the max() in the gain calculation can be
>     removed:
>
> [IJ] OK
>
>     gain = GAIN*(1.0 + (1.0 - owd_trend/ 0.2))
>
>     - Where is this 0.2 in this gain calculation again coming from. Is there a
>     goodreasoning for this value? Should this be again a parameter? Is this the
>     sameparameter than the previous 0.2?-
>
> [IJ] The 0.2 values (no relation between the to 0.2 values actually) are the
> results of experimentation. They are not that critical however, could have been
> other values like 0.3. Could be good to have them as parameters (actually I got
> the same remark from Daniel Lindström who helped me to get SCReAM into
> OpenWebRTC :-) .

Okay! I think if a parameter was found from experimentation, it should be 
mentioned and it should be explained what the effect will be if this value is 
changed nevertheless.

>
>     I'd move the discussion what the difference to LEDBAT is into section
>     3.1because these point are very general and therefore no algorithm details
>     areneeded to be known and you mention those difference there already without
>     sayingwhat they actually are.- However, you say you don't use
>     ALLOWED_INCREASE but isn'tMAX_BYTES_IN_FLIGHT_HEAD_ROOM effective the same...?-
>
> [IJ] Hmm, will have to look into this..
>
>     The part on flow compensation is rather unclear! Suddenly you have two
>     newparameters owd_norm_mean_sh and owd_norm_var: how are they calculated?
>     When arethey updated? I think there is something missing somewhere. Further
>     where doesthis formula come from? I guess there is some previous working
>     that proposedthis approach? Can you give a reference and further explain why
>     this approach istaken?
>
> [IJ] You beat me to the punch here. This is inspired by the work on shared
> bottleneck detection in RMCAT.  I implemented the variance and skew estimates an
> observed that

...?

>
>     Section 4.1.2.2
>
>     - I'd move the initial part here with the two bullet points into section
>     3.2because this is rather some high level info.- I don't think I would call
>     the proposed approach pacing. Pacing is for me thatif you can send you
>     multiple packet at once (because the send/congestion windowis large enough)
>     that you spread the sending times of each packet over the timeinterval until
>     the next ACK/update is expected.- Further the calculation seems rather
>     complicated. Is this really needed? Canyou further explain in the doc in
>     which case this approach helps which problemand why?!
>
>     Section 4.1.3
>
>     - Why is the estimation interval 200ms (shouldn't this be a parameter as
>     well?)but the RATE_ADJUSTMENT_INTERVAL 100ms? If you only estimate every
>     200ms therewill be no new information after 100ms to adjust something...?-
>     s/ if the congestion control is if fast start or not./ if the
>     congestioncontrol is in fast start or not./- In general, as mention above,
>     there is more explanation needed on the singlesteps of the video rate
>     controller, especially on
>
> [IJ] Probably a mistake, should be 200ms, according to the implementation
>
>        - What is RMAP_UP_TIME actually good for? Why is this 10s?
>
> [IJ] This is actually a mistake, it is replaced by a RAMP_UP_SPEED parameter that indicates
> the max increase of the target rate.
>
>       - Please explain further why increment is calculated the way it is calculated!
>
> [IJ] Will add an explanation
>
>     - Shouldn't the target_bitrate be only increased in steps that the
>     encodercan handle?
>
> [IJ] There are no actual steps that the encoder can handle. One typically just
> set a target rate and the rate control loop in the video encoder tries to adapt
> to it.
>
>        - Please you further explain the last part on rate_rtp_limit!!!
>
> [IJ] Yes, this part probably needs a better explanation, it was added quite
> recently as a remedy to some real-life issues with video coding.
>
>       Section 4.2
>
>     - Here the table is good because there are less values and it actually
>     providesa nice overview!
>
> [IJ] OK
>
>     Section 5.
>
>     - This needs to go in an own draft at some point. Please also
>     comparesimilarities with the information needed by NADA and GCC!
>
> [IJ] Yes,
>
>     Section 7
>     - Actually a conclusion is not needed for an IETF Internet draft...
>
> [IJ] OK, great
>
>       Section 8
>
>     - regarding the clock drift compensation, please see the appendix of the
>     LEDBATRFC. If you've implemented basically the same thing as described
>     there, itprobably enough to refer to that.
>
>     Mirja
>
> [IJ] No clock drift compensation implemented yet. Will have a look into the
> LEDBAT RFC.
>
>
>
>