Re: [rmcat] Review on draft-ietf-rmcat-eval-test-02

Zaheduzzaman Sarker <zaheduzzaman.sarker@ericsson.com> Tue, 16 February 2016 12:05 UTC

From: Zaheduzzaman Sarker <zaheduzzaman.sarker@ericsson.com>
To: Stefan Holmer <holmer@google.com>, rmcat WG <rmcat@ietf.org>, "Michael Ramalho (mramalho)" <mramalho@cisco.com>, "Xiaoqing Zhu (xiaoqzhu)" <xiaoqzhu@cisco.com>, Varun Singh <varun@comnet.tkk.fi>
References: <CAEdus3Jdn6dvf8tXDWEGF1wiKpAsDK6fuxQ1_gMky7Cf04N5GQ@mail.gmail.com>
Organization: Ericsson AB
Message-ID: <56C31073.8040508@ericsson.com>
Date: Tue, 16 Feb 2016 13:05:07 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1
MIME-Version: 1.0
In-Reply-To: <CAEdus3Jdn6dvf8tXDWEGF1wiKpAsDK6fuxQ1_gMky7Cf04N5GQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/rmcat/bV2Tc1zOJeRBpAB5q_IIWCRU1b0>
Subject: Re: [rmcat] Review on draft-ietf-rmcat-eval-test-02
Precedence: list

Hi Stefan,

Thanks for your detailed review. We really appreciate this. You have 
brought up couple of points that we should discuss a bit more.

please see inline below.

BR

Zahed (on behalf of all the authors)

On 2016-02-07 13:05, Stefan Holmer wrote:
> Hi,
>
> Sorry for being late with this review, here are my comments:
>
> *General comments:*
>
>   * We consider audio and video content, but should we also consider
>     slide shows and/or screen content? In my experience those often
>     sources are widely different in their rate characteristics, and may
>     be very bursty. I think it could be interesting to see how different
>     candidates handle the case where they are source limited most of the
>     time (static image, encoder produces < 50 kbps), and every now and
>     then there's a slide change.


The test cases here supposed to cover interactive real-time media hence 
we can discuss if screen sharing and slide shows is covered in the RMCAT 
requirements. The eval criteria draft is not so clear about this. This 
need to be clarified there first.

We will have to look at the traffic characteristics of the such an use 
case. It would be nice if you can provide traces of such scenario. will 
it be very much different case than the media pause and resume testcase? 
what does make it different?

If we think is as competition among multiple flows then we have test 
case 5.2 and 5.4 to cover that.

However if we think is as perceived video quality then it fits more when 
the performance of the whole RTC system is under investigation. That is 
not in the scope of the test cases.


>   * Each video stream is often accompanied by a constant bitrate audio
>     stream in these test cases, but it's not clear to me why?
Due to the difference in required bandwidth for audio and video, video 
streams were considered more challenging than audio streams. It was also 
assumed that in case of congestion one would like to adapt the video 
rate first before adapting audio rate. Hence the focus was on video 
streams. Hence it was decided that we can have test cases where audio 
can have constant bitrate.
I think we
>     need to clarify how the audio stream is going to be evaluated. Are
>     we going to look at end-to-end latency and jitter?
yes that is the idea, I think both of the metrics you mentioned is 
already captured in the test cases.
>   * It's suggested that video resolution should be specified in the test
>     cases, but we don't seem to suggest that video quality should be
>     evaluated (PSNR/SSIM). How are we then expecting that video
>     resolution will matter in these tests?
One can say the whole media source behavior description requires video 
quality evaluation. We had discussed this at length in the design team 
and agreed that we need some sort of guidance for the behavior to 
produce some comparable results. This is also part of why the WG has 
been asking to evaluate at least two candidates using the same system so 
that the comparison is more fair.

But the question is, how the change of resolution really changes packet 
patterns on the wire? This also has been discussed when we proposed 
different way of modeling synthetic video sources. At IETF 88 I showed 
that the video framesize of 720p 25fps 800kbps encoded stream is almost 
a match of the same with 480p 25fps 800kbps. Hence for CC evaluation 
purpose the change of resolution should not really matter much.


  Do we allow the encoder to
>     internally scale down the resolution depending on the available
>     bandwidth?
yes, isnt is natural to do that?
In that case, will be really be evaluating CC candidates
>     or are we starting to lean towards comparing full RTC systems?
>
no, comparing full RTC system was never on the menu at all. In fact the 
test cases are designed so that we can test candidate algorithms to see 
if they are safe enough to deploy in the Internet or not. This also 
serves the purpose of comparing the test results.

If there is any confusion on this CC candidate versus full RTC system 
comparison, please post some suggestions to clarify that.

To be honest, when you asked about the screen sharing case it felt like 
comparing full RTC system :-).



>
> *Raw comments:*
>
>       4.1.  Evaluation metircs  . . . . . . . . . . . . . . . . . . .   7
> Comments:
>
>   * metrics
>
>
> *Section 3*
> *
> *
> It should be noted that
>        depending on the test cases it is possible to have different path
>        characteristics in of the either directions.
>
> Comments:
>
>   * "in either of the directions"
>   * There is a bullet which looks misplaced after this.
>
>
> In a  testbed environment laboratory there may exist a significant
>        amount of traffic on portions of the network path between the
>        endpoints that is not desired for the purposes of these RMCAT
>        tests.
>
> Comments:
>
>   * I think we should define what a testbed environment laboratory is
>     somewhere. Is it a LAN with some network emulator, or is it a fully
>     simulated network?
We can elaborate that. The laboratory is a LAN with real machines and 
routers.
>
>
>    +  Bottleneck queue size: defines size of queue in terms of
>              queuing time when the queue is full (in milliseconds).
>
> Comments:
>
>   * "the size of the queue"
>
>
>        *  Application-related: defines the traffic source behaviour for
>           implementing the test case
>
> Comments:
>
>   * In this section we mention Video/Voice as media types. Should we
>     also consider screen content/slide shows? See general comments.
>   * Under adaptability we mention resolution as being an option of
>     adaptation. It's not clear to me how we should evaluate something
>     like that without comparing the visual quality in some way. See
>     general comments.
>
>
> *Section 4.3*
> *
> *
> Also it is possible
>                 that the media traffic generator used in a particular
>                 simulator or testbed if not capable of generating higher
>                 bitrate.  Hence we have selected a suitable bitrate range
>
> Comments:
>
>   * Spelling errors:
>       o "testbed is not"
>       o "bit rate"
>
> *Section 5.1*
> *
> *
> maximum Media Bit Rate is Greater than Link Capacity.  In this
>        case, the application will attempt to ramp up to its maximum bit
>        rate, since the link capacity is limited to a value lower, the
>        congestion control scheme is expected to stabilize the sending bit
>        rate close to the available bottleneck capacity.  This situation
>        can occur when the endpoints are connected via thin long networks
>        even though the advertised capacity of the access network may be
>        higher.
>
> Comments:
>
>   * I don't understand the last sentence here. Isn't it simpler to refer
>     to e.g. cable/adsl links where the uplink may be in the order of 256
>     - 1024 kbps.
you can think of a client connected to a wifi network that has a wire 
connection (as you mentioned) to the Internet. we can clarify this.
>
>
>     It should be noted that the exact variation in available capacity due
>     to any of the above depends on the under-lying technologies.  Hence,
>     we describe a set of known factors, which may be extended to devise a
>     more specific test case targeting certain behaviour in a certain
>     network environment.
>
> Expected behavior: the candidate algorithm is expected to detect the
>     path capacity constraint, converges to bottleneck link's capacity and
>
> Comments:
>
>   * "underlying"
>   * "targeting*a* certain"
>   * "converges to *the* bottleneck link's capacity"
>
>
>     o  Path characteristics: as described in Section 4.2
>
> Comments:
>
>   * I don't think there is a point in mentioning that we are using the
>     defaults.
>
>        *  This test uses the following one way propagation delays of 50
>           ms and 100 ms.
>
> Comments:
>
>   * "This test uses one way propagation delays of 50 ms and 100 ms"
>   * Is this referring to two instances of the test, or to forward and
>     backward delays?
two different runs of the test. need to clarify that. good catch.
>
>
> *Section 5.2*
> *
> *
>     Expected behavior: the candidate algorithms is expected to detect the
>     variation in available capacity and adapt the media stream(s)
>     accordingly.  The flows stabilize around their maximum bitrate as the
>     as the maximum link capacity is large enough to accommodate the
>     flows.  When the available capacity drops, the flow(s) adapts by
>     decreasing its sending bit rate, and when congestion disappears, the
>     flow(s) are again expected to ramp up.
>
> Comments:
>
>   * spelling error "maximum bitrate as the maximum link capacity"
>   * Rewrite last sentence as plural.
>
>
> *Section 5.3*
> *
> *
>     It is expected that the candidate algorithms is able to cope with the
>     lack of feedback information and adapt to minimize the performance
>     degradation of media flows in the forward channel.
>
>     It should be noted that for this test case: logs are compared with
>     the reference case, i.e, when the backward channel has no impairments
>
> Comments:
>
>   * "algorithms *are* able"
>   * End last sentence with ".
>   * I think the test duration should be longer since this is a fairly
>     complicated scenario. What about 300 s?
if running 300s shows different behavior than running for 100s then we 
should change it. Do you have any data to share?
>
>
> *Section 5.4*
>
>     In this test case, more than one RMCAT media flow shares the
>     bottleneck link and each of them uses the same congestion control
>     algorithm.  This is a typical scenario where a real-time interactive
>     application sends more than one media flows to the same destination
>     and these flows are multiplexed over the same port.
>
> Comments:
>
>   *
>   * I don't think multiplexing is a good example here. There is no
>     reason why they would actually fight for bandwidth between each
>     other if they are multiplexed over the same port, as they may as
>     well run a single CC and allocate the available bandwidth between
>     them. Better to take an example of two or more different endpoints
>     in the same office calling two or more endpoints in another office,
>     or similar.
Fine but I believe we are still talking about RTP as protocol and there 
almost everything is done per SSRC basis hence the congestion control 
should be able to operate per SSRC basis. Of course one can aggregate 
the congestion control but then that is special case, like using coupled CC.
>   * misspelled: "one media flows" should be "one media flow"
>
>
>     Testbed topology: Three media sources S1, S2, S3 are connected to
>     respective R1, R2, R3.
>
> Comments:
>
>   * Should probably be "are connected to R1, R2 and R3 respectively".
>
>
>
>               +---------+------------+------------+----------+
>               | Flow IF | Media type | Start time | End time |
>               +---------+------------+------------+----------+
>
> Comments:
>
>   * Flow ID
>
>
> *Section 5.1*
>
>        *  Path capacity ratio: 1.0
> Comments:
>
>   * Not necessary to mention as I think it is the default? Also
>     mentioned in other places.
>
>           +  Congestion control: Default TCP congestion control.
> Comments:
>
>   * Should we suggest what the default TCP CC could be? New reno, Cubic?
yes we can. what do you suggest?
>
> *Section 5.8*
> *
> *
>     In this test case, more than one real-time interactive media flows
>     share the link bandwidth and all flows reach to a steady state by
>     utilizing the link capacity in an optimum way.  At these stage one of
>     the media flow is paused for a moment.  This event will result in
>     more available bandwidth for the rest of the flows and as they are on
>     a shared link.  When the paused media flow will resume it would no
>     longer have the same bandwidth share on the link.  It has to make
>     it's way through the other existing flows in the link to achieve a
>     fair share of the link capacity.  This test case is important
>     specially for real-time interactive media which consists of more than
>     one media flows and can pause/resume media flow at any point of time
>     during the session.  This test case directly addresses the
>     requirement number 5 in [I-D.ietf-rmcat-cc-requirements].  One can
>     think it as a variation of test case defined in Section 5.4.
>     However, it is different as the candidate algorithms can use
>     different strategies to increase its efficiency, for example the
>     fairness, convergence time, reduce oscillation etc, by capitalizing
>     the fact that they have previous information of the link.
>
> Comments:
>
>   * Several spelling errors:
>       o "At *this* stage one of  the media *flows* is paused for a moment."
>       o
>       o "This event will result in more available bandwidth for the rest
>         of the flows *(and)* as they are on a shared link.  When the
>         paused media flow *resumes* it *will* no longer have the same
>         bandwidth share on the link."
>       o
>       o " ... one media flows and can pause/resume media *flows* at any
>         point of time during the session."
>       o
>       o "However, it is different as the candidate algorithms can use
>         different strategies to increase *their* efficiency, for example
>         *(the)* fairness"
>
>
> *Section 6*
>   It has been noticed that there are other interesting test cases
>     besides the basis test cases listed above.
>
> Comments:
>
>   * "base"
>
> *Section 6.2*
>     Testbed attributes:
>
>     o  Test duration: 120s
>
>     o  Path characteristics:
>
>        *  Reference bottleneck capacity between A and B = 2Mbps.
> ...
>
>        *  One-Way propagation delay:
>
>           1.  Between S1 and R1: 100ms
>
>           2.  Between S2 and R2: 40ms
>
>           3.  Between S3 and R3: 40ms
>
> Comments:
>
>   * Duration should probably be at least 300s given the complexity of
>     this setup.
>   * Does the reference bottleneck capacity apply to all links or only A
>     and B? If only A and B, what's the reference for the other links?
to all links.
>   * In my opinion this test case is complex enough without different
>     one-way propagation delays on the links.

Again thanks for the typos fix and nits.

[rmcat] Review on draft-ietf-rmcat-eval-test-02 Stefan Holmer
Re: [rmcat] Review on draft-ietf-rmcat-eval-test-… Zaheduzzaman Sarker