Re: [aqm] comments on draft-ietf-aqm-eval-guidelines-00

Hi Alfred, 

Thank you for your comments on the document. 
We are currently working on the document and an updated version 
should be uploaded in the next couple of days. 

Also, we have already integrated some modifications following
your comments. You can have more information about it below. 

> On 30 Jan 2015, at 19:57, MORTON, ALFRED C (AL) <acmorton@att.com> wrote:
> 
> Hi Nicolas, Preethi, David, and Naeem,
> 
> After discussing several performance-related topics 
> in AQM, BMWG, and IPPM with Richard Scheffenegger,
> I agreed to review your draft from the point of view of existing
> Benchmarking activities in IETF.
> 
> I'll just make a few suggestions in this first pass and see where 
> that takes us, didn't go the github route. 
> My comments are prefaced by ACM:
> 
> regards,
> Al
> 
> ACM:
> I read in Section 12:
> 
> 12.1.  Methodology
> 
>   A sufficiently detailed description of the test setup SHOULD be
>   provided.  Indeed, that would allow other to replicate the tests if
>   needed.  This test setup MAY include software and hardware versions.
> ACM:
> The goal, IMHO, is to produce repeatable tests. 
> Thinking ahead to the results, at least one of the parties tested 
> will want to repeat the tests for various reasons, such as wanting 
> understand the details behind performance of their system. Accepting
> this view would mean that requirement levels above are MUST, not 
> SHOULD or MAY. 
> 

We agree - The above paragraph will be updated in that sense for the next version.
"
  A sufficiently detailed description of the test setup MUST be
  provided.  Indeed, that would allow other to replicate the tests if
  needed.  This test setup MAY include software and hardware versions.
"

>   The tester MAY make its data available.
> 
>   The proposals SHOULD be experimented on real systems, or they MAY be
>   evaluated with event-driven simulations (such as NS-2, NS-3, OMNET,
>   etc.).  The proposed scenarios are not bound to a particular
>   evaluation toolset.
> 
> 12.2.  Comments on metrics measurement
> 
>   In this document, we present the end-to-end metrics that SHOULD be
>   evaluated to evaluate the trade-off between latency and goodput. 
> ACM:
> You are better-off to agree on the mandatory set of metrics for 
> evaluation before hand, or you expose the analysis to variation
> with the introduction of new metrics when the tests are repeated.
> 

We agree that this would have been a better approach. We started with
that in the early versions of the document, but as a generic set of the metric
can hardly be of interest when the traffic and network characteristics change,
we decided to "just" present them all. Then throughout all the document, we suggest
the metrics that could be generated for every given scenario.

>   The
>   queue-related metrics enable a better understanding of the AQM
>   behavior under tests and the impact of its internal parameters.
>   Whenever it is possible, these guidelines advice to consider queue-
>   related metrics, such as link utilization, queuing delay, queue size
>   or packet loss.
> ACM:
> It's fine to collect internal metrics for information, but the 
> evaluation should be based on externally observed metrics (what
> we call blackbox metrics in BMWG).
> 

We agree on that point. To reflect this, we have updated the text as follows:

"
Whenever it is possible (e.g. depending
   on the features provided by the hardware/software), these guidelines
   RECOMMEND to collect queue-level metrics, such as link utilization,
   queuing delay, queue size or packet drop/mark statistics in addition
   to the AQM-specific parameters.  However, the evaluation MUST be
   primarily based on externally observed end-to-end metrics.
"

>   These guidelines could hardly detail the way the metrics can be
>   measured depends highly on the evaluation toolset.
> ACM:
> It's still important to say exactly how a given toolset 
> performs a measurement to collect each specific metric.
> This way, at least what was measured can be understood,
> and other tools might be modified to measure the same way.

We agree but the problem is that the guidelines are supposed not to be
dependent on the platform used and should suit for evaluations based on simulations,
emulation or real-platforms. We do not clearly see how we can make a consistent
and fully detailed list.

> ACM:
> Alternatively, you could point to existing standards with
> metric definitions and find a tool set that will measure them
> (which is likely, because they *are* standardized).
> More about this below.
> 

ACK

> ---end of Section 12 comments ----
> 
> ACM: 
> Since we ended up on Metrics, let's go back to section 
> 
> 2.  End-to-end metrics
> 
> ...This section presents the metrics that COULD
>   be used to better quantify (1) the reduction of latency, (2)
>   maximization of goodput and (3) the trade-off between the two.  These
>   metrics SHOULD be considered to better assess the performance of an
>   AQM scheme.
> 
>   The metrics listed in this section are not necessarily suited to
>   every type of traffic detailed in the rest of this document.  It is
>   therefore NOT REQUIRED to measure all of following metrics.
> ACM:
> It seems more clear to say what metrics will be certainly be measured,
> and the applicability of those metrics. I see a key distinction under
> UDP and TCP testing, where the "Goodput" metric only has to account for
> retransmissions with TCP, for example. But this is a case where you normally won't have retransmissions with UDP (unless there is some higher-
> layer integrity on stream or messages imposed).
> You can also have a set of OPTIONAL metrics, and with their
> respective applicability. 
> 

At the beginning, we had two lists of metrics. Some that  were queue-related (and optional)
and others being end-to-end level (and mandatory). Because the viability of a measurement 
is not only a function of the transport layer characteristic, but also of the scenario or the 
requirements of the applications, we thought it was more reasonable to list all the metrics 
and say that all are not required. We acknowledge that this might not be the clearest solution.
We have however been more precised on that in the next version of the draft:  

"
   The metrics listed in this section are not necessarily suited to
   every type of traffic detailed in the rest of this document.  It is
   therefore NOT REQUIRED to measure all of the following metrics in
   every scenario discussed in this document necessarily, if the chosen
   metric is not relevant to the context of the evaluation scenario
   (e.g. latency vs. gooodput trade-off in application-limited traffic
   scenarios).  The tester SHOULD however measure and report on all the
   metrics relevant to the context of the evaluation scenario.
"

> 2.1.  Flow Completion time
> 
>   The flow completion time is an important performance metric for the
>   end user.  Considering the fact that an AQM scheme may drop packets,
>   the flow completion time is directly linked to the dropping policy of
>   the AQM scheme.  This metric helps to better assess the performance
>   of an AQM depending on the flow size.
> ACM:
> It would be good to recognize a relationship for this metric:
> 
>  Flow Completion Time, s  = (flow size, bits) / (Goodput for the flow, bits/s)
> 
> because you later specify Goodput.
> 

All right ! We have added the following lines in the updated version:

"
The Flow Completion Time (FCT) is related to the flow size (Fs) and the Goodput for the flow (G) as follows:
FCT [s] = Fs [B] / ( G [Mbps] / 8 )
"

> 2.2.  Packet loss
> 
>   Packet losses, that may occur in a queue, impact on the end-to-end
>   performance at the receiver's side.
> 
>   The tester MUST evaluate, at the receiver:
> 
>   o  the packet loss probability: this metric should be frequently
>      measured during the experiment as the long term loss probability
>      is of interests for steady state scenarios only;
> 
>   o  the interval between consecutive losses: the time between two
>      losses should be measured.  From the set of interval times, the
>      tester should present the median value, the minimum and maximum
>      values and the 10th and 90th percentiles.
> 
> ACM:
> In lab testing, there are two practical ways to assess loss:
> 
> - for all packets sent, check that a corresponding packet was received
>  within a reasonable time for transmission, Tmax (RFC2679)
> - keep a count of all packets sent, and count the non-duplicate packets
>  received when done sending allowing time for queues to empty (RFC2544)
> 
> either can produce the Loss Ratio = Lost Packets / Total Sent
> 
> The interval between consecutive losses is called a Gap in RFC3611,
> where the density of bursts can be variable but busts can be forced
> to be consecutive losses by setting parameter Gmin = 0. 
> 
> 

Thank you a lot for the pointers.
The text will be updated as follows:
"

   The packet loss probability can be assessed by simply evaluating the
   loss ratio as a function of the number of lost packets and the total
   number of packets sent.  This might not be easily done in laboratory
   testing, for which these guidelines advice the tester:

   o  to check that for every packet, a corresponding packet was
      received within a reasonable time, as explained in [RFC2679].

   o  to keep a count of all packets sent, and a count of the non-
      dupplicate packets received, as explained in the section 10 of
      [RFC2544].

   The interval between consecutive losses, which is also called a gap,
   is a metric of interest for VoIP traffic and, as a result, has been
   further specified in [RFC3611].

"

> 2.3.  Packet loss synchronization
> ACM:
> This is sync between flows, it's not been covered in detail at IETF, AFAIK
> but I simply add that many network events, such as failed link 
> restoration, cause correlated or synchronized loss across active flows.
> 
> 

In order to reflect that in the document, we have added:
"
   If an AQM scheme is evaluated using real-life network environments,
   it is worth pointing out that some network events, such as failed
   link restoration may cause synchronized losses between active flows
   and thus confuse the meaning of this metric.
"

> 2.4.  Goodput
> ACM:
> BMWG has a definition of Goodput in RFC 2647:
> http://tools.ietf.org/html/rfc2647#section-3.17
>> 
>>   Definition:
>>     The number of bits per unit of time forwarded to the correct
>>     destination interface of the DUT/SUT, minus any bits lost or
>>     retransmitted.
> Here, DUT is Device Under Test, SUT is System Under Test, and
> we recently clarified on the list that all bits/packets lost
> were intended to be attributable to the DUT/SUT.
> This means that the test setup needs to be qualified to assure
> that it is not generating loss on its own.
> 

Thanks a lot for the pointer. We have added the following definition in the next version of the document:

"
   The goodput has been defined in the section 3.17 of [RFC2647] as the
   number of bits per unit of time forwarded to the correct destination
   interface of the Device Under Test (DUT) or the System Under Test
   (SUT), minus any bits lost or retransmitted.  This definition induces
   that the test setup needs to be qualified to assure that it is not
   generating losses on its own.
"

> 2.5.  Latency and jitter
> 
>   The end-to-end latency differs from the queuing delay: it is linked
>   to the network topology and the path characteristics.  Moreover, the
>   jitter strongly depends on the traffic and the topology as well.  The
>   introduction of an AQM scheme would impact on these metrics and the
>   end-to-end evaluation of performance SHOULD consider them to better
>   assess the AQM schemes.
> 
>   The guidelines advice that the tester SHOULD determine the minimum,
>   average and maximum measurements for these metrics and the
>   coefficient of variation for their average values as well.
> ACM:
> Latency is fairly easy to define, I suggest RFC2679.
> "Jitter" has several definitions around, so we studied the trade-offs
> years ago. The short answer is to measure delay variation according to
> the difference between the packet in the stream/flow with minimum delay,
> subtracted from all other packet delays, and then describe that shifted
> distribution with a high percentile and other summary statistics.
> This is Packet Delay Variation (PDV), in RFC 5481.  
> The comparison between two key delay variation metrics is tabulated here:
> http://tools.ietf.org/html/rfc5481#section-7.3.
> When there are few restrictions on measurement equipment, as we typically find in
> lab work, PDV serves all use cases quite nicely.
> 
> 

Thanks. We have updated the text as follows:

"
   The latency, or the one-way delay metric, is discussed in [RFC2679].
   There is a consensus on a adequate metric for the jitter, that
   represents the one-way delay variations for packets from the same
   flow: the Packet Delay Variation (PDV), detailed in [RFC5481], serves
   well all use cases.
"

> 
> 4.  Various TCP variants
> ACM:
> I'll simply say that BMWG is concerned with the repeatability properties
> of TCP testing, but we are investing some effort to understand the
> issues and are working on a draft which enters into testing with stateful
> flows: http://tools.ietf.org/html/draft-ietf-bmwg-traffic-management
> Some folks from AQM have already commented on this, but it may be 
> worth a look.
> 
> 
> 

Thanks, we will have a look ASAP.

Thanks a lot,

Kind regards, 

Nicolas

> 
> 
> _______________________________________________
> aqm mailing list
> aqm@ietf.org
> https://www.ietf.org/mailman/listinfo/aqm