Re: [ippm] Adoption call for draft-cpaasch-ippm-responsiveness

Hello Greg,

> On Jan 6, 2022, at 3:00 PM, Greg Mirsky <gregimirsky@gmail.com> wrote:
> 
> Hi Christoph,
> a happy and healthy New Year to you and All!

Happy New Year to you as well!

> Thank you for your kind consideration of my notes and detailed responses. Please find my follow-up notes in-line below under the GIM>> tag.

Thanks for your replies. Please see inline:

> On Wed, Jan 5, 2022 at 10:52 AM Christoph Paasch <cpaasch@apple.com <mailto:cpaasch@apple.com>> wrote:
> Hello Greg,
> 
> thanks for your comments. Please see inline:
> 
>> On Dec 22, 2021, at 11:43 PM, Greg Mirsky <gregimirsky@gmail.com <mailto:gregimirsky@gmail.com>> wrote:
>> 
>> Dear Marcus, Authors, et al,
>> apologies for the belated response.
>> I've read the draft and have some comments to share with you:
>> as I understand it, the proposed new responsiveness metric is viewed as the single indicator of a bufferbloat condition in a network. As I recall, the discussion at Measuring Network Quality for End-Users workshop and on the mailing list <https://mailarchive.ietf.org/arch/browse/network-quality-workshop/?gbt=1&index=cuW_1lh4DD22V28EvlPFB_NjjZY> indicated, that there’s no consensus on what behaviors, symptoms can reliably signal the bufferbloat.
> We are not trying for this responsiveness metric to be "the single indicator of bufferbloat". Bufferbloat can be measured in many different number of ways. And each of these will produce a correct, but a different result. Thus, "bufferbloat" is whatever the methodology tries to detect.
> 
> Let me give an example of two methodologies that are both correct but both will produce entirely different numbers :
> 
> If we would decide to generate the load by flooding the network with UDP traffic from a specific 4-tuple and measure latency with parallel ICMP pings. Then, on a over-buffered FIFO queue we would measure huge latencies (thus correctly expose bufferbloat), while on a FQ-codel queue we would not measure any bufferbloat.
> 
> If on the other hand, the load-generating traffic is changing the source-port for every single UDP-packet, then in both the FIFO-queue and the FQ-codel queue we will measure huge amounts of bufferbloat.
> 
> Thus, these two methods both produced correct results but with hugely different numbers in the FQ-codel case. [1]
> 
> Now, while both methods measure some variant of bufferbloat, they both don't measure a realistic usage of the network.
> GIM>> Thank you for the insights. It seems to me that what the method can demonstrate is rather the level of efficiency of the AQM in the network for a particular class of applications.

Yes, that is a good description. It is for a "particular class of applications" and we are trying to make this class of applications representative of a "typical user-scenario". (admittedly, we can debate forever on what kind of applications are representative and I would love to have that debate :-)).

On the point of "efficiency of the AQM". I would go even further that it's not only AQM but also the client- and server-side implementations of these applications (as noted further below).

> 
> 
> That is why the "Responsiveness under working conditions" tries to clearly specify how the load is generated and how the latency is being measured. And it does not measure "bufferbloat" but it measures "responsiveness under working conditions" based on the methodology that is being used (using HTTP/2 or HTTP/3, multiple flows, ...). It does expose bufferbloat which can happen in the network. It also exposes certain server-side behaviors that can cause (huge amounts of) additional latency - those behaviors are typically not called "bufferbloat".
> GIM>> Thank you for pointing out that the result of the RTT measurement has two contributing factors - network and server.

Yes, servers contribute as do the client-side implementations. It's all three (client, network, server) that need to work "correctly" to achieve good responsiveness. Btw., as we are now gathering more experience with our methodology in different environments we find that the biggest portions of latency actually come from the server-side. We see several seconds of latency introduced by the HTTP/2 and TCP implementations.

> It seems worth enhancing the method to localize each contribution and measure them separately.

With the latency measuring probes being sent on load-bearing connections and separate connections and with the separate connections serving to measure DNS/TCP/... individually, the different data-points actually allow to localize to some extend.

However, I would be reluctant to dive too deep into localization/trouble-shooting/debugging of networks as part of this I-D. As this opens a whole new can of worms. We could then start thinking about sending latency-probes while playing with the IP TTL to find which router is introducing the latency,... It's an entirely different research-topic IMO :-) Dave Taht was thinking of starting something along these lines (https://github.com/dtaht/wtbb <https://github.com/dtaht/wtbb>).

>> It seems that it would be reasonable to first define what is being measured, characterized by the responsiveness metric. Having a document that discusses and defines the bufferbloat would be great.
> I agree that there is a lack of definition for what "bufferbloat" really is.
> 
> The way we look at "responsiveness under working conditions" is that it measures the latency in conditions that may realistically happen in worst-case scenarios with end-users/implementations that are non-malicious (non-malicious to exclude the UDP-flooding scenario).
> 
> Thus, I assume we should make a better job at explaining this. The lack of a formal definition of "bufferbloat" doesn't help and thus we are indeed using this term a bit freely in the current draft. We will improve the Introduction to better set the stage (https://github.com/network-quality/draft-cpaasch-ippm-responsiveness/issues/31 <https://github.com/network-quality/draft-cpaasch-ippm-responsiveness/issues/31>).
> 
>> It seems like in the foundation of the methodology described in the draft lies the assumption that without adding new flows the available bandwidth is constant, does not change. While that is mostly the case, there are technologies that behave differently and may change bandwidth because of the outside conditions. Some of these behaviors of links with variable discrete bandwidth are discussed in, for example, RFC 8330 <https://datatracker.ietf.org/doc/rfc8330/> and RFC 8625 <https://datatracker.ietf.org/doc/rfc8625/>.
> I'm not sure I entirely understand your comment. But let me explain why we are gradually adding new flows:
> 
> 1. TCP-implementations have usually a fixed limit for the upper bound of the receive window. In some networks that upper bound is lower than the BDP of the network. Thus, the only way to reach full capacity is by having multiple flows.
> 2. Having multiple connections allows to quicker achieve full capacity in high-RTT networks and thus speeds up the test-duration.
> 3. In some networks with "random" packet-loss, congestion-control may come in the way of achieving full capacity. Again, multiple flows will work around that.
> GIM>> I might have asked several questions at once. Let me clarify what I am looking for:
> As I understand the method of creating the "working conditions in a network" is based on certain assumptions. First, seems is that the bandwidth is symmetrical between the measurement points. Second, that the bandwidth doesn't change for the duration of the measurement session. AFAIK, in the access networks, both are not necessarily always the case.
We don't have the assumption that bandwidth is symmetrical (assuming, you mean uplink/downlink symmetry - please clarify otherwise).

The load-generating algorithm runs independently for uplink and downlink traffic. And it is perfectly fine when both have huge asymmetry.

Regarding the stability of the bandwidth:
You are making a good point indeed that we assume that the bandwidth is to some extend stable while ramping up the flows to "working conditions". Admittedly that assumption does not always hold, and that is one of the reasons why we try hard for the test to not take too long.
I'm not sure how we could adjust the algorithm for varying bandwidth without introducing too much complexity. I'm open for suggestions :-)
> On the other hand, I might have missed how the method of creating the "working conditions" guarantees a symmetrical load between the measurement points.
As mentioned above, we don't assume a symmetrical load. Can you show us where in the draft we give that impression, so we can fix that?

>> 
>> Then, I find the motivation not to use time units to express the responsiveness metric not convincing:
>>    "Latency" is a poor measure of responsiveness, since it can be hard
>>    for the general public to understand.  The units are unfamiliar
>>    ("what is a millisecond?") and counterintuitive ("100 msec - that
>>    sounds good - it's only a tenth of a second!").
> 
> Can you expand on what exactly is not convincing to you? Do you think that people will mis-understand the metric or that milli-seconds is the right way to communicate responsiveness to the general public?
> GIM>> Let me try. We know packet delay requirements for AR, VR applications. I believe that gamers are familiar with these numbers too. The same is likely the case for the industrial automation use cases served, for example, by Deterministic Networking.

I can understand that for a technical audience, milli-seconds is easy and familiar. A non-technical audience might be more open to accepting a new "higher-is-better" metric. Responsiveness is something new and abstract so, it's kind of natural that it comes with a new unit.

But I fully recognize that that's a controversial topic and can be discussed at length :)

Cheers,
Christoph

> 
> 
> Thanks a lot,
> Christoph
> 
> [1] And there are many networks that prioritize ICMP pings, thus we could observe even more different results based on what protocol is used to measure the latency.
> 
>> 
>> On Mon, Dec 6, 2021 at 7:53 AM Marcus Ihlar <marcus.ihlar=40ericsson.com@dmarc.ietf.org <mailto:40ericsson.com@dmarc.ietf.org>> wrote:
>> Hi IPPM,
>> 
>>  
>> 
>> This email starts an adoption call for draft-cpaasch-ippm-responsiveness, "Responsiveness under Working Conditions”. This document specifies the “RPM Test” for measuring user experience when the network is fully loaded. The intended status of the document is Experimental.   
>> 
>>  
>> 
>> https://datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/ <https://datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/>
>> https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01 <https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01>
>>  
>> 
>> This adoption call will last until Monday, December 20. Please review the document, and reply to this email thread to indicate if you think IPPM should adopt this document.
>> 
>>  
>> 
>> BR,
>> 
>> Marcus
>> 
>>  
>> 
>> _______________________________________________
>> ippm mailing list
>> ippm@ietf.org <mailto:ippm@ietf.org>
>> https://www.ietf.org/mailman/listinfo/ippm <https://www.ietf.org/mailman/listinfo/ippm>
>> _______________________________________________
>> ippm mailing list
>> ippm@ietf.org <mailto:ippm@ietf.org>
>> https://www.ietf.org/mailman/listinfo/ippm <https://www.ietf.org/mailman/listinfo/ippm>
>