Re: [ippm] Adoption call for draft-cpaasch-ippm-responsiveness

Hi Christoph,
thank you for putting your thought into my comments. Your understanding is
absolutely correct and the new section, as you've outlined it, would make
the document even more useful to a reader. With that plan in place, I
support the adoption of the draft and would help with review and comments.

Regards,
Greg

On Tue, Feb 15, 2022 at 4:11 PM Christoph Paasch <cpaasch@apple.com> wrote:

> Hello Greg,
>
> On Feb 8, 2022, at 1:10 PM, Greg Mirsky <gregimirsky@gmail.com> wrote:
>
> Hi Christoph,
> apologies for the belated response and thank you for sharing interesting
> details of you using the measurement method. I think that if the
> measurement method can not only provide the Round-trip Per Minute (RPM)
> metric but expose the network propagation and residential components of the
> round-trip delay, then it seems to me, the scope of the draft to be aligned
> with the charter of the IPPM WG and I'll be in favor of the WG adoption of
> the work.
> What do you think? What is the opinion of the authors and the WG?
>
>
> I am assuming that with "residential components" you mean the
> server/client-side contribution to the measured latency, right?
>
> In that case, yes the method does allow to separate these, as
> latency-probes are sent on both the load-generating connections and on
> separate connections. The difference between the two represents the
> "server-side contribution" to the latency.
>
> I think what would be helpful would be a section in the draft that
> explains the different sources of latency (network, server, client) and how
> they affect the final RPM-number and how one can separate out these two
> components. It is also important to understand that the results are highly
> implementation-dependent. And explaining that in this section should help,
> I believe.
>
> Would that be in line with what you are looking for?
>
>
> Thanks,
> Christoph
>
>
>
> Regards,
> Greg
>
> On Thu, Jan 6, 2022 at 4:42 PM Christoph Paasch <cpaasch@apple.com> wrote:
>
>> Hello Greg,
>>
>> On Jan 6, 2022, at 3:00 PM, Greg Mirsky <gregimirsky@gmail.com> wrote:
>>
>> Hi Christoph,
>> a happy and healthy New Year to you and All!
>>
>>
>> Happy New Year to you as well!
>>
>> Thank you for your kind consideration of my notes and detailed responses.
>> Please find my follow-up notes in-line below under the GIM>> tag.
>>
>>
>> Thanks for your replies. Please see inline:
>>
>> On Wed, Jan 5, 2022 at 10:52 AM Christoph Paasch <cpaasch@apple.com>
>> wrote:
>>
>>> Hello Greg,
>>>
>>> thanks for your comments. Please see inline:
>>>
>>> On Dec 22, 2021, at 11:43 PM, Greg Mirsky <gregimirsky@gmail.com> wrote:
>>>
>>> Dear Marcus, Authors, et al,
>>> apologies for the belated response.
>>> I've read the draft and have some comments to share with you:
>>>
>>>    - as I understand it, the proposed new responsiveness metric is
>>>    viewed as the single indicator of a bufferbloat condition in a network. As
>>>    I recall, the discussion at Measuring Network Quality for End-Users
>>>    workshop and on the mailing list
>>>    <https://mailarchive.ietf.org/arch/browse/network-quality-workshop/?gbt=1&index=cuW_1lh4DD22V28EvlPFB_NjjZY>
>>>    indicated, that there’s no consensus on what behaviors, symptoms can
>>>    reliably signal the bufferbloat.
>>>
>>> We are not trying for this responsiveness metric to be "the single
>>> indicator of bufferbloat". Bufferbloat can be measured in many different
>>> number of ways. And each of these will produce a correct, but a different
>>> result. Thus, "bufferbloat" is whatever the methodology tries to detect.
>>>
>>> Let me give an example of two methodologies that are both correct but
>>> both will produce entirely different numbers :
>>>
>>> If we would decide to generate the load by flooding the network with UDP
>>> traffic from a specific 4-tuple and measure latency with parallel ICMP
>>> pings. Then, on a over-buffered FIFO queue we would measure huge latencies
>>> (thus correctly expose bufferbloat), while on a FQ-codel queue we would not
>>> measure any bufferbloat.
>>>
>>> If on the other hand, the load-generating traffic is changing the
>>> source-port for every single UDP-packet, then in both the FIFO-queue and
>>> the FQ-codel queue we will measure huge amounts of bufferbloat.
>>>
>>> Thus, these two methods both produced correct results but with hugely
>>> different numbers in the FQ-codel case. [1]
>>>
>>> Now, while both methods measure some variant of bufferbloat, they both
>>> don't measure a realistic usage of the network.
>>>
>> GIM>> Thank you for the insights. It seems to me that what the method can
>> demonstrate is rather the level of efficiency of the AQM in the network for
>> a particular class of applications.
>>
>>
>> Yes, that is a good description. It is for a "particular class of
>> applications" and we are trying to make this class of applications
>> representative of a "typical user-scenario". (admittedly, we can debate
>> forever on what kind of applications are representative and I would love to
>> have that debate :-)).
>>
>> On the point of "efficiency of the AQM". I would go even further that
>> it's not only AQM but also the client- and server-side implementations of
>> these applications (as noted further below).
>>
>>
>>
>>>
>>> That is why the "Responsiveness under working conditions" tries to
>>> clearly specify how the load is generated and how the latency is being
>>> measured. And it does not measure "bufferbloat" but it measures
>>> "responsiveness under working conditions" based on the methodology that is
>>> being used (using HTTP/2 or HTTP/3, multiple flows, ...). It does expose
>>> bufferbloat which can happen in the network. It also exposes certain
>>> server-side behaviors that can cause (huge amounts of) additional latency -
>>> those behaviors are typically not called "bufferbloat".
>>>
>> GIM>> Thank you for pointing out that the result of the RTT measurement
>> has two contributing factors - network and server.
>>
>>
>> Yes, servers contribute as do the client-side implementations. It's all
>> three (client, network, server) that need to work "correctly" to achieve
>> good responsiveness. Btw., as we are now gathering more experience with our
>> methodology in different environments we find that the biggest portions of
>> latency actually come from the server-side. We see several seconds of
>> latency introduced by the HTTP/2 and TCP implementations.
>>
>> It seems worth enhancing the method to localize each contribution and
>> measure them separately.
>>
>>
>> With the latency measuring probes being sent on load-bearing connections
>> and separate connections and with the separate connections serving to
>> measure DNS/TCP/... individually, the different data-points actually allow
>> to localize to some extend.
>>
>> However, I would be reluctant to dive too deep into
>> localization/trouble-shooting/debugging of networks as part of this I-D. As
>> this opens a whole new can of worms. We could then start thinking about
>> sending latency-probes while playing with the IP TTL to find which router
>> is introducing the latency,... It's an entirely different research-topic
>> IMO :-) Dave Taht was thinking of starting something along these lines (
>> https://github.com/dtaht/wtbb).
>>
>>
>>>
>>>
>>>    - It seems that it would be reasonable to first define what is being
>>>    measured, characterized by the responsiveness metric. Having a document
>>>    that discusses and defines the bufferbloat would be great.
>>>
>>> I agree that there is a lack of definition for what "bufferbloat" really
>>> is.
>>>
>>> The way we look at "responsiveness under working conditions" is that it
>>> measures the latency in conditions that may realistically happen in
>>> worst-case scenarios with end-users/implementations that are non-malicious
>>> (non-malicious to exclude the UDP-flooding scenario).
>>>
>>> Thus, I assume we should make a better job at explaining this. The lack
>>> of a formal definition of "bufferbloat" doesn't help and thus we are indeed
>>> using this term a bit freely in the current draft. We will improve the
>>> Introduction to better set the stage (
>>> https://github.com/network-quality/draft-cpaasch-ippm-responsiveness/issues/31
>>> ).
>>>
>>>
>>>    - It seems like in the foundation of the methodology described in
>>>    the draft lies the assumption that without adding new flows the
>>>    available bandwidth is constant, does not change. While that is mostly the
>>>    case, there are technologies that behave differently and may change
>>>    bandwidth because of the outside conditions. Some of these behaviors of
>>>    links with variable discrete bandwidth are discussed in, for example, RFC
>>>    8330 <https://datatracker.ietf.org/doc/rfc8330/> and RFC 8625
>>>    <https://datatracker.ietf.org/doc/rfc8625/>.
>>>
>>> I'm not sure I entirely understand your comment. But let me explain why
>>> we are gradually adding new flows:
>>>
>>> 1. TCP-implementations have usually a fixed limit for the upper bound of
>>> the receive window. In some networks that upper bound is lower than the BDP
>>> of the network. Thus, the only way to reach full capacity is by having
>>> multiple flows.
>>> 2. Having multiple connections allows to quicker achieve full capacity
>>> in high-RTT networks and thus speeds up the test-duration.
>>> 3. In some networks with "random" packet-loss, congestion-control may
>>> come in the way of achieving full capacity. Again, multiple flows will work
>>> around that.
>>>
>> GIM>> I might have asked several questions at once. Let me clarify what I
>> am looking for:
>>
>>    - As I understand the method of creating the "working conditions in a
>>    network" is based on certain assumptions. First, seems is that the
>>    bandwidth is symmetrical between the measurement points. Second, that the
>>    bandwidth doesn't change for the duration of the measurement session.
>>    AFAIK, in the access networks, both are not necessarily always the case.
>>
>> We don't have the assumption that bandwidth is symmetrical (assuming, you
>> mean uplink/downlink symmetry - please clarify otherwise).
>>
>> The load-generating algorithm runs independently for uplink and downlink
>> traffic. And it is perfectly fine when both have huge asymmetry.
>>
>>
>> Regarding the stability of the bandwidth:
>> You are making a good point indeed that we assume that the bandwidth is
>> to some extend stable while ramping up the flows to "working conditions".
>> Admittedly that assumption does not always hold, and that is one of the
>> reasons why we try hard for the test to not take too long.
>> I'm not sure how we could adjust the algorithm for varying bandwidth
>> without introducing too much complexity. I'm open for suggestions :-)
>>
>>
>>    - On the other hand, I might have missed how the method of creating
>>    the "working conditions" guarantees a symmetrical load between the
>>    measurement points.
>>
>> As mentioned above, we don't assume a symmetrical load. Can you show us
>> where in the draft we give that impression, so we can fix that?
>>
>>
>>>
>>>    - Then, I find the motivation not to use time units to express the
>>>    responsiveness metric not convincing:
>>>
>>>    "Latency" is a poor measure of responsiveness, since it can be hard
>>>    for the general public to understand.  The units are unfamiliar
>>>    ("what is a millisecond?") and counterintuitive ("100 msec - that
>>>    sounds good - it's only a tenth of a second!").
>>>
>>>
>>> Can you expand on what exactly is not convincing to you? Do you think
>>> that people will mis-understand the metric or that milli-seconds is the
>>> right way to communicate responsiveness to the general public?
>>>
>> GIM>> Let me try. We know packet delay requirements for AR, VR
>> applications. I believe that gamers are familiar with these numbers too.
>> The same is likely the case for the industrial automation use cases served,
>> for example, by Deterministic Networking.
>>
>>
>> I can understand that for a technical audience, milli-seconds is easy and
>> familiar. A non-technical audience might be more open to accepting a new
>> "higher-is-better" metric. Responsiveness is something new and abstract so,
>> it's kind of natural that it comes with a new unit.
>>
>> But I fully recognize that that's a controversial topic and can be
>> discussed at length :)
>>
>>
>> Cheers,
>> Christoph
>>
>>
>>>
>>> Thanks a lot,
>>> Christoph
>>>
>>> [1] And there are many networks that prioritize ICMP pings, thus we
>>> could observe even more different results based on what protocol is used to
>>> measure the latency.
>>>
>>>
>>> On Mon, Dec 6, 2021 at 7:53 AM Marcus Ihlar <marcus.ihlar=
>>> 40ericsson.com@dmarc.ietf.org> wrote:
>>>
>>>> Hi IPPM,
>>>>
>>>>
>>>>
>>>> This email starts an adoption call for
>>>> draft-cpaasch-ippm-responsiveness, "Responsiveness under Working
>>>> Conditions”. This document specifies the “RPM Test” for measuring user
>>>> experience when the network is fully loaded. The intended status of the
>>>> document is Experimental.
>>>>
>>>>
>>>>
>>>> https://datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/
>>>>
>>>>
>>>> https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01
>>>>
>>>>
>>>>
>>>> This adoption call will last until *Monday, December 20*. Please
>>>> review the document, and reply to this email thread to indicate if you
>>>> think IPPM should adopt this document.
>>>>
>>>>
>>>>
>>>> BR,
>>>>
>>>> Marcus
>>>>
>>>>
>>>> _______________________________________________
>>>> ippm mailing list
>>>> ippm@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/ippm
>>>>
>>> _______________________________________________
>>> ippm mailing list
>>> ippm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/ippm
>>>
>>>
>>>
>>
>