Re: [ippm] Adoption call for draft-cpaasch-ippm-responsiveness

Greg Mirsky <gregimirsky@gmail.com> Tue, 08 February 2022 21:10 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7A1F63A11CC for <ippm@ietfa.amsl.com>; Tue, 8 Feb 2022 13:10:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ceHC5iQ0slTF for <ippm@ietfa.amsl.com>; Tue, 8 Feb 2022 13:10:40 -0800 (PST)
Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D5A3A3A11B8 for <ippm@ietf.org>; Tue, 8 Feb 2022 13:10:39 -0800 (PST)
Received: by mail-ej1-x634.google.com with SMTP id fy20so1529257ejc.0 for <ippm@ietf.org>; Tue, 08 Feb 2022 13:10:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=m/wtuzWrj8YQXkGTo6qkS6Rm+b7lxzGiKk3fcXW/Ahc=; b=DJsVDZ5bzE/v/sGqKYtT1OTscLKmvKJDKot9GwZMlg61Ug1ANGvri2un04qPQxZrzE S0CzjTJCDRXx7rfv0D2liP+0Bt2YHBCt7YaR9+p35zIR1MwYtYObxnrdpptejOjhtyeG 6s1y2snQkjJFdxomRzmovigGgjdLzKRollx4URPgh0RnL5HANptRhwAG9XwxkiTT83x1 4u85KkmLjXnFV25i7sYfsp5OnpOOBcH+KDQe4t1MgaLHfuay9Po2ZNxqrrbQ8KY2t9M2 3TYnHHq5ZfqGdwAbzRgEnX5ZibZCE2UCpUUERyOh65ZyfwSEhDSwwDXT2Xe2lG/Pquf1 ILcg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=m/wtuzWrj8YQXkGTo6qkS6Rm+b7lxzGiKk3fcXW/Ahc=; b=UNDJ5aNnCc/QQ8fLVQmmXg/0Ueconv56jIpjQfqnoMouyV6tJTOpidN09N2vaUDvct dTPqi2gk3Qa5PJ0AgQ2pHPjKGE0/mhJx0jtRy4KqjMB6EsUWlVTFol8wtFAByO/xTIkE 9e4SNmUmzJBdxkopWutx+iDICKH3fULfPOe7CW+m1RAlh7hVJp2desFfDjsYOkhfCJYt yw4b04JBBMGYJxkagVQzTiArc/UBMKkmMm1aHR04gGmEv7AN3Flt4fx++pU+G0EkAsFF OuZRLeQcBoKLvvvvpFyyr88MliF9NH9kBxi1S4xAWvhMXd5i7gkJwYFFqM6ol2p88Cr4 zPyg==
X-Gm-Message-State: AOAM530jbuQUvo+erJv4EchGhgSutysur9LIm4vg3POkFZUckjrQHnJq oi9nCWyzgn8A56r1tlgPIy+rL/QSCfwB936Qeis=
X-Google-Smtp-Source: ABdhPJxycA6B+CNXH/fo7pazdbpLLgnPRyW/fv8iaSKDjYz0In4uoHJWmA0kDdbL9HoKlklNUFJXrHXfXEqDP2Y0HhM=
X-Received: by 2002:a17:907:72d0:: with SMTP id du16mr5184739ejc.506.1644354636886; Tue, 08 Feb 2022 13:10:36 -0800 (PST)
MIME-Version: 1.0
References: <AM0PR07MB4131542BCD0A6DE3F82F1E19E26D9@AM0PR07MB4131.eurprd07.prod.outlook.com> <CA+RyBmU_j9-vR+BnjvhKCDuaWYPZ_Ym96yUJPX0LhGihfsp1ng@mail.gmail.com> <3DC3F6B6-229E-46C6-BD84-2A6A7FE6DD48@apple.com> <CA+RyBmV_+yysquiZ=2PwB=oaqeJmfKV39c3=GE9sxWkb4qTM=Q@mail.gmail.com> <9340CFDA-079C-4490-A01C-EB863D365F8F@apple.com>
In-Reply-To: <9340CFDA-079C-4490-A01C-EB863D365F8F@apple.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Tue, 08 Feb 2022 13:10:25 -0800
Message-ID: <CA+RyBmW=xMmj70GymYwbsG0XcDNS64UNSWxGdwy10+KMjuVWww@mail.gmail.com>
To: Christoph Paasch <cpaasch@apple.com>
Cc: Marcus Ihlar <marcus.ihlar=40ericsson.com@dmarc.ietf.org>, IETF IPPM WG <ippm@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000004362405d7882665"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/-AHiCxwAss1CUEDyVs9fL04MaHw>
Subject: Re: [ippm] Adoption call for draft-cpaasch-ippm-responsiveness
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Feb 2022 21:10:44 -0000

Hi Christoph,
apologies for the belated response and thank you for sharing interesting
details of you using the measurement method. I think that if the
measurement method can not only provide the Round-trip Per Minute (RPM)
metric but expose the network propagation and residential components of the
round-trip delay, then it seems to me, the scope of the draft to be aligned
with the charter of the IPPM WG and I'll be in favor of the WG adoption of
the work.
What do you think? What is the opinion of the authors and the WG?

Regards,
Greg

On Thu, Jan 6, 2022 at 4:42 PM Christoph Paasch <cpaasch@apple.com> wrote:

> Hello Greg,
>
> On Jan 6, 2022, at 3:00 PM, Greg Mirsky <gregimirsky@gmail.com> wrote:
>
> Hi Christoph,
> a happy and healthy New Year to you and All!
>
>
> Happy New Year to you as well!
>
> Thank you for your kind consideration of my notes and detailed responses.
> Please find my follow-up notes in-line below under the GIM>> tag.
>
>
> Thanks for your replies. Please see inline:
>
> On Wed, Jan 5, 2022 at 10:52 AM Christoph Paasch <cpaasch@apple.com>
> wrote:
>
>> Hello Greg,
>>
>> thanks for your comments. Please see inline:
>>
>> On Dec 22, 2021, at 11:43 PM, Greg Mirsky <gregimirsky@gmail.com> wrote:
>>
>> Dear Marcus, Authors, et al,
>> apologies for the belated response.
>> I've read the draft and have some comments to share with you:
>>
>>    - as I understand it, the proposed new responsiveness metric is
>>    viewed as the single indicator of a bufferbloat condition in a network. As
>>    I recall, the discussion at Measuring Network Quality for End-Users
>>    workshop and on the mailing list
>>    <https://mailarchive.ietf.org/arch/browse/network-quality-workshop/?gbt=1&index=cuW_1lh4DD22V28EvlPFB_NjjZY>
>>    indicated, that there’s no consensus on what behaviors, symptoms can
>>    reliably signal the bufferbloat.
>>
>> We are not trying for this responsiveness metric to be "the single
>> indicator of bufferbloat". Bufferbloat can be measured in many different
>> number of ways. And each of these will produce a correct, but a different
>> result. Thus, "bufferbloat" is whatever the methodology tries to detect.
>>
>> Let me give an example of two methodologies that are both correct but
>> both will produce entirely different numbers :
>>
>> If we would decide to generate the load by flooding the network with UDP
>> traffic from a specific 4-tuple and measure latency with parallel ICMP
>> pings. Then, on a over-buffered FIFO queue we would measure huge latencies
>> (thus correctly expose bufferbloat), while on a FQ-codel queue we would not
>> measure any bufferbloat.
>>
>> If on the other hand, the load-generating traffic is changing the
>> source-port for every single UDP-packet, then in both the FIFO-queue and
>> the FQ-codel queue we will measure huge amounts of bufferbloat.
>>
>> Thus, these two methods both produced correct results but with hugely
>> different numbers in the FQ-codel case. [1]
>>
>> Now, while both methods measure some variant of bufferbloat, they both
>> don't measure a realistic usage of the network.
>>
> GIM>> Thank you for the insights. It seems to me that what the method can
> demonstrate is rather the level of efficiency of the AQM in the network for
> a particular class of applications.
>
>
> Yes, that is a good description. It is for a "particular class of
> applications" and we are trying to make this class of applications
> representative of a "typical user-scenario". (admittedly, we can debate
> forever on what kind of applications are representative and I would love to
> have that debate :-)).
>
> On the point of "efficiency of the AQM". I would go even further that it's
> not only AQM but also the client- and server-side implementations of these
> applications (as noted further below).
>
>
>
>>
>> That is why the "Responsiveness under working conditions" tries to
>> clearly specify how the load is generated and how the latency is being
>> measured. And it does not measure "bufferbloat" but it measures
>> "responsiveness under working conditions" based on the methodology that is
>> being used (using HTTP/2 or HTTP/3, multiple flows, ...). It does expose
>> bufferbloat which can happen in the network. It also exposes certain
>> server-side behaviors that can cause (huge amounts of) additional latency -
>> those behaviors are typically not called "bufferbloat".
>>
> GIM>> Thank you for pointing out that the result of the RTT measurement
> has two contributing factors - network and server.
>
>
> Yes, servers contribute as do the client-side implementations. It's all
> three (client, network, server) that need to work "correctly" to achieve
> good responsiveness. Btw., as we are now gathering more experience with our
> methodology in different environments we find that the biggest portions of
> latency actually come from the server-side. We see several seconds of
> latency introduced by the HTTP/2 and TCP implementations.
>
> It seems worth enhancing the method to localize each contribution and
> measure them separately.
>
>
> With the latency measuring probes being sent on load-bearing connections
> and separate connections and with the separate connections serving to
> measure DNS/TCP/... individually, the different data-points actually allow
> to localize to some extend.
>
> However, I would be reluctant to dive too deep into
> localization/trouble-shooting/debugging of networks as part of this I-D. As
> this opens a whole new can of worms. We could then start thinking about
> sending latency-probes while playing with the IP TTL to find which router
> is introducing the latency,... It's an entirely different research-topic
> IMO :-) Dave Taht was thinking of starting something along these lines (
> https://github.com/dtaht/wtbb).
>
>
>>
>>
>>    - It seems that it would be reasonable to first define what is being
>>    measured, characterized by the responsiveness metric. Having a document
>>    that discusses and defines the bufferbloat would be great.
>>
>> I agree that there is a lack of definition for what "bufferbloat" really
>> is.
>>
>> The way we look at "responsiveness under working conditions" is that it
>> measures the latency in conditions that may realistically happen in
>> worst-case scenarios with end-users/implementations that are non-malicious
>> (non-malicious to exclude the UDP-flooding scenario).
>>
>> Thus, I assume we should make a better job at explaining this. The lack
>> of a formal definition of "bufferbloat" doesn't help and thus we are indeed
>> using this term a bit freely in the current draft. We will improve the
>> Introduction to better set the stage (
>> https://github.com/network-quality/draft-cpaasch-ippm-responsiveness/issues/31
>> ).
>>
>>
>>    - It seems like in the foundation of the methodology described in the
>>    draft lies the assumption that without adding new flows the
>>    available bandwidth is constant, does not change. While that is mostly the
>>    case, there are technologies that behave differently and may change
>>    bandwidth because of the outside conditions. Some of these behaviors of
>>    links with variable discrete bandwidth are discussed in, for example, RFC
>>    8330 <https://datatracker.ietf.org/doc/rfc8330/> and RFC 8625
>>    <https://datatracker.ietf.org/doc/rfc8625/>.
>>
>> I'm not sure I entirely understand your comment. But let me explain why
>> we are gradually adding new flows:
>>
>> 1. TCP-implementations have usually a fixed limit for the upper bound of
>> the receive window. In some networks that upper bound is lower than the BDP
>> of the network. Thus, the only way to reach full capacity is by having
>> multiple flows.
>> 2. Having multiple connections allows to quicker achieve full capacity in
>> high-RTT networks and thus speeds up the test-duration.
>> 3. In some networks with "random" packet-loss, congestion-control may
>> come in the way of achieving full capacity. Again, multiple flows will work
>> around that.
>>
> GIM>> I might have asked several questions at once. Let me clarify what I
> am looking for:
>
>    - As I understand the method of creating the "working conditions in a
>    network" is based on certain assumptions. First, seems is that the
>    bandwidth is symmetrical between the measurement points. Second, that the
>    bandwidth doesn't change for the duration of the measurement session.
>    AFAIK, in the access networks, both are not necessarily always the case.
>
> We don't have the assumption that bandwidth is symmetrical (assuming, you
> mean uplink/downlink symmetry - please clarify otherwise).
>
> The load-generating algorithm runs independently for uplink and downlink
> traffic. And it is perfectly fine when both have huge asymmetry.
>
>
> Regarding the stability of the bandwidth:
> You are making a good point indeed that we assume that the bandwidth is to
> some extend stable while ramping up the flows to "working conditions".
> Admittedly that assumption does not always hold, and that is one of the
> reasons why we try hard for the test to not take too long.
> I'm not sure how we could adjust the algorithm for varying bandwidth
> without introducing too much complexity. I'm open for suggestions :-)
>
>
>    - On the other hand, I might have missed how the method of creating
>    the "working conditions" guarantees a symmetrical load between the
>    measurement points.
>
> As mentioned above, we don't assume a symmetrical load. Can you show us
> where in the draft we give that impression, so we can fix that?
>
>
>>
>>    - Then, I find the motivation not to use time units to express the
>>    responsiveness metric not convincing:
>>
>>    "Latency" is a poor measure of responsiveness, since it can be hard
>>    for the general public to understand.  The units are unfamiliar
>>    ("what is a millisecond?") and counterintuitive ("100 msec - that
>>    sounds good - it's only a tenth of a second!").
>>
>>
>> Can you expand on what exactly is not convincing to you? Do you think
>> that people will mis-understand the metric or that milli-seconds is the
>> right way to communicate responsiveness to the general public?
>>
> GIM>> Let me try. We know packet delay requirements for AR, VR
> applications. I believe that gamers are familiar with these numbers too.
> The same is likely the case for the industrial automation use cases served,
> for example, by Deterministic Networking.
>
>
> I can understand that for a technical audience, milli-seconds is easy and
> familiar. A non-technical audience might be more open to accepting a new
> "higher-is-better" metric. Responsiveness is something new and abstract so,
> it's kind of natural that it comes with a new unit.
>
> But I fully recognize that that's a controversial topic and can be
> discussed at length :)
>
>
> Cheers,
> Christoph
>
>
>>
>> Thanks a lot,
>> Christoph
>>
>> [1] And there are many networks that prioritize ICMP pings, thus we could
>> observe even more different results based on what protocol is used to
>> measure the latency.
>>
>>
>> On Mon, Dec 6, 2021 at 7:53 AM Marcus Ihlar <marcus.ihlar=
>> 40ericsson.com@dmarc.ietf.org> wrote:
>>
>>> Hi IPPM,
>>>
>>>
>>>
>>> This email starts an adoption call for
>>> draft-cpaasch-ippm-responsiveness, "Responsiveness under Working
>>> Conditions”. This document specifies the “RPM Test” for measuring user
>>> experience when the network is fully loaded. The intended status of the
>>> document is Experimental.
>>>
>>>
>>>
>>> https://datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/
>>>
>>>
>>> https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01
>>>
>>>
>>>
>>> This adoption call will last until *Monday, December 20*. Please review
>>> the document, and reply to this email thread to indicate if you think IPPM
>>> should adopt this document.
>>>
>>>
>>>
>>> BR,
>>>
>>> Marcus
>>>
>>>
>>> _______________________________________________
>>> ippm mailing list
>>> ippm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/ippm
>>>
>> _______________________________________________
>> ippm mailing list
>> ippm@ietf.org
>> https://www.ietf.org/mailman/listinfo/ippm
>>
>>
>>
>