Re: [ippm] Adoption call for draft-cpaasch-ippm-responsiveness

Christoph Paasch <> Wed, 05 January 2022 00:23 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4756F3A219D; Tue, 4 Jan 2022 16:23:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 2.325
X-Spam-Level: **
X-Spam-Status: No, score=2.325 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.576, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, GB_SUMOF=5, HTML_MESSAGE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id dLZcLpGBcprC; Tue, 4 Jan 2022 16:23:36 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 2CB223A219A; Tue, 4 Jan 2022 16:23:36 -0800 (PST)
Received: from pps.filterd ( []) by ( with SMTP id 2050KfuV010682; Tue, 4 Jan 2022 16:23:32 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; h=from : message-id : content-type : mime-version : subject : date : in-reply-to : cc : to : references; s=20180706; bh=6KRs2Jc8irjOJzs1dpE3KSae7ioWh8fHk8h/CQvrvqM=; b=LY7Y/tD8Kt1iESJxqQrfSLBUhKTbTM2jAChfRiPGjKdrFr9YKCMSOD1s4nWn+Xap8Rsv VsFc5emTVJlFch2dIa915trjyIY1LOdkOsuHU6ildf7M+3Zkpm5foyGYiTsjsX04pYdy dksQrflawoSh9LT6t+HG0/ipKK0i7uCUOqUlSR8aYeVRIdWBg/Pw84rc6cuiSLlZHwTf IWWSqW8qmbeTn4x5Gkrj32wCX45HaN90mQu+GTzhQYyGVg3hARljLrVB7+BjSgtTZm6f CzIMHTz62RNcGo1IZ6ZObP4u9KVDroXemxHaFBFxvqKmG9G7u6PD0N+pTHPjkieh2cQQ ew==
Received: from ( []) by with ESMTP id 3dapu0tk6t-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 04 Jan 2022 16:23:32 -0800
Received: from ( []) by (Oracle Communications Messaging Server 64bit (built Sep 3 2021)) with ESMTPS id <>; Tue, 04 Jan 2022 16:23:31 -0800 (PST)
Received: from by (Oracle Communications Messaging Server 64bit (built Sep 3 2021)) id <>; Tue, 04 Jan 2022 16:23:31 -0800 (PST)
X-Va-T-CD: 81ca60fce39c2560b6c4a7e5841f9b8f
X-Va-E-CD: de054fff2f5096dfe99f93b89e853049
X-Va-R-CD: 7d741991045d6b2f869f53b80b980cf0
X-Va-CD: 0
X-Va-ID: 8301b18c-52f4-4a24-ae6c-3783babe23f1
X-V-T-CD: 81ca60fce39c2560b6c4a7e5841f9b8f
X-V-E-CD: de054fff2f5096dfe99f93b89e853049
X-V-R-CD: 7d741991045d6b2f869f53b80b980cf0
X-V-CD: 0
X-V-ID: 0c0b4219-538f-4648-8d55-3de7176dbfe0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425, 18.0.790 definitions=2022-01-04_11:2022-01-04, 2022-01-04 signatures=0
Received: from ([]) by (Oracle Communications Messaging Server 64bit (built Sep 3 2021)) with ESMTPSA id <>; Tue, 04 Jan 2022 16:23:30 -0800 (PST)
From: Christoph Paasch <>
Message-id: <>
Content-type: multipart/alternative; boundary="Apple-Mail=_5D6FD931-6B7F-4434-9883-53DA75FDDEA2"
MIME-version: 1.0 (Mac OS X Mail 15.0 \(3693.\))
Date: Tue, 04 Jan 2022 16:23:30 -0800
In-reply-to: <>
Cc: Marcus Ihlar <>, "" <>, Tommy Pauly <>
To: "MORTON JR., AL" <>
References: <> <>
X-Mailer: Apple Mail (2.3693.
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.425, 18.0.790 definitions=2022-01-04_11:2022-01-04, 2022-01-04 signatures=0
Archived-At: <>
Subject: Re: [ippm] Adoption call for draft-cpaasch-ippm-responsiveness
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 05 Jan 2022 00:23:42 -0000

Hello Al,

thanks a lot for your feedback! Please see inline:

> On Dec 17, 2021, at 3:50 PM, MORTON JR., AL <> wrote:
> Hi authors and ippm-chairs,
> Thanks for writing this-up!
> I took one pass through, and have the following comments during Adoption call for draft-cpaasch-ippm-responsiveness:
> TL;DR:
> Many previously undefined terms were used here, and a more direct description using the term “saturation” seems possible, IMO.

I fully agree with you. We are not very good at describing the "working conditions"/"saturation" we are aiming for, why we use these and what the right approach is.

The type of "working conditions" is crucial as to what the measurement result will be
For example, flooding the network with UDP traffic will saturate the network pretty well, but it is far away from a realistic working condition.

What we are aiming for is near worst-case scenario while still being realistic. At least, that is the intention and it may be good to have an open discussion about this.

> IPPM has used a template for metric drafts, and use of the hierarchy of singleton, sample, and statistic metrics from RFC 2330 will help with clarity/answer many of my questions.
> regards (I’m off-line for a while now, so enjoy the holidays),
> Al
> From the Abstract:
>    This document specifies the "RPM Test" for measuring responsiveness.
>    It uses common protocols and mechanisms to measure user experience
>    especially when the network is fully loaded ("responsiveness under
>    working conditions".)  The measurement is expressed as "Round-trips
>    Per Minute" (RPM) and should be included with throughput (up and
>    down) and idle latency as critical indicators of network quality.
> “fully loaded” and “working conditions” aren’t necessarily the same, to me.  I’ll be looking for better definitions.


> 3 <>.  Goals
>    The algorithm described here defines an RPM Test that serves as a
>    good proxy for user experience.  This means:
>    1.  Today's Internet traffic primarily uses HTTP/2 over TLS.  Thus,
>        the algorithm should use that protocol.
>        As a side note: other types of traffic are gaining in popularity
>        (HTTP/3) and/or <???? UDP ???> are already being used widely (RTP).
> There are many measurement stability challenges when TCP is involved, see section 4 of RFC8337: <>
> RFC8337 intentionally broke the TCP control loop to make measurements in the face of these challenges.

Yes, we are aware of these kind of stability challenges. And actually sometimes observe them as results can vary to some degree across different runs.

The goal is to get as close as possible to a stable measurement result, while still using the protocols the end-users use on a day-to-day basis.

> 4.1 <>.  Working Conditions
>    For the purpose of this methodology, typical "working conditions"
>    represent a state of the network in which the bottleneck node is
>    experiencing ingress and egress flows similar to those created by
>    humans in the typical day-to-day pattern.
>    While a single HTTP transaction might briefly put a network into
>    working conditions, making reliable measurements requires maintaining
>    the state over sufficient time.
>    The algorithm must also detect when the network is in a persistent
>    working condition, also called "saturation".
>    Desired properties of "working condition":
>    o  Should not waste traffic, since the person may be paying for it
>    o  Should finish within a short time to avoid impacting other people
>       on the same network, to avoid varying network conditions, and not
>       try the person's patience.
> These seem like reasonable goals for the traffic that loads the network.
> New terms needing definition were introduced:
> “persistent working condition = saturation”,
> which is different from
> “ingress and egress flows similar to those created by humans in the typical day-to-day pattern”
> Later in 4.1.1, terms like “saturate a path”  and “fill the pipe” appear, and
>    The goal of the RPM Test is to keep the network as busy as possible
>    in a sustained and persistent way.  It uses multiple TCP connections
>    and gradually adds more TCP flows until saturation is reached.
> The terms “busy as possible”, and “typical day-to-day pattern”, or
> “saturation” and “working conditions” indicate different load levels to me.
> @@@@ Suggestion: I think it would help to simplify the terminology in this draft. You intend to measure a saturated path, so just say that. No “typical”, no “working conditions”, etc., in these early sections. 
> The sentence beginning “The goal...” should really appear in Section 3. Goals
> Also, you have defined a measurement method in the sentence, “It uses...” above. This method of adding connections has been observed in other measurement systems, but it isn’t typical of user traffic, especially when each connection has an ~infinite amount of data to send during the test.

From your comments I see that we definitely need a longer explanation of the tradeoffs that are being made of measuring this near worst-case, but realistic scenario. As you correctly point out, we are mixing confusing and sometimes contradictory terms. This needs to be cleaned up.

For all of the above points, I filed: <>

Please feel free to add to the github issue.

> 4.1.2 <>.  Parallel vs Sequential Uplink and Downlink
> ...
>    To measure responsiveness under working conditions, the algorithm
>    must saturate both directions.
> Bi-directional saturation is really atypical of usage. I don’t think the benefit of “more data” pays off.
> ...
>    However, a number of caveats come with measuring in parallel:
>    o  Half-duplex links may not permit simultaneous uplink and downlink
>       traffic.  This means the test might not saturate both directions
>       at once.
>    o  Debuggability of the results becomes harder: During parallel
>       measurement it is impossible to differentiate whether the observed
>       latency happens in the uplink or the downlink direction.
>    o  Consequently, the test should have an option for sequential
>       testing.
> @@@@ Suggestion: IMO, tests/results with Downlink saturation OR Uplink saturation would be more straightforward, and can be understood by users (especially those who have tested in the past). Avoid the pitfalls and make Sequential testing the preferred option.

I tend to agree with you. We can leave "Parallel" as an interesting extension to the test that can expose other types of characteristics of the network.

> 4.1.3 <>.  Reaching saturation
>    The RPM Test gradually increases the number of TCP connections and
>    measures "goodput" - the sum of actual data transferred across all
>    connections in a unit of time.  When the goodput stops increasing, it
>    means that saturation has been reached.
> ...
>    Filling buffers at the bottleneck depends on the congestion control
>    deployed on the sender side.  Congestion control algorithms like BBR
>    may reach high throughput without causing queueing because the
>    bandwidth detection portion of BBR effectively seeks the bottleneck
>    capacity.
>    RPM Test clients and servers should use loss-based congestion
>    controls like Cubic to fill queues reliably.
> With the evolution of Congestion control algorithms seeking to avoid filling buffers, does it make sense to require a full buffer at the bottleneck to achieve saturation?
> In fact, the definition above, “When the goodput stops increasing,...” does not require full buffers; it requires maximizing a delivery rate measurement instead.

The above paragraph on BBR vs Cubic should probably be changed. With our goal being to measure "realistic" usage patterns, the recommendation should be to use the congestion control that is currently most widely deployed. If the majority of the Internet switches to BBR, then that's what should be measured.

> In 4.1.4, the final steps of the algorithm were not clear to me:
>       *  Else, network reached saturation for the current flow count.
> @@@@ This wording implies it to be the final step, but there are further conditions to test.
>      Maybe this step is “Else, Candidate for stable saturation”?

Sounds good!

>          +  If new flows added and for 4 seconds the moving average
>             throughput did not change: network reached stable saturation
> @@@@ Maybe: 
>          +  If the 4 second moving average of "instantaneous aggregate goodput" with no new 
>             flows added did not change 
>             (defined as: moving average = "previous" moving average +/- 5%),
>             then the network reached stable saturation

That's better!

> ----------------------------------------------------------------------------------------------------
>          +  Else, add four more flows
> @@@ ??? and return to start?

Yes, the entire thing is evaluated every 1-second interval. I will make that explicit.

> Finally, in 4.1.4, the Note explains:
>    Note: It is tempting to envision an initial base RTT measurement and
>    adjust the intervals as a function of that RTT.  However, experiments
>    have shown that this makes the saturation detection extremely
>    unstable in low RTT environments.  In the situation where the
>    "unloaded" RTT is in the single-digit millisecond range, yet the
>    network's RTT increases under load to more than a hundred
>    milliseconds, the intervals become much too low to accurately drive
>    the algorithm.
> Well, TCP senders/control-loops are involved here, and likely play a
> role in behavior categorized as “difficult to measure”.
> By the time we get to
> 4.2 <>.  Measuring Responsiveness
>    Once the network is in a consistent working conditions, the RPM Test
>    must "probe" the network multiple times to measure its
>    responsiveness.
>    Each RPM Test probe measures:
> You previously started at least four TCP connections with infinitely large files.
> The “create connection” RPM probes establish additional connections, DNS, TCP, etc.
> Is each new connection an RPM probe? or is the set of connection tests a single probe?
> (later we learn it is the set of
> What if one of the set of connections fails/times-out?
> I take it that the “load-bearing” connections are driving the path to saturation.
> Maybe “load-generating connections” is more clear?

"load-generating" is indeed a better term!

For all of the above "minor" edits, I filed <> 

> 4.2.1 <>.  Aggregating the Measurements
>    The algorithm produces sets of 5 times for each probe, namely: DNS
>    handshake, TCP handshake, TLS handshake, HTTP/2 request/response on
>    separate (idle) connections, HTTP/2 request/response on load bearing
>    connections.  This fine-grained data is useful, but not necessary for
>    creating a useful metric.
> @@@@ So, only ONE of the load-generating connections runs the 1-byte GET ?  (it says connections, and there are at least 4) The various handshakes result in 4 RT measurements.
> But, do you mean *5 repeated measurement sets*? Each set with:
> DNS HS, TCP HS, TLS HS, HTTP/2 idle GET, and the potentially much longer GET on the 
> load generating connections.
>    To create a single "Responsiveness" (e.g., RPM) number, this first
>    iteration of the algorithm gives an equal weight to each of these
>    values.  That is, it sums the five time values for each probe, and
>    divides by the total number of probes to compute an average probe
>    duration.  The reciprocal of this, normalized to 60 seconds, gives
>    the Round-trips Per Minute (RPM).
> @@@@ I’m missing a step, I think:
> Are the “time values for each probe” the sum of handshake or response times for
> DNS HS, TCP HS, TLS HS, HTTP/2 idle GET and load generating connections?
> The processing doesn’t seem to include this preliminary calculation to produce 
> “five time values”.  
> In Section 5, “no new protocol is defined”, but
>    The client begins the responsiveness measurement by querying for the
>    JSON configuration.  This supplies the URLs for creating the load
>    bearing connections in the upstream and downstream direction as well
>    as the small object for the latency measurements.
> The client needs to know how the server response will be organized, down to key: value, right?
> Some client and server agreements needed...

Yes! With "protocol" we mean nothing new at HTTP/TCP/... layer. Just standard JSON. I don't know if that qualifies as a new "protocol" ;-)

Thanks a lot,

> From: ippm <> On Behalf Of Marcus Ihlar
> Sent: Monday, December 6, 2021 10:53 AM
> To:
> Subject: [ippm] Adoption call for draft-cpaasch-ippm-responsiveness
> Hi IPPM,
> This email starts an adoption call for draft-cpaasch-ippm-responsiveness, "Responsiveness under Working Conditions”. This document specifies the “RPM Test” for measuring user experience when the network is fully loaded. The intended status of the document is Experimental.   
> <;!!BhdT!zI6d5je1i8cafA6NXByD5tvxHFKKPMjYgtM6t2aLUHFPsyPz-XwPFguwa1HS$>
> <;!!BhdT!zI6d5je1i8cafA6NXByD5tvxHFKKPMjYgtM6t2aLUHFPsyPz-XwPFq67PfWg$>
> This adoption call will last until Monday, December 20. Please review the document, and reply to this email thread to indicate if you think IPPM should adopt this document.
> BR,
> Marcus 
> _______________________________________________
> ippm mailing list