Re: [ippm] Adoption call for draft-cpaasch-ippm-responsiveness

Matt Mathis <mattmathis@google.com> Mon, 20 December 2021 03:39 UTC

Return-Path: <mattmathis@google.com>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2781F3A0D35 for <ippm@ietfa.amsl.com>; Sun, 19 Dec 2021 19:39:05 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -12.599
X-Spam-Level:
X-Spam-Status: No, score=-12.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, GB_SUMOF=5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p-TyRoABqaCT for <ippm@ietfa.amsl.com>; Sun, 19 Dec 2021 19:39:01 -0800 (PST)
Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 993C13A0D32 for <ippm@ietf.org>; Sun, 19 Dec 2021 19:39:00 -0800 (PST)
Received: by mail-wr1-x42a.google.com with SMTP id e5so17226914wrc.5 for <ippm@ietf.org>; Sun, 19 Dec 2021 19:39:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kF3+1rhBTqKL0XMHu59zH23fYMCshB+BOvF2SNjV2eU=; b=saRkXnV7mFdqB4QoEISj66CArzTc1MyWfE7shwBSDLE4v7Kg1d7DWKLmASncCBgZPJ a7drm9fNIEplZpUw/1qmhiaSdy1OBjyVmZ1T7vHrSlrj8h2BLIcHpBOrWk6tXScPNI/F g6z8hmcttCCjNwFOCasBDiLdHTi3XXcxwqbV/4LpJ0k+FehgEKX9Q68Fy183lKsoFRzu otctmosUs3Si+GqMU1ruI94Aoa3L8rdA3BULf3I/7KgX3iji3B7asAW+1JGX5ePTq2sb 4NIAb56ueJCJyB5Tn3YdgAds2g8MhPjF61YHZlQFQFXJlYzzHj9IbMhRBwHonn1ybEFS djgw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kF3+1rhBTqKL0XMHu59zH23fYMCshB+BOvF2SNjV2eU=; b=0af1PjIDdDQUCDW9qJtkYzCbkto563sYiSStytCmCPsXYzeCS6wCPnY6LcHgr51KXp nYcB+9huaUxbyQLXviFlG1tUzLXpgkkLHXQyqUI8cESAoTSO7F7WD7/Exx7kWs2EfXfn 3GViRkCjgdvj53n3PajGmLPfc0Krcw/ekxn0OK2Up9/HOp6+ZwcUGa0SyOGi4bfEqE9y X5Wj4jnD+XUfO9WSL6l9u2DFsYvjDrqyyrNBALrl/FGFtnKL/cE0t2X/U/n9lX8uGMwt FBbz21s3bjz3lu1NpCh5WN0VasfDMFfH1t/xO7LTKK/l1gAjfUOrBJmVTbVgTAc+IxBK GQXw==
X-Gm-Message-State: AOAM533sV+ghbV/xWQxMdnYm3Sb7nTNIsp6kesxo80OSxdk1WpTVW3eQ bn8VDkXEagTn0v25sLrnIDV51vcE+GMoLDYYlrX2Ig==
X-Google-Smtp-Source: ABdhPJznOhLUxgjG5SYMemaIm3WvLUHfV0gxCv3ZftXziXJCdCyuo+Dw51hpyCNZXOZUIRlJ2v5siWiK3WPHRCQOf1o=
X-Received: by 2002:a5d:4207:: with SMTP id n7mr10756677wrq.708.1639971537442; Sun, 19 Dec 2021 19:38:57 -0800 (PST)
MIME-Version: 1.0
References: <AM0PR07MB4131542BCD0A6DE3F82F1E19E26D9@AM0PR07MB4131.eurprd07.prod.outlook.com> <CH0PR02MB79802EEBC038D7136449D06FD3789@CH0PR02MB7980.namprd02.prod.outlook.com>
In-Reply-To: <CH0PR02MB79802EEBC038D7136449D06FD3789@CH0PR02MB7980.namprd02.prod.outlook.com>
From: Matt Mathis <mattmathis@google.com>
Date: Sun, 19 Dec 2021 19:38:45 -0800
Message-ID: <CAH56bmDJfN02sey=gHy+H0YJab3oyOFrYmFs2er=1NjEpE7m3g@mail.gmail.com>
To: "MORTON JR., AL" <acmorton@att.com>
Cc: "ippm@ietf.org" <ippm@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000eedbfe05d38ba056"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/iwRVAIFl-mNm_hEPCmeDw83y7zQ>
Subject: Re: [ippm] Adoption call for draft-cpaasch-ippm-responsiveness
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Dec 2021 03:39:06 -0000

I support this draft, and I have even offered to be a co-author.

Thank you  Al. for the extensive comments.   I only have a moment for some
high level responses:

In some sense this document  is opposite of RFC8337.  Responsiveness is
about measuring the how well/early the bottleneck signals congestion to the
transport without incurring excessive queueing.   This not only depends on
the loop behavior of the bottleneck and the measurement stream, it also
depends on the loop behavior, population and RTT's of the cross traffic.
 I do not expect any of the theory to be mature enough to be able to do any
of this as abstractly as IPPM does with most metrics.   At this time I
expect a responsiveness metric to require fingerprinting the hardware, OS
and application software, and include a warning that results may be
sensitive to the details of the implementation.

WiFi is a poorly modeled half duplex shared broadcast channel, that is a
critical (dominant?) part of the responsiveness for typical Internet
users.     Bidirectional traffic matters a lot, but I fear the scope might
be too broad.

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of
control;
            too weak risks being mistaken for tacit approval.


On Fri, Dec 17, 2021 at 3:51 PM MORTON JR., AL <acmorton@att.com> wrote:

> Hi authors and ippm-chairs,
>
>
>
> Thanks for writing this-up!
>
>
>
> I took one pass through, and have the following comments during Adoption
> call for draft-cpaasch-ippm-responsiveness:
>
>
>
> TL;DR:
>
> Many previously undefined terms were used here, and a more direct
> description using the term “saturation” seems possible, IMO. IPPM has used
> a template for metric drafts, and use of the hierarchy of singleton,
> sample, and statistic metrics from RFC 2330 will help with clarity/answer
> many of my questions.
>
>
>
> regards (I’m off-line for a while now, so enjoy the holidays),
>
> Al
>
>
>
> From the Abstract:
>
>
>
>    This document specifies the "RPM Test" for measuring responsiveness.
>
>    It uses common protocols and mechanisms to measure user experience
>
>    *especially when the network is fully loaded ("responsiveness under*
>
> *   working conditions"*.)  The measurement is expressed as "Round-trips
>
>    Per Minute" (RPM) and should be included with throughput (up and
>
>    down) and idle latency as critical indicators of network quality.
>
>
>
> “fully loaded” and “working conditions” aren’t necessarily the same, to me.
> I’ll be looking for better definitions.
>
>
>
> 3 <https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01#section-3>.  Goals
>
>
>
>    The algorithm described here defines an RPM Test that serves as a
>
>    good proxy for user experience.  This means:
>
>
>
>    1.  Today's Internet traffic primarily uses HTTP/2 over TLS.  Thus,
>
>        the algorithm should use that protocol.
>
>
>
>        As a side note: other types of traffic are gaining in popularity
>
>        (HTTP/3) and/or *<???? UDP ???>* are already being used widely (RTP).
>
>
>
> There are many measurement stability challenges when TCP is involved, see
> section 4 of RFC8337:
> https://datatracker.ietf.org/doc/html/rfc8337#section-4
>
> RFC8337 intentionally broke the TCP control loop to make measurements in
> the face of these challenges.
>
>
>
> 4.1 <https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01#section-4.1>.  Working Conditions
>
>
>
>    For the purpose of this methodology, typical "working conditions"
>
>    represent a state of the network in which the bottleneck node is
>
>    experiencing ingress and egress flows similar to those created by
>
>    humans in the typical day-to-day pattern.
>
>
>
>    While a single HTTP transaction might briefly put a network into
>
>    working conditions, making reliable measurements requires maintaining
>
>    the state over sufficient time.
>
>
>
>    The algorithm must also detect when the network is in a persistent
>
>    working condition, also called "saturation".
>
>
>
>    Desired properties of "working condition":
>
>
>
>    o  Should not waste traffic, since the person may be paying for it
>
>
>
>    o  Should finish within a short time to avoid impacting other people
>
>       on the same network, to avoid varying network conditions, and not
>
>       try the person's patience.
>
>
>
> These seem like reasonable goals for the traffic that loads the network.
>
> New terms needing definition were introduced:
>
> “persistent working condition = saturation”,
>
> which is different from
>
> “ingress and egress flows similar to those created by humans in the
> *typical* day-to-day pattern”
>
>
>
> Later in 4.1.1, terms like “saturate a path”  and “fill the pipe” appear,
> and
>
>
>
>    *The goal of the RPM Test is to keep the network as busy as possible*
>
>    in a sustained and persistent way.  It uses multiple TCP connections
>
>    and gradually adds more TCP flows until saturation is reached.
>
>
>
> The terms “busy as possible”, and “typical day-to-day pattern”, or
>
> “saturation” and “working conditions” indicate different load levels to me.
>
>
>
> @@@@ Suggestion: I think it would help to simplify the terminology in this
> draft. You intend to measure a saturated path, so just say that. No
> “typical”, no “working conditions”, etc., in these early sections.
>
>
>
> The sentence beginning “The goal...” should really appear in Section 3.
> Goals
>
>
>
> Also, you have defined a measurement method in the sentence, “It uses...”
> above. This method of adding connections has been observed in other
> measurement systems, but it isn’t typical of user traffic, especially when
> each connection has an ~infinite amount of data to send during the test.
>
>
>
> 4.1.2 <https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01#section-4.1.2>.  Parallel vs Sequential Uplink and Downlink
>
>
>
> ...
>
>    To measure responsiveness under working conditions, the algorithm
>
>    must saturate both directions.
>
>
>
> Bi-directional saturation is really atypical of usage. I don’t think the benefit of “more data” pays off.
>
>
>
> ...
>
>
>
>    However, a number of caveats come with measuring in parallel:
>
>
>
>    o  Half-duplex links may not permit simultaneous uplink and downlink
>
>       traffic.  This means the test might not saturate both directions
>
>       at once.
>
>
>
>    o  Debuggability of the results becomes harder: During parallel
>
>       measurement it is impossible to differentiate whether the observed
>
>       latency happens in the uplink or the downlink direction.
>
>
>
>    o  Consequently, the test should have an option for sequential
>
>       testing.
>
>
>
> @@@@ Suggestion: IMO, tests/results with Downlink saturation OR Uplink
> saturation would be more straightforward, and can be understood by users
> (especially those who have tested in the past). Avoid the pitfalls and make
> Sequential testing the preferred option.
>
>
>
>
>
> 4.1.3 <https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01#section-4.1.3>.  Reaching saturation
>
>
>
>    The RPM Test gradually increases the number of TCP connections and
>
>    measures "goodput" - the sum of actual data transferred across all
>
>    connections in a unit of time.  When the goodput stops increasing, it
>
>    means that saturation has been reached.
>
> ...
>
>
>
>    Filling buffers at the bottleneck depends on the congestion control
>
>    deployed on the sender side.  Congestion control algorithms like BBR
>
>    may reach high throughput without causing queueing because the
>
>    bandwidth detection portion of BBR effectively seeks the bottleneck
>
>    capacity.
>
>
>
>    RPM Test clients and servers should use loss-based congestion
>
>    controls like Cubic to fill queues reliably.
>
>
>
> With the evolution of Congestion control algorithms seeking to avoid filling buffers, does it make sense to require a full buffer at the bottleneck to achieve saturation?
>
> In fact, the definition above, “When the goodput stops increasing,...” does not require full buffers; it requires maximizing a delivery rate measurement instead.
>
>
>
>
>
> In 4.1.4, the final steps of the algorithm were not clear to me:
>
>
>
>       *  Else, network reached saturation for the current flow count.
>
> @@@@ This wording implies it to be the final step, but there are further conditions to test.
>
>      Maybe this step is “Else, Candidate for stable saturation”?
>
>
>
>          +  If new flows added and for 4 seconds the moving average
>
>             throughput did not change: network reached stable saturation
>
> @@@@ Maybe:
>
>          +  If the 4 second moving average of "instantaneous aggregate goodput" with no new
>
>             flows added did not change
>
>             (defined as: moving average = "previous" moving average +/- 5%),
>
>             then the network reached stable saturation
>
> ----------------------------------------------------------------------------------------------------
>
>
>
>          +  Else, add four more flows
>
> @@@ ??? and return to start?
>
>
>
>
>
> Finally, in 4.1.4, the Note explains:
>
>
>
>    Note: It is tempting to envision an initial base RTT measurement and
>
>    adjust the intervals as a function of that RTT.  However, experiments
>
>    have shown that this makes the saturation detection extremely
>
>    unstable in low RTT environments.  In the situation where the
>
>    "unloaded" RTT is in the single-digit millisecond range, yet the
>
>    network's RTT increases under load to more than a hundred
>
>    milliseconds, the intervals become much too low to accurately drive
>
>    the algorithm.
>
>
>
> Well, TCP senders/control-loops are involved here, and likely play a
>
> role in behavior categorized as “difficult to measure”.
>
>
>
> By the time we get to
>
>
>
> 4.2 <https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01#section-4.2>.  Measuring Responsiveness
>
>
>
>    Once the network is in a consistent working conditions, the RPM Test
>
>    must "probe" the network multiple times to measure its
>
>    responsiveness.
>
>
>
>    Each RPM Test probe measures:
>
>
>
> You previously started at least four TCP connections with infinitely large
> files.
>
> The “create connection” RPM probes establish additional connections, DNS,
> TCP, etc.
>
> Is each new connection an RPM probe? or is the set of connection tests a
> single probe?
>
> (later we learn it is the set of
>
> What if one of the set of connections fails/times-out?
>
>
>
> I take it that the “load-bearing” connections are driving the path to
> saturation.
>
> Maybe “load-generating connections” is more clear?
>
>
>
> 4.2.1 <https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01#section-4.2.1>.  Aggregating the Measurements
>
>
>
>    The algorithm produces sets of 5 times for each probe, namely: DNS
>
>    handshake, TCP handshake, TLS handshake, HTTP/2 request/response on
>
>    separate (idle) connections, HTTP/2 request/response on load bearing
>
>    connections.  This fine-grained data is useful, but not necessary for
>
>    creating a useful metric.
>
>
>
> @@@@ So, only ONE of the load-generating connections runs the 1-byte GET ?  (it says connections, and there are at least 4) The various handshakes result in 4 RT measurements.
>
>
>
> But, do you mean **5 repeated measurement sets**? Each set with:
>
> DNS HS, TCP HS, TLS HS, HTTP/2 idle GET, and the potentially much longer GET on the
>
> load generating connections.
>
>
>
>    To create a single "Responsiveness" (e.g., RPM) number, this first
>
>    iteration of the algorithm gives an equal weight to each of these
>
>    values.  That is, it sums the five time values for each probe, and
>
>    divides by the total number of probes to compute an average probe
>
>    duration.  The reciprocal of this, normalized to 60 seconds, gives
>
>    the Round-trips Per Minute (RPM).
>
>
>
> @@@@ I’m missing a step, I think:
>
> Are the “time values for each probe” the sum of handshake or response times for
>
> DNS HS, TCP HS, TLS HS, HTTP/2 idle GET and load generating connections?
>
> The processing doesn’t seem to include this preliminary calculation to produce
>
> “five time values”.
>
>
>
>
>
> In Section 5, “no new protocol is defined”, but
>
>
>
>    The client begins the responsiveness measurement by querying for the
>
>    JSON configuration.  This supplies the URLs for creating the load
>
>    bearing connections in the upstream and downstream direction as well
>
>    as the small object for the latency measurements.
>
>
>
> The client needs to know how the server response will be organized, down
> to key: value, right?
>
> Some client and server agreements needed...
>
>
>
>
>
> *From:* ippm <ippm-bounces@ietf.org> *On Behalf Of *Marcus Ihlar
> *Sent:* Monday, December 6, 2021 10:53 AM
> *To:* ippm@ietf.org
> *Subject:* [ippm] Adoption call for draft-cpaasch-ippm-responsiveness
>
>
>
> Hi IPPM,
>
>
>
> This email starts an adoption call for draft-cpaasch-ippm-responsiveness,
> "Responsiveness under Working Conditions”. This document specifies the “RPM
> Test” for measuring user experience when the network is fully loaded. The
> intended status of the document is Experimental.
>
>
>
> https://datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/
> <https://urldefense.com/v3/__https:/datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/__;!!BhdT!zI6d5je1i8cafA6NXByD5tvxHFKKPMjYgtM6t2aLUHFPsyPz-XwPFguwa1HS$>
>
> https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01
> <https://urldefense.com/v3/__https:/datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01__;!!BhdT!zI6d5je1i8cafA6NXByD5tvxHFKKPMjYgtM6t2aLUHFPsyPz-XwPFq67PfWg$>
>
>
>
> This adoption call will last until *Monday, December 20*. Please review
> the document, and reply to this email thread to indicate if you think IPPM
> should adopt this document.
>
>
>
> BR,
>
> Marcus
>
>
> _______________________________________________
> ippm mailing list
> ippm@ietf.org
> https://www.ietf.org/mailman/listinfo/ippm
>