Re: [ippm] draft-ietf-ippm-responsiveness

Sebastian Moeller <moeller0@gmx.de> Fri, 19 January 2024 13:14 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D8F33C151536 for <ippm@ietfa.amsl.com>; Fri, 19 Jan 2024 05:14:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.554
X-Spam-Level:
X-Spam-Status: No, score=-2.554 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmx.de
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Xgn2iutWumkF for <ippm@ietfa.amsl.com>; Fri, 19 Jan 2024 05:14:28 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B60A6C14F5EF for <ippm@ietf.org>; Fri, 19 Jan 2024 05:14:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417; t=1705670059; x=1706274859; i=moeller0@gmx.de; bh=7uqse8FHDwISIlP+Csb4mNCiCJhc1Fc3cGlgaXJ9iTs=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References: To; b=gxLfZOLcUnM566Jkg8iqZ8xGv7h+kKYbrzGTn5ST64LlXCC6AvTl30TKUZdVz/nC It7QNq10wFzEzg2O26TWsDDSZRxGJ/0mP+sgDyx0xdYMCIqWbVpplbQ/h+nqu1gwN aM1SB1zhtDqK3z/9YPG82aIbrunD1MUZkvKZlFbm/bkRtIVgGp0O3MgRcz3/2s7Qa lDneJhYE7JJEFiLx06u0aLNnyamAN/gL+ozLzK6n3Wy4qpBrdMZNz7OalafFFC4Zb I5Ow+5RdXbsBiEC79m1JIkM+Rmhpc82NNeu8lZDY1JZVPNMAgjx6rVyBhbiNW7vM2 KTrMlLACZaH8wIyALQ==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from smtpclient.apple ([134.76.241.253]) by mail.gmx.net (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MPXd2-1rmBL72LCc-00MaPP; Fri, 19 Jan 2024 14:14:19 +0100
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.300.61.1.2\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <7494CC8D-7BAB-41DB-9FF7-7306747F2DC9@apple.com>
Date: Fri, 19 Jan 2024 14:14:09 +0100
Cc: IETF IPPM WG <ippm@ietf.org>, Rpm <rpm@lists.bufferbloat.net>
Content-Transfer-Encoding: quoted-printable
Message-Id: <14EC339A-9A84-40C5-AFCC-474DF03C16B6@gmx.de>
References: <D7323D41-BA9B-46E5-AA7D-6514636AA44D@gmx.de> <7494CC8D-7BAB-41DB-9FF7-7306747F2DC9@apple.com>
To: Christoph Paasch <cpaasch@apple.com>
X-Mailer: Apple Mail (2.3774.300.61.1.2)
X-Provags-ID: V03:K1:e+00WRI3ceuCjQ3wMJjYwSOMQcuijI+SclPtdEbqABV3rli/p4b 0fWJprnvmLjLNuQ/pjUa5Pd7J0c+W7tYDkDgqQZrfiggdISOjFdtoWIbhiXYZcsgQhV7ivD E9updJ+IIYkrhZIOfw0lxuCX/3h1Ye0eFGb5JD1vTYXhKaiTKPZJmhQlDjxc3eS8SRyI/WT /0t14g/FijONdB7OFHNww==
UI-OutboundReport: notjunk:1;M01:P0:qc2LjwIgJVY=;7fcFEC0bz6rV4YvHKwclauKO0Vp eQHYYU0xA2Sv2FBxEYwgQpAUF2rwA+cVaxX9et8FEqGU38fyoXuxVv3ESmdy2Q5Xf4j0xr9Xi 5mMlBDqzRUz5qOBC2/AYPitCcV1heRdbvP7e1Jxyt3YKW5DkP8hMg+pfmoK7S5nXff6ydeKqL M81qsqtdJ2tI2i3vg/Cl/gaZGoYj652xlpCZN48JqOne8DfmUWuPMJ1Zt0QpbWtCB647OFlph UYS9Ww9kOX0RaCji2u4F4BPEmkU0I3oKnD6+7G7x7ZP7dJAfUZdAIlszRDe6+MpThi4ntbT1W hDvfgBAHqc0JZmHPzD989QDi/1es8RoU7yFQ7QexDf9644clo1Y9n4P6sLysl7h5ma4iDj1mH fsupiKnq6v6B1a4id8ZPwoB1Aml0LPRRsVQBv3p92d5MPW97vzxAa/AWyHRFQH04Nm4dz/Yzi RutfMBmEc/8Q3EBN66buHm1aarTVQ+reZLaKXs0Rlm/WFEstlpP0PgF/ofqzu0xAY7uT2UCKI OslDdSq5zX1N5kJeSSltXODf1lsmGJowvqOcQvVf/s8+wuHlAYkjBCiSfNps0k7JTDvbYH+IH 8JG5Lr9RC0Ni8r+vOS61ZWWTW0l0X/PGpNtkeFM6VbMSQdP0Y/nVj4owkf0B0sidTGWZYsioB UHCsT9ZMfRJpLA5q5CeHO8SmBW+9MNXWSjaDI0qnjCOcMgkwcZ/z3FypiTngVugTgET/6tXw8 uVF0rI41hzo1KIk4W0EaNrgoxQrk2rTzARwt3f4VKLeYBGVuNKSx8sS30AGfC6L5Sap6kNGhI Jr1ZCZgGdX2ey89GJuUnvELNavsjJq5YG2ZOLGAjjF4L2uGRTYGjLJDpmtRa/TWUzo3cmtG3Z YF/cTf/hQfjhmHR8y4ZLV4/Ni6CxufBbiJ3LlPUKTMi2M1/oD4S9GarJrwRDuO0Sgb5/nXZnh ciZzAmySAGp4wY0KqdxoKRdCt3U=
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/9XPdon89Ak83oJHK0u1NfopzbgY>
Subject: Re: [ippm] draft-ietf-ippm-responsiveness
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jan 2024 13:14:31 -0000

Hi Christoph

> On 16. Jan 2024, at 20:01, Christoph Paasch <cpaasch@apple.com> wrote:
> 
> Hello Sebastian,
> 
> 
> thanks for the feedback, please see inline!
> 
>> On Dec 3, 2023, at 10:13 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
>> 
>> Dear IPPM members,
>> 
>> On re-reading the current responsiveness draft I stumbled over the following section:
>> 
>> 
>> Parallel vs Sequential Uplink and Downlink
>> 
>> Poor responsiveness can be caused by queues in either (or both) the upstream and the downstream direction. Furthermore, both paths may differ significantly due to access link conditions (e.g., 5G downstream and LTE upstream) or routing changes within the ISPs. To measure responsiveness under working conditions, the algorithm must explore both directions.
>> 
>> One approach could be to measure responsiveness in the uplink and downlink in parallel. It would allow for a shorter test run-time.
>> 
>> However, a number of caveats come with measuring in parallel:
>> 
>> • Half-duplex links may not permit simultaneous uplink and downlink traffic. This restriction means the test might not reach the path's capacity in both directions at once and thus not expose all the potential sources of low responsiveness.
>> • Debuggability of the results becomes harder: During parallel measurement it is impossible to differentiate whether the observed latency happens in the uplink or the downlink direction.
>> Thus, we recommend testing uplink and downlink sequentially. Parallel testing is considered a future extension.
>> 
>> 
>> I argue, that this is not the correct diagnosis and hence not the correct decision.
>> For half-duplex links the given argument is not incorrect, but incomplete, as it is quite likely that when forced to multiplex more bi-directional traffic (all TCP testing is bi-directional, so we only argue about the amount of reverse traffic, not whether it exist, and even if we would switch to QUIC/UDP we would still need a feed-back channel) we will se different "potential sources of low responsiveness" so ignoring any of the two seems ill advised.
> 
> You are saying that parallel bi-directional traffic exposes different sources of responsiveness issues than uni-directional traffic (up and down) ? What kind of different sources would that expose ? Can you give some examples and maybe a suggestion on how to word things ?

[SM] If the bottleneck is a WiFi link we occasionally see that some OS are more aggressive than others in acquiring airtime, which easily results in differential throughput for the two directions and often higher queueing delay for the direction that is 'slowed' down. In theory that should not really happen but in practise it does, e.g. the ISP unhelpfully passes undesired DSCP marks into a home network that then are acted upon by WiFi WMM. To elaborate, Comcast for a long time had an issue where large fractions (IIRC up to 25%) of packets where inadvertently marked as CS1 which in default WMM translates to AC_BK, and if the client sends the upload traffic via the default AC_BE, these differential AC usage can now result in different queueing delay compared to looking at upload and download individually. (If all traffic of a channel uses AC_BK instead of AC_BE this should not affect latency much)
Side-note: Comcast after being alerted took notice of the issue and fixed it, but I think this kind of issue can happen to other ISPs as well.


> 
>> Debuggability is not "rocket science" either, all one needs is a three value timestamp format (similar to what NTP uses) and one can, even without synchronized clocks! establish baseline OWDs and then under bi-directional load one can see which of these unloaded OWDs actually increases, so I argue that "it is impossible to differentiate whether the observed latency happens in the uplink or the downlink direction" is simply an incorrect assertion... (and we are actually doing this successfully in the existing internet as part of the cake-autorate project [h++ps://github.com/lynxthecat/cake-autorate/tree/master] already, based on ICMP timestamps). The relevant observation here is that we are not necessarily interested in veridical OWDs under idle conditions, but we want to see which OWD(s) increase during working-conditions, and that works with desynchronized clocks and is also robust against slow clock drift.
> 
> Unfortunately, this would require for the server to add timestamps to the HTTP-response, right ?

[SM] Yes in a sense.... but that could be a a small process that simply updates the content of that file every couple of milliseconds, so would not strictly need to be the server process... 


> We opted against this because the “power” of the responsiveness methodology is that it is extremely lightweight on the server-side. And with lightweight I mean not only from an implementation/CPU perspective but also from a deployment perspective. All one needs to do on the server in order to provide a responsiveness-measurement-endpoint is to host 2 files (one very large one and a very small one) and provide an endpoint to “POST” data to. All of these are standard capabilities in every webserver that can easily be configured. And we have seen a rise of endpoints showing up thanks to the simplicity to deploy it.
> 
> So, it is IMO a balance between “deployability” and “debuggability”. The responsiveness test is clearly aiming towards being deployable and accessible. Thus I think we would prefer keeping things on the server-side simple.
> 
> 
> Thoughts ?

[SM] I really really would like some way to get OWDs if only optional, but even more than that I think RPM should get as wide a deployment as possible, ubiquity has its own inherent value for measurement platforms, so if this makes deployment harder it would be a no-go. 

Now, I get that this is a long shot, but I fear that if the draft does not mention this at all the chance will be gone forever.... 
Could we maybe add a description of an optional 'time' payload, so clients could expect a single standardised format for that, if a server would optionally support it?



> That being said, I’m not entirely opposed to recommending the parallel mode as well. The interesting bit about the parallel mode is not so much the responsiveness measurement but rather the capacity measurement. Because, surprisingly many modems/… that are supposedly (according to their spec-sheet) able to handle 1 Gbps full-duplex suddenly show their weakness and are no more able to handle line-rate. So, it is more about capacity than responsiveness IMO.

[SM] True, yet such overload also occasionally affects queuing delay and jitter (sure RPM does not report jitter, but it likely affects the ability of a test to reach the required stability criteria).

> However, as a frequent user of the networkQuality-tool I realize myself that whenever I want to test my network I end up using a sequential test in favor of the parallel test.

[SM] I agree that a full complement of upload, then download, then combined upload & download is a great tool for understanding network behaviour. I also want to applaud Apple's networkQuality of an excellent implementation of the ideas behind this draft, offering a great and well selected set of options:

USAGE: networkQuality [-C <configuration_url>] [-c] [-d] [-f <comma-separated list>] [-h] [-I <network interface name>] [-k] [-p] [-r host] [-S <port>] [-s] [-u] [-v]
    -C: Override Configuration URL or path (with scheme file://)
    -c: Produce computer-readable output
    -d: Do not run a download test (implies -s)
    -f: <comma-separated list>: Enforce Protocol selections. Available options:
        h1: Force-enable HTTP/1.1
        h2: Force-enable HTTP/2
        h3: Force-enable HTTP/3 (QUIC)
        L4S: Force-enable L4S
        noL4S: Force-disable L4S
    -h: Show help (this message)
    -I: Bind test to interface (e.g., en0, pdp_ip0,...)
    -k: Disable certificate validation
    -p: Use iCloud Private Relay
    -r: Connect to host or IP, overriding DNS for initial config request
    -S: Start and run server on specified port. Other specified options ignored
    -s: Run tests sequentially instead of parallel upload/download
    -u: Do not run an upload test (implies -s)
    -v: Verbose output

that cover a lot of cases with a relative small set of control parameters.

> 
> 
> 
> Christoph
> 
> 
>> 
>> Given these observations, I ask that we change this design parameter to default requiring both measurement modes and defaulting to parallel testing (or randomly select between both modes, but report which it choose).
>> 
>> Best Regards
>> Sebastian
>> _______________________________________________
>> ippm mailing list
>> ippm@ietf.org
>> https://www.ietf.org/mailman/listinfo/ippm
>