Re: [ippm] [Rpm] lightweight active sensing of bandwidth and buffering

Sebastian Moeller <moeller0@gmx.de> Wed, 02 November 2022 21:41 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7D453C1524B5 for <ippm@ietfa.amsl.com>; Wed, 2 Nov 2022 14:41:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.855
X-Spam-Level:
X-Spam-Status: No, score=-6.855 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmx.de
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kgPzQn3KVdEg for <ippm@ietfa.amsl.com>; Wed, 2 Nov 2022 14:41:19 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A8C57C14CF16 for <ippm@ietf.org>; Wed, 2 Nov 2022 14:41:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417; t=1667425270; bh=vTWk/dc5n1luDT+sHTLXG+457e9MSk9pDjQPc/b4NFM=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=Yo7GE0M6a8o8P4La0Ri0a/nYfgP/6VCPFOpenWAENg5p7gqJ+//4ARC+mMPlhK7ge V1gTB5TGf9dDjhPTI3C2a5Sx36epCijSGL1mbTPx/iMGD7TYQ5nlWjEoybdxhu5RaC wAm0tfT6PUQa1lvPZiVaf4ZxGbygvpYKDhtlg75zHKxouNmHv9Gr84XcVSfVpPcemu b43x34b7GngQFdNeVuiASldtzjGxDEHya23dT9AL0vYF06Ez6fCYfOY+YVvM4UoImZ IJT6ZWxsEQoeJ25GHlfZTljNhs+1SK9xnPDJdzmA+MdH2A+HMfLDNeWJKz6ej9Wafs RoB++UCSZBnCg==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from smtpclient.apple ([77.10.108.195]) by mail.gmx.net (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MuUjC-1p7BXo1fD0-00raqb; Wed, 02 Nov 2022 22:41:10 +0100
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <FR2P281MB15274FF81D44E875CC4940259C399@FR2P281MB1527.DEUP281.PROD.OUTLOOK.COM>
Date: Wed, 02 Nov 2022 22:41:09 +0100
Cc: rjmcmahon <rjmcmahon@rjmcmahon.com>, Rpm <rpm@lists.bufferbloat.net>, ippm@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <C3839FE9-4FC5-42B2-8AEC-4530C2B956A9@gmx.de>
References: <CH0PR02MB79808E2508E6AED66DC7657AD32E9@CH0PR02MB7980.namprd02.prod.outlook.com> <CH0PR02MB7980DFB52D45F2458782430FD3379@CH0PR02MB7980.namprd02.prod.outlook.com> <CH0PR02MB7980D3036BF700A074D902A1D3379@CH0PR02MB7980.namprd02.prod.outlook.com> <CAA93jw7Jb_77dZzr-AFjXPtwf_hBxhODyF5UzTX5a-A6+xMkWw@mail.gmail.com> <0a8cc31c7077918bf84fddf9db50db02@rjmcmahon.com> <CH0PR02MB798043B62D22E8C82F61138DD3379@CH0PR02MB7980.namprd02.prod.outlook.com> <CAA93jw6kuHJp_PnUBb6J4HiFmy=xTG9uiu7bML7fuHFzNhMr2w@mail.gmail.com> <344f2a33b6bcae4ad4390dcb96f92589@rjmcmahon.com> <261B90F5-FD4E-46D5-BEFE-6BF12D249A28@gmx.de> <FR2P281MB15274FF81D44E875CC4940259C399@FR2P281MB1527.DEUP281.PROD.OUTLOOK.COM>
To: Ruediger.Geib@telekom.de
X-Mailer: Apple Mail (2.3696.120.41.1.1)
X-Provags-ID: V03:K1:7Ns/szgSG97ichg0ERvGIVwfdvAyBv0XpEoMKZpJyo1ffS1xECt 6NcLoOioEgJkBwVDN1Aj1a+bIIIUMvei8m4JjOqs+nl+cH2AT42aPHIWZk4psNjowKCPYqs QeuiHELjFR5S+jM3Y4ZSuepMFeWzAD/J1BJIMQrA1V1WRl5OQL5kvWaH5mfBvFg7M7KF6r7 Df2eHdLKQsiEyk7ZSKDlg==
UI-OutboundReport: notjunk:1;M01:P0:hpgexlJuvFo=;Q3q/DY2D0LodmifNWfo+3j76/Rj hT6XKaLbUVvrr2tILxDcF6iTMuz1hMcjK7vEE8Zy8xp7W4TcCKQXuilCf6j/XpAFTBng3n1xH Djv3q2UaZobAqELWu3QMJaHH4Xr/JTC1egRBhQU+HIgxZTVuf7Bi/yMHDp/0pnCPnaghMeYio zO7LHzMwJzv2c78puj8FMglCFbm4yaIrG77GOa9Wsu4MIAMcuI/Y0UsRZLB4oxGYfxovYPHUc XMdcCTb9XuMp8ZKz2XTatVz1gnrx8uxzn5avzCkv4fWzp0Lh5biq7cQR/yCyoK9w/zntRpeRZ hNvG/miXUraw6LkAGg14kmi38do7PEvJsvEjLSaTsbmPj8tczwKCG9TgLYeoP2ZoFBNSQZcre X2MCrxJvrcyudOk9coH//P0lb+8kzEdvt72FWS5WC6z4iUG+bvr2R2nuUu0x5a7khljRXIqLN YfvwqwFoW3GDq1IJqktVwlkRlU4Vx4CGq00YUQnlzyBB29fULZTZ/eIg0qtrVhA2/zMs2pN2X oprJpxx5gdiorH67OHyIhpZ/deMUsBy4cdlMYKJiW5aXa+8b7jk0OBkk8MmbD5wFEdl1ZimOp XaiuVu3k6kYwaX74EfZ/roPVFkBTmqLglOAhCesMotB7j9CU9srDz6cYvMGB9j51VVd7+T7Oa A/7u6hj0sG9GLv2CN3f48VYezq/zm83YOPihv9U9a22Z+xrqe4Cn7+VQZcCmM5zSGa6lc3w7I RTFSJ7dMJTHrFzJYTDaoeYLRm1MleY6bta+h9iSr0NcE717HUpW6uOe3RXueVl0a016LjSQG6 PCm8yAxDY9CUHdZYucddRM9tbhDToLaTXvkA+Mpwct/kDlCJqYOUS2T+PHcgTYjVQTPRTMqq/ L9CfPKUQx86BqM9Q0BvIt21cztluI7x5moYkKX2XRMyReH7qiHtIkHd2kRBMMRnNNDmK+ZdlX G7wxk2jG8vn+5v/d3Hjlgzrddwg=
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/uJXEL1JcQ9Pdc5H5eiMUOKrHBWQ>
Subject: Re: [ippm] [Rpm] lightweight active sensing of bandwidth and buffering
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Nov 2022 21:41:23 -0000

Dear Ruediger,

thank you very much for your helpful information. I will chew over this and see how/if I can exploit these "development of congestion observations" somehow. 
The goal of these plots is not primarily to detect congestion* (that would be the core of autorate's functionality, detect increases in delay and respond in reducing the shaper rate to counter act them), but more to show how well this works (the current rationale is that compared to a situation without traffic shaping the difference in high versus low-load CDFs should be noticeably** smaller).

*) autorate will be in control of an artificial bottleneck and we do measure the achieved throughput per direction, so we can reason about "congestion" based on throughput and delay; the loading is organic in that we simply measure the traffic volume per time of what travels over the relevant interfaces, the delay measurements however are active, which has its pros and cons...
**) Maybe even run a few statistical tests, like Mann-Withney-U/Wilcoxon ranksum test and then claim "significantly smaller". I feel a parametric t-test might not be in order here, with delay PDFs decidedly non-normal in shape (then again they likely are mono-modal, so t-test would still work okayish in spite of its core assumption being violated).


> On Nov 2, 2022, at 10:41, <Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de> wrote:
> 
> Bob, Sebastian,
> 
> not being active on your topic, just to add what I observed on congestion:

	[SM] I will try to explain how/if we could exploit your observations for our controller

> - starts with an increase of jitter, but measured minimum delays still remain constant. Technically, a queue builds up some of the time, but it isn't present permanently.

	[SM] So in that phase we would expect CDFs to have different slopes, higher variance should result in shallower slope? As for using this insight for the actual controller, I am not sure how that would work; maybe maintaining a "jitter" base line per reflector and test whether each new sample deviates significantly from that base line? That is similar to the approach we are currently taking with delay/RTT.

> - buffer fill reaches a "steady state", called bufferbloat on access I think

	[SM] I would call it buffer bloat if that steady-state results in too high delays increases (which to a degree is a subjective judgement). Although in accordance with the Nichols/Jacobsen analogy of buffers/queues as shock absorbers a queue with with acceptable steady-state induced delay might not work too well to even out occasional bursts?

> ; technically, OWD increases also for the minimum delays, jitter now decreases (what you've described that as "the delay magnitude" decreases or "minimum CDF shift" respectively, if I'm correct).

	[SM] That is somewhat unfortunate as it is harder to detect quickly than something that simply increases and stays high (like RTT).

> I'd expect packet loss to occur, once the buffer fill is on steady state, but loss might be randomly distributed and could be of a low percentage.

	[SM] Loss is mostly invisible to our controller (it would need to affect our relatively thin active delay measurement traffic we have no insight into the rest of the traffic), but more than that the controller's goal is to avoid this situation so hopefully it will be rare and transient.

> - a sudden rather long load burst may cause a  jump-start to "steady-state" buffer fill.

	[SM] As would a rather steep drop in available capacity with traffic in-flight sized to the previous larger capacity. This is e.g. what can be observed over shared media like docsis/cable and GSM successors.


> The above holds for a slow but steady load increase (where the measurement frequency determines the timescale qualifying "slow").
> - in the end, max-min delay or delay distribution/jitter likely isn't an easy to handle single metric to identify congestion.

	[SM] Pragmatically we work with delay increase over baseline, which seems to work well enough to be useful, while it is unlikely to be perfect. The CDFs I plotted are really just for making sense post hoc out of the logged data... (cake-autorate is currently designed to maintain a "flight-recorder" log buffer that can be extracted after noticeable events, and I am trying to come up with how to slice and dice the data to help explain "noticeable events" from the limited log data we have).

Many Thanks & Kind Regards
	Sebastian


> 
> Regards,
> 
> Ruediger
> 
> 
>> On Nov 2, 2022, at 00:39, rjmcmahon via Rpm <rpm@lists.bufferbloat.net> wrote:
>> 
>> Bufferbloat shifts the minimum of the latency or OWD CDF.
> 
> 	[SM] Thank you for spelling this out explicitly, I only worked on a vage implicit assumption along those lines. However what I want to avoid is using delay magnitude itself as classifier between high and low load condition as that seems statistically uncouth to then show that the delay differs between the two classes;). 
> 	Yet, your comment convinced me that my current load threshold (at least for the high load condition) probably is too small, exactly because the "base" of the high-load CDFs coincides with the base of the low-load CDFs implying that the high-load class contains too many samples with decent delay (which after all is one of the goals of the whole autorate endeavor).
> 
> 
>> A suggestion is to disable x-axis auto-scaling and start from zero.
> 
> 	[SM] Will reconsider. I started with start at zero, end then switched to an x-range that starts with the delay corresponding to 0.01% for the reflector/condition with the lowest such value and stops at 97.5% for the reflector/condition with the highest delay value. My rationale is that the base delay/path delay of each reflector is not all that informative* (and it can still be learned from reading the x-axis), the long tail > 50% however is where I expect most differences so I want to emphasize this and finally I wanted to avoid that the actual "curvy" part gets compressed so much that all lines more or less coincide. As I said, I will reconsider this
> 
> 
> *) We also maintain individual baselines per reflector, so I could just plot the differences from baseline, but that would essentially equalize all reflectors, and I think having a plot that easily shows reflectors with outlying base delay can be informative when selecting reflector candidates. However once we actually switch to OWDs baseline correction might be required anyways, as due to colck differences ICMP type 13/14 data can have massive offsets that are mostly indicative of un synched clocks**.
> 
> **) This is whyI would prefer to use NTP servers as reflectors with NTP requests, my expectation is all of these should be reasonably synced by default so that offsets should be in the sane range....
> 
> 
>> 
>> Bob
>>> For about 2 years now the cake w-adaptive bandwidth project has been 
>>> exploring techniques to lightweightedly sense  bandwidth and 
>>> buffering problems. One of my favorites was their discovery that ICMP 
>>> type 13 got them working OWD from millions of ipv4 devices!
>>> They've also explored leveraging ntp and multiple other methods, and 
>>> have scripts available that do a good job of compensating for 5g and 
>>> starlink's misbehaviors.
>>> They've also pioneered a whole bunch of new graphing techniques, 
>>> which I do wish were used more than single number summaries 
>>> especially in analyzing the behaviors of new metrics like rpm, 
>>> samknows, ookla, and
>>> RFC9097 - to see what is being missed.
>>> There are thousands of posts about this research topic, a new post on 
>>> OWD just went by here.
>>> https://forum.openwrt.org/t/cake-w-adaptive-bandwidth/135379/793
>>> and of course, I love flent's enormous graphing toolset for 
>>> simulating and analyzing complex network behaviors.
>> _______________________________________________
>> Rpm mailing list
>> Rpm@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/rpm
> 
> _______________________________________________
> ippm mailing list
> ippm@ietf.org
> https://www.ietf.org/mailman/listinfo/ippm