Re: [L4s-discuss] Thoughts on cost fairness metric
rjmcmahon <rjmcmahon@rjmcmahon.com> Tue, 20 June 2023 21:07 UTC
Return-Path: <rjmcmahon@rjmcmahon.com>
X-Original-To: l4s-discuss@ietfa.amsl.com
Delivered-To: l4s-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7B7CEC1526E9 for <l4s-discuss@ietfa.amsl.com>; Tue, 20 Jun 2023 14:07:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.096
X-Spam-Level:
X-Spam-Status: No, score=-6.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HK_RANDOM_ENVFROM=0.001, HK_RANDOM_FROM=0.999, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=rjmcmahon.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0QtXA7-wJAWX for <l4s-discuss@ietfa.amsl.com>; Tue, 20 Jun 2023 14:07:50 -0700 (PDT)
Received: from bobcat.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7597DC15257D for <l4s-discuss@ietf.org>; Tue, 20 Jun 2023 14:07:50 -0700 (PDT)
Received: from mail.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) by bobcat.rjmcmahon.com (Postfix) with ESMTPA id 2BE381B25F; Tue, 20 Jun 2023 14:07:50 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.11.0 bobcat.rjmcmahon.com 2BE381B25F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rjmcmahon.com; s=bobcat; t=1687295270; bh=FRMQnW8NbPAT7fb6h6//LZ43Pr7y21C+fAqAhL9kCiU=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=YEGGutF4w0CEJTb78EqPLqH4D3YrA6kGvIQyNhgnCUIJRrTzQdK6BG0nY/EtUALLg rSlsQh0UKoPflhuMc1ROzJadw21gfRw9Mdo4kFtX0PmHSziemSkjC/U1Uyy2fh6vI6 5SIAUeO1Zhz20xPdjws/H67R7w24n6VwyUofMWqw=
MIME-Version: 1.0
Date: Tue, 20 Jun 2023 14:07:50 -0700
From: rjmcmahon <rjmcmahon@rjmcmahon.com>
To: Sebastian Moeller <moeller0@gmx.de>
Cc: Bob Briscoe <in=40bobbriscoe.net@dmarc.ietf.org>, l4s-discuss@ietf.org
In-Reply-To: <9065174E-217F-430C-A271-9CF2AFECFACC@gmx.de>
References: <a34b4e4474ea744e01d5ce15131fc465@rjmcmahon.com> <93729ad5-6919-8a86-5994-fdfe6344a596@bobbriscoe.net> <0ED86DEC-CA18-4C6A-AA1B-4D21FA261196@gmx.de> <d0c7f792-eba5-63ab-d825-8b2158bf33ea@bobbriscoe.net> <9065174E-217F-430C-A271-9CF2AFECFACC@gmx.de>
Message-ID: <7726d27e241c4c0def3cbbe8d5ad899f@rjmcmahon.com>
X-Sender: rjmcmahon@rjmcmahon.com
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/l4s-discuss/Z1mb4qzz8W7MxT23qc82mb-GJ74>
Subject: Re: [L4s-discuss] Thoughts on cost fairness metric
X-BeenThere: l4s-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Low Latency, Low Loss, Scalable Throughput \(L4S\) " <l4s-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/l4s-discuss/>
List-Post: <mailto:l4s-discuss@ietf.org>
List-Help: <mailto:l4s-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jun 2023 21:07:54 -0000
Hi Sebastian, list, I'm not sure what "wasting a mark" means. Does it matter if the short flows stop on their own accord vs slow down per a mark? Marks that don't affect transport don't seem wasted per se, just not an affecting signal. Not sure why that's a problem. On "average/median time between marks/drops" Should this be a histogram too? In general, I try to support histograms as well as central limit theorem (CLT) averaging in stats collections. I'll probably start with marks - not sure how to do drops with ebpfs in a simple way for iperf. Bob > Hi Bob, list, > > >> On Jun 19, 2023, at 01:48, Bob Briscoe >> <in=40bobbriscoe.net@dmarc.ietf.org> wrote: >> >> Sebastian, >> >> Before responding to each point, yes, I agree that the marking >> algorithms have to mark 'fairly' and we have to agree how to do that >> and improve algorithms over time. But lack of perfection in the >> network doesn't stop the congestion-rate being a good metric for iperf >> to maintain while network marking algorithms are being improved. > > The question is IMHO not so much "perfection" or not, as I subscribe > to the "good enough" school of problem solving, my question is which > out of the simple to compute harm measures is the most suitable. I am > not yet convinced that marking/dropping rate is "it", but it might > well be good enough... > >> And if anyone quotes iperf harm metrics without knowing or defining >> the marking algorithm in use, then others would be right to question >> the validity of their results. > > Honestly, I would prefer a harm metric that is independent of details > as much as possible, as is this metric seems not well suited to e.g. > compare L4S' two queues with each other... (the L-queue generally > using a higher marking rate)... > > >> >> Now pls see [BB] inline >> >> On 18/06/2023 11:33, Sebastian Moeller wrote: >>> Hi List, >>> >>> >>>> On Jun 16, 2023, at 23:15, Bob Briscoe >>>> <in=40bobbriscoe.net@dmarc.ietf.org> wrote: >>>> >>>> Bob >>>> >>>> That would actually be v useful as a harm metric. For the list, it's >>>> essentially the bit rate, but measuring only those packets that are >>>> ECN-marked [in units of marked-b/s]. >>> This primarily seems to show how much harm the flow suffered from >>> the AQM, not how much it actually caused. Granted there is some >>> correlation between the two, but no hard causality... Now, if the AQM >>> sits behind a min/max flow queuing scheduler the correlation will be >>> harder and if it sits behind a single queue "scheduler" it will be >>> weaker (especially if the AQM on that single queue is not looking at >>> a packets actual sojourn time, but at the estimated sojourn time of a >>> virtual packet enqueued at dequeue time, so how much queue a newly >>> added packet will have to wait behind). As far as I understand that >>> is what L4S recommends for dual queue AQMs, and in that case the >>> "harm" can be caused by a later burst (e.g. from a badly-paced flow) >>> but assigned to a perfectly well behaving low rate flow. >>> That is IMHO sub-optimal : >>> a) once congestion hits the best an AQM can do is mark a packet of >>> the most contributing flow not the one that has happen to have a >>> packet at the head of the queue >>> b) this clearly assigns blame to the wrong flow >> >> [BB] What you say at '(a)' sounds as if you are considering all the >> marking being done at just one point in time. An algorithm that >> statistically decides whether to mark each packet as it becomes the >> head packet should be able to produce an outcome where packets from >> the flow contributing most to congestion are marked more. So I don't >> see any basis for what you've said at '(a)'. > > [SM] That is part of my point, by stochastically marking packets you > will sooner or later end up marking more packets of larger flows than > smaller flows (as in number of packets in the queue over time), but a > node that experiences congestion now would ideally not randomly > mark/drop packets but selectively drop packets that have the highest > likelihood of resulting in a noticeable decrease in the offered load > and do so as soon as possible. It is the "sooner or later" part in the > randomly dropping that I consider open to improvements. Now, one > obviously needs a suitable data structure, if restricted to a single > queue there is not much one can do... (one could still keep per flow > stats for each queue entry and consult that when dealing with the head > of queue packet) > Case in point, if an AQM marks say a DNS-over-TCP response in all > likelihood it might as well not marked that packet at all, the flow > likely is too short to ever throttle down enough for the load at the > bottleneck to subside... same for a flow that is already at the > minimal congestion window. From a responsiveness perspective these > marks are wasted... (real drops at least reduce the bottleneck queue > but marking "unresponsive" flows (there are valid reasons for a flow > not being able to respond to marking, so that is not necessarily > nefarious) is not going to help). > > > >> >> There are many things we (the community) still don't understand about >> apportionment of blame for congestion. But one thing we can be certain >> of is that the best solution will not involve heaping all the blame >> onto a single flow. > > [SM] I respectfully disagree, there are situations when a single flow > monopolizes the lion's share of a bottlenecks capacity (e.g. single > flow TCP dowwnload on a home link) so this flow will also cause the > lion's share of the queue build-up, so heaping most of the > responsibility on that flow seems imminently reasonable. We seem to > agree that talking in absolutes (all the blame) makes little sense... > > >> Where a number of flows behave the same or similarly, that would >> produce a very unstable outcome. > > [SM] In my experience, having run FQ schedulers in one form or > another (mostly fq_codel and cake) on my internet access link for over > a decade now, the link behaves pretty well, with low > latency-increase-under-load/ high responsineness (for those flows that > behave well, less well behaved flows will suffer the consequence by > higher intra-flow queuing). However, that situation, many flows at > very similar "fatness" is in my experience rather a rare beast, normal > traffic tends to have loads of short transient flows that never reach > steady state and that are not in equilibrium with each other. In the > rare case randomly selecting a flow will be a tad cheaper than > searching for and selecting the flow with the largest immediate > queueing contribution, but I am not sure whether it makes all that > much sense optimizing for the rare case, while the common case is more > a bi modal distribution with low and high contributing flows. > > >> Despite just having said 'one thing we can be certain of', I would >> otherwise suggest that anyone who thinks they are certain about what >> the solution to the blame apportionment problem is should be very >> careful until they have done exhaustive analysis and experimentation. >> That includes the above opinions about FQ vs single queue. > > [SM] I would call this more a hypothesis, and less an opinion, as I > explained why the AQM sojourn estimation is designed to increase the > "speed" of the marking action, not its targeting of relevant flows... > > >> >> For instance, I myself thought I had hit upon the solution that >> marking ought to be based on the delay that each packet causes other >> packets to experience, rather than the delay each packet experiences >> itself (which is what sojourn delay measures). > > [SM] That in essence would mean scaling by packet size if all one > looks at is individual packets... not sure that would help. If the > marking entity does not take packet size into account when marking > then whether 64 octets were marked or ~1500 does not carry precise > and relevant information, neither about the "magnitude" of the > congestion, nor about the magnitude of the contribution of that flow > to the congestion. To be able to extract that informatio from a mark > the marking entity would need to take that into account. > BTW, in a sense that is what a min/max fq-scheduler does offer, the > flows with most packets in the queue (or rather bytes) will have a > larger effect on the total queuing delay than flows with fewer packets > and marking packets from that fat flow will at least solve the "does > the marked flow contribute meaningful to the congestion" question. But > the endpoints have still no clue about the bottleneck AQM's marking > strategy and hence can not make robust inferences. > >> In my experiments so far I turned out to be correct quite often, but >> not sufficiently often > > [SM] Which is the hallmark of a correlation, not a causation... > >> to be able to say I had found /the/ solution. > > [SM] As above indicates I clearly must be misunderstanding how you > quantify the delay an individual packet introduces, could you > elaborate, please? > > >> I've got a much better grasp of the problem now compared to the >> position that had been reached in the late 1990s (when this problem >> was last attacked in the research community). However, the latest >> results of my simulations have surprised me in one region of the >> parameter space. So my experiments are still telling me that I need to >> go round the design-evaluate loop at least one more time before I have >> something worthy of write-up. > > [SM] I would love to read a decent peer-reviewed paper on that. > >> >>> >>> Now, I wonder how that is going to work out: >>> rfc3168 flows: >>> the receiver knows the number of CE marked bytes and could accumulate >>> those, but the sender does not, as ECE will be asserted until CWR is >>> received independent of whether additional packets were marked. >>> L4S: >>> these flows are expected to accumulate more CE marks than classic >>> traffic and hence more marked bytes. (also even with accurate ECN >>> does the sender unambiguously know which packets where CE marked so >>> it can veridically track their size?) >>> >>> So to make sense out of this "measure" the application needs to >>> collect information from both ends and aggregate these before >>> reporting, something that iperf2 already does, and it will need to >>> report ECT(0) and ECT(1) flows/packets separately (assuming that >>> ECT(1) means L4S signaling/AQM). >> >> [BB] As I already said, this metric will not mean much if reported in >> isolation, without defining the test conditions. That's for peer >> reviewers to highlight when results of tests using iperf are reported. >> It's doesn't mean it would be inappropriate for iperf to be able to >> report this metric in the first place. > > [SM] Indeed the best way to figure out whether a measure is useful > (or how useful it is) seems to actually take it and correlate it with > other measures of interest. Personally, I would also like to see > simpler measures, like total number of marks/drops of a measurement > flow or even average/median time between marks/drops, as I said, also, > not instead. > > >> >>> >>> >>> >>>> Even without ECN marking, the congestion cost of a flow could be >>>> reported, as the rate of bits lost [in units of lost-b/s] (except of >>>> course one cannot distinguish congestion losses from other causes of >>>> loss). >>> But see above, even with ECN in practice this measure does not seem >>> to be precise, no? >> >> [BB] See previous response. >> >> Nonetheless, it's got more potential as a harm metric for >> characterizing dynamic scenarios than any other metric (e.g. various >> fairness indices, which only measure rate harm, not latency harm, and >> only in steady state scenarios). > > [SM] But in non-steady state situations with e.g. the L4S recommended > sojourn time estimator we will have considerable more mis-targeted > markings, no? I am not satying the problem is easy to solve, but the > proposed measure is clearly not ideal either; I am not ruling out that > it might be an useful compromise. The advantage of looking at the rate > is that it is a measure which is invariant to the CC-algorithm and > will easily allow to compare e.g. L- and C-queue flows in L4S. > > >> >>> >>> >>>> A specific use would be to test how well a flow could keep within a >>>> certain level of congestion cost. For instance, that would test how >>>> well a flow would do if passed through DOCSIS queue protection >>>> (without needing to have a DOCSIS box to pass it through). DOCSIS >>>> QProt gives a flow a certain constant allowance for the rate it >>>> contributes to congestion, which is precisely the cost metric you >>>> highlight. >>> [SM] On that note, how is that going to work on variable rate links? >> >> [BB] That's the whole point - the steady-state congestion-rate of each >> of a set of scalable flows sharing a link should be invariant whatever >> the link rate (and whatever the number of flows). > > [SM] A variable rate link will "actively conspire" against reaching > steady-state... > > >> Of course, the challenge is how rapidly a flow's controller responds >> when the rate is varying (because nothing ever actually reaches >> steady-state). This metric should give a handle on how well or badly a >> flow manages to track varying capacity. > > [SM] Assuming the marks are actually assigned to the "right" > packets... (which for flows running long enough should e true, albeit > the mis-aaprpriation will result in a "noise-floor" for the measure). > >> (Of course there are caveats, e.g. whether the congestion-rate >> actually is invariant in the steady state, like the theory says it >> should be.) >> >>> >>> >>>> Other methods for policing latency might do similarly. A few years >>>> ago now I was given permission to reveal that a Procera traffic >>>> policer used the same congestion cost metric to more strongly limit >>>> traffic for those users contributing a higher congestion cost. >>>> Unlike rate policers, these policers can inherently take account of >>>> behaviour over time. >>> [SM] Curious, does the "over time" part not require to keep per flow >>> state for longer? >> >> [BB] A common design for per-flow congestion policers is to only hold >> state for badly behaving flows. For example: >> * A common technique in flow-rate policers is to hold flow-state only >> on flows selected probabilistically by randomly picking a small >> proportion of those packets that are ECN-marked or dropped. Then the >> more a flow is marked, the more likely flow-state will be held on it. > > [SM] Such sub-sampling might work for monitoring purposes, but to > drive an actual controller on this seems more approximate than > desirable, especially since we aim for a low-latency control loop, > sub-sampling and averaging suffer from requiring multiple rounds > before converging on something actionable, no? > > >> * In the DOCSIS q-protection algo, the state decays out between the >> packets of a flow (freeing up memory for other flows, which also >> usually decays out before the next packet of that flow). See >> https://datatracker.ietf.org/doc/html/draft-briscoe-docsis-q-protection-06#section-2.1 > > [SM] It would be great if there were published evaluations of the > docsis methods. I assume data was acquired and analyzed to come to the > final design, now it would be nice if that data and analysis could be > made available... > >> In the Procera case, it held per-user state, not per-flow (a >> congestion cost metric aggregates well over a set of flows as well as >> over time). > > [SM] Makes sense, assuming there are fewer "users" than flows, and > "user" is a relevant grouping. Is it correct then that they kept per > user drop/mark probability and hence used different probabilities per > user? ("User" is a tricky concept, but in cake we use IP address as > decent proxy and optionally do a first round of arbitration between IP > addresses (typically "internal" IP addresses to the home network) and > then do per-flow queueing within each IP's traffic, this takes the > sting out of using flow-explosion to gain a throughput advantage in > that this behaviour only affects traffic to/from the same IP while > other hosts do not even notice; this does obviously not help against > DOS attacks but it mitigates the simplistic, 'let's use a shipload of > flows to monopolize the link's capacity' strategy). > > > Regards > Sebastian > > >>> Also is there any public data showing how this affected RTT-bias? >> >> [BB] In a word, no. >> >> Cheers >> >> >> Bob >> >>> >>> >>>> Since then Procera merged with Sandvine, so I don't know whether >>>> that technology is still available. >>>> >>>> Thanks >>>> >>>> >>>> Another Bob >>>> >>>> On 16/06/2023 21:20, rjmcmahon wrote: >>>>> Hi All, >>>>> >>>>> I read the below recently and am wondering if the cost fairness >>>>> metric is useful? I'm adding ECN/L4S support from a test >>>>> perspective into iperf 2 and thought this new metric might be >>>>> generally useful - not sure. Feedback is appreciated. >>>>> >>>>> https://www.bobbriscoe.net/projects/refb/draft-briscoe-tsvarea-fair-02.html >>>>> >>>>> "The metric required to arbitrate cost fairness is simply volume of >>>>> congestion, that is congestion times the bit rate of each user >>>>> causing it, taken over time. In engineering terms, for each user it >>>>> can be measured very easily as the amount of data the user sent >>>>> that was dropped. Or with explicit congestion notification (ECN >>>>> [RFC3168]) the amount of each user's data to have been congestion >>>>> marked. Importantly, unlike flow rates, this metric integrates >>>>> easily and correctly across different flows on different paths and >>>>> across time, so it can be easily incorporated into future service >>>>> level agreements of ISPs." >>>>> >>>>> Thanks, >>>>> Bob >>>>> >>>> -- >>>> ________________________________________________________________ >>>> Bob Briscoe http://bobbriscoe.net/ >>>> >>>> -- >>>> L4s-discuss mailing list >>>> L4s-discuss@ietf.org >>>> https://www.ietf.org/mailman/listinfo/l4s-discuss >> >> -- >> ________________________________________________________________ >> Bob Briscoe http://bobbriscoe.net/ >> >> -- >> L4s-discuss mailing list >> L4s-discuss@ietf.org >> https://www.ietf.org/mailman/listinfo/l4s-discuss
- [L4s-discuss] Thoughts on cost fairness metric rjmcmahon
- Re: [L4s-discuss] Thoughts on cost fairness metric Bob Briscoe
- Re: [L4s-discuss] Thoughts on cost fairness metric Sebastian Moeller
- Re: [L4s-discuss] Thoughts on cost fairness metric Bob Briscoe
- Re: [L4s-discuss] Thoughts on cost fairness metric Sebastian Moeller
- Re: [L4s-discuss] Thoughts on cost fairness metric rjmcmahon
- Re: [L4s-discuss] Thoughts on cost fairness metric Sebastian Moeller
- Re: [L4s-discuss] Thoughts on cost fairness metric rjmcmahon