Re: [L4s-discuss] Thoughts on cost fairness metric

rjmcmahon <rjmcmahon@rjmcmahon.com> Tue, 20 June 2023 21:07 UTC

Return-Path: <rjmcmahon@rjmcmahon.com>
X-Original-To: l4s-discuss@ietfa.amsl.com
Delivered-To: l4s-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7B7CEC1526E9 for <l4s-discuss@ietfa.amsl.com>; Tue, 20 Jun 2023 14:07:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.096
X-Spam-Level:
X-Spam-Status: No, score=-6.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HK_RANDOM_ENVFROM=0.001, HK_RANDOM_FROM=0.999, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=rjmcmahon.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0QtXA7-wJAWX for <l4s-discuss@ietfa.amsl.com>; Tue, 20 Jun 2023 14:07:50 -0700 (PDT)
Received: from bobcat.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7597DC15257D for <l4s-discuss@ietf.org>; Tue, 20 Jun 2023 14:07:50 -0700 (PDT)
Received: from mail.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) by bobcat.rjmcmahon.com (Postfix) with ESMTPA id 2BE381B25F; Tue, 20 Jun 2023 14:07:50 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.11.0 bobcat.rjmcmahon.com 2BE381B25F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rjmcmahon.com; s=bobcat; t=1687295270; bh=FRMQnW8NbPAT7fb6h6//LZ43Pr7y21C+fAqAhL9kCiU=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=YEGGutF4w0CEJTb78EqPLqH4D3YrA6kGvIQyNhgnCUIJRrTzQdK6BG0nY/EtUALLg rSlsQh0UKoPflhuMc1ROzJadw21gfRw9Mdo4kFtX0PmHSziemSkjC/U1Uyy2fh6vI6 5SIAUeO1Zhz20xPdjws/H67R7w24n6VwyUofMWqw=
MIME-Version: 1.0
Date: Tue, 20 Jun 2023 14:07:50 -0700
From: rjmcmahon <rjmcmahon@rjmcmahon.com>
To: Sebastian Moeller <moeller0@gmx.de>
Cc: Bob Briscoe <in=40bobbriscoe.net@dmarc.ietf.org>, l4s-discuss@ietf.org
In-Reply-To: <9065174E-217F-430C-A271-9CF2AFECFACC@gmx.de>
References: <a34b4e4474ea744e01d5ce15131fc465@rjmcmahon.com> <93729ad5-6919-8a86-5994-fdfe6344a596@bobbriscoe.net> <0ED86DEC-CA18-4C6A-AA1B-4D21FA261196@gmx.de> <d0c7f792-eba5-63ab-d825-8b2158bf33ea@bobbriscoe.net> <9065174E-217F-430C-A271-9CF2AFECFACC@gmx.de>
Message-ID: <7726d27e241c4c0def3cbbe8d5ad899f@rjmcmahon.com>
X-Sender: rjmcmahon@rjmcmahon.com
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/l4s-discuss/Z1mb4qzz8W7MxT23qc82mb-GJ74>
Subject: Re: [L4s-discuss] Thoughts on cost fairness metric
X-BeenThere: l4s-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Low Latency, Low Loss, Scalable Throughput \(L4S\) " <l4s-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/l4s-discuss/>
List-Post: <mailto:l4s-discuss@ietf.org>
List-Help: <mailto:l4s-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jun 2023 21:07:54 -0000

Hi Sebastian, list,

I'm not sure what "wasting a mark" means. Does it matter if the short 
flows stop on their own accord vs slow down per a mark? Marks that don't 
affect transport don't seem wasted per se, just not an affecting signal. 
Not sure why that's a problem.

On "average/median time between marks/drops"

Should this be a histogram too? In general, I try to support histograms 
as well as central limit theorem (CLT) averaging in stats collections.

I'll probably start with marks - not sure how to do drops with ebpfs in 
a simple way for iperf.

Bob
> Hi Bob, list,
> 
> 
>> On Jun 19, 2023, at 01:48, Bob Briscoe 
>> <in=40bobbriscoe.net@dmarc.ietf.org> wrote:
>> 
>> Sebastian,
>> 
>> Before responding to each point, yes, I agree that the marking 
>> algorithms have to mark 'fairly' and we have to agree how to do that 
>> and improve algorithms over time. But lack of perfection in the 
>> network doesn't stop the congestion-rate being a good metric for iperf 
>> to maintain while network marking algorithms are being improved.
> 
> 	The question is IMHO not so much "perfection" or not, as I subscribe
> to the "good enough" school of problem solving, my question is which
> out of the simple to compute harm measures is the most suitable. I am
> not yet convinced that marking/dropping rate is "it", but it might
> well be good enough...
> 
>> And if anyone quotes iperf harm metrics without knowing or defining 
>> the marking algorithm in use, then others would be right to question 
>> the validity of their results.
> 
> 	Honestly, I would prefer a harm metric that is independent of details
> as much as possible, as is this metric seems not well suited to e.g.
> compare L4S' two queues with each other... (the L-queue generally
> using a higher marking rate)...
> 
> 
>> 
>> Now pls see [BB] inline
>> 
>> On 18/06/2023 11:33, Sebastian Moeller wrote:
>>> Hi List,
>>> 
>>> 
>>>> On Jun 16, 2023, at 23:15, Bob Briscoe 
>>>> <in=40bobbriscoe.net@dmarc.ietf.org> wrote:
>>>> 
>>>> Bob
>>>> 
>>>> That would actually be v useful as a harm metric. For the list, it's 
>>>> essentially the bit rate, but measuring only those packets that are 
>>>> ECN-marked [in units of marked-b/s].
>>> 	This primarily seems to show how much harm the flow suffered from 
>>> the AQM, not how much it actually caused. Granted there is some 
>>> correlation between the two, but no hard causality... Now, if the AQM 
>>> sits behind a min/max flow queuing scheduler the correlation will be 
>>> harder and if it sits behind a single queue "scheduler" it will be 
>>> weaker (especially if the AQM on that single queue is not looking at 
>>> a packets actual sojourn time, but at the estimated sojourn time of a 
>>> virtual packet enqueued at dequeue time, so how much queue a newly 
>>> added packet will have to wait behind). As far as I understand that 
>>> is what L4S recommends for dual queue AQMs, and in that case the 
>>> "harm" can be caused by a later burst (e.g. from a badly-paced flow) 
>>> but assigned to a perfectly well behaving low rate flow.
>>> 	That is IMHO sub-optimal :
>>> a) once congestion hits the best an AQM can do is mark a packet of 
>>> the most contributing flow not the one that has happen to have a 
>>> packet at the head of the queue
>>> b) this clearly assigns blame to the wrong flow
>> 
>> [BB] What you say at '(a)' sounds as if you are considering all the 
>> marking being done at just one point in time. An algorithm that 
>> statistically decides whether to mark each packet as it becomes the 
>> head packet should be able to produce an outcome where packets from 
>> the flow contributing most to congestion are marked more. So I don't 
>> see any basis for what you've said at '(a)'.
> 
> 	[SM] That is part of my point, by stochastically marking packets you
> will sooner or later end up marking more packets of larger flows than
> smaller flows (as in number of packets in the queue over time), but a
> node that experiences congestion now would ideally not randomly
> mark/drop packets but selectively drop packets that have the highest
> likelihood of resulting in a noticeable decrease in the offered load
> and do so as soon as possible. It is the "sooner or later" part in the
> randomly dropping that I consider open to improvements. Now, one
> obviously needs a suitable data structure, if restricted to a single
> queue there is not much one can do... (one could still keep per flow
> stats for each queue entry and consult that when dealing with the head
> of queue packet)
> Case in point, if an AQM marks say a DNS-over-TCP response in all
> likelihood it might as well not marked that packet at all, the flow
> likely is too short to ever throttle down enough for the load at the
> bottleneck to subside... same for a flow that is already at the
> minimal congestion window. From a responsiveness perspective these
> marks are wasted... (real drops at least reduce the bottleneck queue
> but marking "unresponsive" flows (there are valid reasons for a flow
> not being able to respond to marking, so that is not necessarily
> nefarious) is not going to help).
> 
> 
> 
>> 
>> There are many things we (the community) still don't understand about 
>> apportionment of blame for congestion. But one thing we can be certain 
>> of is that the best solution will not involve heaping all the blame 
>> onto a single flow.
> 
> 	[SM] I respectfully disagree, there are situations when a single flow
> monopolizes the lion's share of a bottlenecks capacity (e.g. single
> flow TCP dowwnload on a home link) so this flow will also cause the
> lion's share of the queue build-up, so heaping most of the
> responsibility on that flow seems imminently reasonable. We seem to
> agree that talking in absolutes (all the blame) makes little sense...
> 
> 
>> Where a number of flows behave the same or similarly, that would 
>> produce a very unstable outcome.
> 
> 	[SM] In my experience, having run FQ schedulers in one form or
> another (mostly fq_codel and cake) on my internet access link for over
> a decade now, the link behaves pretty well, with low
> latency-increase-under-load/ high responsineness (for those flows that
> behave well, less well behaved flows will suffer the consequence by
> higher intra-flow queuing). However, that situation, many flows at
> very similar "fatness" is in my experience rather a rare beast, normal
> traffic tends to have loads of short transient flows that never reach
> steady state and that are not in equilibrium with each other. In the
> rare case randomly selecting a flow will be a tad cheaper than
> searching for and selecting the flow with the largest immediate
> queueing contribution, but I am not sure whether it makes all that
> much sense optimizing for the rare case, while the common case is more
> a bi modal distribution with low and high contributing flows.
> 
> 
>> Despite just having said 'one thing we can be certain of', I would 
>> otherwise suggest that anyone who thinks they are certain about what 
>> the solution to the blame apportionment problem is should be very 
>> careful until they have done exhaustive analysis and experimentation. 
>> That includes the above opinions about FQ vs single queue.
> 
> 	[SM] I would call this more a hypothesis, and less an opinion, as I
> explained why the AQM sojourn estimation is designed to increase the
> "speed" of the marking action, not its targeting of relevant flows...
> 
> 
>> 
>> For instance, I myself thought I had hit upon the solution that 
>> marking ought to be based on the delay that each packet causes other 
>> packets to experience, rather than the delay each packet experiences 
>> itself (which is what sojourn delay measures).
> 
> 	[SM] That in essence would mean scaling by packet size if all one
> looks at is individual packets... not sure that would help. If the
> marking entity does not take packet size into account when marking
> then whether  64 octets were marked or ~1500 does not carry precise
> and relevant information, neither about the "magnitude" of the
> congestion, nor about the magnitude of the contribution of that flow
> to the congestion. To be able to extract that informatio from a mark
> the marking entity would need to take that into account.
> 	BTW, in a sense that is what a min/max fq-scheduler does offer, the
> flows with most packets in the queue (or rather bytes) will have a
> larger effect on the total queuing delay than flows with fewer packets
> and marking packets from that fat flow will at least solve the "does
> the marked flow contribute meaningful to the congestion" question. But
> the endpoints have still no clue about the bottleneck AQM's marking
> strategy and hence can not make robust inferences.
> 
>> In my experiments so far I turned out to be correct quite often, but 
>> not sufficiently often
> 
> 	[SM] Which is the hallmark of a correlation, not a causation...
> 
>> to be able to say I had found /the/ solution.
> 
> 	[SM] As above indicates I clearly must be misunderstanding how you
> quantify the delay an individual packet introduces, could you
> elaborate, please?
> 
> 
>> I've got a much better grasp of the problem now compared to the 
>> position that had been reached in the late 1990s (when this problem 
>> was last attacked in the research community). However, the latest 
>> results of my simulations have surprised me in one region of the 
>> parameter space. So my experiments are still telling me that I need to 
>> go round the design-evaluate loop at least one more time before I have 
>> something worthy of write-up.
> 
> 	[SM] I would love to read a decent peer-reviewed paper on that.
> 
>> 
>>> 
>>> Now, I wonder how that is going to work out:
>>> rfc3168 flows:
>>> the receiver knows the number of CE marked bytes and could accumulate 
>>> those, but the sender does not, as ECE will be asserted until CWR is 
>>> received independent of whether additional packets were marked.
>>> L4S:
>>> these flows are expected to accumulate more CE marks than classic 
>>> traffic and hence more marked bytes. (also even with accurate ECN 
>>> does the sender unambiguously know which packets where CE marked so 
>>> it can veridically track their size?)
>>> 
>>> So to make sense out of this "measure" the application needs to 
>>> collect information from both ends and aggregate these before 
>>> reporting, something that iperf2 already does, and it will need to 
>>> report ECT(0) and ECT(1) flows/packets separately (assuming that 
>>> ECT(1) means L4S signaling/AQM).
>> 
>> [BB] As I already said, this metric will not mean much if reported in 
>> isolation, without defining the test conditions. That's for peer 
>> reviewers to highlight when results of tests using iperf are reported. 
>> It's doesn't mean it would be inappropriate for iperf to be able to 
>> report this metric in the first place.
> 
> 	[SM] Indeed the best way to figure out whether a measure is useful
> (or how useful it is) seems to actually take it and correlate it with
> other measures of interest. Personally, I would also like to see
> simpler measures, like total number of marks/drops of a measurement
> flow or even average/median time between marks/drops, as I said, also,
> not instead.
> 
> 
>> 
>>> 
>>> 
>>> 
>>>> Even without ECN marking, the congestion cost of a flow could be 
>>>> reported, as the rate of bits lost [in units of lost-b/s] (except of 
>>>> course one cannot distinguish congestion losses from other causes of 
>>>> loss).
>>> 	But see above, even with ECN in practice this measure does not seem 
>>> to be precise, no?
>> 
>> [BB] See previous response.
>> 
>> Nonetheless, it's got more potential as a harm metric for 
>> characterizing dynamic scenarios than any other metric (e.g. various 
>> fairness indices, which only measure rate harm, not latency harm, and 
>> only in steady state scenarios).
> 
> 	[SM] But in non-steady state situations with e.g. the L4S recommended
> sojourn time estimator we will have considerable more mis-targeted
> markings, no? I am not satying the problem is easy to solve, but the
> proposed measure is clearly not ideal either; I am not ruling out that
> it might be an useful compromise. The advantage of looking at the rate
> is that it is a measure which is invariant to the CC-algorithm and
> will easily allow to compare e.g. L- and C-queue flows in L4S.
> 
> 
>> 
>>> 
>>> 
>>>> A specific use would be to test how well a flow could keep within a 
>>>> certain level of congestion cost. For instance, that would test how 
>>>> well a flow would do if  passed through DOCSIS queue protection 
>>>> (without needing to have a DOCSIS box to pass it through). DOCSIS 
>>>> QProt gives a flow a certain constant allowance for the rate it 
>>>> contributes to congestion, which is precisely the cost metric you 
>>>> highlight.
>>> 	[SM] On that note, how is that going to work on variable rate links?
>> 
>> [BB] That's the whole point - the steady-state congestion-rate of each 
>> of a set of scalable flows sharing a link should be invariant whatever 
>> the link rate (and whatever the number of flows).
> 
> 	[SM] A variable rate link will "actively conspire" against reaching
> steady-state...
> 
> 
>> Of course, the challenge is how rapidly a flow's controller responds 
>> when the rate is varying (because nothing ever actually reaches 
>> steady-state). This metric should give a handle on how well or badly a 
>> flow manages to track varying capacity.
> 
> 	[SM] Assuming the marks are actually assigned to the "right"
> packets... (which for flows running long enough should e true, albeit
> the mis-aaprpriation will result in a "noise-floor" for the measure).
> 
>> (Of course there are caveats, e.g. whether the congestion-rate 
>> actually is invariant in the steady state, like the theory says it 
>> should be.)
>> 
>>> 
>>> 
>>>> Other methods for policing latency might do similarly. A few years 
>>>> ago now I was given permission to reveal that a Procera traffic 
>>>> policer used the same congestion cost metric to more strongly limit 
>>>> traffic for those users contributing a higher congestion cost. 
>>>> Unlike rate policers, these policers can inherently take account of 
>>>> behaviour over time.
>>> 	[SM] Curious, does the "over time" part not require to keep per flow 
>>> state for longer?
>> 
>> [BB] A common design for per-flow congestion policers is to only hold 
>> state for badly behaving flows. For example:
>> * A common technique in flow-rate policers is to hold flow-state only 
>> on flows selected probabilistically by randomly picking a small 
>> proportion of those packets that are ECN-marked or dropped. Then the 
>> more a flow is marked, the more likely flow-state will be held on it.
> 
> 	[SM] Such sub-sampling might work for monitoring purposes, but to
> drive an actual controller on this seems more approximate than
> desirable, especially since we aim for a low-latency control loop,
> sub-sampling and averaging suffer from requiring multiple rounds
> before converging on something actionable, no?
> 
> 
>> * In the DOCSIS q-protection algo, the state decays out between the 
>> packets of a flow (freeing up memory for other flows, which also 
>> usually decays out before the next packet of that flow). See 
>> https://datatracker.ietf.org/doc/html/draft-briscoe-docsis-q-protection-06#section-2.1
> 
> 	[SM] It would be great if there were published evaluations of the
> docsis methods. I assume data was acquired and analyzed to come to the
> final design, now it would be nice if that data and analysis could be
> made available...
> 
>> In the Procera case, it held per-user state, not per-flow (a 
>> congestion cost metric aggregates well over a set of flows as well as 
>> over time).
> 
> 	[SM] Makes sense, assuming there are fewer "users" than flows, and
> "user" is a relevant grouping. Is it correct then that they kept per
> user drop/mark probability and hence used different probabilities per
> user? ("User" is a tricky concept, but in cake we use IP address as
> decent proxy and optionally do a first round of arbitration between IP
> addresses (typically "internal" IP addresses to the home network) and
> then do per-flow queueing within each IP's traffic, this takes the
> sting out of using flow-explosion to gain a throughput advantage in
> that this behaviour only affects traffic to/from the same IP while
> other hosts do not even notice; this does obviously not help against
> DOS attacks but it mitigates the simplistic, 'let's use a shipload of
> flows to monopolize the link's capacity' strategy).
> 
> 
> Regards
> 	Sebastian
> 
> 
>>> Also is there any public data showing how this affected RTT-bias?
>> 
>> [BB] In a word, no.
>> 
>> Cheers
>> 
>> 
>> Bob
>> 
>>> 
>>> 
>>>> Since then Procera merged with Sandvine, so I don't know whether 
>>>> that technology is still available.
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> Another Bob
>>>> 
>>>> On 16/06/2023 21:20, rjmcmahon wrote:
>>>>> Hi All,
>>>>> 
>>>>> I read the below recently and am wondering if the cost fairness 
>>>>> metric is useful? I'm adding ECN/L4S support from a test 
>>>>> perspective into iperf 2 and thought this new metric might be 
>>>>> generally useful - not sure. Feedback is appreciated.
>>>>> 
>>>>> https://www.bobbriscoe.net/projects/refb/draft-briscoe-tsvarea-fair-02.html
>>>>> 
>>>>> "The metric required to arbitrate cost fairness is simply volume of 
>>>>> congestion, that is congestion times the bit rate of each user 
>>>>> causing it, taken over time. In engineering terms, for each user it 
>>>>> can be measured very easily as the amount of data the user sent 
>>>>> that was dropped. Or with explicit congestion notification (ECN 
>>>>> [RFC3168]) the amount of each user's data to have been congestion 
>>>>> marked. Importantly, unlike flow rates, this metric integrates 
>>>>> easily and correctly across different flows on different paths and 
>>>>> across time, so it can be easily incorporated into future service 
>>>>> level agreements of ISPs."
>>>>> 
>>>>> Thanks,
>>>>> Bob
>>>>> 
>>>> --
>>>> ________________________________________________________________
>>>> Bob Briscoe                               http://bobbriscoe.net/
>>>> 
>>>> --
>>>> L4s-discuss mailing list
>>>> L4s-discuss@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/l4s-discuss
>> 
>> --
>> ________________________________________________________________
>> Bob Briscoe                               http://bobbriscoe.net/
>> 
>> --
>> L4s-discuss mailing list
>> L4s-discuss@ietf.org
>> https://www.ietf.org/mailman/listinfo/l4s-discuss