Re: [L4s-discuss] Thoughts on cost fairness metric

Hi Bob, list,

> On Jun 19, 2023, at 01:48, Bob Briscoe <in=40bobbriscoe.net@dmarc.ietf.org> wrote:
> 
> Sebastian,
> 
> Before responding to each point, yes, I agree that the marking algorithms have to mark 'fairly' and we have to agree how to do that and improve algorithms over time. But lack of perfection in the network doesn't stop the congestion-rate being a good metric for iperf to maintain while network marking algorithms are being improved.

	The question is IMHO not so much "perfection" or not, as I subscribe to the "good enough" school of problem solving, my question is which out of the simple to compute harm measures is the most suitable. I am not yet convinced that marking/dropping rate is "it", but it might well be good enough... 

> And if anyone quotes iperf harm metrics without knowing or defining the marking algorithm in use, then others would be right to question the validity of their results.

	Honestly, I would prefer a harm metric that is independent of details as much as possible, as is this metric seems not well suited to e.g. compare L4S' two queues with each other... (the L-queue generally using a higher marking rate)...

> 
> Now pls see [BB] inline
> 
> On 18/06/2023 11:33, Sebastian Moeller wrote:
>> Hi List,
>> 
>> 
>>> On Jun 16, 2023, at 23:15, Bob Briscoe <in=40bobbriscoe.net@dmarc.ietf.org> wrote:
>>> 
>>> Bob
>>> 
>>> That would actually be v useful as a harm metric. For the list, it's essentially the bit rate, but measuring only those packets that are ECN-marked [in units of marked-b/s].
>> 	This primarily seems to show how much harm the flow suffered from the AQM, not how much it actually caused. Granted there is some correlation between the two, but no hard causality... Now, if the AQM sits behind a min/max flow queuing scheduler the correlation will be harder and if it sits behind a single queue "scheduler" it will be weaker (especially if the AQM on that single queue is not looking at a packets actual sojourn time, but at the estimated sojourn time of a virtual packet enqueued at dequeue time, so how much queue a newly added packet will have to wait behind). As far as I understand that is what L4S recommends for dual queue AQMs, and in that case the "harm" can be caused by a later burst (e.g. from a badly-paced flow) but assigned to a perfectly well behaving low rate flow.
>> 	That is IMHO sub-optimal :
>> a) once congestion hits the best an AQM can do is mark a packet of the most contributing flow not the one that has happen to have a packet at the head of the queue
>> b) this clearly assigns blame to the wrong flow
> 
> [BB] What you say at '(a)' sounds as if you are considering all the marking being done at just one point in time. An algorithm that statistically decides whether to mark each packet as it becomes the head packet should be able to produce an outcome where packets from the flow contributing most to congestion are marked more. So I don't see any basis for what you've said at '(a)'.

	[SM] That is part of my point, by stochastically marking packets you will sooner or later end up marking more packets of larger flows than smaller flows (as in number of packets in the queue over time), but a node that experiences congestion now would ideally not randomly mark/drop packets but selectively drop packets that have the highest likelihood of resulting in a noticeable decrease in the offered load and do so as soon as possible. It is the "sooner or later" part in the randomly dropping that I consider open to improvements. Now, one obviously needs a suitable data structure, if restricted to a single queue there is not much one can do... (one could still keep per flow stats for each queue entry and consult that when dealing with the head of queue packet)
Case in point, if an AQM marks say a DNS-over-TCP response in all likelihood it might as well not marked that packet at all, the flow likely is too short to ever throttle down enough for the load at the bottleneck to subside... same for a flow that is already at the minimal congestion window. From a responsiveness perspective these marks are wasted... (real drops at least reduce the bottleneck queue but marking "unresponsive" flows (there are valid reasons for a flow not being able to respond to marking, so that is not necessarily nefarious) is not going to help).

> 
> There are many things we (the community) still don't understand about apportionment of blame for congestion. But one thing we can be certain of is that the best solution will not involve heaping all the blame onto a single flow.

	[SM] I respectfully disagree, there are situations when a single flow monopolizes the lion's share of a bottlenecks capacity (e.g. single flow TCP dowwnload on a home link) so this flow will also cause the lion's share of the queue build-up, so heaping most of the responsibility on that flow seems imminently reasonable. We seem to agree that talking in absolutes (all the blame) makes little sense... 

> Where a number of flows behave the same or similarly, that would produce a very unstable outcome.

	[SM] In my experience, having run FQ schedulers in one form or another (mostly fq_codel and cake) on my internet access link for over a decade now, the link behaves pretty well, with low latency-increase-under-load/ high responsineness (for those flows that behave well, less well behaved flows will suffer the consequence by higher intra-flow queuing). However, that situation, many flows at very similar "fatness" is in my experience rather a rare beast, normal traffic tends to have loads of short transient flows that never reach steady state and that are not in equilibrium with each other. In the rare case randomly selecting a flow will be a tad cheaper than searching for and selecting the flow with the largest immediate queueing contribution, but I am not sure whether it makes all that much sense optimizing for the rare case, while the common case is more a bi modal distribution with low and high contributing flows.

> Despite just having said 'one thing we can be certain of', I would otherwise suggest that anyone who thinks they are certain about what the solution to the blame apportionment problem is should be very careful until they have done exhaustive analysis and experimentation. That includes the above opinions about FQ vs single queue.

	[SM] I would call this more a hypothesis, and less an opinion, as I explained why the AQM sojourn estimation is designed to increase the "speed" of the marking action, not its targeting of relevant flows...

> 
> For instance, I myself thought I had hit upon the solution that marking ought to be based on the delay that each packet causes other packets to experience, rather than the delay each packet experiences itself (which is what sojourn delay measures).

	[SM] That in essence would mean scaling by packet size if all one looks at is individual packets... not sure that would help. If the marking entity does not take packet size into account when marking then whether  64 octets were marked or ~1500 does not carry precise and relevant information, neither about the "magnitude" of the congestion, nor about the magnitude of the contribution of that flow to the congestion. To be able to extract that informatio from a mark the marking entity would need to take that into account.
	BTW, in a sense that is what a min/max fq-scheduler does offer, the flows with most packets in the queue (or rather bytes) will have a larger effect on the total queuing delay than flows with fewer packets and marking packets from that fat flow will at least solve the "does the marked flow contribute meaningful to the congestion" question. But the endpoints have still no clue about the bottleneck AQM's marking strategy and hence can not make robust inferences.

> In my experiments so far I turned out to be correct quite often, but not sufficiently often

	[SM] Which is the hallmark of a correlation, not a causation...

> to be able to say I had found /the/ solution.

	[SM] As above indicates I clearly must be misunderstanding how you quantify the delay an individual packet introduces, could you elaborate, please?

> I've got a much better grasp of the problem now compared to the position that had been reached in the late 1990s (when this problem was last attacked in the research community). However, the latest results of my simulations have surprised me in one region of the parameter space. So my experiments are still telling me that I need to go round the design-evaluate loop at least one more time before I have something worthy of write-up.

	[SM] I would love to read a decent peer-reviewed paper on that.

> 
>> 
>> Now, I wonder how that is going to work out:
>> rfc3168 flows:
>> the receiver knows the number of CE marked bytes and could accumulate those, but the sender does not, as ECE will be asserted until CWR is received independent of whether additional packets were marked.
>> L4S:
>> these flows are expected to accumulate more CE marks than classic traffic and hence more marked bytes. (also even with accurate ECN does the sender unambiguously know which packets where CE marked so it can veridically track their size?)
>> 
>> So to make sense out of this "measure" the application needs to collect information from both ends and aggregate these before reporting, something that iperf2 already does, and it will need to report ECT(0) and ECT(1) flows/packets separately (assuming that ECT(1) means L4S signaling/AQM).
> 
> [BB] As I already said, this metric will not mean much if reported in isolation, without defining the test conditions. That's for peer reviewers to highlight when results of tests using iperf are reported. It's doesn't mean it would be inappropriate for iperf to be able to report this metric in the first place.

	[SM] Indeed the best way to figure out whether a measure is useful (or how useful it is) seems to actually take it and correlate it with other measures of interest. Personally, I would also like to see simpler measures, like total number of marks/drops of a measurement flow or even average/median time between marks/drops, as I said, also, not instead.

> 
>> 
>> 
>> 
>>> Even without ECN marking, the congestion cost of a flow could be reported, as the rate of bits lost [in units of lost-b/s] (except of course one cannot distinguish congestion losses from other causes of loss).
>> 	But see above, even with ECN in practice this measure does not seem to be precise, no?
> 
> [BB] See previous response.
> 
> Nonetheless, it's got more potential as a harm metric for characterizing dynamic scenarios than any other metric (e.g. various fairness indices, which only measure rate harm, not latency harm, and only in steady state scenarios).

	[SM] But in non-steady state situations with e.g. the L4S recommended sojourn time estimator we will have considerable more mis-targeted markings, no? I am not satying the problem is easy to solve, but the proposed measure is clearly not ideal either; I am not ruling out that it might be an useful compromise. The advantage of looking at the rate is that it is a measure which is invariant to the CC-algorithm and will easily allow to compare e.g. L- and C-queue flows in L4S.

> 
>> 
>> 
>>> A specific use would be to test how well a flow could keep within a certain level of congestion cost. For instance, that would test how well a flow would do if  passed through DOCSIS queue protection (without needing to have a DOCSIS box to pass it through). DOCSIS QProt gives a flow a certain constant allowance for the rate it contributes to congestion, which is precisely the cost metric you highlight.
>> 	[SM] On that note, how is that going to work on variable rate links?
> 
> [BB] That's the whole point - the steady-state congestion-rate of each of a set of scalable flows sharing a link should be invariant whatever the link rate (and whatever the number of flows).

	[SM] A variable rate link will "actively conspire" against reaching steady-state...

> Of course, the challenge is how rapidly a flow's controller responds when the rate is varying (because nothing ever actually reaches steady-state). This metric should give a handle on how well or badly a flow manages to track varying capacity.

	[SM] Assuming the marks are actually assigned to the "right" packets... (which for flows running long enough should e true, albeit the mis-aaprpriation will result in a "noise-floor" for the measure).

> (Of course there are caveats, e.g. whether the congestion-rate actually is invariant in the steady state, like the theory says it should be.)
> 
>> 
>> 
>>> Other methods for policing latency might do similarly. A few years ago now I was given permission to reveal that a Procera traffic policer used the same congestion cost metric to more strongly limit traffic for those users contributing a higher congestion cost. Unlike rate policers, these policers can inherently take account of behaviour over time.
>> 	[SM] Curious, does the "over time" part not require to keep per flow state for longer?
> 
> [BB] A common design for per-flow congestion policers is to only hold state for badly behaving flows. For example:
> * A common technique in flow-rate policers is to hold flow-state only on flows selected probabilistically by randomly picking a small proportion of those packets that are ECN-marked or dropped. Then the more a flow is marked, the more likely flow-state will be held on it.

	[SM] Such sub-sampling might work for monitoring purposes, but to drive an actual controller on this seems more approximate than desirable, especially since we aim for a low-latency control loop, sub-sampling and averaging suffer from requiring multiple rounds before converging on something actionable, no?

> * In the DOCSIS q-protection algo, the state decays out between the packets of a flow (freeing up memory for other flows, which also usually decays out before the next packet of that flow). See https://datatracker.ietf.org/doc/html/draft-briscoe-docsis-q-protection-06#section-2.1

	[SM] It would be great if there were published evaluations of the docsis methods. I assume data was acquired and analyzed to come to the final design, now it would be nice if that data and analysis could be made available...

> In the Procera case, it held per-user state, not per-flow (a congestion cost metric aggregates well over a set of flows as well as over time).

	[SM] Makes sense, assuming there are fewer "users" than flows, and "user" is a relevant grouping. Is it correct then that they kept per user drop/mark probability and hence used different probabilities per user? ("User" is a tricky concept, but in cake we use IP address as decent proxy and optionally do a first round of arbitration between IP addresses (typically "internal" IP addresses to the home network) and then do per-flow queueing within each IP's traffic, this takes the sting out of using flow-explosion to gain a throughput advantage in that this behaviour only affects traffic to/from the same IP while other hosts do not even notice; this does obviously not help against DOS attacks but it mitigates the simplistic, 'let's use a shipload of flows to monopolize the link's capacity' strategy).

Regards
	Sebastian

>> Also is there any public data showing how this affected RTT-bias?
> 
> [BB] In a word, no.
> 
> Cheers
> 
> 
> Bob
> 
>> 
>> 
>>> Since then Procera merged with Sandvine, so I don't know whether that technology is still available.
>>> 
>>> Thanks
>>> 
>>> 
>>> Another Bob
>>> 
>>> On 16/06/2023 21:20, rjmcmahon wrote:
>>>> Hi All,
>>>> 
>>>> I read the below recently and am wondering if the cost fairness metric is useful? I'm adding ECN/L4S support from a test perspective into iperf 2 and thought this new metric might be generally useful - not sure. Feedback is appreciated.
>>>> 
>>>> https://www.bobbriscoe.net/projects/refb/draft-briscoe-tsvarea-fair-02.html
>>>> 
>>>> "The metric required to arbitrate cost fairness is simply volume of congestion, that is congestion times the bit rate of each user causing it, taken over time. In engineering terms, for each user it can be measured very easily as the amount of data the user sent that was dropped. Or with explicit congestion notification (ECN [RFC3168]) the amount of each user's data to have been congestion marked. Importantly, unlike flow rates, this metric integrates easily and correctly across different flows on different paths and across time, so it can be easily incorporated into future service level agreements of ISPs."
>>>> 
>>>> Thanks,
>>>> Bob
>>>> 
>>> -- 
>>> ________________________________________________________________
>>> Bob Briscoe                               http://bobbriscoe.net/
>>> 
>>> -- 
>>> L4s-discuss mailing list
>>> L4s-discuss@ietf.org
>>> https://www.ietf.org/mailman/listinfo/l4s-discuss
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
> 
> -- 
> L4s-discuss mailing list
> L4s-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/l4s-discuss