Re: [L4s-discuss] Thoughts on cost fairness metric

rjmcmahon <rjmcmahon@rjmcmahon.com> Wed, 21 June 2023 15:43 UTC

Return-Path: <rjmcmahon@rjmcmahon.com>
X-Original-To: l4s-discuss@ietfa.amsl.com
Delivered-To: l4s-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B1532C14CE51 for <l4s-discuss@ietfa.amsl.com>; Wed, 21 Jun 2023 08:43:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.096
X-Spam-Level:
X-Spam-Status: No, score=-6.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HK_RANDOM_ENVFROM=0.001, HK_RANDOM_FROM=0.999, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=rjmcmahon.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k1uzv-bgX4Tz for <l4s-discuss@ietfa.amsl.com>; Wed, 21 Jun 2023 08:43:06 -0700 (PDT)
Received: from bobcat.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9547CC151530 for <l4s-discuss@ietf.org>; Wed, 21 Jun 2023 08:43:06 -0700 (PDT)
Received: from mail.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) by bobcat.rjmcmahon.com (Postfix) with ESMTPA id 29D1F1B25F; Wed, 21 Jun 2023 08:43:06 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.11.0 bobcat.rjmcmahon.com 29D1F1B25F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rjmcmahon.com; s=bobcat; t=1687362186; bh=H9ryq7RxVYgVlgp6lSwPy/A54BMlp1nsEv+eO02ejbk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=gUUQUKQLelwwnElsbV4dHMYuAxMsD4rkPU+QavPFhQV5BLafP78ij+0Gzh4BeItek 7KiyDIW4YmJxtSkGMTtUSA3AzJsvFraJEQj4ItL1ehC3AX856HGwv+eGL0n9I3Rjx+ 4YhIFpqBstD8RTDc71eLc74QVtonRW3DfWm9d2G0=
MIME-Version: 1.0
Date: Wed, 21 Jun 2023 08:43:06 -0700
From: rjmcmahon <rjmcmahon@rjmcmahon.com>
To: Sebastian Moeller <moeller0@gmx.de>
Cc: Bob Briscoe <in@bobbriscoe.net>, l4s-discuss@ietf.org
In-Reply-To: <FDAB9F08-60EB-4BD0-9ACB-D15308D9BCAF@gmx.de>
References: <a34b4e4474ea744e01d5ce15131fc465@rjmcmahon.com> <93729ad5-6919-8a86-5994-fdfe6344a596@bobbriscoe.net> <0ED86DEC-CA18-4C6A-AA1B-4D21FA261196@gmx.de> <d0c7f792-eba5-63ab-d825-8b2158bf33ea@bobbriscoe.net> <9065174E-217F-430C-A271-9CF2AFECFACC@gmx.de> <7726d27e241c4c0def3cbbe8d5ad899f@rjmcmahon.com> <FDAB9F08-60EB-4BD0-9ACB-D15308D9BCAF@gmx.de>
Message-ID: <64c8d14c9754d4c3ba0dffde643f8ed6@rjmcmahon.com>
X-Sender: rjmcmahon@rjmcmahon.com
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/l4s-discuss/f_0nytstwxvK6Pjie_3qhbiNzX4>
Subject: Re: [L4s-discuss] Thoughts on cost fairness metric
X-BeenThere: l4s-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Low Latency, Low Loss, Scalable Throughput \(L4S\) " <l4s-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/l4s-discuss/>
List-Post: <mailto:l4s-discuss@ietf.org>
List-Help: <mailto:l4s-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Jun 2023 15:43:10 -0000

Thanks Sebastian,

Yes, iperf 2 supports the tcp_info struct which includes retries. It 
requires -e to get this output. "

There are more options that might be of interest including little's law 
outputs (the inP metric on the server) and write to read latencies. I 
didn't enable histograms below but those are supported too.

[rjmcmahon@ryzen3950 iperf2-code]$ iperf -c 192.168.1.70 -i 1 -e 
--trip-times --tcp-write-times --tcp-write-prefetch 16K
------------------------------------------------------------
Client connecting to 192.168.1.70, TCP port 5001 with pid 388510 (1 
flows)
Write buffer size: 131072 Byte (writetimer-enabled)
TCP congestion control using cubic
TOS set to 0x0 (Nagle on)
TCP window size: 85.0 KByte (default)
Event based writes (pending queue watermark at 16384 bytes)
------------------------------------------------------------
[  1] local 192.168.1.95%enp4s0 port 34814 connected with 192.168.1.70 
port 5001 (prefetch=16384) (trip-times) (sock=3) 
(icwnd/mss/irtt=14/1448/333) (ct=0.40 ms) on 2023-06-21 08:39:03.062 
(PDT)
[ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     
Cwnd/RTT        NetPwr  write-times avg/min/max/stdev (cnt)
[  1] 0.00-1.00 sec   114 MBytes   954 Mbits/sec  910/0         66    
1539K/13197 us  9038  1.098/0.020/9.206/0.445 ms (910)
[  1] 1.00-2.00 sec   112 MBytes   943 Mbits/sec  899/0          0    
1681K/14382 us  8193  1.111/0.430/1.612/0.303 ms (899)
[  1] 2.00-3.00 sec   112 MBytes   943 Mbits/sec  899/0          0    
1794K/15428 us  7638  1.111/0.402/1.660/0.305 ms (899)
[  1] 3.00-4.00 sec   112 MBytes   942 Mbits/sec  898/0          0    
1882K/16192 us  7269  1.111/0.398/1.636/0.307 ms (898)
[  1] 4.00-5.00 sec   112 MBytes   937 Mbits/sec  894/0          1    
1365K/11584 us  10116  1.117/0.069/8.519/0.400 ms (894)
[  1] 5.00-6.00 sec   112 MBytes   942 Mbits/sec  898/0          0    
1459K/12522 us  9400  1.111/0.450/1.583/0.298 ms (898)
[  1] 6.00-7.00 sec   112 MBytes   943 Mbits/sec  899/0          0    
1531K/13111 us  8987  1.112/0.440/1.621/0.304 ms (899)
[  1] 7.00-8.00 sec   112 MBytes   942 Mbits/sec  898/0          0    
1582K/13682 us  8603  1.112/0.509/1.509/0.295 ms (898)
[  1] 8.00-9.00 sec   112 MBytes   942 Mbits/sec  898/0          0    
1617K/13936 us  8446  1.112/0.479/1.542/0.304 ms (898)
[  1] 9.00-10.00 sec   112 MBytes   942 Mbits/sec  898/0          0    
1644K/14088 us  8355  1.112/0.406/1.611/0.315 ms (898)
[  1] 0.00-10.03 sec  1.10 GBytes   940 Mbits/sec  8992/0         67    
1645K/14129 us  8315  1.111/0.020/9.206/0.331 ms (8992)


[root@rjm-nas examples]# iperf -s -i 1 -e
------------------------------------------------------------
Server listening on TCP port 5001 with pid 4533
Read buffer size:  128 KByte (Dist bin width=16.0 KByte)
TCP congestion control default cubic
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.70%eno1 port 5001 connected with 192.168.1.95 port 
34814 (trip-times) (sock=4) (peer 2.1.10-master) 
(icwnd/mss/irtt=14/1448/293) on 2023-06-21 08:39:03.063 (PDT)
[ ID] Interval        Transfer    Bandwidth    Burst Latency 
avg/min/max/stdev (cnt/size) inP NetPwr  Reads=Dist
[  1] 0.00-1.00 sec   112 MBytes   941 Mbits/sec  
12.577/1.463/33.996/4.237 ms (897/131111) 1.43 MByte 9351  
4807=983:3347:454:1:2:1:5:14
[  1] 1.00-2.00 sec   112 MBytes   941 Mbits/sec  
14.843/13.833/15.827/0.402 ms (898/131047) 1.67 MByte 7929  
4896=988:3740:168:0:0:0:0:0
[  1] 2.00-3.00 sec   112 MBytes   942 Mbits/sec  
15.954/15.127/16.863/0.336 ms (898/131074) 1.79 MByte 7378  
4896=994:3477:425:0:0:0:0:0
[  1] 3.00-4.00 sec   112 MBytes   941 Mbits/sec  
16.817/16.074/17.470/0.274 ms (897/131186) 1.89 MByte 6998  
4900=1009:3739:152:0:0:0:0:0
[  1] 4.00-5.00 sec   112 MBytes   941 Mbits/sec  
17.326/7.370/33.850/1.775 ms (898/131040) 1.94 MByte 6792  
4825=981:3758:70:0:0:0:1:15
[  1] 5.00-6.00 sec   112 MBytes   941 Mbits/sec  
13.130/12.374/13.943/0.294 ms (898/131045) 1.47 MByte 8963  
4896=1001:3571:324:0:0:0:0:0
[  1] 6.00-7.00 sec   112 MBytes   942 Mbits/sec  
13.845/13.114/14.572/0.249 ms (898/131063) 1.55 MByte 8501  
4898=997:3578:323:0:0:0:0:0
[  1] 7.00-8.00 sec   112 MBytes   941 Mbits/sec  
14.371/13.918/14.723/0.210 ms (898/131029) 1.61 MByte 8188  
4898=1004:3893:1:0:0:0:0:0
[  1] 8.00-9.00 sec   112 MBytes   941 Mbits/sec  
14.748/14.346/15.244/0.177 ms (897/131185) 1.66 MByte 7979  
4894=994:3900:0:0:0:0:0:0
[  1] 9.00-10.00 sec   112 MBytes   941 Mbits/sec  
15.001/14.642/15.594/0.184 ms (898/131047) 1.68 MByte 7845  
4896=984:3844:68:0:0:0:0:0
[  1] 0.00-10.02 sec  1.10 GBytes   941 Mbits/sec  
14.862/1.463/33.996/2.055 ms (8992/131072) 1.58 MByte 7918  
48884=9952:36903:1990:1:2:1:6:29


Bob
> Hi Bob,
> 
> 
>> On Jun 20, 2023, at 23:07, rjmcmahon 
>> <rjmcmahon=40rjmcmahon.com@dmarc.ietf.org> wrote:
>> 
>> Hi Sebastian, list,
>> 
>> I'm not sure what "wasting a mark" means. Does it matter if the short 
>> flows stop on their own accord vs slow down per a mark? Marks that 
>> don't affect transport don't seem wasted per se, just not an affecting 
>> signal. Not sure why that's a problem.
> 
> 	[SM] What I tried to express, inartfully, was that for low queueing
> delay it seems obvious that marking a responsive flow is superior to
> marking an under-responsive flow. A mark on a low-rate flow that was
> about to stop anyway will have very little effect on the future queue
> growth a mark on a high-rate flow will have a considerably higher
> effect. Now the bottleneck node really does not know for sure how a
> flow is going to behave in the (immediate) future, but "past is
> prolog" seems a reasonable heuristic here, so picking flows that
> noticeably contribute to the queue now, and tell them to slow down
> seems like a decent approach (even though that high-rate flow might
> already have stopped sending and is closing down). RED and similar
> "stochastic" AQMs counter that problem, by increasing the mark/drop
> rate based on queue length, again a decent strategy in the
> intermediate term but it will take a few "marking" hits before a mark
> ends up on a responsive flow with high enough congestion window to
> actually make a difference (or to mark a sufficiently high number of
> low-rate flows). For tight temporal control of the queue (which seems
> to be one reason for L4S' existence) I argue that marking
> under-responsive flows does not help. Again, given enough time a
> stochastic marker will end up marking flows roughly in proportion to
> their amount of packets in the queue so will do the right thing*, it
> is just that one can do better by carefully selecting the packet to
> mark, if one is willing to maintain a suitable data structure...
> 
> 
> *) For L4S' L-queue it also helps that the AQM is supposed to use a
> pretty high marking rate by default so the temporal cost of marking
> sub-optimal flows is likely not all that high. But I am not sure
> whether that logic also applies to the C-queue...
> 
>> On "average/median time between marks/drops"
>> 
>> Should this be a histogram too? In general, I try to support 
>> histograms as well as central limit theorem (CLT) averaging in stats 
>> collections.
> 
> 	[SM] I absolutely like histograms as I often find them so much more
> informative than single numbers (e.g. for bi/multi-modal
> distributions), so maybe the inter-mark/drop-interval histogram? Not
> sure about the scaling for the x-axis though...
> 
>> I'll probably start with marks - not sure how to do drops with ebpfs 
>> in a simple way for iperf.
> 
> 	[SM] I think, as Bob Biscoe mentioned, drops are consiberably more
> ambiguous anyway so treating them separately (and not at all for
> starters) seems quite reasonable. I think there are also some numbers
> floating around how much non congestive packet-loss is typical for
> leaf networks. Especially BBR IIRC makes some assumptions in that
> direction and essentially will tolerate a (low) level of
> packet-loss/drops without engaging a rate reduction, no?
> 
> That said, each dropped packet in TCP sould trigger a retransmit, does
> iperf not already report the number of retransmits? (I might confuse
> iperf2 and 3 here, I really need to set up my own iperf2 server so I
> can do some testing).
> 
> Regards
> 	Sebastian
> 
> 
>> 
>> Bob
>>> Hi Bob, list,
>>>> On Jun 19, 2023, at 01:48, Bob Briscoe 
>>>> <in=40bobbriscoe.net@dmarc.ietf.org> wrote:
>>>> Sebastian,
>>>> Before responding to each point, yes, I agree that the marking 
>>>> algorithms have to mark 'fairly' and we have to agree how to do that 
>>>> and improve algorithms over time. But lack of perfection in the 
>>>> network doesn't stop the congestion-rate being a good metric for 
>>>> iperf to maintain while network marking algorithms are being 
>>>> improved.
>>> 	The question is IMHO not so much "perfection" or not, as I subscribe
>>> to the "good enough" school of problem solving, my question is which
>>> out of the simple to compute harm measures is the most suitable. I am
>>> not yet convinced that marking/dropping rate is "it", but it might
>>> well be good enough...
>>>> And if anyone quotes iperf harm metrics without knowing or defining 
>>>> the marking algorithm in use, then others would be right to question 
>>>> the validity of their results.
>>> 	Honestly, I would prefer a harm metric that is independent of 
>>> details
>>> as much as possible, as is this metric seems not well suited to e.g.
>>> compare L4S' two queues with each other... (the L-queue generally
>>> using a higher marking rate)...
>>>> Now pls see [BB] inline
>>>> On 18/06/2023 11:33, Sebastian Moeller wrote:
>>>>> Hi List,
>>>>>> On Jun 16, 2023, at 23:15, Bob Briscoe 
>>>>>> <in=40bobbriscoe.net@dmarc.ietf.org> wrote:
>>>>>> Bob
>>>>>> That would actually be v useful as a harm metric. For the list, 
>>>>>> it's essentially the bit rate, but measuring only those packets 
>>>>>> that are ECN-marked [in units of marked-b/s].
>>>>> 	This primarily seems to show how much harm the flow suffered from 
>>>>> the AQM, not how much it actually caused. Granted there is some 
>>>>> correlation between the two, but no hard causality... Now, if the 
>>>>> AQM sits behind a min/max flow queuing scheduler the correlation 
>>>>> will be harder and if it sits behind a single queue "scheduler" it 
>>>>> will be weaker (especially if the AQM on that single queue is not 
>>>>> looking at a packets actual sojourn time, but at the estimated 
>>>>> sojourn time of a virtual packet enqueued at dequeue time, so how 
>>>>> much queue a newly added packet will have to wait behind). As far 
>>>>> as I understand that is what L4S recommends for dual queue AQMs, 
>>>>> and in that case the "harm" can be caused by a later burst (e.g. 
>>>>> from a badly-paced flow) but assigned to a perfectly well behaving 
>>>>> low rate flow.
>>>>> 	That is IMHO sub-optimal :
>>>>> a) once congestion hits the best an AQM can do is mark a packet of 
>>>>> the most contributing flow not the one that has happen to have a 
>>>>> packet at the head of the queue
>>>>> b) this clearly assigns blame to the wrong flow
>>>> [BB] What you say at '(a)' sounds as if you are considering all the 
>>>> marking being done at just one point in time. An algorithm that 
>>>> statistically decides whether to mark each packet as it becomes the 
>>>> head packet should be able to produce an outcome where packets from 
>>>> the flow contributing most to congestion are marked more. So I don't 
>>>> see any basis for what you've said at '(a)'.
>>> 	[SM] That is part of my point, by stochastically marking packets you
>>> will sooner or later end up marking more packets of larger flows than
>>> smaller flows (as in number of packets in the queue over time), but a
>>> node that experiences congestion now would ideally not randomly
>>> mark/drop packets but selectively drop packets that have the highest
>>> likelihood of resulting in a noticeable decrease in the offered load
>>> and do so as soon as possible. It is the "sooner or later" part in 
>>> the
>>> randomly dropping that I consider open to improvements. Now, one
>>> obviously needs a suitable data structure, if restricted to a single
>>> queue there is not much one can do... (one could still keep per flow
>>> stats for each queue entry and consult that when dealing with the 
>>> head
>>> of queue packet)
>>> Case in point, if an AQM marks say a DNS-over-TCP response in all
>>> likelihood it might as well not marked that packet at all, the flow
>>> likely is too short to ever throttle down enough for the load at the
>>> bottleneck to subside... same for a flow that is already at the
>>> minimal congestion window. From a responsiveness perspective these
>>> marks are wasted... (real drops at least reduce the bottleneck queue
>>> but marking "unresponsive" flows (there are valid reasons for a flow
>>> not being able to respond to marking, so that is not necessarily
>>> nefarious) is not going to help).
>>>> There are many things we (the community) still don't understand 
>>>> about apportionment of blame for congestion. But one thing we can be 
>>>> certain of is that the best solution will not involve heaping all 
>>>> the blame onto a single flow.
>>> 	[SM] I respectfully disagree, there are situations when a single 
>>> flow
>>> monopolizes the lion's share of a bottlenecks capacity (e.g. single
>>> flow TCP dowwnload on a home link) so this flow will also cause the
>>> lion's share of the queue build-up, so heaping most of the
>>> responsibility on that flow seems imminently reasonable. We seem to
>>> agree that talking in absolutes (all the blame) makes little sense...
>>>> Where a number of flows behave the same or similarly, that would 
>>>> produce a very unstable outcome.
>>> 	[SM] In my experience, having run FQ schedulers in one form or
>>> another (mostly fq_codel and cake) on my internet access link for 
>>> over
>>> a decade now, the link behaves pretty well, with low
>>> latency-increase-under-load/ high responsineness (for those flows 
>>> that
>>> behave well, less well behaved flows will suffer the consequence by
>>> higher intra-flow queuing). However, that situation, many flows at
>>> very similar "fatness" is in my experience rather a rare beast, 
>>> normal
>>> traffic tends to have loads of short transient flows that never reach
>>> steady state and that are not in equilibrium with each other. In the
>>> rare case randomly selecting a flow will be a tad cheaper than
>>> searching for and selecting the flow with the largest immediate
>>> queueing contribution, but I am not sure whether it makes all that
>>> much sense optimizing for the rare case, while the common case is 
>>> more
>>> a bi modal distribution with low and high contributing flows.
>>>> Despite just having said 'one thing we can be certain of', I would 
>>>> otherwise suggest that anyone who thinks they are certain about what 
>>>> the solution to the blame apportionment problem is should be very 
>>>> careful until they have done exhaustive analysis and 
>>>> experimentation. That includes the above opinions about FQ vs single 
>>>> queue.
>>> 	[SM] I would call this more a hypothesis, and less an opinion, as I
>>> explained why the AQM sojourn estimation is designed to increase the
>>> "speed" of the marking action, not its targeting of relevant flows...
>>>> For instance, I myself thought I had hit upon the solution that 
>>>> marking ought to be based on the delay that each packet causes other 
>>>> packets to experience, rather than the delay each packet experiences 
>>>> itself (which is what sojourn delay measures).
>>> 	[SM] That in essence would mean scaling by packet size if all one
>>> looks at is individual packets... not sure that would help. If the
>>> marking entity does not take packet size into account when marking
>>> then whether  64 octets were marked or ~1500 does not carry precise
>>> and relevant information, neither about the "magnitude" of the
>>> congestion, nor about the magnitude of the contribution of that flow
>>> to the congestion. To be able to extract that informatio from a mark
>>> the marking entity would need to take that into account.
>>> 	BTW, in a sense that is what a min/max fq-scheduler does offer, the
>>> flows with most packets in the queue (or rather bytes) will have a
>>> larger effect on the total queuing delay than flows with fewer 
>>> packets
>>> and marking packets from that fat flow will at least solve the "does
>>> the marked flow contribute meaningful to the congestion" question. 
>>> But
>>> the endpoints have still no clue about the bottleneck AQM's marking
>>> strategy and hence can not make robust inferences.
>>>> In my experiments so far I turned out to be correct quite often, but 
>>>> not sufficiently often
>>> 	[SM] Which is the hallmark of a correlation, not a causation...
>>>> to be able to say I had found /the/ solution.
>>> 	[SM] As above indicates I clearly must be misunderstanding how you
>>> quantify the delay an individual packet introduces, could you
>>> elaborate, please?
>>>> I've got a much better grasp of the problem now compared to the 
>>>> position that had been reached in the late 1990s (when this problem 
>>>> was last attacked in the research community). However, the latest 
>>>> results of my simulations have surprised me in one region of the 
>>>> parameter space. So my experiments are still telling me that I need 
>>>> to go round the design-evaluate loop at least one more time before I 
>>>> have something worthy of write-up.
>>> 	[SM] I would love to read a decent peer-reviewed paper on that.
>>>>> Now, I wonder how that is going to work out:
>>>>> rfc3168 flows:
>>>>> the receiver knows the number of CE marked bytes and could 
>>>>> accumulate those, but the sender does not, as ECE will be asserted 
>>>>> until CWR is received independent of whether additional packets 
>>>>> were marked.
>>>>> L4S:
>>>>> these flows are expected to accumulate more CE marks than classic 
>>>>> traffic and hence more marked bytes. (also even with accurate ECN 
>>>>> does the sender unambiguously know which packets where CE marked so 
>>>>> it can veridically track their size?)
>>>>> So to make sense out of this "measure" the application needs to 
>>>>> collect information from both ends and aggregate these before 
>>>>> reporting, something that iperf2 already does, and it will need to 
>>>>> report ECT(0) and ECT(1) flows/packets separately (assuming that 
>>>>> ECT(1) means L4S signaling/AQM).
>>>> [BB] As I already said, this metric will not mean much if reported 
>>>> in isolation, without defining the test conditions. That's for peer 
>>>> reviewers to highlight when results of tests using iperf are 
>>>> reported. It's doesn't mean it would be inappropriate for iperf to 
>>>> be able to report this metric in the first place.
>>> 	[SM] Indeed the best way to figure out whether a measure is useful
>>> (or how useful it is) seems to actually take it and correlate it with
>>> other measures of interest. Personally, I would also like to see
>>> simpler measures, like total number of marks/drops of a measurement
>>> flow or even average/median time between marks/drops, as I said, 
>>> also,
>>> not instead.
>>>>>> Even without ECN marking, the congestion cost of a flow could be 
>>>>>> reported, as the rate of bits lost [in units of lost-b/s] (except 
>>>>>> of course one cannot distinguish congestion losses from other 
>>>>>> causes of loss).
>>>>> 	But see above, even with ECN in practice this measure does not 
>>>>> seem to be precise, no?
>>>> [BB] See previous response.
>>>> Nonetheless, it's got more potential as a harm metric for 
>>>> characterizing dynamic scenarios than any other metric (e.g. various 
>>>> fairness indices, which only measure rate harm, not latency harm, 
>>>> and only in steady state scenarios).
>>> 	[SM] But in non-steady state situations with e.g. the L4S 
>>> recommended
>>> sojourn time estimator we will have considerable more mis-targeted
>>> markings, no? I am not satying the problem is easy to solve, but the
>>> proposed measure is clearly not ideal either; I am not ruling out 
>>> that
>>> it might be an useful compromise. The advantage of looking at the 
>>> rate
>>> is that it is a measure which is invariant to the CC-algorithm and
>>> will easily allow to compare e.g. L- and C-queue flows in L4S.
>>>>>> A specific use would be to test how well a flow could keep within 
>>>>>> a certain level of congestion cost. For instance, that would test 
>>>>>> how well a flow would do if  passed through DOCSIS queue 
>>>>>> protection (without needing to have a DOCSIS box to pass it 
>>>>>> through). DOCSIS QProt gives a flow a certain constant allowance 
>>>>>> for the rate it contributes to congestion, which is precisely the 
>>>>>> cost metric you highlight.
>>>>> 	[SM] On that note, how is that going to work on variable rate 
>>>>> links?
>>>> [BB] That's the whole point - the steady-state congestion-rate of 
>>>> each of a set of scalable flows sharing a link should be invariant 
>>>> whatever the link rate (and whatever the number of flows).
>>> 	[SM] A variable rate link will "actively conspire" against reaching
>>> steady-state...
>>>> Of course, the challenge is how rapidly a flow's controller responds 
>>>> when the rate is varying (because nothing ever actually reaches 
>>>> steady-state). This metric should give a handle on how well or badly 
>>>> a flow manages to track varying capacity.
>>> 	[SM] Assuming the marks are actually assigned to the "right"
>>> packets... (which for flows running long enough should e true, albeit
>>> the mis-aaprpriation will result in a "noise-floor" for the measure).
>>>> (Of course there are caveats, e.g. whether the congestion-rate 
>>>> actually is invariant in the steady state, like the theory says it 
>>>> should be.)
>>>>>> Other methods for policing latency might do similarly. A few years 
>>>>>> ago now I was given permission to reveal that a Procera traffic 
>>>>>> policer used the same congestion cost metric to more strongly 
>>>>>> limit traffic for those users contributing a higher congestion 
>>>>>> cost. Unlike rate policers, these policers can inherently take 
>>>>>> account of behaviour over time.
>>>>> 	[SM] Curious, does the "over time" part not require to keep per 
>>>>> flow state for longer?
>>>> [BB] A common design for per-flow congestion policers is to only 
>>>> hold state for badly behaving flows. For example:
>>>> * A common technique in flow-rate policers is to hold flow-state 
>>>> only on flows selected probabilistically by randomly picking a small 
>>>> proportion of those packets that are ECN-marked or dropped. Then the 
>>>> more a flow is marked, the more likely flow-state will be held on 
>>>> it.
>>> 	[SM] Such sub-sampling might work for monitoring purposes, but to
>>> drive an actual controller on this seems more approximate than
>>> desirable, especially since we aim for a low-latency control loop,
>>> sub-sampling and averaging suffer from requiring multiple rounds
>>> before converging on something actionable, no?
>>>> * In the DOCSIS q-protection algo, the state decays out between the 
>>>> packets of a flow (freeing up memory for other flows, which also 
>>>> usually decays out before the next packet of that flow). See 
>>>> https://datatracker.ietf.org/doc/html/draft-briscoe-docsis-q-protection-06#section-2.1
>>> 	[SM] It would be great if there were published evaluations of the
>>> docsis methods. I assume data was acquired and analyzed to come to 
>>> the
>>> final design, now it would be nice if that data and analysis could be
>>> made available...
>>>> In the Procera case, it held per-user state, not per-flow (a 
>>>> congestion cost metric aggregates well over a set of flows as well 
>>>> as over time).
>>> 	[SM] Makes sense, assuming there are fewer "users" than flows, and
>>> "user" is a relevant grouping. Is it correct then that they kept per
>>> user drop/mark probability and hence used different probabilities per
>>> user? ("User" is a tricky concept, but in cake we use IP address as
>>> decent proxy and optionally do a first round of arbitration between 
>>> IP
>>> addresses (typically "internal" IP addresses to the home network) and
>>> then do per-flow queueing within each IP's traffic, this takes the
>>> sting out of using flow-explosion to gain a throughput advantage in
>>> that this behaviour only affects traffic to/from the same IP while
>>> other hosts do not even notice; this does obviously not help against
>>> DOS attacks but it mitigates the simplistic, 'let's use a shipload of
>>> flows to monopolize the link's capacity' strategy).
>>> Regards
>>> 	Sebastian
>>>>> Also is there any public data showing how this affected RTT-bias?
>>>> [BB] In a word, no.
>>>> Cheers
>>>> Bob
>>>>>> Since then Procera merged with Sandvine, so I don't know whether 
>>>>>> that technology is still available.
>>>>>> Thanks
>>>>>> Another Bob
>>>>>> On 16/06/2023 21:20, rjmcmahon wrote:
>>>>>>> Hi All,
>>>>>>> I read the below recently and am wondering if the cost fairness 
>>>>>>> metric is useful? I'm adding ECN/L4S support from a test 
>>>>>>> perspective into iperf 2 and thought this new metric might be 
>>>>>>> generally useful - not sure. Feedback is appreciated.
>>>>>>> https://www.bobbriscoe.net/projects/refb/draft-briscoe-tsvarea-fair-02.html
>>>>>>> "The metric required to arbitrate cost fairness is simply volume 
>>>>>>> of congestion, that is congestion times the bit rate of each user 
>>>>>>> causing it, taken over time. In engineering terms, for each user 
>>>>>>> it can be measured very easily as the amount of data the user 
>>>>>>> sent that was dropped. Or with explicit congestion notification 
>>>>>>> (ECN [RFC3168]) the amount of each user's data to have been 
>>>>>>> congestion marked. Importantly, unlike flow rates, this metric 
>>>>>>> integrates easily and correctly across different flows on 
>>>>>>> different paths and across time, so it can be easily incorporated 
>>>>>>> into future service level agreements of ISPs."
>>>>>>> Thanks,
>>>>>>> Bob
>>>>>> --
>>>>>> ________________________________________________________________
>>>>>> Bob Briscoe                               http://bobbriscoe.net/
>>>>>> --
>>>>>> L4s-discuss mailing list
>>>>>> L4s-discuss@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/l4s-discuss
>>>> --
>>>> ________________________________________________________________
>>>> Bob Briscoe                               http://bobbriscoe.net/
>>>> --
>>>> L4s-discuss mailing list
>>>> L4s-discuss@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/l4s-discuss
>> 
>> --
>> L4s-discuss mailing list
>> L4s-discuss@ietf.org
>> https://www.ietf.org/mailman/listinfo/l4s-discuss