Re: [L4s-discuss] Thoughts on cost fairness metric
rjmcmahon <rjmcmahon@rjmcmahon.com> Wed, 21 June 2023 15:43 UTC
Return-Path: <rjmcmahon@rjmcmahon.com>
X-Original-To: l4s-discuss@ietfa.amsl.com
Delivered-To: l4s-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B1532C14CE51 for <l4s-discuss@ietfa.amsl.com>; Wed, 21 Jun 2023 08:43:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.096
X-Spam-Level:
X-Spam-Status: No, score=-6.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HK_RANDOM_ENVFROM=0.001, HK_RANDOM_FROM=0.999, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=rjmcmahon.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k1uzv-bgX4Tz for <l4s-discuss@ietfa.amsl.com>; Wed, 21 Jun 2023 08:43:06 -0700 (PDT)
Received: from bobcat.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9547CC151530 for <l4s-discuss@ietf.org>; Wed, 21 Jun 2023 08:43:06 -0700 (PDT)
Received: from mail.rjmcmahon.com (bobcat.rjmcmahon.com [45.33.58.123]) by bobcat.rjmcmahon.com (Postfix) with ESMTPA id 29D1F1B25F; Wed, 21 Jun 2023 08:43:06 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.11.0 bobcat.rjmcmahon.com 29D1F1B25F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rjmcmahon.com; s=bobcat; t=1687362186; bh=H9ryq7RxVYgVlgp6lSwPy/A54BMlp1nsEv+eO02ejbk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=gUUQUKQLelwwnElsbV4dHMYuAxMsD4rkPU+QavPFhQV5BLafP78ij+0Gzh4BeItek 7KiyDIW4YmJxtSkGMTtUSA3AzJsvFraJEQj4ItL1ehC3AX856HGwv+eGL0n9I3Rjx+ 4YhIFpqBstD8RTDc71eLc74QVtonRW3DfWm9d2G0=
MIME-Version: 1.0
Date: Wed, 21 Jun 2023 08:43:06 -0700
From: rjmcmahon <rjmcmahon@rjmcmahon.com>
To: Sebastian Moeller <moeller0@gmx.de>
Cc: Bob Briscoe <in@bobbriscoe.net>, l4s-discuss@ietf.org
In-Reply-To: <FDAB9F08-60EB-4BD0-9ACB-D15308D9BCAF@gmx.de>
References: <a34b4e4474ea744e01d5ce15131fc465@rjmcmahon.com> <93729ad5-6919-8a86-5994-fdfe6344a596@bobbriscoe.net> <0ED86DEC-CA18-4C6A-AA1B-4D21FA261196@gmx.de> <d0c7f792-eba5-63ab-d825-8b2158bf33ea@bobbriscoe.net> <9065174E-217F-430C-A271-9CF2AFECFACC@gmx.de> <7726d27e241c4c0def3cbbe8d5ad899f@rjmcmahon.com> <FDAB9F08-60EB-4BD0-9ACB-D15308D9BCAF@gmx.de>
Message-ID: <64c8d14c9754d4c3ba0dffde643f8ed6@rjmcmahon.com>
X-Sender: rjmcmahon@rjmcmahon.com
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/l4s-discuss/f_0nytstwxvK6Pjie_3qhbiNzX4>
Subject: Re: [L4s-discuss] Thoughts on cost fairness metric
X-BeenThere: l4s-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Low Latency, Low Loss, Scalable Throughput \(L4S\) " <l4s-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/l4s-discuss/>
List-Post: <mailto:l4s-discuss@ietf.org>
List-Help: <mailto:l4s-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Jun 2023 15:43:10 -0000
Thanks Sebastian, Yes, iperf 2 supports the tcp_info struct which includes retries. It requires -e to get this output. " There are more options that might be of interest including little's law outputs (the inP metric on the server) and write to read latencies. I didn't enable histograms below but those are supported too. [rjmcmahon@ryzen3950 iperf2-code]$ iperf -c 192.168.1.70 -i 1 -e --trip-times --tcp-write-times --tcp-write-prefetch 16K ------------------------------------------------------------ Client connecting to 192.168.1.70, TCP port 5001 with pid 388510 (1 flows) Write buffer size: 131072 Byte (writetimer-enabled) TCP congestion control using cubic TOS set to 0x0 (Nagle on) TCP window size: 85.0 KByte (default) Event based writes (pending queue watermark at 16384 bytes) ------------------------------------------------------------ [ 1] local 192.168.1.95%enp4s0 port 34814 connected with 192.168.1.70 port 5001 (prefetch=16384) (trip-times) (sock=3) (icwnd/mss/irtt=14/1448/333) (ct=0.40 ms) on 2023-06-21 08:39:03.062 (PDT) [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr write-times avg/min/max/stdev (cnt) [ 1] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 910/0 66 1539K/13197 us 9038 1.098/0.020/9.206/0.445 ms (910) [ 1] 1.00-2.00 sec 112 MBytes 943 Mbits/sec 899/0 0 1681K/14382 us 8193 1.111/0.430/1.612/0.303 ms (899) [ 1] 2.00-3.00 sec 112 MBytes 943 Mbits/sec 899/0 0 1794K/15428 us 7638 1.111/0.402/1.660/0.305 ms (899) [ 1] 3.00-4.00 sec 112 MBytes 942 Mbits/sec 898/0 0 1882K/16192 us 7269 1.111/0.398/1.636/0.307 ms (898) [ 1] 4.00-5.00 sec 112 MBytes 937 Mbits/sec 894/0 1 1365K/11584 us 10116 1.117/0.069/8.519/0.400 ms (894) [ 1] 5.00-6.00 sec 112 MBytes 942 Mbits/sec 898/0 0 1459K/12522 us 9400 1.111/0.450/1.583/0.298 ms (898) [ 1] 6.00-7.00 sec 112 MBytes 943 Mbits/sec 899/0 0 1531K/13111 us 8987 1.112/0.440/1.621/0.304 ms (899) [ 1] 7.00-8.00 sec 112 MBytes 942 Mbits/sec 898/0 0 1582K/13682 us 8603 1.112/0.509/1.509/0.295 ms (898) [ 1] 8.00-9.00 sec 112 MBytes 942 Mbits/sec 898/0 0 1617K/13936 us 8446 1.112/0.479/1.542/0.304 ms (898) [ 1] 9.00-10.00 sec 112 MBytes 942 Mbits/sec 898/0 0 1644K/14088 us 8355 1.112/0.406/1.611/0.315 ms (898) [ 1] 0.00-10.03 sec 1.10 GBytes 940 Mbits/sec 8992/0 67 1645K/14129 us 8315 1.111/0.020/9.206/0.331 ms (8992) [root@rjm-nas examples]# iperf -s -i 1 -e ------------------------------------------------------------ Server listening on TCP port 5001 with pid 4533 Read buffer size: 128 KByte (Dist bin width=16.0 KByte) TCP congestion control default cubic TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.70%eno1 port 5001 connected with 192.168.1.95 port 34814 (trip-times) (sock=4) (peer 2.1.10-master) (icwnd/mss/irtt=14/1448/293) on 2023-06-21 08:39:03.063 (PDT) [ ID] Interval Transfer Bandwidth Burst Latency avg/min/max/stdev (cnt/size) inP NetPwr Reads=Dist [ 1] 0.00-1.00 sec 112 MBytes 941 Mbits/sec 12.577/1.463/33.996/4.237 ms (897/131111) 1.43 MByte 9351 4807=983:3347:454:1:2:1:5:14 [ 1] 1.00-2.00 sec 112 MBytes 941 Mbits/sec 14.843/13.833/15.827/0.402 ms (898/131047) 1.67 MByte 7929 4896=988:3740:168:0:0:0:0:0 [ 1] 2.00-3.00 sec 112 MBytes 942 Mbits/sec 15.954/15.127/16.863/0.336 ms (898/131074) 1.79 MByte 7378 4896=994:3477:425:0:0:0:0:0 [ 1] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 16.817/16.074/17.470/0.274 ms (897/131186) 1.89 MByte 6998 4900=1009:3739:152:0:0:0:0:0 [ 1] 4.00-5.00 sec 112 MBytes 941 Mbits/sec 17.326/7.370/33.850/1.775 ms (898/131040) 1.94 MByte 6792 4825=981:3758:70:0:0:0:1:15 [ 1] 5.00-6.00 sec 112 MBytes 941 Mbits/sec 13.130/12.374/13.943/0.294 ms (898/131045) 1.47 MByte 8963 4896=1001:3571:324:0:0:0:0:0 [ 1] 6.00-7.00 sec 112 MBytes 942 Mbits/sec 13.845/13.114/14.572/0.249 ms (898/131063) 1.55 MByte 8501 4898=997:3578:323:0:0:0:0:0 [ 1] 7.00-8.00 sec 112 MBytes 941 Mbits/sec 14.371/13.918/14.723/0.210 ms (898/131029) 1.61 MByte 8188 4898=1004:3893:1:0:0:0:0:0 [ 1] 8.00-9.00 sec 112 MBytes 941 Mbits/sec 14.748/14.346/15.244/0.177 ms (897/131185) 1.66 MByte 7979 4894=994:3900:0:0:0:0:0:0 [ 1] 9.00-10.00 sec 112 MBytes 941 Mbits/sec 15.001/14.642/15.594/0.184 ms (898/131047) 1.68 MByte 7845 4896=984:3844:68:0:0:0:0:0 [ 1] 0.00-10.02 sec 1.10 GBytes 941 Mbits/sec 14.862/1.463/33.996/2.055 ms (8992/131072) 1.58 MByte 7918 48884=9952:36903:1990:1:2:1:6:29 Bob > Hi Bob, > > >> On Jun 20, 2023, at 23:07, rjmcmahon >> <rjmcmahon=40rjmcmahon.com@dmarc.ietf.org> wrote: >> >> Hi Sebastian, list, >> >> I'm not sure what "wasting a mark" means. Does it matter if the short >> flows stop on their own accord vs slow down per a mark? Marks that >> don't affect transport don't seem wasted per se, just not an affecting >> signal. Not sure why that's a problem. > > [SM] What I tried to express, inartfully, was that for low queueing > delay it seems obvious that marking a responsive flow is superior to > marking an under-responsive flow. A mark on a low-rate flow that was > about to stop anyway will have very little effect on the future queue > growth a mark on a high-rate flow will have a considerably higher > effect. Now the bottleneck node really does not know for sure how a > flow is going to behave in the (immediate) future, but "past is > prolog" seems a reasonable heuristic here, so picking flows that > noticeably contribute to the queue now, and tell them to slow down > seems like a decent approach (even though that high-rate flow might > already have stopped sending and is closing down). RED and similar > "stochastic" AQMs counter that problem, by increasing the mark/drop > rate based on queue length, again a decent strategy in the > intermediate term but it will take a few "marking" hits before a mark > ends up on a responsive flow with high enough congestion window to > actually make a difference (or to mark a sufficiently high number of > low-rate flows). For tight temporal control of the queue (which seems > to be one reason for L4S' existence) I argue that marking > under-responsive flows does not help. Again, given enough time a > stochastic marker will end up marking flows roughly in proportion to > their amount of packets in the queue so will do the right thing*, it > is just that one can do better by carefully selecting the packet to > mark, if one is willing to maintain a suitable data structure... > > > *) For L4S' L-queue it also helps that the AQM is supposed to use a > pretty high marking rate by default so the temporal cost of marking > sub-optimal flows is likely not all that high. But I am not sure > whether that logic also applies to the C-queue... > >> On "average/median time between marks/drops" >> >> Should this be a histogram too? In general, I try to support >> histograms as well as central limit theorem (CLT) averaging in stats >> collections. > > [SM] I absolutely like histograms as I often find them so much more > informative than single numbers (e.g. for bi/multi-modal > distributions), so maybe the inter-mark/drop-interval histogram? Not > sure about the scaling for the x-axis though... > >> I'll probably start with marks - not sure how to do drops with ebpfs >> in a simple way for iperf. > > [SM] I think, as Bob Biscoe mentioned, drops are consiberably more > ambiguous anyway so treating them separately (and not at all for > starters) seems quite reasonable. I think there are also some numbers > floating around how much non congestive packet-loss is typical for > leaf networks. Especially BBR IIRC makes some assumptions in that > direction and essentially will tolerate a (low) level of > packet-loss/drops without engaging a rate reduction, no? > > That said, each dropped packet in TCP sould trigger a retransmit, does > iperf not already report the number of retransmits? (I might confuse > iperf2 and 3 here, I really need to set up my own iperf2 server so I > can do some testing). > > Regards > Sebastian > > >> >> Bob >>> Hi Bob, list, >>>> On Jun 19, 2023, at 01:48, Bob Briscoe >>>> <in=40bobbriscoe.net@dmarc.ietf.org> wrote: >>>> Sebastian, >>>> Before responding to each point, yes, I agree that the marking >>>> algorithms have to mark 'fairly' and we have to agree how to do that >>>> and improve algorithms over time. But lack of perfection in the >>>> network doesn't stop the congestion-rate being a good metric for >>>> iperf to maintain while network marking algorithms are being >>>> improved. >>> The question is IMHO not so much "perfection" or not, as I subscribe >>> to the "good enough" school of problem solving, my question is which >>> out of the simple to compute harm measures is the most suitable. I am >>> not yet convinced that marking/dropping rate is "it", but it might >>> well be good enough... >>>> And if anyone quotes iperf harm metrics without knowing or defining >>>> the marking algorithm in use, then others would be right to question >>>> the validity of their results. >>> Honestly, I would prefer a harm metric that is independent of >>> details >>> as much as possible, as is this metric seems not well suited to e.g. >>> compare L4S' two queues with each other... (the L-queue generally >>> using a higher marking rate)... >>>> Now pls see [BB] inline >>>> On 18/06/2023 11:33, Sebastian Moeller wrote: >>>>> Hi List, >>>>>> On Jun 16, 2023, at 23:15, Bob Briscoe >>>>>> <in=40bobbriscoe.net@dmarc.ietf.org> wrote: >>>>>> Bob >>>>>> That would actually be v useful as a harm metric. For the list, >>>>>> it's essentially the bit rate, but measuring only those packets >>>>>> that are ECN-marked [in units of marked-b/s]. >>>>> This primarily seems to show how much harm the flow suffered from >>>>> the AQM, not how much it actually caused. Granted there is some >>>>> correlation between the two, but no hard causality... Now, if the >>>>> AQM sits behind a min/max flow queuing scheduler the correlation >>>>> will be harder and if it sits behind a single queue "scheduler" it >>>>> will be weaker (especially if the AQM on that single queue is not >>>>> looking at a packets actual sojourn time, but at the estimated >>>>> sojourn time of a virtual packet enqueued at dequeue time, so how >>>>> much queue a newly added packet will have to wait behind). As far >>>>> as I understand that is what L4S recommends for dual queue AQMs, >>>>> and in that case the "harm" can be caused by a later burst (e.g. >>>>> from a badly-paced flow) but assigned to a perfectly well behaving >>>>> low rate flow. >>>>> That is IMHO sub-optimal : >>>>> a) once congestion hits the best an AQM can do is mark a packet of >>>>> the most contributing flow not the one that has happen to have a >>>>> packet at the head of the queue >>>>> b) this clearly assigns blame to the wrong flow >>>> [BB] What you say at '(a)' sounds as if you are considering all the >>>> marking being done at just one point in time. An algorithm that >>>> statistically decides whether to mark each packet as it becomes the >>>> head packet should be able to produce an outcome where packets from >>>> the flow contributing most to congestion are marked more. So I don't >>>> see any basis for what you've said at '(a)'. >>> [SM] That is part of my point, by stochastically marking packets you >>> will sooner or later end up marking more packets of larger flows than >>> smaller flows (as in number of packets in the queue over time), but a >>> node that experiences congestion now would ideally not randomly >>> mark/drop packets but selectively drop packets that have the highest >>> likelihood of resulting in a noticeable decrease in the offered load >>> and do so as soon as possible. It is the "sooner or later" part in >>> the >>> randomly dropping that I consider open to improvements. Now, one >>> obviously needs a suitable data structure, if restricted to a single >>> queue there is not much one can do... (one could still keep per flow >>> stats for each queue entry and consult that when dealing with the >>> head >>> of queue packet) >>> Case in point, if an AQM marks say a DNS-over-TCP response in all >>> likelihood it might as well not marked that packet at all, the flow >>> likely is too short to ever throttle down enough for the load at the >>> bottleneck to subside... same for a flow that is already at the >>> minimal congestion window. From a responsiveness perspective these >>> marks are wasted... (real drops at least reduce the bottleneck queue >>> but marking "unresponsive" flows (there are valid reasons for a flow >>> not being able to respond to marking, so that is not necessarily >>> nefarious) is not going to help). >>>> There are many things we (the community) still don't understand >>>> about apportionment of blame for congestion. But one thing we can be >>>> certain of is that the best solution will not involve heaping all >>>> the blame onto a single flow. >>> [SM] I respectfully disagree, there are situations when a single >>> flow >>> monopolizes the lion's share of a bottlenecks capacity (e.g. single >>> flow TCP dowwnload on a home link) so this flow will also cause the >>> lion's share of the queue build-up, so heaping most of the >>> responsibility on that flow seems imminently reasonable. We seem to >>> agree that talking in absolutes (all the blame) makes little sense... >>>> Where a number of flows behave the same or similarly, that would >>>> produce a very unstable outcome. >>> [SM] In my experience, having run FQ schedulers in one form or >>> another (mostly fq_codel and cake) on my internet access link for >>> over >>> a decade now, the link behaves pretty well, with low >>> latency-increase-under-load/ high responsineness (for those flows >>> that >>> behave well, less well behaved flows will suffer the consequence by >>> higher intra-flow queuing). However, that situation, many flows at >>> very similar "fatness" is in my experience rather a rare beast, >>> normal >>> traffic tends to have loads of short transient flows that never reach >>> steady state and that are not in equilibrium with each other. In the >>> rare case randomly selecting a flow will be a tad cheaper than >>> searching for and selecting the flow with the largest immediate >>> queueing contribution, but I am not sure whether it makes all that >>> much sense optimizing for the rare case, while the common case is >>> more >>> a bi modal distribution with low and high contributing flows. >>>> Despite just having said 'one thing we can be certain of', I would >>>> otherwise suggest that anyone who thinks they are certain about what >>>> the solution to the blame apportionment problem is should be very >>>> careful until they have done exhaustive analysis and >>>> experimentation. That includes the above opinions about FQ vs single >>>> queue. >>> [SM] I would call this more a hypothesis, and less an opinion, as I >>> explained why the AQM sojourn estimation is designed to increase the >>> "speed" of the marking action, not its targeting of relevant flows... >>>> For instance, I myself thought I had hit upon the solution that >>>> marking ought to be based on the delay that each packet causes other >>>> packets to experience, rather than the delay each packet experiences >>>> itself (which is what sojourn delay measures). >>> [SM] That in essence would mean scaling by packet size if all one >>> looks at is individual packets... not sure that would help. If the >>> marking entity does not take packet size into account when marking >>> then whether 64 octets were marked or ~1500 does not carry precise >>> and relevant information, neither about the "magnitude" of the >>> congestion, nor about the magnitude of the contribution of that flow >>> to the congestion. To be able to extract that informatio from a mark >>> the marking entity would need to take that into account. >>> BTW, in a sense that is what a min/max fq-scheduler does offer, the >>> flows with most packets in the queue (or rather bytes) will have a >>> larger effect on the total queuing delay than flows with fewer >>> packets >>> and marking packets from that fat flow will at least solve the "does >>> the marked flow contribute meaningful to the congestion" question. >>> But >>> the endpoints have still no clue about the bottleneck AQM's marking >>> strategy and hence can not make robust inferences. >>>> In my experiments so far I turned out to be correct quite often, but >>>> not sufficiently often >>> [SM] Which is the hallmark of a correlation, not a causation... >>>> to be able to say I had found /the/ solution. >>> [SM] As above indicates I clearly must be misunderstanding how you >>> quantify the delay an individual packet introduces, could you >>> elaborate, please? >>>> I've got a much better grasp of the problem now compared to the >>>> position that had been reached in the late 1990s (when this problem >>>> was last attacked in the research community). However, the latest >>>> results of my simulations have surprised me in one region of the >>>> parameter space. So my experiments are still telling me that I need >>>> to go round the design-evaluate loop at least one more time before I >>>> have something worthy of write-up. >>> [SM] I would love to read a decent peer-reviewed paper on that. >>>>> Now, I wonder how that is going to work out: >>>>> rfc3168 flows: >>>>> the receiver knows the number of CE marked bytes and could >>>>> accumulate those, but the sender does not, as ECE will be asserted >>>>> until CWR is received independent of whether additional packets >>>>> were marked. >>>>> L4S: >>>>> these flows are expected to accumulate more CE marks than classic >>>>> traffic and hence more marked bytes. (also even with accurate ECN >>>>> does the sender unambiguously know which packets where CE marked so >>>>> it can veridically track their size?) >>>>> So to make sense out of this "measure" the application needs to >>>>> collect information from both ends and aggregate these before >>>>> reporting, something that iperf2 already does, and it will need to >>>>> report ECT(0) and ECT(1) flows/packets separately (assuming that >>>>> ECT(1) means L4S signaling/AQM). >>>> [BB] As I already said, this metric will not mean much if reported >>>> in isolation, without defining the test conditions. That's for peer >>>> reviewers to highlight when results of tests using iperf are >>>> reported. It's doesn't mean it would be inappropriate for iperf to >>>> be able to report this metric in the first place. >>> [SM] Indeed the best way to figure out whether a measure is useful >>> (or how useful it is) seems to actually take it and correlate it with >>> other measures of interest. Personally, I would also like to see >>> simpler measures, like total number of marks/drops of a measurement >>> flow or even average/median time between marks/drops, as I said, >>> also, >>> not instead. >>>>>> Even without ECN marking, the congestion cost of a flow could be >>>>>> reported, as the rate of bits lost [in units of lost-b/s] (except >>>>>> of course one cannot distinguish congestion losses from other >>>>>> causes of loss). >>>>> But see above, even with ECN in practice this measure does not >>>>> seem to be precise, no? >>>> [BB] See previous response. >>>> Nonetheless, it's got more potential as a harm metric for >>>> characterizing dynamic scenarios than any other metric (e.g. various >>>> fairness indices, which only measure rate harm, not latency harm, >>>> and only in steady state scenarios). >>> [SM] But in non-steady state situations with e.g. the L4S >>> recommended >>> sojourn time estimator we will have considerable more mis-targeted >>> markings, no? I am not satying the problem is easy to solve, but the >>> proposed measure is clearly not ideal either; I am not ruling out >>> that >>> it might be an useful compromise. The advantage of looking at the >>> rate >>> is that it is a measure which is invariant to the CC-algorithm and >>> will easily allow to compare e.g. L- and C-queue flows in L4S. >>>>>> A specific use would be to test how well a flow could keep within >>>>>> a certain level of congestion cost. For instance, that would test >>>>>> how well a flow would do if passed through DOCSIS queue >>>>>> protection (without needing to have a DOCSIS box to pass it >>>>>> through). DOCSIS QProt gives a flow a certain constant allowance >>>>>> for the rate it contributes to congestion, which is precisely the >>>>>> cost metric you highlight. >>>>> [SM] On that note, how is that going to work on variable rate >>>>> links? >>>> [BB] That's the whole point - the steady-state congestion-rate of >>>> each of a set of scalable flows sharing a link should be invariant >>>> whatever the link rate (and whatever the number of flows). >>> [SM] A variable rate link will "actively conspire" against reaching >>> steady-state... >>>> Of course, the challenge is how rapidly a flow's controller responds >>>> when the rate is varying (because nothing ever actually reaches >>>> steady-state). This metric should give a handle on how well or badly >>>> a flow manages to track varying capacity. >>> [SM] Assuming the marks are actually assigned to the "right" >>> packets... (which for flows running long enough should e true, albeit >>> the mis-aaprpriation will result in a "noise-floor" for the measure). >>>> (Of course there are caveats, e.g. whether the congestion-rate >>>> actually is invariant in the steady state, like the theory says it >>>> should be.) >>>>>> Other methods for policing latency might do similarly. A few years >>>>>> ago now I was given permission to reveal that a Procera traffic >>>>>> policer used the same congestion cost metric to more strongly >>>>>> limit traffic for those users contributing a higher congestion >>>>>> cost. Unlike rate policers, these policers can inherently take >>>>>> account of behaviour over time. >>>>> [SM] Curious, does the "over time" part not require to keep per >>>>> flow state for longer? >>>> [BB] A common design for per-flow congestion policers is to only >>>> hold state for badly behaving flows. For example: >>>> * A common technique in flow-rate policers is to hold flow-state >>>> only on flows selected probabilistically by randomly picking a small >>>> proportion of those packets that are ECN-marked or dropped. Then the >>>> more a flow is marked, the more likely flow-state will be held on >>>> it. >>> [SM] Such sub-sampling might work for monitoring purposes, but to >>> drive an actual controller on this seems more approximate than >>> desirable, especially since we aim for a low-latency control loop, >>> sub-sampling and averaging suffer from requiring multiple rounds >>> before converging on something actionable, no? >>>> * In the DOCSIS q-protection algo, the state decays out between the >>>> packets of a flow (freeing up memory for other flows, which also >>>> usually decays out before the next packet of that flow). See >>>> https://datatracker.ietf.org/doc/html/draft-briscoe-docsis-q-protection-06#section-2.1 >>> [SM] It would be great if there were published evaluations of the >>> docsis methods. I assume data was acquired and analyzed to come to >>> the >>> final design, now it would be nice if that data and analysis could be >>> made available... >>>> In the Procera case, it held per-user state, not per-flow (a >>>> congestion cost metric aggregates well over a set of flows as well >>>> as over time). >>> [SM] Makes sense, assuming there are fewer "users" than flows, and >>> "user" is a relevant grouping. Is it correct then that they kept per >>> user drop/mark probability and hence used different probabilities per >>> user? ("User" is a tricky concept, but in cake we use IP address as >>> decent proxy and optionally do a first round of arbitration between >>> IP >>> addresses (typically "internal" IP addresses to the home network) and >>> then do per-flow queueing within each IP's traffic, this takes the >>> sting out of using flow-explosion to gain a throughput advantage in >>> that this behaviour only affects traffic to/from the same IP while >>> other hosts do not even notice; this does obviously not help against >>> DOS attacks but it mitigates the simplistic, 'let's use a shipload of >>> flows to monopolize the link's capacity' strategy). >>> Regards >>> Sebastian >>>>> Also is there any public data showing how this affected RTT-bias? >>>> [BB] In a word, no. >>>> Cheers >>>> Bob >>>>>> Since then Procera merged with Sandvine, so I don't know whether >>>>>> that technology is still available. >>>>>> Thanks >>>>>> Another Bob >>>>>> On 16/06/2023 21:20, rjmcmahon wrote: >>>>>>> Hi All, >>>>>>> I read the below recently and am wondering if the cost fairness >>>>>>> metric is useful? I'm adding ECN/L4S support from a test >>>>>>> perspective into iperf 2 and thought this new metric might be >>>>>>> generally useful - not sure. Feedback is appreciated. >>>>>>> https://www.bobbriscoe.net/projects/refb/draft-briscoe-tsvarea-fair-02.html >>>>>>> "The metric required to arbitrate cost fairness is simply volume >>>>>>> of congestion, that is congestion times the bit rate of each user >>>>>>> causing it, taken over time. In engineering terms, for each user >>>>>>> it can be measured very easily as the amount of data the user >>>>>>> sent that was dropped. Or with explicit congestion notification >>>>>>> (ECN [RFC3168]) the amount of each user's data to have been >>>>>>> congestion marked. Importantly, unlike flow rates, this metric >>>>>>> integrates easily and correctly across different flows on >>>>>>> different paths and across time, so it can be easily incorporated >>>>>>> into future service level agreements of ISPs." >>>>>>> Thanks, >>>>>>> Bob >>>>>> -- >>>>>> ________________________________________________________________ >>>>>> Bob Briscoe http://bobbriscoe.net/ >>>>>> -- >>>>>> L4s-discuss mailing list >>>>>> L4s-discuss@ietf.org >>>>>> https://www.ietf.org/mailman/listinfo/l4s-discuss >>>> -- >>>> ________________________________________________________________ >>>> Bob Briscoe http://bobbriscoe.net/ >>>> -- >>>> L4s-discuss mailing list >>>> L4s-discuss@ietf.org >>>> https://www.ietf.org/mailman/listinfo/l4s-discuss >> >> -- >> L4s-discuss mailing list >> L4s-discuss@ietf.org >> https://www.ietf.org/mailman/listinfo/l4s-discuss
- [L4s-discuss] Thoughts on cost fairness metric rjmcmahon
- Re: [L4s-discuss] Thoughts on cost fairness metric Bob Briscoe
- Re: [L4s-discuss] Thoughts on cost fairness metric Sebastian Moeller
- Re: [L4s-discuss] Thoughts on cost fairness metric Bob Briscoe
- Re: [L4s-discuss] Thoughts on cost fairness metric Sebastian Moeller
- Re: [L4s-discuss] Thoughts on cost fairness metric rjmcmahon
- Re: [L4s-discuss] Thoughts on cost fairness metric Sebastian Moeller
- Re: [L4s-discuss] Thoughts on cost fairness metric rjmcmahon