Re: [L4s-discuss] Configuring a L4S test plant

Sebastian Moeller <moeller0@gmx.de> Fri, 06 October 2023 15:57 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: l4s-discuss@ietfa.amsl.com
Delivered-To: l4s-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C5090C151091 for <l4s-discuss@ietfa.amsl.com>; Fri, 6 Oct 2023 08:57:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.553
X-Spam-Level:
X-Spam-Status: No, score=-2.553 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmx.de
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wOyK_zZt5yBG for <l4s-discuss@ietfa.amsl.com>; Fri, 6 Oct 2023 08:57:26 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DDAEEC15C262 for <l4s-discuss@ietf.org>; Fri, 6 Oct 2023 08:57:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417; t=1696607836; x=1697212636; i=moeller0@gmx.de; bh=pX1UYwt5K8V/4VpIVHCI/uthNdaN4/mGM46av8BIrws=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=LenIVGLS/h9I7ERh9VdNKH2cB2nCsn3KhMCD/kfKkbVhCOJBxi12MRfNVY/69y2RSqyYFkpmle7 krgc2d13YPwqpPI7t1Bxe/gt1uCSN9nZgbWX6vpIomFk80PRNHs3IGHDFWSmdQLQ2FL8AjQS57SSr oZ8vTrqjBq29a03n29IDBAsXJPC3kNFv0b7XjPiUn1L4+eDSj/edZTXmPfR//XjFWeZ2wLFJadICs tG5xkA17wglBEFzt0cj52FU+2oHeDSm31tsRC5jsbDM6yjRqSQTWTBqmdZ/MnOWFh505zB36/VQei r3afzsmU6HDBsBri4Z2OCXK+8wOpW6N0Qj0Q==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from smtpclient.apple ([134.76.241.253]) by mail.gmx.net (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MZTqW-1r3gcs1MD0-00WWpz; Fri, 06 Oct 2023 17:57:16 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.4\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <8f0a95fe65ab1397269afabfd365aaaa@studenti.polito.it>
Date: Fri, 06 Oct 2023 17:57:15 +0200
Cc: l4s-discuss@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <6F852039-08FB-419F-A396-C1F8EB1CD79D@gmx.de>
References: <7952e11516cc7b25484b53ae1380d88c@studenti.polito.it> <230D9924-C32F-4DE8-8BBD-F3D35D94B05B@gmx.de> <b82b81e36e168f6e627798d8cd588db8@studenti.polito.it> <A3BEF415-8574-4854-93D5-7CD1DB7B60F5@gmx.de> <CADVnQynOTd3FsHRk-BG5BTTmEYaM3JdnPj5qJQ9BHOqY_SPwsQ@mail.gmail.com> <727ed5bc3df58dff2e23115a8165b9b2@studenti.polito.it> <CADVnQyn=zSoDiCTK=wbXMt9zaSArYkTv_VTVtt=ve4R011GHxQ@mail.gmail.com> <8f0a95fe65ab1397269afabfd365aaaa@studenti.polito.it>
To: Matteo Guarna S303434 <matteo.guarna@studenti.polito.it>
X-Mailer: Apple Mail (2.3696.120.41.1.4)
X-Provags-ID: V03:K1:h8pwELJatCRyY1gpUEkQ08VKrpzLGLtgOH+/+PyebpQ5LcRnU+F pdb7e/LcHq2akEwngLvcH6ZlsbGyF+S0lhyyQ0LLZrV11zV1Fow9EBDECXW9vTGXHQwJEA3 rpL/T94Pfh0qCQ0Qvwli3BBic7yI+fih5dLASqjP27Otcl4xtlhWF4Ayez36wT7rVDsk17S pu1dimbCRBGKpauZdMJsA==
UI-OutboundReport: notjunk:1;M01:P0:xhIAacXarGs=;qraZh94RS2xzoqlnG4zif5lxjk9 442B+RlxzawqhgNo/I4Kgw70Bp2FGZHzoW3+j4Ry1HRAMBd+dpmi4czsB9El7b201g7N1RhdJ GgWFrNuHLhCSQXMVerp/JUthQmhFMlYC0juIDvgiEcwwSoHqECIIEUnaSghh/t26NRpVdMZp0 jvRTlB5+9RFz+SiYB3nVXO33iEOFXw00XpMQu+IZISo8nb/bStxE82zubTkLORVd/cQO3Xw9p 9no5nAU4hKGyUc+QPqZt4Anif2LJH/OAhnfkvWeLyV/sGzocn2lqw0ludo4VZ7g5AW2Z/+4Ne xZV9DTtOARigsQSccmoPX5wtzsyBlA8AWHtIXF7Wu/bYUjjwiwwEjmcZQGI5wQO7HHJaq6ERc Amomcrh0Yg47tJIpfonyeDU6/bPnHcOpqihf7yKmtZYHvv1hEVWEzf24X5kWV1PpXlEfxxouB SnCK5/bWQ3ukm8zCjHuMBVRKygcjNa2aYozs5dkeswA0PMSRHhPYasnRt+ndwiKzNs2JyRYI5 Tb3UEpJ9ncqJ05wToQD1y/+AOJJufJQq7wT1V8lPcROfTElnA/M3Vwd+wFbJDHKL/La/DYQLk Wo0CoxIELYLK7NUHgllTtJHdBTm/7U49iOM5aPrNRUaK7cnLqCGkjjogt0718tHImzdSdlsv9 6MlfGrtc9J66TTMah/zzRQsq7hQgS41bmKfza184ZYbeEDSulm8Vt54odRBQnXF9Titb7X5xF pWKx3A2va53wCA8G0RpgrbQPvaTejZ2zngQcjraNW1GA3IZjgQOAlRjcRjtE7fOYW2op7B4OE bG/zD25+QhJ/p/FvC8fqoRGSSmjfQoYbml+LPYt6Ud9BluoY+T8oYcIXv1wwu8kV78gDCEzx9 uJi4TZG/0+rYyI7QtKRB2ReO/W1dZC/J82AD+WVVytgLQVqtuSLgzY3mLeYlFyrSL9RN9ExtQ uV8R5xVfREYWk780/cDUogM3v28=
Archived-At: <https://mailarchive.ietf.org/arch/msg/l4s-discuss/wcLFdoonRd0rcYw-7nJQQ0fNXp0>
Subject: Re: [L4s-discuss] Configuring a L4S test plant
X-BeenThere: l4s-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Low Latency, Low Loss, Scalable Throughput \(L4S\) " <l4s-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/l4s-discuss/>
List-Post: <mailto:l4s-discuss@ietf.org>
List-Help: <mailto:l4s-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/l4s-discuss>, <mailto:l4s-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Oct 2023 15:57:30 -0000

Hi Matteo,


> On Oct 6, 2023, at 14:32, Matteo Guarna S303434 <matteo.guarna@studenti.polito.it> wrote:
> 
> Hi Neal. Thank you for providing me with your impressions so quickly,
> 
> On 2023-10-05 20:41 Neal Cardwell wrote:
>> Thanks for the detailed data!
>> You mention the L4S flow having a higher delay... what's the source
>> for that data?
>    [MG] I am using spindump to capture the flows passing through the router. Its code is available here: https://github.com/EricssonResearch/spindump
> I can try and produce a log of the captures, but unfortunately I have to wait until monday to access the test plant again. Still, I repeated my measurements many times over and I get a really consistent RTT measurement for Cubic each second (33.4 ms) while the Prague flows by the second vary mostly between 33.9 ms and 34.7 ms.
> 
>> From a quick glance at the pcaps and ss data, it seems like:
>> - From the ss data, CUBIC sees RTT delays between 35ms and 53ms;
>> Prague sees RTT delays between 31ms and 35ms.
> 
>    [MG] Your observations are much more in line with their supposed behaviour than mine. I can see that myself on the ss capture, now that you're pointing that out... Maybe Spindump is having problems with the measurements for some reason? I have to look it up I guess. Thank you!

	[SM] Hhmm, when comparing RTTs in the two traces, Prague and Cubic look for the longest time pretty close (Cubic has some "spikes" later in the trace), but that should not really be if the DualQ does its thing correctly... with DualQ as egress qdisc, how did you configure the actual interface (how deep were the interface buffers and was BQL active or not)?



> 
>> - Prague is getting about a 6% ECN mark rate, and given that it is
>> correctly converging to a rate of roughly 1/.06 - 1 ~= 15 Mbps. That
>> rate is far below its fair share of 50 Mbps. So if there is an issue
>> here, it might be in dualpi2 providing too many ECN marks to the L4S
>> flow and/or too few drops to the CUBIC flow.
>    [MG] It may well be, in fact I generate traffic with iperf3 and I can see how many retransmissions actually happen during trials of 60 seconds, where I run both flows at 100 Mbps through the bottleneck. There, while I have virtually 0 retransmissions with Prague, I can see very little retransmissions with Cubic, meaning around 20in the first second and then 1 or 2 every three seconds on average. I think this might be little too few, do you?

	[SM] This matches what you can see in the packet captures as well if you do a tcptrace plot, essentially zero duplicate ACKs (signs of drops) for Prague and some for Cubic, so this is consistent...


> 
> Thank you once again for your valuable insights
> 
> Matteo
> 
>> neal
>> On Thu, Oct 5, 2023 at 12:23 PM Matteo Guarna S303434
>> <matteo.guarna@studenti.polito.it> wrote:
>>> Hi Neal,
>>> thank you for reaching me. I executed the script on both the prague
>>> and
>>> the cubic server as you asked.
>>> The prague server has IP address 192.168.202.21, and transmits data
>>> towards 192.168.201.17
>>> The cubic server has IP address 192.168.202.22, and transmits data
>>> towards 192.168.201.18
>>> All connections lasted for 20 seconds and were established via
>>> iperf3 in
>>> reverse mode
>>> Please forgive me for having the date on the two machines out of
>>> sync
>>> (the flows had in fact started at the same time):
>>> - the transmission timestamp on the prague server begins at Thu Oct
>>> 5
>>> 2023, 05:22:50 PM CEST
>>> - the transmission timestamp on the cubic server begins at Fri Sep
>>> 29
>>> 2023, 01:37:53, CEST
>>> I am providing you with the captures as attachments to this mail: I
>>> named them with the "prague" and "cubic" suffixes after the servers
>>> where the capture took place.
>>> If you need more information please don't hesitate to contact me
>>> Best regards and thank you in advance,
>>> Matteo Guarna
>>> Il 2023-10-04 17:18 Neal Cardwell ha scritto:
>>>> Thanks for the report, Matteo.
>>>> To help debug this, could you please gather and share the
>>> following
>>>> instrumentation during one of your tests? This would need to be
>>>> collected on both data senders (servers), as root:
>>>> (while true; do date; ss -tenmoi; sleep 1; done) > /root/ss.txt &
>>>> tcpdump -w /root/dump.pcap -n -s 100 -c 1000000 host $REMOTE_HOST
>>> -i
>>>> $INTERFACE &
>>>> nstat -n; (while true; do date; nstat; sleep 1; done)  >
>>>> /root/nstat.txt &
>>>> The data should probably only be needed for the time interval
>>> starting
>>>> from before the test and ending when the flows reach steady state,
>>>> which may be 10-20 secs into the test.
>>>> thanks,
>>>> neal
>>>> On Wed, Oct 4, 2023 at 6:03 AM Sebastian Moeller
>>> <moeller0@gmx.de>
>>>> wrote:
>>>>> Hi Matteo,
>>>>>> On Oct 4, 2023, at 11:48, Matteo Guarna S303434
>>>>> <matteo.guarna@studenti.polito.it> wrote:
>>>>>> Hi Sebastian and thank you for your answer
>>>>>> Il 2023-10-03 16:39 Sebastian Moeller ha scritto:
>>>>>>> Hi Matteo.
>>>>>>>> On Oct 3, 2023, at 15:42, Matteo Guarna S303434
>>>>> <matteo.guarna@studenti.polito.it> wrote:
>>>>>>>> Greetings everyone,
>>>>>>>> I hope the question isn't too off-topic, please forgive me in
>>>>> advance if it is so.
>>>>>>>> I am still trying to perform some fairness measurements with
>>>>> both L4S and classic flow, although now on a physical test plant
>>>>> instead of a virtualized one. I'm relying on the L4STeam Github
>>>>> project for the deployment of the L4S architecture and I am
>>> looking
>>>>> for someone who's familiar with the project and might be willing
>>> to
>>>>> help me: in fact I seem not to be able to achieve the correct
>>>>> configuration.
>>>>>>>> My setup is very simple: I have four servers (two senders and
>>>>> two receivers) exchanging two traffic flows through one server
>>>>> acting as a router. One client-server pair uses Prague as CC,
>>> while
>>>>> the other uses Cubic. All servers have the patched kernel
>>> provided
>>>>> in the https://github.com/L4STeam/linux/ repository branch.
>>>>>>>> If I trigger a congestion on the router by generating both the
>>>>> Prague and the Cubic flows (let's say the flows measure 100
>>> Mbit/s
>>>>> each, and they come though a L2 switch both on the same router's
>>>>> input interface on a 1Gb Ethernet link; only a 100M link though
>>> is
>>>>> in place on the output interface towards the receivers) I see the
>>>>> L4S flow having higher delay, higher jitter and a smaller (and
>>> more
>>>>> variable) bandwidth share. The Prague share is 1/4 of the Cubic
>>>>> share. I am sending an attachment with a graphical representation
>>> of
>>>>> the scenario here described.
>>>>>>>> I configured my L4S endpoints as follows:
>>>>>>>> - I set the CC as tcp Prague (sysctl -w
>>>>> net.ipv4.tcp_congestion_control=prague)
>>>>>>>> - I set the AccEcn, even if it's not necessary apparently
>>>>> (sysctl -w net.ipv4.tcp_ecn=3)
>>>>>>>> - I disabled the required offloading capabilities on the
>>>>> endpoints (sudo ethtool -K $NETIF tso off gso off gro off lro
>>> off)
>>>>>>> [SM] I think you need to do the same on the router... or
>>>>> with your
>>>>>>> topology with running prague and cubic over separate end-points
>>>>>>> especially on the router itself. Side-node, sch_cake grew a
>>>>> split-gso
>>>>>>> mode to automatically handle this issue because it can be a bit
>>>>> of a
>>>>>>> whack-a-mole problem to make these configs stick (and in the
>>> case
>>>>> of
>>>>>>> cake the idea was to make deployment easy even for
>>> non-experts).
>>>>>> [MG] I tried as you suggested and unfortunately the situation
>>>>> remains unvaried.
>>>>> [SM2] Hmmm, that would indicate that it might not be
>>>>> "lumpyness" of inputs into the router. I guess I would take
>>> packet
>>>>> captures on both interfaces of the router to see whether there is
>>>>> any unexpected distribution of packets between both input and
>>>>> output? Also worth looking is the CPU usage on the router... we
>>>>> occasionally run into issues with aggressive?
>>>>> power/voltage/frequency scaling where a CPU might take much
>>> longer
>>>>> to wake up than expected, the L-queue with its rather low (IMHO
>>> too
>>>>> low) reference delay of 1ms would be especially sensitive to such
>>>>> issues.
>>>>> Also does your 100Mbps interface support BQL?
>>>>>> Still, I think I missed the point regarding sch_cake, could you
>>>>> explain again what it is and if and how could it be useful?
>>>>> [SM2] I am talking about Linux's cake qdisc and just as
>>>>> example, cake does not support special treatment of ECT(1) but
>>>>> implements rfc3168 ECN signaling for both ECT(0) and ECT(1). So
>>> for
>>>>> your experiments it might not be that useful (but for the fun of
>>> it,
>>>>> maybe try it as alternative for DualQ) I just mentioned it as an
>>>>> example for a qdisc that opted for not simply disabling all
>>>>> offloads. After all these offloads are quite useful, as they can
>>>>> considerably reduce the CPU of networking. (GSO/GRO work by
>>>>> ameliorating the somewhat fixed per-packet cost of Linux
>>>>> network-stack over multiple ethernet frames, as long as the
>>>>> increased deelay inherent in such bathing approaches this can
>>> help a
>>>>> lot).
>>>>>> Apologize, I guess I perfectly fit into the definition of "non
>>>>> experts". I tried to look it up on the internet but I struggled
>>> to
>>>>> find any clarification.
>>>>> [SM2] Sorry, my bad, I should have been clearer that I was
>>>>> talkning about a qdisc here, see "man tc-cake" on a sufficietly
>>>>> modern Linux system, the source code file is called sch_cake.c
>>> (see
>>>>> e.g.
>>> https://elixir.bootlin.com/linux/latest/source/net/sched/sch_cake.c)
>>>>>>>> - I configured the fair queue on the endpoints (sudo tc qdisc
>>>>> replace dev $NETIF root fq)
>>>>>>>> I configured my router as follows:
>>>>>>>> - I enabled forwarding through these interfaces to obtain the
>>>>> routing capabilities (sudo sysctl -w net.ipv4.ip_forward=1)
>>>>>>>> - I set the dualpi2 on both interfaces (sudo tc qdisc replace
>>>>> dev $NETIF root dualpi2)
>>>>>>>> I then applied the fair queue and disabled the offloading
>>>>> capabilities on both my classic endpoints to ensure that the
>>> classic
>>>>> and l4s flows act as fairly as possible, but to no avail (even
>>>>> without these precautions the results remain roughly the same).
>>>>>>> [SM] Again, I think with your topology offloads at the
>>>>> endpoints
>>>>>>> should not have much influence, but at the router the well
>>> might.
>>>>> If
>>>>>>> that turns out to help this might be explained by Prague's
>>>>> (and/or
>>>>>>> DualQ's L-queue) considerably higher sensitivity to bursty
>>>>> traffic
>>>>>>> compared to classic traffic and queue.
>>>>>>>> I am sure I am missing some important details in the setup,
>>> and
>>>>> I would really appreciate some help.
>>>>>>> [SM] To me this looks rather straight forward, and I
>>>>> probably would
>>>>>>> try something similar, but I did not actually try in practice.
>>>>>>> Regards & good luck
>>>>>>> Sebastian
>>>>>> [MG] Thanks in advance for your help, and if you have other
>>>>> tips or if you (or anyone else for that matter) are by any chance
>>>>> aware of a paper or project using the prague branch of the
>>> L4STeam
>>>>> repository, that might indeed be really helpful too.
>>>>> [SM] I am not the best/most objective person to quizz here,
>>>>> as I consider L4S in general too little too late and neither TCP
>>>>> Prague nor the DualQ AQM worth deploying in their current state
>>> (but
>>>>> that is why I consider your effort researching these admirable,
>>> both
>>>>> IMHO really need more research direly).
>>>>> I would always try to run the same tests over a bottleneck using
>>> a
>>>>> fq-scheduler, be it the all in one cake or fq_codel. Fq_codel
>>>>> actually con be configured to treat ECT(1) mire in line with what
>>>>> TCP Prague desires, so that might well be a decent starting point
>>>>> for alternative measurements....
>>>>> Regards
>>>>> Sebastian
>>>>>> My best regards to you and the community,
>>>>>> Matteo
>>>>>>>> Regards,
>>>>>>>> Matteo
>>>>>>>> P.s.
>>>>>>>> I just want to point out that by looking at the packet traces
>>>>> everything seems fine: Prague carries the ECN=1, the dualpi2
>>> marks
>>>>> packets with ECN=3, the AccEcn control signals on the ACE fields
>>> are
>>>>> coherent, and no losses occur in the Prague flow, while they do
>>>>> happen with the Cubic flow. It looks like Prague is
>>> underperforming
>>>>> for whatever reason. Furthermore, if I switch back to two Cubic
>>>>> flows I measure perfect share, equal delay and equal jitter, so
>>> it
>>>>> looks to me like there are no physical impairments on the
>>>>> testbed.<testplant_issue.pdf>--
>>>>>>>> L4s-discuss mailing list
>>>>>>>> L4s-discuss@ietf.org
>>>>>>>> https://www.ietf.org/mailman/listinfo/l4s-discuss
>>>>>> --
>>>>>> L4s-discuss mailing list
>>>>>> L4s-discuss@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/l4s-discuss
>>>>> --
>>>>> L4s-discuss mailing list
>>>>> L4s-discuss@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/l4s-discuss--
>>> L4s-discuss mailing list
>>> L4s-discuss@ietf.org
>>> https://www.ietf.org/mailman/listinfo/l4s-discuss
> 
> -- 
> L4s-discuss mailing list
> L4s-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/l4s-discuss