Re: [v6ops] Scale-up tests of iptables for the number of CPU cores -- Re: Preliminary scale-up tests and results for draft-ietf-v6ops-transition-comparison -- request for review
Tom Herbert <tom@herbertland.com> Mon, 06 September 2021 13:06 UTC
Return-Path: <tom@herbertland.com>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 553C03A09A8 for <v6ops@ietfa.amsl.com>; Mon, 6 Sep 2021 06:06:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jis4FwMCt2-9 for <v6ops@ietfa.amsl.com>; Mon, 6 Sep 2021 06:06:09 -0700 (PDT)
Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5D6123A09A5 for <v6ops@ietf.org>; Mon, 6 Sep 2021 06:06:09 -0700 (PDT)
Received: by mail-ej1-x62d.google.com with SMTP id lc21so13395763ejc.7 for <v6ops@ietf.org>; Mon, 06 Sep 2021 06:06:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=U4smxXDJRkl89TVS1jdEoT2FFaq3rjYQcY1rzIQB82I=; b=U57f8YBpZB5wVgyRT0DjaRIjitcDr5zi39quGJf3hzQNIK9dxnCB3UwzWRYHgXnzNb gEwqRa39H7Xr23EeT9/tHAAHVCbJMM6r1HsxVnfcLd02cwvybNEpN9pBgXbZrYQmf6Es paIDU5gLSHzjRHUHKe/yaka+NfGF2tCHwF6T9qdFF/97VdqPwpxm3qT1xtfWyJbIuqPk ib3FjBckNH8d7sTlfzhKEQ33lGaq1Y4rIyEpCzu2YjbYhkuOgFGXT32UI5AHsSsymVxr Vj02f1ODI+37jXkzDdfypdozqxHtwcYl49bBBRXXbmgKYIuM3dtyLzsaU3Yu8EtMLSak HQEg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=U4smxXDJRkl89TVS1jdEoT2FFaq3rjYQcY1rzIQB82I=; b=rsGcUSx5tbu0mnRH0mp6v724vg6ykI2Z+nx/tjTlrGdLltV0TQPQBqe2NN7vTOEuWf GDmpkzfOQDFIBFP6mw8paYKTHeuxMak45U6J6dyzOTxClQyVGpOqrmdv4op9U6maK+Pr nqEq6rQlYzJpgk0OiO8yE67DQsbGUOUGmi4jjI6BqqbXszkcW2baHbi45DCHpHXGPm8s D0XaARXCBpgEjawbqFjx8rMKpbA2U1FivgdFhuM9sI4h73aRi6tYvD27b2YGikoFGWuQ pveB+wGm/PMxInPPox4UvOBV4TFpDMyJniv2DLDr6fqjcxU1bWKDw+40C4yLdmpqTg6z 8dGw==
X-Gm-Message-State: AOAM530tvW1Iu8b3zLjNKevBG5ZVsgYNWOZV7wc4082YB6r+P3NEoglv bpZI7rYPrjoA9cjVHGfhAu0+4IGqA1YmoEGtrrIenwEGHII=
X-Google-Smtp-Source: ABdhPJwSzZcVwmWDl8iubWjLWzoyTHmF1LXTMwUdy9EahQyJNyC+0K3UfLedVXlb/SVtUP+WgJxvntHGWDBdgQwzwkc=
X-Received: by 2002:a17:906:b104:: with SMTP id u4mr13599878ejy.201.1630933566478; Mon, 06 Sep 2021 06:06:06 -0700 (PDT)
MIME-Version: 1.0
References: <1a82ae6f-db3d-4228-e176-5dab0049a156@hit.bme.hu> <DE1B80FE-6BD9-44F4-843F-78776E48C9F3@employees.org> <eff7c05a-15b5-e465-f558-5ef7eb291318@hit.bme.hu> <782eb836-23e6-b180-47f0-5118733f1e59@hit.bme.hu> <CAKD1Yr39JzNG6ng2TS3GNfM1z=rcs9jA2ph66f-3tjy5G_cjbA@mail.gmail.com> <388905fe-13da-e6e2-26c6-3c6a07d58574@joelhalpern.com> <da4eda6b-0502-dc6e-6477-050eb4b3df7d@hit.bme.hu>
In-Reply-To: <da4eda6b-0502-dc6e-6477-050eb4b3df7d@hit.bme.hu>
From: Tom Herbert <tom@herbertland.com>
Date: Mon, 06 Sep 2021 09:05:53 -0400
Message-ID: <CALx6S37jhPeE2Z1tKH-g-=p=rh8CLhiVZApZyn7CMQh173xiOg@mail.gmail.com>
To: Gábor LENCSE <lencse@hit.bme.hu>
Cc: "Joel M. Halpern" <jmh@joelhalpern.com>, Lorenzo Colitti <lorenzo=40google.com@dmarc.ietf.org>, "v6ops@ietf.org WG" <v6ops@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000e1f0dd05cb534f0f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/v6ops/nNc4OcSWORo-djjcRmb0gEFB1wo>
Subject: Re: [v6ops] Scale-up tests of iptables for the number of CPU cores -- Re: Preliminary scale-up tests and results for draft-ietf-v6ops-transition-comparison -- request for review
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Sep 2021 13:06:16 -0000
On Mon, Sep 6, 2021, 5:10 AM Gábor LENCSE <lencse@hit.bme.hu> wrote: > Dear Joel and Lorenzo, > > Thank you for your suggestions! I am sure that solutions eliminating > TCP/IP socket interface may produce much higher performance. > > However, the aim of my current tests is to check the scalability of > stateful technologies. From our draft ( > https://datatracker.ietf.org/doc/html/draft-ietf-v6ops-transition-comparison > ), we need to eliminate the following section: > Hi Gabor, You might want to look at Ptables which was introduced at the last Netdev conference. This is explicitly designed to address the performance challenges of iptables. https://netdevconf.info/0x15/session.html?Introducing-Ptables Tom Stateful technologies, 464XLAT and DS-Lite (and also NAT444) can > therefore be much more efficient in terms of port allocation and thus > public IP address saving. The price is the stateful operation in the > service provider network, which allegedly does not scale up well. It > should be noticed that in many cases, all those factors may depend on > how it is actually implemented. > > XXX MEASUREMENTS ARE PLANNED TO TEST IF THE ABOVE IS TRUE. XXX > > As for scalability, the single core performance is redundant, rather we > focus on two things: > > 1. How the performance scales-up with the number of CPU cores? > > It can be seen in the table below: > > num. CPU cores 1 2 4 8 16 > src ports 4,000 4,000 4,000 4,000 4,000 > dst ports 1,000 1,000 1,000 1,000 1,000 > num. conn. 4,000,000 4,000,000 4,000,000 4,000,000 4,000,000 > conntrack t. s. 2^23 2^23 2^23 2^23 2^23 > hash table size c.t.s c.t.s c.t.s c.t.s c.t.s > c.t.s/num.conn. 2.097 2.097 2.097 2.097 2.097 > num. exp. 10 10 10 10 10 > error 100 100 100 1,000 1,000 > cps median 223.5 371.1 708.7 1,341 2,383 > cps min 221.6 367.7 701.7 1,325 2,304 > cps max 226.7 375.9 723.6 1,376 2,417 > cps rel. scale up 1 0.830 0.793 0.750 0.666 > thorughput median 414.9 742.3 1,379 2,336 4,557 > thorughput min 413.9 740.6 1,373 2,311 4,436 > thorughput max 416.1 746.9 1,395 2,361 4,627 > tp. rel. scale up 1 0.895 0.831 0.704 0.686 > > Of course, the performance of a 16-core system is only 10-fold, and not > 16-fold, but IMHO it is quite good. > > For example, please refer to the scale-up results of NSD (a high > performance authoritative DNS server) in Table 9 of this (open access) > paper: > > G. Lencse, "Benchmarking Authoritative DNS Servers", *IEEE Access*, vol. > 8. pp. 130224-130238, July 2020. DOI: 10.1109/ACCESS.2020.3009141 > https://ieeexplore.ieee.org/document/9139929 > > For example, the scale-up of the medians in Table 9 is: > 1,454,661/177,432=8.2-fold performance using 16 cores, that is, the > relative speed up is only 0.52. And DNS is not a stateful technology! > > > 2. How the performance degrades with the growth of the number of sessions > stored in the connection tracking table of the stateful NATxy device? > > Regarding this, I plan to perform measurements with the following > parameters: > > src ports 2,500 5,000 10,000 20,000 40,000 > dst ports 625 1,250 2,500 5,000 10,000 > num. conn. 1,562,500 6,250,000 25,000,000 100,000,000 400,000,000 > conntrack t. s. 2^21 2^23 2^25 2^27 2^29 > hash table size c.t.s c.t.s c.t.s c.t.s c.t.s > c.t.s/num.conn. 1.342 1.342 1.342 1.342 1.342 > num. exp. 10 10 10 10 10 > error 1,000 1,000 1,000 1,000 1,000 > > As carrying out the measurements requires a lot of time (both from me to > set and start the measurements and also the execution time is rather high, > when we are talking about the last two columns), I would like to check in > advance, if the members of the WG consider these parameters good. > > A question to all WG members: > > Will you be convinced by the results gained using the parameters above? > > If not, please point out, what problem you can see that may question the > validity of my results! > > Thank you very much in advance! > > Best regards, > > Gábor > > > 9/4/2021 5:10 AM keltezéssel, Joel M. Halpern írta: > > Or you could use fd.io, which is optimized for both performance and > flexible applciation of packet behaviors (NAT, IPSec, LISP, ...). > > Yours, > Joel > > On 9/3/2021 9:02 PM, Lorenzo Colitti wrote: > > Note that the Linux forwarding stack is not very optimized for forwarding. > If you want Hugh speeds you probably want to use XDP, which acts on packets > as soon as the receiving NIC DMAs them into memory. > > That means you have to do all the packet modifications yourself though. > Modifying IPv6 packets is tirival (just change the TTL) but implementing > IPv4 NAT is more complicated. > > On Sat, 4 Sept 2021, 01:26 Gábor LENCSE, <lencse@hit.bme.hu > <mailto:lencse@hit.bme.hu> <lencse@hit.bme.hu>> wrote: > > Dear Ole, > > I have performed the scale-up tests of iptables using 1, 2, 4, 8, > and 16 CPU cores. I used two "P" series nodes of NICT StarBED, which > are DELL PowerEdge R430 servers, please see their hardware details > here: https://starbed.nict.go.jp/en/equipment/ > <https://starbed.nict.go.jp/en/equipment/> > <https://starbed.nict.go.jp/en/equipment/> > > I have done some tuning of the parameters: number of connections: > 4,000,000; src ports: 4,000; dst ports: 1,000; conntrack table size: > 2^23; hash size = connection table size. > > I think that the results are quite good, both the number of > connections per second and throughput scaled up quite well with the > number of CPU cores. > > num. CPU cores 1 2 4 8 16 > src ports 4,000 4,000 4,000 4,000 4,000 > dst ports 1,000 1,000 1,000 1,000 1,000 > num. conn. 4,000,000 4,000,000 4,000,000 4,000,000 4,000,000 > conntrack t. s. 2^23 2^23 2^23 2^23 2^23 > hash table size c.t.s c.t.s c.t.s c.t.s c.t.s > c.t.s/num.conn. 2.097 2.097 2.097 2.097 2.097 > num. exp. 10 10 10 10 10 > error 100 100 100 1,000 1,000 > cps median 223.5 371.1 708.7 1,341 2,383 > cps min 221.6 367.7 701.7 1,325 2,304 > cps max 226.7 375.9 723.6 1,376 2,417 > cps rel. scale up 1 0.830 0.793 0.750 0.666 > thorughput median 414.9 742.3 1,379 2,336 4,557 > thorughput min 413.9 740.6 1,373 2,311 4,436 > thorughput max 416.1 746.9 1,395 2,361 4,627 > tp. rel. scale up 1 0.895 0.831 0.704 0.686 > > As you can see, the performance of a 16-core machine is about 10x of > the performance of a single core machine. I think it is very good > results for the scale-up of a stateful NAT44 implementation. > > What do you think? > > Best regards, > > Gábor > > 8/18/2021 10:52 PM keltezéssel, Gábor LENCSE írta: > > Dear Ole, > > Thank you very much for your reply! > > Please see my answers inline. > > 8/17/2021 10:52 PM keltezéssel, otroan@employees.org > <mailto:otroan@employees.org> <otroan@employees.org> írta: > > Gábor, > > Thanks for some great work! > I will try to get a more thorough look later, but for a first set of > comments. > > For methodology. > - Setting a baseline, e.g. by measuring base IPv4 forwarding would > be useful in establishing how much extra work is involved doing the > translation. > > I had the measurement results ready both for IPv4 and IPv6 kernel > routing. (20 experiments, error=1000) The below table shows the > number of all forwarded packets of bidirectional traffic (in > million frames per seconds). > > Linux kernel routing IPv4 IPv6 > thorughput median 9.471 9.064 > thorughput min 9.443 9.029 > thorughput max 9.486 9.088 > > > - Scaling linearly by number of cores is challenging in these > systems. It would be interesting to see the results for 1, 2, 4, 8 cores > and not only > for the 8 core system. > > > Until the current measurements, I always did exactly this kind of > scale-up tests. For example, I compared the performance of > different DNS64 implementations using 1, 2, 4, 8, 16 cores in [2] > and the performance of different authoritative DNS servers using > 1, 2, 4, 8, 16, 32 cores in [3]. I have got some interesting > experience with switching on/off the CPU cores. First, I did the > on/off switching of the i-th CPU core by writing 1/0 values into > the /sys/devices/system/cpu/cpu$i/onlinefile of the working Linux > kernel [2]. Whereas it seemed to work well, when the query rates > were moderate (a few times ten thousand queries per second), it > caused problems in the second case, when I used up to 3 million > queries per second query rates, and thus I rather set the number > of active CPU cores at the DUT by using the maxcpus=nkernel > parameter. > > [2] G. Lencse and Y. Kadobayashi, "Benchmarking DNS64 > Implementations: Theory and Practice", /Computer Communications/ > (Elsevier), vol. 127, no. 1, pp. 61-74, September 1, 2018, DOI: > 10.1016/j.comcom.2018.05.005 Review version in PDF > > <http://www.hit.bme.hu/~lencse/publications/ECC-2018-DNS64-BM-for-review.pdf> > <http://www.hit.bme.hu/~lencse/publications/ECC-2018-DNS64-BM-for-review.pdf> > > [3] G. Lencse, "Benchmarking Authoritative DNS Servers", /IEEE > Access/, vol. 8. pp. 130224-130238, July 2020. DOI: > 10.1109/ACCESS.2020.3009141 Revised version in PDF > > <http://www.hit.bme.hu/~lencse/publications/IEEE-Access-2020-AuthDNS-revised.pdf> > <http://www.hit.bme.hu/~lencse/publications/IEEE-Access-2020-AuthDNS-revised.pdf> > > So now I feel I have the experience to perform such measurements. > > I plan to start a series of measurements with 1, 2, 4, 8 cores. To > spare execution time, I need to choose one fixed number of > connections. (It would last very long to test all possible > combinations if I used several different numbers of connections!) > As I expect good scale up, and so far we have seen only the 8-core > performance, I expect that the single core performance is likely > between 1/6 to 1/8 of it. It means that I should use a moderate > number of connections, otherwise the filling up of the conntrack > table would last too long. > > So I plan to use the following parameters: number of connections: > 4,000,000; src ports: 4,000; dst ports: 1,000; conntrack table > size: 2^22; hash size: c.t.s/8. > > Do you agree with it? > > Regarding the number of connections. You should see a drop-off when > the size of the session table is larger than the L3 cache. > > > Good catch! > > Although I cannot directly measure the size of the conntrack > table, I can record the change of the free memory during the > experiments and thus I can make an estimation, when this happens. > I do not promise to deal with it now, but I plan to use it later on. > > You might also balance maximum bandwidth available with maximum > session size. 250G of forwarding per socket on a PCIe 3.0 system. > > > Do you mean "250G" as 250Gbps link? > > I am sorry, but it is beyond my dreams. The systems I usually use > (at NICT StarBED, Japan) have 10Gbps NICs. Now I use two HPE > servers with 10/25Gbps NICs interconnected by 10Gbps DAC cables > (at the Széchenyi István University, Győr, Hungary), and even > though my colleague has purchased 25Gbps DAC cables as I > requested, I do not use them, as my current rates are very far > from the maximum frame rate (14,880,952fps) of the 10Gbps links. > > So it would be nice, if someone having higher performance hardware > could repeat my measurements. Any volunteers? > > > I might have read a little too much between the lines in the draft, > but I got a feeling (and just that), that the tests were coloured a bit by > the behaviour of a particular implementation (iptables). > > > Yes, my results reflect only the behavior of iptables. > > On the one hand, if the results of one particular implementation show: > - poor speed up, then it does not prove that the technology is > bad, other implementations might perform much better. > - good speed up, then it proves that the technology is good, but > other implementations may still perform much worse. > > Of course, our time is rather limited, thus my approach is that I > would like to test the implementations we expect to be good > enough. As for NAT44, I expect that iptables is among them. If > anyone can suggest a better one, I am open to try it. It should be > free software for two reasons: > - I do not have a budget to buy a proprietary implementation > - the licenses of some vendors prohibit the publication of > benchmarking results. > > You are of course measuring how a particular implementation (or set of > implementations) scales and we as the IETF have to deduce if the scaling > limitations are in the implementation or in the protocol mechanism. We do > know that you can build large scale NAT44s and NAT64s, so back to my first > point, it might be useful to provide a baseline to give an idea of the > additional cost associated with the extra treatment of packets. > > > Yes, my results (9Mfps vs. 3Mfps) definitely show that stateful > NAT44 is not without performance costs. > > It would certainly be interesting to run your tool against the VPP > implementation of NAT. > Here are some NAT performance results from runs with the CSIT > benchmarking setup for VPP: > > https://docs.fd.io/csit/rls2106/report/vpp_performance_tests/packet_throughput_graphs/nat44.html > > <https://docs.fd.io/csit/rls2106/report/vpp_performance_tests/packet_throughput_graphs/nat44.html> > <https://docs.fd.io/csit/rls2106/report/vpp_performance_tests/packet_throughput_graphs/nat44.html> > > https://docs.fd.io/csit/rls2106/report/vpp_performance_tests/throughput_speedup_multi_core/nat44.html > > <https://docs.fd.io/csit/rls2106/report/vpp_performance_tests/throughput_speedup_multi_core/nat44.html> > <https://docs.fd.io/csit/rls2106/report/vpp_performance_tests/throughput_speedup_multi_core/nat44.html> > > As far as I could figure out, their stateless tests seem to scale > up well up to 4 CPU cores, but stateful tests do not scale up, > regardless of speaking about connections per second or throughput. > > I really wonder how iptables behaves! :-) > > Best regards, > > Gábor > > Best regards, > Ole > > > > On 17 Aug 2021, at 14:51, Gábor LENCSE<lencse@hit.bme.hu> > <lencse@hit.bme.hu> <mailto:lencse@hit.bme.hu> <lencse@hit.bme.hu> > wrote: > > Dear All, > > At IETF 111, I promised to perform scale-up test for stateful NAT64 > and also for stateful NAT44. (Our primary focus is stateful NAT64, as it is > a part of 464XLAT. As there was interest in CGN, I am happy to invest some > time into it, too. I think the comparison of their scalability may be > interesting for several people.) > > Now, I am in the phase of preliminary tests. I mean that I have > performed tests to explore the behavior of both the Tester (stateful branch > of siitperf) and the benchmarked application to be able to determine the > conditions for the production tests. > > Now I write to ask all interested "IPv6 Operations" mailing list > members to review my report below from two points of view: > - Do you consider the methodology sound? > - Do you think that the parameters are appropriate to provide > meaningful results for network operators? > > Control question: > - Will you support the inclusion of the results into > draft-ietf-v6ops-transition-comparison and the publication of the draft as > an RFC even if the results will contradict to your view of the scalability > of the stateful technologies? > > As I had more experience with iptables than Jool, I started with the > scale-up tests of stateful NAT44. > > Now, I give a short description of the test system. > > I used two identical HPE ProLiant DL380 Gen10 (DL380GEN10) servers > with the following configuration: > - 2 Intel 5218 CPUs (the clock frequency was set to 2.3GHz fixed ) > - 256GB (8x32GB) DDR4 SDRAM @ 2666 MHz (accessed quad channel) > - 2-port BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (used > with 10Gbps DAC cables) > > The servers have 4 NUMA nodes (0, 1, 2, 3) and the NICs belong to NUMA > node 1. All the interrupts caused by the NICs are processed by 8 of the 32 > CPU cores (the ones that belong to NUMA node 1, that is cores from 8 to 15) > thus regarding our measurements, the DUT (Device Under Test) is more or > less equivalent with an 8-core server. :-) > > I used Debian Linux 9.13 with 5.12.14 kernel on both servers. > > The test setup was the same as our draft describes: > https://datatracker.ietf.org/doc/html/draft-lencse-bmwg-benchmarking-stateful > > <https://datatracker.ietf.org/doc/html/draft-lencse-bmwg-benchmarking-stateful> > <https://datatracker.ietf.org/doc/html/draft-lencse-bmwg-benchmarking-stateful> > +--------------------------------------+ > 10.0.0.2 |Initiator Responder| 198.19.0.2 > +-------------| Tester > |<------------+ > | private IPv4| [state table]| public IPv4 > | > | +--------------------------------------+ > | > | > | > | +--------------------------------------+ > | > | 10.0.0.1 | DUT: | 198.19.0.1 > | > +------------>| Sateful NATxy gateway > |-------------+ > private IPv4| [connection tracking table] | public IPv4 > +--------------------------------------+ > > > Both the Tester and the DUT were the above mentioned HPE servers. > > I wanted to follow my original plan included in my IETF 111 > presentation to measure both the maximum connection establishment rate and > the throughput using the following number of connections: 1 million, 10 > million, 100 million, and 1 billion. However, the DUT became unavailable > during my tests with 1 billion connections as the connection tracking table > has exhausted its memory. Thus I have reduced the highest number of > connections to 500 million. > > First, I had to perform the tests for the maximum connection > establishment rate. To achieve the required growing number of connections, > I used always 50,000 different source port numbers (from 1 to 50,000) and > increased the destination port numbers as 20, 200, 2000, and 10,000 > (instead of 20,000). > > As for the size of the connection tracking table, I used powers of 2. > The very first value was 2^20=1,048,576 and then I had to use the lowest > large enough number, that is 2^24, etc. The hash size parameter was always > set to 1/8 of the size of the connection tracking table. > > I have performed binary search to determine the maximum connection > establishment rate. The stopping criterion was expressed by the "error" > parameter, which is the difference of the upper and lower bound. It was set > to 1,000 in all cases except the last one: then it was set to 10,000 to > save some execution time. The experiments were executed 10 times except the > last one (only 3 times). > > I have calculated the median, the minimum and the maximum of the > measured connection establishment rates. I hope the values of the table > below will be readable in my e-mail: > > Num. conn. 1,000,000 10,000,000 100,000,000 500,000,000 > src ports 50,000 50,000 50,000 50,000 > dst ports 20 200 2,000 10,000 > conntrack t. s. 2^20 2^24 2^27 2^29 > hash table size c.t.s/8 c.t.s/8 c.t.s/8 c.t.s/8 > num. exp. 10 10 10 3 > error 1,000 1,000 1,000 10,000 > cps median 1.124 1.311 1.01 0.742 > cps max 1.108 1.308 1.007 0.742 > cps min 1.128 1.317 1.013 0.742 > c.t.s/n.c. 1.049 1.678 1.342 1.074 > > The cps (connections per second) values are given in million > connections per second. > > The rise of the median to 1.311M at 10 million connections (from > 1.124M at 1 million connections) is deliberately caused by the fact that > the size of the connection tracking table was quite large compared to the > actual number of connections. I have included this proportion in the last > line of the table. Thus only the very first and very last columns are > directly comparable. If we consider that the maximum connection > establishment rate was decreased from 1.124M to 0.742M while the total > number of connections increased from 1M to 500M, I think, we can be > satisfied with the scale-up of iptables. > (Of course, I can see the limitations of my measurements: the error of > 10,000 was too high, the binary search always finished at the same number.) > > As I wanted more informative results, I have abandoned the decimal > increase of the number of connections. I rather used binary increase to > ensure that the proportion of the number of connections and the size of the > connection tracking table be constant. > > I had another concern. Earlier, I have pointed our that the role of > the source and destination port numbers is not completely symmetrical in > the hash function that distributes the interrupts among the CPU cores [1]. > Therefore, I decided to increase to source and destination port range > together. > > [1] G. Lencse, "Adding RFC 4814 Random Port Feature to Siitperf: > Design, Implementation and Performance Estimation", International Journal > of Advances in Telecommunications, Electrotechnics, Signals and Systems, > vol 9, no 3, pp. 18-26, 2020, DOI: 10.11601/ijates.v9i3.291 Full paper in > PDF > > The results are shown in the table below (first, please see only the > same lines as in the table above): > > Num. conn. 1,562,500 6,250,000 25,000,000 100,000,000 > 400,000,000 > src ports 2,500 5,000 10,000 20,000 40,000 > dst ports 625 1,250 2,500 5,000 10,000 > conntrack t. s. 2^21 2^23 2^25 2^27 2^29 > hash table size c.t.s/8 c.t.s/8 c.t.s/8 c.t.s/8 c.t.s/8 > num. exp. 10 10 10 10 5 > error 1,000 1,000 1,000 1,000 1,000 > cps median 1.216 1.147 1.085 1.02 0.88 > cps min 1.21 1.14 1.077 1.015 0.878 > cps max 1.224 1.153 1.087 1.024 0.884 > c.t.s/n.c. 1.342 1.342 1.342 1.342 1.342 > thorughput median 3.605 3.508 3.227 3.242 2.76 > thorughput min 3.592 3.494 3.213 3.232 2.748 > thorughput max 3.627 3.521 3.236 3.248 2.799 > > Now, the maximum connection setup rate results deteriorate only very > slightly with the increase of the number of connections from 1.216M to > 1.02M, while the number of connections increased from 1,562,500 to > 100,000,000. (A slightly higher degradation can be observed only in the > last column. I will return to it later.) > > The last 3 lines of the table show the median, minimum and maximum > values of the throughput. As required by RFC 8219, throughput was > determined by using bidirectional traffic and the duration of the > elementary steps of the binary search was 60s. (The binary search was > executed 10 times, except for the last column, where it was done only 5 > times to save execution time.) > > Note: commercial testers usually report the total number of frames > forwarded. Siitperf reports the number of frames per direction. Thus in > case of bidirectional tests, the reported value should be multiplied by 2 > to receive the total number of frames per second. I did so, too. > > Although RFC 2544/5180/8219 require testing with bidirectional > traffic, I suspect that perhaps unidirectional throughput my also be > interesting for ISPs, as home users usually have much more download than > upload traffic... > > The degradation of the throughput is also moderate, except at the last > column. I attribute the higher decrease of the throughput at 400,000,000 > connections (as well as that of the maximum connection establishment rate) > to NUMA issues. In more detail: this time nearly the entire memory of the > server was in use, whereas in the previous cases iptables could use NUMA > local memory (if it was smart enough to do that). Unfortunately I cannot > add more memory to these computers to check my hypothesis, but in a few > weeks, I hope to be able to use some DELL PowerEdge R430 computers that > have only two NUMA nodes and 384GB RAM, see the "P" nodes here: > https://starbed.nict.go.jp/en/equipment/ > <https://starbed.nict.go.jp/en/equipment/> > <https://starbed.nict.go.jp/en/equipment/> > > Now, I kindly ask everybody, who is interested in the scale up tests, > to comment my measurements regarding both the methodology and the > parameters! > > I am happy to provide more details, if needed. > > I plan to start working on the NAT64 tests in the upcoming weeks. I > plan to use Jool. I have very limited experience with it, so first, I need > to find out, how I can tune its connection tracking table parameters to be > able to perform fair scale up tests. > > In the meantime, please comment my above experiments and results so > that I may improve the tests to provide convincing results for all > interested parties! > > Best regards, > > Gábor > > P.S.: If someone would volunteer repeating my experiments, I would be > happy to share my scripts and experience and to provide support for > siitperf, which is available from:https://github.com/lencsegabor/siitperf > <https://github.com/lencsegabor/siitperf> > <https://github.com/lencsegabor/siitperf> > The stateful branch of siitperf is in a very alpha state. It has only > a partial documentation in this paper, which is still under review: > http://www.hit.bme.hu/~lencse/publications/SFNAT64-tester-for-review.pdf > <http://www.hit.bme.hu/~lencse/publications/SFNAT64-tester-for-review.pdf> > <http://www.hit.bme.hu/~lencse/publications/SFNAT64-tester-for-review.pdf> > (It may be revised or removed at any time.) > The support for unique pseudorandom source and destination port number > combinations, what I used for my current tests, is not described there, as > I invented it after the submission of that paper. (I plan to update the > paper, if I can have a chance to revise it. In the unlikely case, if the > paper will be accepted as is, I plan to write a shorter paper about the new > features. Now, the commented source code is the most reliable > documentation.) > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > v6ops mailing list > v6ops@ietf.org <mailto:v6ops@ietf.org> <v6ops@ietf.org> > https://www.ietf.org/mailman/listinfo/v6ops > <https://www.ietf.org/mailman/listinfo/v6ops> > <https://www.ietf.org/mailman/listinfo/v6ops> > > > > _______________________________________________ > v6ops mailing list > v6ops@ietf.org <mailto:v6ops@ietf.org> <v6ops@ietf.org> > https://www.ietf.org/mailman/listinfo/v6ops > <https://www.ietf.org/mailman/listinfo/v6ops> > <https://www.ietf.org/mailman/listinfo/v6ops> > > > _______________________________________________ > v6ops mailing list > v6ops@ietf.org <mailto:v6ops@ietf.org> <v6ops@ietf.org> > https://www.ietf.org/mailman/listinfo/v6ops > <https://www.ietf.org/mailman/listinfo/v6ops> > <https://www.ietf.org/mailman/listinfo/v6ops> > > > _______________________________________________ > v6ops mailing list > v6ops@ietf.org > https://www.ietf.org/mailman/listinfo/v6ops > > > _______________________________________________ > v6ops mailing list > v6ops@ietf.org > https://www.ietf.org/mailman/listinfo/v6ops >
- [v6ops] Preliminary scale-up tests and results fo… Gábor LENCSE
- Re: [v6ops] Preliminary scale-up tests and result… otroan
- Re: [v6ops] Preliminary scale-up tests and result… Gábor LENCSE
- [v6ops] Scale-up tests of iptables for the number… Gábor LENCSE
- Re: [v6ops] Scale-up tests of iptables for the nu… Lorenzo Colitti
- Re: [v6ops] Scale-up tests of iptables for the nu… Joel M. Halpern
- Re: [v6ops] Scale-up tests of iptables for the nu… Gábor LENCSE
- Re: [v6ops] Scale-up tests of iptables for the nu… otroan
- Re: [v6ops] Scale-up tests of iptables for the nu… Tom Herbert
- Re: [v6ops] Scale-up tests of iptables for the nu… Gabor LENCSE
- Re: [v6ops] Scale-up tests of iptables for the nu… otroan
- Re: [v6ops] Scale-up tests of iptables for the nu… Gábor LENCSE
- [v6ops] Scale-up tests of iptables for the number… Gábor LENCSE