Re: [bmwg] I-D Action: draft-ietf-bmwg-benchmarking-stateful-01.txt
Gabor LENCSE <lencse@hit.bme.hu> Thu, 17 November 2022 13:13 UTC
Return-Path: <lencse@hit.bme.hu>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8E268C1524BC for <bmwg@ietfa.amsl.com>; Thu, 17 Nov 2022 05:13:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.897
X-Spam-Level:
X-Spam-Status: No, score=-6.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AsHFkAaSWMaT for <bmwg@ietfa.amsl.com>; Thu, 17 Nov 2022 05:13:21 -0800 (PST)
Received: from frogstar.hit.bme.hu (frogstar.hit.bme.hu [IPv6:2001:738:2001:4020::2c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6718FC14CE22 for <bmwg@ietf.org>; Thu, 17 Nov 2022 05:12:45 -0800 (PST)
Received: from [192.168.11.2] (M106153182152.v4.enabler.ne.jp [106.153.182.152]) (authenticated bits=0) by frogstar.hit.bme.hu (8.17.1/8.17.1) with ESMTPSA id 2AHDCMH1012683 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO) for <bmwg@ietf.org>; Thu, 17 Nov 2022 14:12:30 +0100 (CET) (envelope-from lencse@hit.bme.hu)
X-Authentication-Warning: frogstar.hit.bme.hu: Host M106153182152.v4.enabler.ne.jp [106.153.182.152] claimed to be [192.168.11.2]
Content-Type: multipart/alternative; boundary="------------Hl2ptLFIhmxZWexoTaPorjWA"
Message-ID: <9668432a-df92-c717-6e87-a7c5cb7b442e@hit.bme.hu>
Date: Thu, 17 Nov 2022 22:12:20 +0900
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2
From: Gabor LENCSE <lencse@hit.bme.hu>
To: "bmwg@ietf.org" <bmwg@ietf.org>
References: <166623206196.44847.13266883730319756483@ietfa.amsl.com> <8d135160fba94fa089f8901a66668200@huawei.com>
Content-Language: en-US
In-Reply-To: <8d135160fba94fa089f8901a66668200@huawei.com>
X-Virus-Scanned: clamav-milter 0.103.7 at frogstar.hit.bme.hu
X-Virus-Status: Clean
Received-SPF: pass (frogstar.hit.bme.hu: authenticated connection) receiver=frogstar.hit.bme.hu; client-ip=106.153.182.152; helo=[192.168.11.2]; envelope-from=lencse@hit.bme.hu; x-software=spfmilter 2.001 http://www.acme.com/software/spfmilter/ with libspf2-1.2.11;
X-DCC-wuwien-Metrics: frogstar.hit.bme.hu; whitelist
X-Scanned-By: MIMEDefang 2.86 on 152.66.248.44
Archived-At: <https://mailarchive.ietf.org/arch/msg/bmwg/jfkUAw3GD2WzuSBanjBAJ4ASCH4>
Subject: Re: [bmwg] I-D Action: draft-ietf-bmwg-benchmarking-stateful-01.txt
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Nov 2022 13:13:26 -0000
Dear Eduard, Thank you very much for your review! On 11/17/2022 4:02 AM, Vasilenko Eduard wrote: [...] > I am still sure about the big dependency between "packets per second" and "new sessions per second". I definitely agree with it. > But it would be utterly difficult to specify a profile for mixed traffic - it would be very different between fixed and mobile subscribers. > Hence, let test it separately and assume linear influence on each other: if pps is at 60% then cps is probably may be up to 40% from the tested maximum. > Does it make sense to specify it in the document? To be more precise, the situation of a stateful NAT64 or NAT44 gateway is even more complicated. Several different things happen, including: 1) A new connection is established. 2) A packet is transmitted in the upload direction. 3) A packet is transmitted in the download direction. 4) A connection is teared down. (Either by closing a TCP connection or by the timeout of the TCP or UDP "connection".) To have conditions similar to the traffic of the Internet, all the above should be considered. Some further comments: As for iptables, connection tear down is much-much more costly (by means of CPU power) than connection establishment. According the my measurements, in the case of 100M connections, iptables could established 2.237M connections in a second, but it could terminate only 345k connections in a second. Please see the Table 4 and Table 5 in: https://datatracker.ietf.org/doc/html/draft-lencse-v6ops-transition-scalability-04 I tested the throughput of Jool, iptables+tayga, and OpenBSD PF stateful NAT64 solutions using uni-directional traffic in the upload and download directions and I found them different. The actual quantity of upload and download traffic can be very much different in case of a home user! Yet RFC 2544 / RFC 2510 / RFC 8219 require testing throughput with bidirectional traffic. We also kept it in our draft and added testing with unidirectional traffic as OPTIONAL. Should we perhaps make testing with unidirectional traffic REQUIRED? After all these considerations: *What should we recommend and why?* Perhaps an appropriate mix of 1, 2, 3, and 4 could be the desired load for benchmarking a stateful NATxy gateway... But, we see a lot of hindrances, including: - Following the long established tradition (also required by RFC 2544 / RFC 2510 / RFC 8219) we use UDP for testing. In UDP, there is no such thing as "termination of a connection". Of course the gateway still handles UDP "connections", but it can be "terminated" only by timeout. And it makes very hard (perhaps impossible) to use let use let us say 10% connection tear down. - We could still use a mix of let us say: 10% of the packets result in new connections and 90% of the packets belong to an existing connection. (The ratios could be changed.) However, adding 10% new connections may significantly increase the number of connections during a 60s long throughput test, because we are not able to terminate the same number of old connections. (And the number of connections highly influence the performance of a stateful NATxy gateway, please see the connection scalability results in the above mentioned draft.) So currently we cannot propose a better solution than measuring separately: - connection setup performance (maximum connection establishment rate) - packet forwarding performance measured with a constant number of connections (throughput with bidirectional traffic; optionally: throughput with uni-directional traffic) - connection termination performance (connection tear down rate) Can you propose a repeatable measurement that uses an appropriate mix of 1-4? > It is cheating to test ports not processing Engine. Of course, a lightly loaded engine would show much better results. > Unfortunately, to get meaningful results it is important to overload the whole engine. > In the case of a hardware device, it is one NPU on the processing line card. Vendors share (under NDA) performance for it. It is always below 100GE but above 10GE. Hence, many ports may be needed in the 10GE case. > In the case of a Virtual appliance, it is probably a VM of one vCPU of Intel or AMD. One vCPU would not be capable to overload the 10GE port. > Typically, the number of sessions is so big (in a Telco environment) that engines scale linearly (for software-based and hardware-based). The test for many engines (CPU cores) is not more useful but more difficult to implement. I do not really understand the point of the above. It seems to be about a special hardware device that has several cards with perhaps multiple ports... ? In our simple model, we have a DUT with only two ports, e.g., in the case of NAT64 it looks as follows: +--------------------------------------+ 2001:2::2 |Initiator Responder| 198.19.0.2 +-------------| Tester |<------------+ | IPv6 address| [state table]| IPv4 address| | +--------------------------------------+ | | | | +--------------------------------------+ | | 2001:2::1 | DUT: | 198.19.0.1 | +------------>| Stateful NAT64 gateway |-------------+ IPv6 address| [connection tracking table] | IPv4 address +--------------------------------------+ Figure 2: Test setup for benchmarking stateful NAT64 gateways > IMHO: the discussion for what we are trying to overload is mandatory. What do you mean under "overload"? If I understand it correctly, we do overload during both the maximum connection establishment rate test and the throughput test in order to find the maximum lossless rate using a binary search. > How big a session table should be created in phase 1? It may be 10, 10k, or 1M. How to decide? Yes, this is a very good question. IMHO, if we want to imitate the situation of a commercial device used by and ISP, the number of sessions should begin around 1M and go up until our hardware is unable to handle it. I used this approach, and the upper limit was 800M and 1600M for iptables and Jool, respectively. (Please see the connection scalability results in the above mentioned draft in Table 4 and Table 9.) > Or asking the same from a different direction: how many packets should be sent over every session? In my measurements, this number was determined by the number of connections and the achieved throughput. For example, if you see Table 4 and choose the first column (1.56M connections), then the median throughput is 5.326M. It means that 319,560M packets were forwarded during the 60s long throughput test, thus 204,846 packets belonged to a single connection. However, if you choose the last column (800M connections), then the median throughput is 3.689M. It means that 221,340M packets were forwarded during the 60s long throughput test, thus 276 or 277 packets belonged to a single connection. If do similar calculation with Jool, you will get an order of magnitude lower values due to its lower throughput. > Section 2: > I do not understand why pseudorandom port numbers for every packet were assumed in the basic design. The aim of using pseudorandom enumeration of all port number combinations that are possible with the given source port and destination port number ranges is twofold: 1) To achieve that all test frames result in a new connection during the preliminary test phase. -- This is needed to measure connection establishment performance. 2) To fill up the connection tracking table of the DUT and the state table of the tester as soon as possible. -- This is important, if the preliminary test phase is performed in preparation of a real test phase. > It is an unrealistic assumption that somebody would mistakenly generate random packets instead of sessions. > I could not believe such a mistake. People are testing stateful devices (FW, LB, NAT) for ages. > Hence, it does not makes sense to warrant against it. > Could you rephrase this section: "Of course, it's known that the load for a stateful device should be flow based where many packets from both directions are simulating one session (many packets should look like 1 session)". > Hardware-based testers like Spirent support it. I feel that you expect to use a given number of sessions, and the packets should belong to those sessions. Am I right? We do exactly that. But first we establish the connections for all the required sessions in the preliminary test phase and then generate packets in the real test phase that belong to those sessions. > It is just a question of how many packets are in one session and how many sessions. Yes, it is a good question. There can be multiple approaches: - If the tests are commissioned by a network operator, then the numbers should be tailored to the statistics of the network of the operator. - If the tests are performed by an academic researcher (like me), then -- I think -- wide ranges should be examined: starting from a low realistic number up to the hardware limits, as I did in the above mentioned draft to provide a general, wide viewing angle picture about the implementation. :-) What do you think? > New session generation could be pseudorandom (primarily on the source port). Yes, I agree, but RFC 4814 requires pseudorandom by means of both SOURCE and DESTINATION port numbers: https://www.rfc-editor.org/rfc/rfc4814#section-4.5 Please see my considerations in Section 2.3 of http://www.hit.bme.hu/~lencse/publications/ECC-2022-SFNATxy-Tester-published.pdf > Section 3: a little inconsistency: > I guess that "state table" should be created on the Initiator too (you mentioned only the responder). In fact, something like "state table" is actually created in the Initiator of siitperf for performance considerations. The pseudorandom enumeration of all possible port number combination happens before the preliminary test phase and they are stored in an array. The are just read from there linearly during the preliminary test phase. (As the IP addresses are fixed, thus one can say, that the "state table" is there.) But we do not need it after the preliminary test phase. Initiator uses simply pseudorandom port numbers in the real test phase: as all possible combinations were enumerated in the preliminary test phase, no new combinations may occur. > "connection tracking table" is claimed "unknown for the Tester" because it is inside the "NATxy gateway". But then why it has been mentioned in "Initiator"? It is unknown, right? Do you mean the following text? * Initiator: The port of the Tester that may initiate a connection through the stateful DUT in the client to server direction. Theoretically, it can use any source and destination port numbers from the ranges recommended by [RFC4814 <https://datatracker.ietf.org/doc/html/rfc4814>]: if the used four tuple does not belong to an existing connection, the DUT will register a new connection into its connection tracking table. The connection tracking table is mentioned as an explanation, what happens. Yes, the content of the connection tracking table is unknown for both the Initiator and the Responder. > Section 4.1: if destination port numbers are so condensed around a few numbers in the real Internet then why "from a few to several hundreds or thousands as needed" on the test? YES, THIS IS A VERY MUCH VALID QUESTION! My answer is historical: because at the time of designing siitperf (the only stateful NAT64 / NAT44 Tester that supports the draft) I did not want to make too much work for myself and I used a single fixed IP address pair. Thus currently the only way to achieve hundreds of millions of connections with siitperf is to increase the destination port number range. Let is see some example numbers. If we have 40,000 source port numbers, then: - 10 destination port numbers result in 400,000 connections - 100 destination port numbers result in 4M connections - 1000 destination port numbers result in 40M connections Regarding this design decision of using a single IP address pair, Section 4.4. of our draft currently says: 1. A single source address destination address pair is used for all tests. We make this assumption for simplicity. Of course, we are aware that [RFC2544 <https://datatracker.ietf.org/doc/html/rfc2544>] requires testing also with 256 different destination networks. We have discussed it with my co-author, Keiichi Shima, and we believe that: - On the one hand, the usage of multiple destination NETWORKS is not interesting here, because we do not do router testing. - On the other hand, the usage of multiple IP ADDRESSES may be appropriate here, especially because of our experience with OpenBSD. Please see slides 11 and 12 of my ITEF 115 BMWG presentation: https://datatracker.ietf.org/meeting/115/materials/slides-115-bmwg-benchmarking-methodology-for-stateful-natxy-gateways-using-rfc-4814-pseudorandom-port-numbers It was our question that "*Shall we add the usage of multiple IP addresses as a **requirement?*", and _we still have this question open._ Using multiple IP addresses may be useful not only to generate entropy for the hash function to support RSS, but also to generate a high number of network flows using a low(er) number of destination port numbers. What do you think? However, then we must also convince the authors of RFC 4814 (and BMWG members) that narrowing down the destination port number range is not a violation of the following requirement in Section 4.5 of RFC 4814: In addition, it may be desirable to pick pseudorandom values from a selected pool of numbers. Many services identify themselves through use of reserved destination port numbers between 1 and 49151 inclusive. Unless specific port numbers are required, it is RECOMMENDED to pick randomly distributed destination port numbers between these lower and upper boundaries. > It is probably not important to mention that one MAP subscriber has a very limited number of source ports. Because "Border Relay" (our DUT) would have many-many subscribers. Yes, when testing MAP BR, it is fair to assume many MAP CE-s. But MAP BR is a stateless device, and thus, it is out of scope of our draft. However, MAP CE is stateful, it may be in scope. > Section 4.4: Why do we manipulate ports but keep only one IP address? Real scenarios manipulate IP addresses very much (tens of thousands of subscribers are possible for one NAT gateway). Is it possible to recommend at least source address randomization too? (destination addresses would be very limited in the wild Internet) Please see my answers above: I am not against using multiple IP addresses. However, if the Initiator uses only multiple SOURCE IP addresses, then they will appear only as multiple source port numbers after the stateful NATxy gateway translated the packet. Therefore, they will not be visible, when the Responder generates traffic based on the received four tuples. If the Initiator uses multiple DESTINATION IP addresses, they will "survive" the stateful NATxy translation and they will also appear in the traffic generated by the Responder. >> Important warning: in normal (non-NAT) router testing, the port >> number selection algorithm, whether it is pseudo-random or enumerated >> in increasing (or decreasing) order does not affect final results. > In reality, it affects. Because depending on the router configuration, it may use port numbers for hash load-balancing. It would affect the load distribution of many links. I think that the order should not count, if the hash function is good enough. > Section 4.5: >> In practice, we RECOMMEND the usage of binary search. > We could listen to the vendor's claims and choose the initial point more smartly. I am not sure, if I understand what you mean. Do you mean that compared to using 0 and the maximum frame rate for the media as the initial lower bound and the initial upper bound, respectively, one can have a better starting point for the binary search? > Section 4.5.1: >> We RECOMMEND median as the summarizing function of the >> results complemented with the first percentile and the 99th >> percentile as indices of the dispersion of the results. >> connections/s 1st perc. (req.) >> connections/s 99th perc. (req.) > When I was a student I have been told (around 1987) that "standard deviation" is better to use for tests data filtering. > The theory is here:https://en.wikipedia.org/?title=3-sigma&redirect=no > In the case of Microsoft Excel it is something like this: > Min=AVERAGE(Array)-STDEV(Array) > Max=AVERAGE(Array)+STDEV(Array) > Look here:https://www.ablebits.com/office-addins-blog/calculate-standard-deviation-excel/ Yes, they are very good, if the results follow NORMAL DISTRIBUTION. However, I believe that we may not assume normal distribution here, because, according to my experience, the distribution of the results of the multiple experiments is not even symmetric. Please consider that due to some unexpected event, some frames may be lost, and thus there are outliers among the results towards zero. However, there should not be significant outliers in the other direction in a well working system. (There is no "accidental" high performance.) We chose median as summarizing function, because it is less sensitive to outliers than average. Section 7.2 of RFC 8219 redefines the Latency measurement of RFC 2544 to provide better quality results, and it recommends: To account for the variation, the 1st and 99th percentiles of the 20 iterations MAY be reported in two separated columns. We followed that approach. > Section 4.6 >> * The Initiator counts the received frames, and if all N frames are >> arrived then the R frame rate of the maximum connection >> establishment rate measurement (performed in the preliminary test >> phase) is raised for the next iteration, otherwise lowered (as >> well as in the case if test frames were missing in the preliminary >> test phase). > I do not remember such a capability on hardware testers (it was many years ago). But hardware testers are mandatory to test Engine that is close to 100Gbps. > We need somebody from Spirent to judge: is it possible? It can be easily implemented. It is just that all the elements of the state table of the Responder are to be used for packet generation. The "cheapest" solution is to use them in linear order. (It can be done with siitperf.) A more sophisticated implementation can use them in pseudorandom order. (I haven't yet implemented it, but it is only matter of time and work.) > Section 4.8: how are you going to delete the connection table for the hardware device? CLI may return the prompt even before the job is finished. The platform may be asynchronous. > Moreover, because of the: >> We are aware that the performance of removing the entire content of >> the connection tracking table at one time may be different from >> removing all the entries one by one. > The value of such a test is very questionable. Yes, I share all your concerns. However, up to now, I received consistent results, when I used different number of connections. You can find them in Table 5 and Table 10 of my before mentioned draft for iptables and Jool. Since then, I tested it also for OpenBSD PF. > Alternatively, it is possible to establish connections very fast (the same rate that was tested as a maximum) and then continue to send traffic over 1st and the last sessions. Sessions would expire according to the creation time. It would be possible to monitor that tear down time is not worse than the connection establishment time. > I was working for a decade for a vendor selling NAT/FW. I did never discuss with customers/partners/vendor_R&D tear down performance. I propose deleting it completely. Well, connection tear down performance was also overlooked in RFC 8219, of which I am a co-author. However, after my experience with iptables, I cannot overlook it any more, because its connection tear down performance is much lower than its connection establishment performance (as I mentioned above). Note: their proportion may depend on the ratio of the "hashsize" and "nf_conntrack_max" parameters. > Section 4.9: >> DUT was exhausted and it stopped responding >> if the DUT collapses > It should be qualified as a test fail. I definitely agree with it. In theory, it is easy to do so. But in practice, we need some mechanism to detect it, reboot the server, wait until it is ready and continue on the binary search with the lower half interval. > It is a severe bug when the device is "not responding". The device should drop packets after any resources are exhausted but be available for management and monitoring. I agree with you that is should be so. But unfortunately, I experienced the opposite. Once again, thank you very much for all your work reading and commenting our draft!!! Best regards, Gábor > Eduard > -----Original Message----- > From: bmwg [mailto:bmwg-bounces@ietf.org] On Behalf Ofinternet-drafts@ietf.org > Sent: Thursday, October 20, 2022 5:14 AM > To:i-d-announce@ietf.org > Cc:bmwg@ietf.org > Subject: [bmwg] I-D Action: draft-ietf-bmwg-benchmarking-stateful-01.txt > > > A New Internet-Draft is available from the on-line Internet-Drafts directories. > This draft is a work item of the Benchmarking Methodology WG of the IETF. > > Title : Benchmarking Methodology for Stateful NATxy Gateways using RFC 4814 Pseudorandom Port Numbers > Authors : Gabor Lencse > Keiichi Shima > Filename : draft-ietf-bmwg-benchmarking-stateful-01.txt > Pages : 25 > Date : 2022-10-19 > > Abstract: > RFC 2544 has defined a benchmarking methodology for network > interconnect devices. RFC 5180 addressed IPv6 specificities and it > also provided a technology update, but excluded IPv6 transition > technologies. RFC 8219 addressed IPv6 transition technologies, > including stateful NAT64. However, none of them discussed how to > apply RFC 4814 pseudorandom port numbers to any stateful NATxy > (NAT44, NAT64, NAT66) technologies. We discuss why using > pseudorandom port numbers with stateful NATxy gateways is a difficult > problem. We recommend a solution limiting the port number ranges and > using two phases: the preliminary test phase and the real test phase. > We show how the classic performance measurement procedures (e.g. > throughput, frame loss rate, latency, etc.) can be carried out. We > also define new performance metrics and measurement procedures for > maximum connection establishment rate, connection tear down rate and > connection tracking table capacity measurements. > > > The IETF datatracker status page for this draft is: > https://datatracker.ietf.org/doc/draft-ietf-bmwg-benchmarking-stateful/ > > There is also an htmlized version available at: > https://datatracker.ietf.org/doc/html/draft-ietf-bmwg-benchmarking-stateful-01 > > A diff from the previous version is available at: > https://www.ietf.org/rfcdiff?url2=draft-ietf-bmwg-benchmarking-stateful-01 > > > Internet-Drafts are also available by rsync at rsync.ietf.org::internet-drafts > > > _______________________________________________ > bmwg mailing list > bmwg@ietf.org > https://www.ietf.org/mailman/listinfo/bmwg >
- [bmwg] I-D Action: draft-ietf-bmwg-benchmarking-s… internet-drafts
- Re: [bmwg] I-D Action: draft-ietf-bmwg-benchmarki… Vasilenko Eduard
- Re: [bmwg] I-D Action: draft-ietf-bmwg-benchmarki… Gabor LENCSE
- Re: [bmwg] I-D Action: draft-ietf-bmwg-benchmarki… Vasilenko Eduard
- Re: [bmwg] I-D Action: draft-ietf-bmwg-benchmarki… Keiichi SHIMA
- Re: [bmwg] I-D Action: draft-ietf-bmwg-benchmarki… Vasilenko Eduard
- [bmwg] security issues of iptables and Jool -- Re… Gabor LENCSE