[Iot-directorate] [iotdir] telechat Review for draft-ietf-bmwg-ngfw-performance-13

tte@cs.fau.de Sun, 30 January 2022 12:16 UTC

Date: Sun, 30 Jan 2022 13:16:29 +0100
From: tte@cs.fau.de
To: iot-directorate@ietf.org, evyncke@cisco.com
Cc: draft-ietf-bmwg-ngfw-performance.all@ietf.org, mariainesrobles@googlemail.com, bmwg@ietf.org
Message-ID: <YfaBna5xH7hrmcfC@faui48e.informatik.uni-erlangen.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/iot-directorate/CekKXOGwr2GugcyvnFf8LbrOFRU>
Subject: [Iot-directorate] [iotdir] telechat Review for draft-ietf-bmwg-ngfw-performance-13
Precedence: list

Reviewer: toerless eckert
Review result: On the right track

Summary:
Thanks a lot for this work. Its an immensibley complex and important problem to
tackle. I have in my time only measured router traffic performance and that already was an
infinite matrix. This looks to me like some order of infinite bigger a problem.

Meaning: however questioning my reviews feedback may be wrt. nitpicking about the document,
i think that the document in its existing form is already a great advancement to measure
performance for these security devices, and in doubt should be progressed rather faster than
slower especially because in my (limited) market understanding, many security device vendors will
only provide actual feedback once it is an RFC (community i think overall more conservative in
adopting IETF work, most not proactively engaging during draft stage).

But of course: feel free to improve the document with any of the feedback/suggestions
in my review that you feel are useful.

Maybe high level, i would suggest most importantly to add more explanations, especially in
an appropriate section about those aspects known NOT to be considered (but potentially
important) so that the applicability of the tests that are described are better put into
perspective by adopters of the draft to their real-world situations.

Favorite pet topic: Add req. to measure the DUT through a power meter and report consumption
so we can start making sure products with lower power consumptions will see sales benefits
when reporting numbers from this document (see details inline).

Formal:
I choose to keep the whole document inline to make it easier for readers to vet
my comments without having to open in parallel a copy of the whole document.

Rest inline - email ends with string EOF (i have seen some email truncation happening).

Thanks!
Toerless

---
Please fix the following nits - from https://www.ietf.org/tools/idnits
idnits 2.17.00 (12 Aug 2021)

> /tmp/idnits29639/draft-ietf-bmwg-ngfw-performance-13.txt:
> ...
>
> Checking nits according to https://www.ietf.org/id-info/checklist :
> ----------------------------------------------------------------------------
>
> ** The abstract seems to contain references ([RFC3511]), which it
> shouldn't. Please replace those with straight textual mentions of the
> documents in question.
>
> == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
> in the document. If these are example addresses, they should be changed.
>
> == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses
> in the document. If these are example addresses, they should be changed.
>
> -- The draft header indicates that this document obsoletes RFC3511, but the
> abstract doesn't seem to directly say this. It does mention RFC3511
> though, so this could be OK.
>
>
> Miscellaneous warnings:
> ----------------------------------------------------------------------------
>
> == The document seems to lack the recommended RFC 2119 boilerplate, even if
> it appears to use RFC 2119 keywords.
>
> (The document does seem to have the reference to RFC 2119 which the
> ID-Checklist requires).
>
>

The lines in the following commented copy of the document are from idnits too

When a comment/question is preceeded with "Nit:", then it indicates, that
it seems to me the best answer would be modified draft text.

When a comment/question is preceeded with "Q:", then i am actually not so
sure what the outcome could be, so an answer in mail would be a start.

2 Benchmarking Methodology Working Group B. Balarajah
3 Internet-Draft
4 Obsoletes: 3511 (if approved) C. Rossenhoevel
5 Intended status: Informational EANTC AG
6 Expires: 16 July 2022 B. Monkman
7 NetSecOPEN
8 January 2022

10 Benchmarking Methodology for Network Security Device Performance
11 draft-ietf-bmwg-ngfw-performance-13

13 Abstract

15 This document provides benchmarking terminology and methodology for
16 next-generation network security devices including next-generation
17 firewalls (NGFW), next-generation intrusion prevention systems
18 (NGIPS), and unified threat management (UTM) implementations. The

Nit: Why does it have to be next-generation for all example type
of devices except for UTMs, and what does next-generation mean.
Would suggest to rewrite text so reader does not ask herself these
questions.

18 (NGIPS), and unified threat management (UTM) implementations. The
19 main areas covered in this document are test terminology, test
20 configuration parameters, and benchmarking methodology for NGFW and
21 NGIPS. This document aims to improve the applicability,

I don't live and breathe the security device TLA space, but i start to
suspect a UTM is some platform on which FW and IPS could run as software
modules, and because its only software you assume the UTM does not have
to be next-gen ? I wonder how much of this guesswork/thought process you
want the reader to have or if you want to avoid that by being somehawt
clearer...

21 NGIPS. This document aims to improve the applicability,
22 reproducibility, and transparency of benchmarks and to align the test
23 methodology with today's increasingly complex layer 7 security
24 centric network application use cases. As a result, this document
25 makes [RFC3511] obsolete.

[minor] I kinda wonder if / how obsoleting RFC3511 could/should work.
I understand when we do a bis of a standard protocol and really don't
want anyone to implement the older version. But unless there is a
similar IETF mandate going along with this draft that says
non-NG FW and non-NG IPS are hereby obsoleted by the IETF, i can not
see how this draft can obsolete RFC3511 because it simply applies
to a different type of benchmarked entities. And RFC3511 would stay
on forever for whatever we call non-NG.

[minor] At least i think that is the case Unless this document actually does apply also
to non-NG FW/IPS and can therefore superceed RFC3511 and actually obsolete it. But the
text so far does say the opposite.

[mayor] I observe that RFC3511 asks to measure and report goodput (5.6.5.2), and this document
does not mention the term, and if at all, the loss in performance of client/server TCP
or QUIC connections through behavior of the DUT (such as proxying) is at best
covered indirectly by mentioning parameters such as less than 5% reduction in
throughput. If this document is superceeding rfc3511 i think it should have a very
explicit section discussing goodput - and maybe expanding on it.

consider for example the impact of TCP connection throughput and goodput.
Very likely DUT proxying TCP connections will have quite a different performance/goodput
impact for a calssical web-page vs. video streaming. Therefore i am also worried
about sending only average bitrates per session as opposed to some sessions going
up to e.g. 500Mbps for a video streaming connection (example best commercial available
UHD video streaming today). Those type of sessions might incur a lot of goodput loss
with bad DUTs, but if i understand the test profile, then the per-TCP connection througput
of the test profiles will be much less than 100Mbps. If such range in client session
bitrates is not meant to be tested, it might at least be useful to add a section listing
candidate gaps like this. Another one for example is the impact of higher RTT especially
between DUT and server in the Internet. This mostly challenges TCP window size
operation on DUT operating as TCP hosts and also their ability to buffer for retransmissions.
Test Equipment IMHO may/should be able to emulate such long RTT. But this is not included
in this document (RTT not mentioned).

Beside goodput related issues, there are a couple other points in this review that may be too
difficult to fix this late in the development of the document, but maybe for any of those
considered to be useful input maybe add them to a section "out-of-scope (for future versions)
considerations" or the like to capture them.

27 Status of This Memo

29 This Internet-Draft is submitted in full conformance with the
30 provisions of BCP 78 and BCP 79.

32 Internet-Drafts are working documents of the Internet Engineering
33 Task Force (IETF). Note that other groups may also distribute
34 working documents as Internet-Drafts. The list of current Internet-
35 Drafts is at https://datatracker.ietf.org/drafts/current/.

37 Internet-Drafts are draft documents valid for a maximum of six months
38 and may be updated, replaced, or obsoleted by other documents at any
39 time. It is inappropriate to use Internet-Drafts as reference
40 material or to cite them other than as "work in progress."

42 This Internet-Draft will expire on 5 July 2022.

44 Copyright Notice

49 This document is subject to BCP 78 and the IETF Trust's Legal
50 Provisions Relating to IETF Documents (https://trustee.ietf.org/
51 license-info) in effect on the date of publication of this document.
52 Please review these documents carefully, as they describe your rights
53 and restrictions with respect to this document. Code Components
54 extracted from this document must include Revised BSD License text as
55 described in Section 4.e of the Trust Legal Provisions and are
56 provided without warranty as described in the Revised BSD License.

58 Table of Contents

60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
61 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 4
62 3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
63 4. Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . 4
64 4.1. Testbed Configuration . . . . . . . . . . . . . . . . . . 5
65 4.2. DUT/SUT Configuration . . . . . . . . . . . . . . . . . . 6
66 4.2.1. Security Effectiveness Configuration . . . . . . . . 12
67 4.3. Test Equipment Configuration . . . . . . . . . . . . . . 12
68 4.3.1. Client Configuration . . . . . . . . . . . . . . . . 12
69 4.3.2. Backend Server Configuration . . . . . . . . . . . . 15
70 4.3.3. Traffic Flow Definition . . . . . . . . . . . . . . . 17
71 4.3.4. Traffic Load Profile . . . . . . . . . . . . . . . . 17
72 5. Testbed Considerations . . . . . . . . . . . . . . . . . . . 18
73 6. Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . 19
74 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19
75 6.2. Detailed Test Results . . . . . . . . . . . . . . . . . . 21
76 6.3. Benchmarks and Key Performance Indicators . . . . . . . . 21
77 7. Benchmarking Tests . . . . . . . . . . . . . . . . . . . . . 23
78 7.1. Throughput Performance with Application Traffic Mix . . . 23
79 7.1.1. Objective . . . . . . . . . . . . . . . . . . . . . . 23
80 7.1.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 23
81 7.1.3. Test Parameters . . . . . . . . . . . . . . . . . . . 23
82 7.1.4. Test Procedures and Expected Results . . . . . . . . 25
83 7.2. TCP/HTTP Connections Per Second . . . . . . . . . . . . . 26
84 7.2.1. Objective . . . . . . . . . . . . . . . . . . . . . . 26
85 7.2.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 27
86 7.2.3. Test Parameters . . . . . . . . . . . . . . . . . . . 27
87 7.2.4. Test Procedures and Expected Results . . . . . . . . 28
88 7.3. HTTP Throughput . . . . . . . . . . . . . . . . . . . . . 30
89 7.3.1. Objective . . . . . . . . . . . . . . . . . . . . . . 30
90 7.3.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 30
91 7.3.3. Test Parameters . . . . . . . . . . . . . . . . . . . 30
92 7.3.4. Test Procedures and Expected Results . . . . . . . . 32
93 7.4. HTTP Transaction Latency . . . . . . . . . . . . . . . . 33
94 7.4.1. Objective . . . . . . . . . . . . . . . . . . . . . . 33
95 7.4.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 33
96 7.4.3. Test Parameters . . . . . . . . . . . . . . . . . . . 34
97 7.4.4. Test Procedures and Expected Results . . . . . . . . 35
98 7.5. Concurrent TCP/HTTP Connection Capacity . . . . . . . . . 36
99 7.5.1. Objective . . . . . . . . . . . . . . . . . . . . . . 36
100 7.5.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 36
101 7.5.3. Test Parameters . . . . . . . . . . . . . . . . . . . 37
102 7.5.4. Test Procedures and Expected Results . . . . . . . . 38
103 7.6. TCP/HTTPS Connections per Second . . . . . . . . . . . . 39
104 7.6.1. Objective . . . . . . . . . . . . . . . . . . . . . . 40
105 7.6.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 40
106 7.6.3. Test Parameters . . . . . . . . . . . . . . . . . . . 40
107 7.6.4. Test Procedures and Expected Results . . . . . . . . 42
108 7.7. HTTPS Throughput . . . . . . . . . . . . . . . . . . . . 43
109 7.7.1. Objective . . . . . . . . . . . . . . . . . . . . . . 43
110 7.7.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 43
111 7.7.3. Test Parameters . . . . . . . . . . . . . . . . . . . 43
112 7.7.4. Test Procedures and Expected Results . . . . . . . . 45
113 7.8. HTTPS Transaction Latency . . . . . . . . . . . . . . . . 46
114 7.8.1. Objective . . . . . . . . . . . . . . . . . . . . . . 46
115 7.8.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 46
116 7.8.3. Test Parameters . . . . . . . . . . . . . . . . . . . 46
117 7.8.4. Test Procedures and Expected Results . . . . . . . . 48
118 7.9. Concurrent TCP/HTTPS Connection Capacity . . . . . . . . 49
119 7.9.1. Objective . . . . . . . . . . . . . . . . . . . . . . 49
120 7.9.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 49
121 7.9.3. Test Parameters . . . . . . . . . . . . . . . . . . . 49
122 7.9.4. Test Procedures and Expected Results . . . . . . . . 51
123 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 52
124 9. Security Considerations . . . . . . . . . . . . . . . . . . . 53
125 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 53
126 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 53
127 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 53
128 12.1. Normative References . . . . . . . . . . . . . . . . . . 53
129 12.2. Informative References . . . . . . . . . . . . . . . . . 53
130 Appendix A. Test Methodology - Security Effectiveness
131 Evaluation . . . . . . . . . . . . . . . . . . . . . . . 54
132 A.1. Test Objective . . . . . . . . . . . . . . . . . . . . . 55
133 A.2. Testbed Setup . . . . . . . . . . . . . . . . . . . . . . 55
134 A.3. Test Parameters . . . . . . . . . . . . . . . . . . . . . 55
135 A.3.1. DUT/SUT Configuration Parameters . . . . . . . . . . 55
136 A.3.2. Test Equipment Configuration Parameters . . . . . . . 55
137 A.4. Test Results Validation Criteria . . . . . . . . . . . . 56
138 A.5. Measurement . . . . . . . . . . . . . . . . . . . . . . . 56
139 A.6. Test Procedures and Expected Results . . . . . . . . . . 57
140 A.6.1. Step 1: Background Traffic . . . . . . . . . . . . . 57
141 A.6.2. Step 2: CVE Emulation . . . . . . . . . . . . . . . . 58
142 Appendix B. DUT/SUT Classification . . . . . . . . . . . . . . . 58
143 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 58

145 1. Introduction

147 18 years have passed since IETF recommended test methodology and
148 terminology for firewalls initially ([RFC3511]). The requirements
149 for network security element performance and effectiveness have
150 increased tremendously since then. In the eighteen years since

[nit] What is a network security element ? Please provide reference or define.
If we are talking about them in this doc why are they not mentioned in the
abstract ?

150 increased tremendously since then. In the eighteen years since
151 [RFC3511] was published, recommending test methodology and
152 terminology for firewalls, requirements and expectations for network
153 security elements has increased tremendously. Security function

[nit] This does not parse as correct english to me "recommending test methodology ...
has increased tremendously". It would, if you mean that more and more
test methodologies where recommended, but not if there is an outstanding
need to do so (which this document intends to fill).

[nit] Why does the recommending part apply only to firewalls and the requirements
and expectations only to security elements ?

153 security elements has increased tremendously. Security function

[nit] What is a security function ? (i know, but i don't know if the reader is
supposed to know). Aka: provide reference, add terminology section or define.
Maybe easiest to restructure this intro paragraph to start with the
explanation of the evolution from firewalls to network security elements
which support one or more securit functions including firewall, intrusion
detection etc. pp - and then conclude easily how this means that this
requires this document to define all the good BMWG stuff it hopefully does.

Although a terminology section is never a bad thing either ;-)

154 implementations have evolved to more advanced areas and have
155 diversified into intrusion detection and prevention, threat
156 management, analysis of encrypted traffic, etc. In an industry of
157 growing importance, well-defined, and reproducible key performance
158 indicators (KPIs) are increasingly needed to enable fair and
159 reasonable comparison of network security functions. All these

[nit] maybe add what to compare - performance, functionality, scale,
flexibility, adjustability - or if you knowingly only discuss subsets
of these aspects, then maybe still list all the aspects you are aware
of to be of interest to likely readers of this document and summarize
those that you will and those that you won't cover in this document, so
that the readers don't have to continue reading the document hoping to find them described.

160 reasons have led to the creation of a new next-generation network
161 security device benchmarking document, which makes [RFC3511]
162 obsolete.

[nit] as mentioned above, whether or not the obsolete is true is
not clear to me yet.

164 2. Requirements

166 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
167 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
168 "OPTIONAL" in this document are to be interpreted as described in BCP
169 14 [RFC2119], [RFC8174] when, and only when, they appear in all
170 capitals, as shown here.

172 3. Scope

174 This document provides testing terminology and testing methodology
175 for modern and next-generation network security devices that are
176 configured in Active ("Inline", see Figure 1 and Figure 2) mode. It

[nit] The word Active does not again happen in the document, instead, the
description on line 261 defines Inline mode as "active", which in my book
makes 176+261 a perfect circular definition. I would suggest to have a
terminology section that define "Inline", for example by also adding one
most likely possible alternative mode description.

177 covers the validation of security effectiveness configurations of

[nit] security configuration effectiveness ?

179 network security devices, followed by performance benchmark testing.
179 This document focuses on advanced, realistic, and reproducible
180 testing methods. Additionally, it describes testbed environments,

[nit] are you sure advanced and realistic are meant to characterize the
testing method or the scenario that is being tested ? "reroducible
testing methods for advanced real world scenarios" ?

181 test tool requirements, and test result formats.

183 4. Test Setup

185 Test setup defined in this document applies to all benchmarking tests

[nit] "/Test setup defined/The test setup defined/

186 described in Section 7. The test setup MUST be contained within an
187 Isolated Test Environment (see Section 3 of [RFC6815]).

189 4.1. Testbed Configuration

191 Testbed configuration MUST ensure that any performance implications
192 that are discovered during the benchmark testing aren't due to the

[nit] /aren't/are not/

193 inherent physical network limitations such as the number of physical
194 links and forwarding performance capabilities (throughput and
195 latency) of the network devices in the testbed. For this reason,
196 this document recommends avoiding external devices such as switches
197 and routers in the testbed wherever possible.

199 In some deployment scenarios, the network security devices (Device
200 Under Test/System Under Test) are connected to routers and switches,
201 which will reduce the number of entries in MAC or ARP tables of the
202 Device Under Test/System Under Test (DUT/SUT). If MAC or ARP tables
203 have many entries, this may impact the actual DUT/SUT performance due
204 to MAC and ARP/ND (Neighbor Discovery) table lookup processes. This
205 document also recommends using test equipment with the capability of

[nit] /also/therefore/

206 emulating layer 3 routing functionality instead of adding external
207 routers in the testbed.

209 The testbed setup Option 1 (Figure 1) is the RECOMMENDED testbed
210 setup for the benchmarking test.

212 +-----------------------+ +-----------------------+
213 | +-------------------+ | +-----------+ | +-------------------+ |
214 | | Emulated Router(s)| | | | | | Emulated Router(s)| |
215 | | (Optional) | +----- DUT/SUT +-----+ (Optional) | |
216 | +-------------------+ | | | | +-------------------+ |
217 | +-------------------+ | +-----------+ | +-------------------+ |
218 | | Clients | | | | Servers | |
219 | +-------------------+ | | +-------------------+ |
220 | | | |
221 | Test Equipment | | Test Equipment |
222 +-----------------------+ +-----------------------+

224 Figure 1: Testbed Setup - Option 1

226 If the test equipment used is not capable of emulating layer 3
227 routing functionality or if the number of used ports is mismatched
228 between test equipment and the DUT/SUT (need for test equipment port
229 aggregation), the test setup can be configured as shown in Figure 2.

231 +-------------------+ +-----------+ +--------------------+
232 |Aggregation Switch/| | | | Aggregation Switch/|
233 | Router +------+ DUT/SUT +------+ Router |
234 | | | | | |
235 +----------+--------+ +-----------+ +--------+-----------+
236 | |
237 | |
238 +-----------+-----------+ +-----------+-----------+
239 | | | |
240 | +-------------------+ | | +-------------------+ |
241 | | Emulated Router(s)| | | | Emulated Router(s)| |
242 | | (Optional) | | | | (Optional) | |
243 | +-------------------+ | | +-------------------+ |
244 | +-------------------+ | | +-------------------+ |
245 | | Clients | | | | Servers | |
246 | +-------------------+ | | +-------------------+ |
247 | | | |
248 | Test Equipment | | Test Equipment |
249 +-----------------------+ +-----------------------+

251 Figure 2: Testbed Setup - Option 2

[nit] Please elaborate on the "number of used ports", and if possible show in
Figure 2 by drawing multiple links. I guess that in a common case, the test
equipment might provide few, but fast ports, whereas the DUT/SU might provide
more slower ports, and one would there use external switches as port multiplexer ?
Or vice-versa ? Butif such adaptation is performed, i wonder how different
setup might impact the measurements. So for example let's say the Test Equipment
(TE) has a 100Gbps port, and the DUT has 4 * 10Gbps port, so you need on each
side a switch with 100Gbps and 2 * 10 Gbps. Would you try to use VLANs into the
TE, or would you just build a single LAN. Any recommendations for the switch
config, and why.

[mayor] The fact that the left side says only client and the right side says only
server is worth some more discussion. Especially because the Filtering in
Figure 3 also lets me wonder in which direction traffic is meant to be filtered/inspected.
Are you considering the case that clients are responders to (TCP/QUIC/UDP) connections ?
For example left side is "inside", DUT is a site firewall to the Internet (right side),
and there is some server on the left side (e.g.: SMTP). How about that you do
have on the right an Internet and a separate site DMZ interface and then of course
traffic not only between left and right, but between those interfaces on the right ?

More broadly applicable, dynamic port discovery for ICE/STUN, where you want to permit
inside to outside connections (to the STUN server) to permit new connections from other
external nodes to go back inside). E.g.: would be good to have some elaboration about the
rype of connections covered by this document. If its only initiators on the left and
responders on the right, that is fine, but it should be said so and maybe point to
those above cases (DMZ, inside servers, STUN/ICE) not covered by this document.

253 4.2. DUT/SUT Configuration

255 A unique DUT/SUT configuration MUST be used for all benchmarking
256 tests described in Section 7. Since each DUT/SUT will have its own
257 unique configuration, users SHOULD configure their device with the
258 same parameters and security features that would be used in the
259 actual deployment of the device or a typical deployment in order to
260 achieve maximum network security coverage. The DUT/SUT MUST be

[nit] What is a "unique configuration" ? It could be different configurations
across two different DUT but both achieving the same service/filtering, just
difference in syntax, or it could be difference in functional outcome. Would
be good to be more precise what is meant.

[nit] Why would a user choose an actual deployment vs. a typical deployment ?
I am imagining that a user would choose an actual deployment to measure performance
specifically for that deployment but a typical deployment when the DUT would
need to be deployed in different setups but not each of those can be measured
individually, or because the results are meant to be comparable with other
users who may have taken performance numbers. WOuld be good to elaborate a bit
more so readers have a clearer understanding what "actual deployment" and
"typical deployment" means and how/why to pick one over the other.

[nit] I do not understand how the text up to "in order to" justifies that it will
achieve the maximum network security coverage. I also do not know what
"maximum network security coverage" means. If there is a definition, please
provide it. Else introduce it.

260 achieve maximum network security coverage. The DUT/SUT MUST be
261 configured in "Inline" mode so that the traffic is actively inspected
262 by the DUT/SUT. Also "Fail-Open" behavior MUST be disabled on the
263 DUT/SUT.

265 Table 1 and Table 2 below describe the RECOMMENDED and OPTIONAL sets
266 of network security feature list for NGFW and NGIPS respectively.
267 The selected security features SHOULD be consistently enabled on the
268 DUT/SUT for all benchmarking tests described in Section 7.

270 To improve repeatability, a summary of the DUT/SUT configuration
271 including a description of all enabled DUT/SUT features MUST be
272 published with the benchmarking results.

274 +============================+=============+==========+
275 | DUT/SUT (NGFW) Features | RECOMMENDED | OPTIONAL |
276 +============================+=============+==========+
277 | SSL Inspection | x | |
278 +----------------------------+-------------+----------+
279 | IDS/IPS | x | |
280 +----------------------------+-------------+----------+
281 | Anti-Spyware | x | |
282 +----------------------------+-------------+----------+
283 | Anti-Virus | x | |
284 +----------------------------+-------------+----------+
285 | Anti-Botnet | x | |
286 +----------------------------+-------------+----------+
287 | Web Filtering | | x |
288 +----------------------------+-------------+----------+
289 | Data Loss Protection (DLP) | | x |
290 +----------------------------+-------------+----------+
291 | DDoS | | x |
292 +----------------------------+-------------+----------+
293 | Certificate Validation | | x |
294 +----------------------------+-------------+----------+

[mayor] This may be bogs because i don't know well enough how for the purpose
of this document security devices are expected to inspect HTTP connections
from client to server. Maybe this is a sane approach where the security device
operates as a client trusted HTTPs proxy, maybe its one of the more hacky approaches
(faked server certs). But however it works, i think that a security device can not
get away from validating the certificate of the server in a connection. Else
it shouldn't be called a security DUT.

But i am not sure if that validation is what you call "Certificate Validation".

294 +----------------------------+-------------+----------+
295 | Logging and Reporting | x | |
296 +----------------------------+-------------+----------+
297 | Application Identification | x | |
298 +----------------------------+-------------+----------+

300 Table 1: NGFW Security Features

[nit] Why are "Web Filtering"..."Certificate Validation" only MAY ?
Please point to a place in the document (or elsewhere) that rationales
the SHOULD/MAY recommendations. Same applies to Table 2.

[nit]

302 +============================+=============+==========+
303 | DUT/SUT (NGIPS) Features | RECOMMENDED | OPTIONAL |
304 +============================+=============+==========+
305 | SSL Inspection | x | |
306 +----------------------------+-------------+----------+
307 | Anti-Malware | x | |
308 +----------------------------+-------------+----------+
309 | Anti-Spyware | x | |
310 +----------------------------+-------------+----------+
311 | Anti-Botnet | x | |
312 +----------------------------+-------------+----------+
313 | Logging and Reporting | x | |
314 +----------------------------+-------------+----------+
315 | Application Identification | x | |
316 +----------------------------+-------------+----------+
317 | Deep Packet Inspection | x | |
318 +----------------------------+-------------+----------+
319 | Anti-Evasion | x | |
320 +----------------------------+-------------+----------+

322 Table 2: NGIPS Security Features

[nit] I ended up scrolling up and down to compare the tables.
It might be useful for other readers like me to merge the tables,
aka: put the columns for NGFW and NGIPS into one table.

[nit] Please start with Table 3 as it introduces the security features,
else the two above tables introduce a lot of features without defining them.

324 The following table provides a brief description of the security
325 features.

327 +================+================================================+
328 | DUT/SUT | Description |
329 | Features | |
330 +================+================================================+
331 | SSL Inspection | DUT/SUT intercepts and decrypts inbound HTTPS |
332 | | traffic between servers and clients. Once the |
333 | | content inspection has been completed, DUT/SUT |
334 | | encrypts the HTTPS traffic with ciphers and |
335 | | keys used by the clients and servers. |
336 +----------------+------------------------------------------------+
337 | IDS/IPS | DUT/SUT detects and blocks exploits targeting |
338 | | known and unknown vulnerabilities across the |
339 | | monitored network. |
340 +----------------+------------------------------------------------+
341 | Anti-Malware | DUT/SUT detects and prevents the transmission |
342 | | of malicious executable code and any |
343 | | associated communications across the monitored |
344 | | network. This includes data exfiltration as |
345 | | well as command and control channels. |
346 +----------------+------------------------------------------------+
347 | Anti-Spyware | Anti-Spyware is a subcategory of Anti Malware. |
348 | | Spyware transmits information without the |
349 | | user's knowledge or permission. DUT/SUT |
350 | | detects and block initial infection or |
351 | | transmission of data. |
352 +----------------+------------------------------------------------+
353 | Anti-Botnet | DUT/SUT detects traffic to or from botnets. |
354 +----------------+------------------------------------------------+
355 | Anti-Evasion | DUT/SUT detects and mitigates attacks that |
356 | | have been obfuscated in some manner. |
357 +----------------+------------------------------------------------+
358 | Web Filtering | DUT/SUT detects and blocks malicious website |
359 | | including defined classifications of website |
360 | | across the monitored network. |
361 +----------------+------------------------------------------------+
362 | DLP | DUT/SUT detects and prevents data breaches and |
363 | | data exfiltration, or it detects and blocks |
364 | | the transmission of sensitive data across the |
365 | | monitored network. |
366 +----------------+------------------------------------------------+
367 | Certificate | DUT/SUT validates certificates used in |
368 | Validation | encrypted communications across the monitored |
369 | | network. |
370 +----------------+------------------------------------------------+
371 | Logging and | DUT/SUT logs and reports all traffic at the |
372 | Reporting | flow level across the monitored network. |
373 +----------------+------------------------------------------------+
374 | Application | DUT/SUT detects known applications as defined |
375 | Identification | within the traffic mix selected across the |
376 | | monitored network. |
377 +----------------+------------------------------------------------+

379 Table 3: Security Feature Description

[nit] Why is DDoS and DPI not listed in this table ? I just randomnly stumbled across
that one, but maybe there are more mismatches between Table 1 and 2. Pls. make
sure all Table 1/2 Features are mentioned.

[nit] I have a bout 1000 questions and concerns about this stuff: Are there
actually IETF specifications for how any of these features on the DUT do work or
should work, or is this all vendor proprietary functionality ? For anything that
is vendor / market proprietary specification, how would the TE (Test Equipment)
know what the DUT does, so that it can effectively test it ? I imagine that
if there is a difference in how a particular feature functions across different
vendor DUTs, that the same is true for TE, so some TE would have more functional
overlap with DUT than others. ?

[nit (continued] E.g.: lets say some DUT1 feature , e.g.: DLP is really simple
and therefore ot very secure. But that makes it a lot faster than a DUT2 DLP
feature which is a lot more secure. Maybe there is a metric for this security,
like if i rememver correctly from the past, the number of signatures in virus
detection or the like... How would such differences be taken into account in
measurement?

381 Below is a summary of the DUT/SUT configuration:

383 * DUT/SUT MUST be configured in "inline" mode.

385 * "Fail-Open" behavior MUST be disabled.

387 * All RECOMMENDED security features are enabled.

389 * Logging SHOULD be enabled. DUT/SUT SHOULD log all traffic at the
390 flow level - Logging to an external device is permissible.

[nit] Does that mean logging of ALL flows or only of flows that trigger some
security issue ? Logging of ALL flows seems like a big performance hog and
may be something infeasible in fast deployments and may need to be tested as
a separate case by itself. (but my concern may be outdated).

[nit] If logging is to an external device, it may be useful to indicate in
Figure 1/2 such a logging receiver, and ideally have it operate via a link from the DUT that
does not pass test traffic so that it does not interfere.

392 * Geographical location filtering, and Application Identification
393 and Control SHOULD be configured to trigger based on a site or
394 application from the defined traffic mix.

[nit] Geographic location filtering does not sound like a generically necessary
or applicable security feature. If you are for example a high-tech manufacturer
that sells all over the world, you may appreciate customers visiting your
webserver from countries that happen to also host a lot of botnets. Or is this
document focussed on a more narrower set of use-cases ? E.g.: DUT only to filter
anything that could can not put into the cloud (such as web services) ? E.g.:
would be good to write up some justification for the GeoLoc SHOULD that would
then help readers to better understand when/how to conffigure and and when/how not.

396 In addition, a realistic number of access control rules (ACL) SHOULD
397 be configured on the DUT/SUT where ACLs are configurable and
398 reasonable based on the deployment scenario. This document
399 determines the number of access policy rules for four different
400 classes of DUT/SUT: Extra Small (XS), Small (S), Medium (M), and
401 Large (L). A sample DUT/SUT classification is described in
402 Appendix B.

[mayor] IMHO, you can not put numbers such as those in Figure 3 into the main
text of the document, but the speed definitions of the four classes into an
Appendix B. It seems clear to me that the numbers in Figure 3 (and probably elsewhere) where
derived from the assumptions that the four speed classes are defined as in
Appendix B. Suggestion: inline the text of Appendix B here and mention that numbers
such as in Figure 3 are derived from the assumption of those XS/S/M/L numbers,
Add (if necessary, else not) that it may be appropriate to choose other numbers for
XS/S/M/L, but if one does that, then the dependent numbers (such as those from Figure 3)
may also need to be re-evaluated.

404 The Access Control Rules (ACL) defined in Figure 3 MUST be configured
405 from top to bottom in the correct order as shown in the table. This
406 is due to ACL types listed in specificity decreasing order, with
407 "block" first, followed by "allow", representing a typical ACL based
408 security policy. The ACL entries SHOULD be configured with routable
409 IP subnets by the DUT/SUT. (Note: There will be differences between
410 how security vendors implement ACL decision making.) The configured

[nit] /security vendors/DUT/

[nit] I don't understand what i am supposed to learn from the (Note: ...) sentence.
Rephrase ? or remove.

410 how security vendors implement ACL decision making.) The configured
411 ACL MUST NOT block the security and measurement traffic used for the
412 benchmarking tests.

[nit] what is "security traffic" ? what is "measurement traffic" ? Don't see these
terms defined before. Those two terms do not immediately click to me. I guess
measured user/client-server traffic vs. test-setup management traffic (including logging) ??
In any case introduce the terms, define them and use them consistently. Whatever they are.

414 +---------------+
415 | DUT/SUT |
416 | Classification|
417 | # Rules |
418 +-----------+-----------+--------------------+------+---+---+---+---+
419 | | Match | | | | | | |
420 | Rules Type| Criteria | Description |Action| XS| S | M | L |
421 +-------------------------------------------------------------------+
422 |Application|Application| Any application | block| 5 | 10| 20| 50|
423 |layer | | not included in | | | | | |
424 | | | the measurement | | | | | |
425 | | | traffic | | | | | |
426 +-------------------------------------------------------------------+
427 |Transport |SRC IP and | Any SRC IP subnet | block| 25| 50|100|250|
428 |layer |TCP/UDP | used and any DST | | | | | |
429 | |DST ports | ports not used in | | | | | |
430 | | | the measurement | | | | | |
431 | | | traffic | | | | | |
432 +-------------------------------------------------------------------+
433 |IP layer |SRC/DST IP | Any SRC/DST IP | block| 25| 50|100|250|
434 | | | subnet not used | | | | | |
435 | | | in the measurement | | | | | |
436 | | | traffic | | | | | |
437 +-------------------------------------------------------------------+

[nit] WOuld suggest to remove the word "Any" to minimize misinterpretation.

[nit] These three blocks seem to never get exercised by the actual measurement
traffic, right ? So the purpose would then be to simply load up the DUT with
them in case the DUT implementation is stupid enough to have these cause relevant
performance impacts even when not exercised by traffic. Would be good to write
this down as a rationale after the table. Especially because the "Any" had me
confused first that in a real-world deployment you would of course not include
250 individual application/port/prefixes, but you just have some simple block-all.

[nit] Even 27 years ago i've seen routers acting as firewalls for universities
that had thousands of such ACL entries. Aka: i think these numbers are way too low.

438 |Application|Application| Half of the | allow| 10| 10| 10| 10|
439 |layer | | applications | | | | | |
440 | | | included in the | | | | | |
441 | | | measurement traffic| | | | | |
442 | | |(see the note below)| | | | | |
443 +-------------------------------------------------------------------+
444 |Transport |SRC IP and | Half of the SRC | allow| >1| >1| >1| >1|
445 |layer |TCP/UDP | IPs used and any | | | | | |
446 | |DST ports | DST ports used in | | | | | |
447 | | | the measurement | | | | | |
448 | | | traffic | | | | | |
449 | | | (one rule per | | | | | |
450 | | | subnet) | | | | | |
451 +-------------------------------------------------------------------+
452 |IP layer |SRC IP | The rest of the | allow| >1| >1| >1| >1|
453 | | | SRC IP subnet | | | | | |
454 | | | range used in the | | | | | |
455 | | | measurement | | | | | |
456 | | | traffic | | | | | |
457 | | | (one rule per | | | | | |
458 | | | subnet) | | | | | |
459 +-----------+-----------+--------------------+------+---+---+---+---+

[mayor] There should be an explanation of how this is supposed to work, and
it seems there are rules missing:

rule on row 438 explicitly permits half the traffic sent by the test
equiment. So supposedly only the other half has to be checked by rule on row 444.
So when 444 says "Half of the SRC...", is that half of the total ? Would that
have to be set up so that after 444 we now have 75% of the measurement
traffic going through ? Likewise then rule 452 does it bring the total amount
of permitted traffic to 87.5% ?.

[nit] Ultimately, we only have "allows" here.
Is there an assumption that after row 459 there is an implicit deny-anything-else ?
I guess so, but it should be written out explicitly in the table.

461 Figure 3: DUT/SUT Access List

463 Note: If half of the applications included in the measurement traffic
464 is less than 10, the missing number of ACL entries (dummy rules) can
465 be configured for any application traffic not included in the
466 measurement traffic.

468 4.2.1. Security Effectiveness Configuration

470 The Security features (defined in Table 1 and Table 2) of the DUT/SUT
471 MUST be configured effectively to detect, prevent, and report the
472 defined security vulnerability sets. This section defines the
473 selection of the security vulnerability sets from Common

[nit] "from the CVE" ?!

474 vulnerabilities and Exposures (CVE) list for the testing. The

[nit] Add reference for CVE. (Not sure whats best spec, or wikipedia or cve.org,...)

475 vulnerability set SHOULD reflect a minimum of 500 CVEs from no older
476 than 10 calendar years to the current year. These CVEs SHOULD be
477 selected with a focus on in-use software commonly found in business
478 applications, with a Common vulnerability Scoring System (CVSS)
479 Severity of High (7-10).

481 This document is primarily focused on performance benchmarking.
482 However, it is RECOMMENDED to validate the security features
483 configuration of the DUT/SUT by evaluating the security effectiveness
484 as a prerequisite for performance benchmarking tests defined in the

[nit] /in the/in/

485 section 7. In case the benchmarking tests are performed without
486 evaluating security effectiveness, the test report MUST explain the
487 implications of this. The methodology for evaluating security
488 effectiveness is defined in Appendix A.

490 4.3. Test Equipment Configuration

492 In general, test equipment allows configuring parameters in different
493 protocol layers. These parameters thereby influence the traffic
494 flows which will be offered and impact performance measurements.

496 This section specifies common test equipment configuration parameters
497 applicable for all benchmarking tests defined in Section 7. Any
498 benchmarking test specific parameters are described under the test
499 setup section of each benchmarking test individually.

501 4.3.1. Client Configuration

503 This section specifies which parameters SHOULD be considered while
504 configuring clients using test equipment. Also, this section
505 specifies the RECOMMENDED values for certain parameters. The values
506 are the defaults used in most of the client operating systems
507 currently.

509 4.3.1.1. TCP Stack Attributes

511 The TCP stack SHOULD use a congestion control algorithm at client and
512 server endpoints. The IPv4 and IPv6 Maximum Segment Size (MSS)
513 SHOULD be set to 1460 bytes and 1440 bytes respectively and a TX and
514 RX initial receive windows of 64 KByte. Client initial congestion
515 window SHOULD NOT exceed 10 times the MSS. Delayed ACKs are
516 permitted and the maximum client delayed ACK SHOULD NOT exceed 10
517 times the MSS before a forced ACK. Up to three retries SHOULD be
518 allowed before a timeout event is declared. All traffic MUST set the
519 TCP PSH flag to high. The source port range SHOULD be in the range
520 of 1024 - 65535. Internal timeout SHOULD be dynamically scalable per
521 RFC 793. The client SHOULD initiate and close TCP connections. The
522 TCP connection MUST be initiated via a TCP three-way handshake (SYN,
523 SYN/ACK, ACK), and it MUST be closed via either a TCP three-way close
524 (FIN, FIN/ACK, ACK), or a TCP four-way close (FIN, ACK, FIN, ACK).

[nit] Would be nice to have reference to where/how these parameters are determined.
Would be nice to mention why these parameters are choosen. Probably to reflect the
most common current TCP behavior that achieves best performance ?

[minor] The document mentions QUIC in three places, but has no equivalent
section for QUIC here as it has for TCP. I would suggest to add a section here,
even if it can just say "Due to the absence of suficient experience, QUIC parameters
are unspecified. Similarily to TCP, parameters should be choosen that best reflect
state-of-the art performance results for QUIC client/server traffic".

526 4.3.1.2. Client IP Address Space

528 The sum of the client IP space SHOULD contain the following
529 attributes.

531 * The IP blocks SHOULD consist of multiple unique, discontinuous
532 static address blocks.

534 * A default gateway is permitted.

[comment] How is this relevant, what do you expect it to do ? What would happen
if you just removed it ?

536 * The DSCP (differentiated services code point) marking is set to DF
537 (Default Forwarding) '000000' on IPv4 Type of Service (ToS) field
538 and IPv6 traffic class field.

540 The following equation can be used to define the total number of
541 client IP addresses that will be configured on the test equipment.

543 Desired total number of client IP = Target throughput [Mbit/s] /
544 Average throughput per IP address [Mbit/s]

546 As shown in the example list below, the value for "Average throughput
547 per IP address" can be varied depending on the deployment and use
548 case scenario.

550 (Option 1) DUT/SUT deployment scenario 1 : 6-7 Mbit/s per IP (e.g.
551 1,400-1,700 IPs per 10Gbit/s throughput)

553 (Option 2) DUT/SUT deployment scenario 2 : 0.1-0.2 Mbit/s per IP
554 (e.g. 50,000-100,000 IPs per 10Gbit/s throughput)

556 Based on deployment and use case scenario, client IP addresses SHOULD
557 be distributed between IPv4 and IPv6. The following options MAY be
558 considered for a selection of traffic mix ratio.

560 (Option 1) 100 % IPv4, no IPv6

562 (Option 2) 80 % IPv4, 20% IPv6

564 (Option 3) 50 % IPv4, 50% IPv6

566 (Option 4) 20 % IPv4, 80% IPv6

568 (Option 5) no IPv4, 100% IPv6

[minor] This guidance is IMHO not very helpfull. It seems to me the first guidance
seems to be that the percentage of IPv4 vs. IPv6 addresses should be based on the
relevant ratio of IPv4 vs. IPv6 traffic in the target deployment because the way
the test setup is done, some N% IPv4 addresses will also roughly result in N% IPv4 traffic
in the test.

That type of explanation might be very helpfull, because the risk is that readers may
think they can derive the percentage of test IPv4/IPv6 addresses from the ratio
of IPv4/IPv6 addresses in the target deployment, but that very often will not work:

For example in the common dual-stack deployment, every client has an IPv4 and an IPv6 address,
so its 50% IPv4, but the actual percentage of IPv4 traffic will very much depend on the
application scenario. Some enterprises may go up to 90% or more IPv6 traffic if the main
traffic is all newer cloud services traffic. An vice versa, it could be as little as 10% IPv6
if all the cloud services are legacy apps in the cloud not supporting IPv6.

570 Note: The IANA has assigned IP address range for the testing purpose
571 as described in Section 8. If the test scenario requires more IP
572 addresses or subnets than the IANA assigned, this document recommends
573 using non routable Private IPv4 address ranges or Unique Local
574 Address (ULA) IPv6 address ranges for the testing.

[minor] See comments in Section 8. It might be useful to merge the text of this
paragraph with the one in Section 8, else the addressing recommendations are
somewhat split in the middle.

[minor] It would be prudent to add a disclaimer that this document does not consider
to determine whether DUT may emobdy optimizations in performance behavior for known testing
address ranges. Such a disclaimer may be more general and go on the end of the
document, e.g.: before IANA section - no considerations against DUT optimizations of known
test scenarios including addressing ranges or other test profile specific parameters.

576 4.3.1.3. Emulated Web Browser Attributes

578 The client emulated web browser (emulated browser) contains
579 attributes that will materially affect how traffic is loaded. The

[nit] what does "how traffic is loaded" mean ? Rephrase.

580 objective is to emulate modern, typical browser attributes to improve
581 realism of the result set.

[nit] /result set/resulting traffic/ ?

583 For HTTP traffic emulation, the emulated browser MUST negotiate HTTP
584 version 1.1 or higher. Depending on test scenarios and chosen HTTP
585 version, the emulated browser MAY open multiple TCP connections per
586 Server endpoint IP at any time depending on how many sequential
587 transactions need to be processed. For HTTP/2 or HTTP/3, the
588 emulated browser MAY open multiple concurrent streams per connection
589 (multiplexing). HTTP/3 emulated browser uses QUIC ([RFC9000]) as
590 transport protocol. HTTP settings such as number of connection per
591 server IP, number of requests per connection, and number of streams
592 per connection MUST be documented. This document refers to [RFC8446]
593 for HTTP/2. The emulated browser SHOULD advertise a User-Agent
594 header. The emulated browser SHOULD enforce content length
595 validation. Depending on test scenarios and selected HTTP version,
596 HTTP header compression MAY be set to enable or disable. This
597 setting (compression enabled or disabled) MUST be documented in the
598 report.

600 For encrypted traffic, the following attributes SHALL define the
601 negotiated encryption parameters. The test clients MUST use TLS
602 version 1.2 or higher. TLS record size MAY be optimized for the

[minor] I would bet SEC review will challenge you to comment on TLS 1.3.
Would make sense to add a sentence stating that the ratio of TLS 1.2 vs TLS 1.3
traffic should be choosen based on expected target deployment and may range
from 100% TLS 1.2 to 100% TLS 1.3. In the absence of known ratios, a 50/50%
ratio is RECOMMENDED.

602 version 1.2 or higher. TLS record size MAY be optimized for the
603 HTTPS response object size up to a record size of 16 KByte. If
604 Server Name Indication (SNI) is required in the traffic mix profile,
605 the client endpoint MUST send TLS extension Server Name Indication
606 (SNI) information when opening a security tunnel. Each client

[minor] SNI is pretty standard today. I would remove the "if" and make the
whole sentence a MUST.

606 (SNI) information when opening a security tunnel. Each client
607 connection MUST perform a full handshake with server certificate and
608 MUST NOT use session reuse or resumption.

610 The following TLS 1.2 supported ciphers and keys are RECOMMENDED to
611 use for HTTPS based benchmarking tests defined in Section 7.

613 1. ECDHE-ECDSA-AES128-GCM-SHA256 with Prime256v1 (Signature Hash
614 Algorithm: ecdsa_secp256r1_sha256 and Supported group: secp256r1)

616 2. ECDHE-RSA-AES128-GCM-SHA256 with RSA 2048 (Signature Hash
617 Algorithm: rsa_pkcs1_sha256 and Supported group: secp256r1)

619 3. ECDHE-ECDSA-AES256-GCM-SHA384 with Secp521 (Signature Hash
620 Algorithm: ecdsa_secp384r1_sha384 and Supported group: secp521r1)

622 4. ECDHE-RSA-AES256-GCM-SHA384 with RSA 4096 (Signature Hash
623 Algorithm: rsa_pkcs1_sha384 and Supported group: secp256r1)

625 Note: The above ciphers and keys were those commonly used enterprise
626 grade encryption cipher suites for TLS 1.2. It is recognized that
627 these will evolve over time. Individual certification bodies SHOULD
628 use ciphers and keys that reflect evolving use cases. These choices
629 MUST be documented in the resulting test reports with detailed
630 information on the ciphers and keys used along with reasons for the
631 choices.

633 [RFC8446] defines the following cipher suites for use with TLS 1.3.

635 1. TLS_AES_128_GCM_SHA256

637 2. TLS_AES_256_GCM_SHA384

639 3. TLS_CHACHA20_POLY1305_SHA256

641 4. TLS_AES_128_CCM_SHA256

643 5. TLS_AES_128_CCM_8_SHA256

645 4.3.2. Backend Server Configuration

647 This section specifies which parameters should be considered while
648 configuring emulated backend servers using test equipment.

650 4.3.2.1. TCP Stack Attributes

652 The TCP stack on the server side SHOULD be configured similar to the
653 client side configuration described in Section 4.3.1.1. In addition,
654 server initial congestion window MUST NOT exceed 10 times the MSS.
655 Delayed ACKs are permitted and the maximum server delayed ACK MUST
656 NOT exceed 10 times the MSS before a forced ACK.

658 4.3.2.2. Server Endpoint IP Addressing

660 The sum of the server IP space SHOULD contain the following
661 attributes.

663 * The server IP blocks SHOULD consist of unique, discontinuous
664 static address blocks with one IP per server Fully Qualified
665 Domain Name (FQDN) endpoint per test port.

[minor] The "per FQDN per test port" is likely underspecified/confusing.
How would you recommend to configure the testbed if the same FQDN may be reachable
across more than one DUT server port and the DUT is doing load balancing ?
If that is not supposed to be considered, then it seems as if every FQDN is
supposed to be reachable across only one DUT port, but then the sentence
ikely should just say "per FQDN" (without the "per test port qualification").
Not 100% sure...

[minor] Especially for IPv4, there is obviously a big trend in DC to save
IPv4 address space by using SNI. Therefore a realistic scanerio would be
to have more than one FQDN per IPv4 address. Maybe as high as 10:1 (guesswork).
In any case i think it is prudent to include testing of such SNI overload
of IP addresses because it likely can impact performance (demux of processing
state not solely based on 5-tuple).

667 * A default gateway is permitted. The DSCP (differentiated services

[minor] Again wondering why default gateway adds value to the doc.

667 * A default gateway is permitted. The DSCP (differentiated services
668 code point) marking is set to DF (Default Forwarding) '000000' on
669 IPv4 Type of Service (ToS) field and IPv6 traffic class field.

671 * The server IP addresses SHOULD be distributed between IPv4 and
672 IPv6 with a ratio identical to the clients distribution ratio.

674 Note: The IANA has assigned IP address range for the testing purpose
675 as described in Section 8. If the test scenario requires more IP
676 addresses or subnets than the IANA assigned, this document recommends
677 using non routable Private IPv4 address ranges or Unique Local
678 Address (ULA) IPv6 address ranges for the testing.

[minor] same note about moving these addressing recommendations out as in the
client section.

680 4.3.2.3. HTTP / HTTPS Server Pool Endpoint Attributes

682 The server pool for HTTP SHOULD listen on TCP port 80 and emulate the
683 same HTTP version (HTTP 1.1 or HTTP/2 or HTTP/3) and settings chosen
684 by the client (emulated web browser). The Server MUST advertise
685 server type in the Server response header [RFC7230]. For HTTPS
686 server, TLS 1.2 or higher MUST be used with a maximum record size of
687 16 KByte and MUST NOT use ticket resumption or session ID reuse. The
688 server SHOULD listen on TCP port 443 for HTTP version 1.1 and 2. For
689 HTTP/3 (HTTP over QUIC) the server SHOULD listen on UDP 443. The
690 server SHALL serve a certificate to the client. The HTTPS server
691 MUST check host SNI information with the FQDN if SNI is in use.
692 Cipher suite and key size on the server side MUST be configured
693 similar to the client side configuration described in
694 Section 4.3.1.3.

696 4.3.3. Traffic Flow Definition

698 This section describes the traffic pattern between client and server
699 endpoints. At the beginning of the test, the server endpoint
700 initializes and will be ready to accept connection states including
701 initialization of the TCP stack as well as bound HTTP and HTTPS
702 servers. When a client endpoint is needed, it will initialize and be
703 given attributes such as a MAC and IP address. The behavior of the
704 client is to sweep through the given server IP space, generating a
705 recognizable service by the DUT. Sequential and pseudorandom sweep
706 methods are acceptable. The method used MUST be stated in the final
707 report. Thus, a balanced mesh between client endpoints and server
708 endpoints will be generated in a client IP and port to server IP and
709 port combination. Each client endpoint performs the same actions as
710 other endpoints, with the difference being the source IP of the
711 client endpoint and the target server IP pool. The client MUST use
712 the server IP address or FQDN in the host header [RFC7230].

[minor] given the prevalence of SNI centric server selection, i would suggest
to change server IP to server FQDN and note that server IP is simply derived from
server FQDN. Likewise server port is dervice from server protocol, which seems
to be just HTTP or HTTPs, so its unclear to me where we would get ports different
from 80 and 443 (maybe thats mentioned later). Aka: server Port may not be
relevant to mention.

714 4.3.3.1. Description of Intra-Client Behavior

716 Client endpoints are independent of other clients that are
717 concurrently executing. When a client endpoint initiates traffic,
718 this section describes how the client steps through different
719 services. Once the test is initialized, the client endpoints
720 randomly hold (perform no operation) for a few milliseconds for
721 better randomization of the start of client traffic. Each client
722 will either open a new TCP connection or connect to a TCP persistence
723 stack still open to that specific server. At any point that the
724 traffic profile may require encryption, a TLS encryption tunnel will
725 form presenting the URL or IP address request to the server. If
726 using SNI, the server MUST then perform an SNI name check with the
727 proposed FQDN compared to the domain embedded in the certificate.
728 Only when correct, will the server process the HTTPS response object.
729 The initial response object to the server is based on benchmarking
730 tests described in Section 7. Multiple additional sub-URLs (response
731 objects on the service page) MAY be requested simultaneously. This
732 MAY be to the same server IP as the initial URL. Each sub-object
733 will also use a canonical FQDN and URL path, as observed in the
734 traffic mix used.

[minor] This may be necessary to keep the configuration complexity at bay,
but in practice each particular IP client will likely exhibit quite different
traffic profiles. One may continuously request HTTP video segments when
streaming video. Another one may continuously do WebRTC (zoom), and the like.
BY having every client randomnly do all the services (this is what i figure from
above description), you forego the important performance aspect of "worst hit
client" if the DUT exhibits specific issues with specific services (false
filtering, performance degradation etc..). IMHO it would be great if test
equipment could create different client traffic profiles by segmentation of
the possible application space into groups and then assign new clients
randomnly to groups. Beside being able to easier find performance issues,
it is also resulting in more real-world performance, which might be higher.
For example in a multi-core CPU based DUT, there may be heuristics of
assigning different clients traffic to different CPU cores, so that L1..L3
cache of the CPU core can be better kept focussed on the codespace for
a particular type of client inspection. (just guessing).

736 4.3.4. Traffic Load Profile

738 The loading of traffic is described in this section. The loading of
739 a traffic load profile has five phases: Init, ramp up, sustain, ramp
740 down, and collection.

742 1. Init phase: Testbed devices including the client and server
743 endpoints should negotiate layer 2-3 connectivity such as MAC
744 learning and ARP. Only after successful MAC learning or ARP/ND
745 resolution SHALL the test iteration move to the next phase. No
746 measurements are made in this phase. The minimum RECOMMENDED
747 time for Init phase is 5 seconds. During this phase, the
748 emulated clients SHOULD NOT initiate any sessions with the DUT/
749 SUT, in contrast, the emulated servers should be ready to accept
750 requests from DUT/SUT or from emulated clients.

752 2. Ramp up phase: The test equipment SHOULD start to generate the
753 test traffic. It SHOULD use a set of the approximate number of
754 unique client IP addresses to generate traffic. The traffic
755 SHOULD ramp up from zero to desired target objective. The target
756 objective is defined for each benchmarking test. The duration
757 for the ramp up phase MUST be configured long enough that the
758 test equipment does not overwhelm the DUT/SUTs stated performance
759 metrics defined in Section 6.3 namely, TCP Connections Per
760 Second, Inspected Throughput, Concurrent TCP Connections, and
761 Application Transactions Per Second. No measurements are made in
762 this phase.

764 3. Sustain phase: Starts when all required clients are active and
765 operating at their desired load condition. In the sustain phase,
766 the test equipment SHOULD continue generating traffic to constant
767 target value for a constant number of active clients. The
768 minimum RECOMMENDED time duration for sustain phase is 300
769 seconds. This is the phase where measurements occur. The test
770 equipment SHOULD measure and record statistics continuously. The
771 sampling interval for collecting the raw results and calculating
772 the statistics SHOULD be less than 2 seconds.

774 4. Ramp down phase: No new connections are established, and no
775 measurements are made. The time duration for ramp up and ramp
776 down phase SHOULD be the same.

778 5. Collection phase: The last phase is administrative and will occur
779 when the test equipment merges and collates the report data.

781 5. Testbed Considerations

783 This section describes steps for a reference test (pre-test) that
784 control the test environment including test equipment, focusing on
785 physical and virtualized environments and as well as test equipments.
786 Below are the RECOMMENDED steps for the reference test.

788 1. Perform the reference test either by configuring the DUT/SUT in
789 the most trivial setup (fast forwarding) or without presence of

[nit] Define/explain or provide reference for "fast forwarding".

790 the DUT/SUT.

[minor] Is the DUT/SUT assumed to operate as a router or transparent L2 switch ?
Asking because "or without presence" should be amended (IMHO) with mentioning
that instead of the DUT one would put a router or switch in its place that
is pre-loaded with a config equivalent to that of the DUT but without any
seurity functions, just passing traffic at rates to bring the TE to its limits.

792 2. Generate traffic from traffic generator. Choose a traffic
793 profile used for HTTP or HTTPS throughput performance test with
794 smallest object size.

796 3. Ensure that any ancillary switching or routing functions added in
797 the test equipment does not limit the performance by introducing
798 network metrics such as packet loss and latency. This is
799 specifically important for virtualized components (e.g.,
800 vSwitches, vRouters).

802 4. Verify that the generated traffic (performance) of the test
803 equipment matches and reasonably exceeds the expected maximum
804 performance of the DUT/SUT.

806 5. Record the network performance metrics packet loss latency
807 introduced by the test environment (without DUT/SUT).

809 6. Assert that the testbed characteristics are stable during the
810 entire test session. Several factors might influence stability
811 specifically, for virtualized testbeds. For example, additional
812 workloads in a virtualized system, load balancing, and movement
813 of virtual machines during the test, or simple issues such as
814 additional heat created by high workloads leading to an emergency
815 CPU performance reduction.

[minor] Add something to test the performance of the logging system. Without
DUT actually generating logging, this will so far not have been validated.
Maybe TE can generate logging records ? Especially burst logging from DUT
without loss is important to verify (no packet loss of logged events).

817 The reference test SHOULD be performed before the benchmarking tests
818 (described in section 7) start.

820 6. Reporting

[minor] I would swap section 6 and 7, because it is problematic to read what's
to be reported without knowing whats to be measured first. For example, when i
read 6. first it was not clear to me if/how you would test the performance limits,
so the report data had a lot of questions for me.

Of course when you do run the testbed you first should have read both sections first.

822 This section describes how the benchmarking test report should be
823 formatted and presented. It is RECOMMENDED to include two main
824 sections in the report, namely the introduction and the detailed test
825 results sections.

827 6.1. Introduction

829 The following attributes SHOULD be present in the introduction
830 section of the test report.

[minor] I'd suggest to say here that the test report needs to include all information
sufficient for independent third-party reproduction of the test setup to permit
third party falsification of the test results. This includes but may not be limited
to the following...

832 1. The time and date of the execution of the tests

834 2. Summary of testbed software and hardware details
835 a. DUT/SUT hardware/virtual configuration

837 * This section SHOULD clearly identify the make and model of
838 the DUT/SUT

840 * The port interfaces, including speed and link information

842 * If the DUT/SUT is a Virtual Network Function (VNF), host
843 (server) hardware and software details, interface
844 acceleration type such as DPDK and SR-IOV, used CPU cores,
845 used RAM, resource sharing (e.g. Pinning details and NUMA
846 Node) configuration details, hypervisor version, virtual
847 switch version

849 * details of any additional hardware relevant to the DUT/SUT
850 such as controllers

852 b. DUT/SUT software

854 * Operating system name

856 * Version

858 * Specific configuration details (if any)

[minor] Any software details necessary and sufficient to preproduce the software setup of DUT/SUT.

860 c. DUT/SUT enabled features

862 * Configured DUT/SUT features (see Table 1 and Table 2)

864 * Attributes of the above-mentioned features

866 * Any additional relevant information about the features

868 d. Test equipment hardware and software

870 * Test equipment vendor name

872 * Hardware details including model number, interface type

874 * Test equipment firmware and test application software
875 version

877 e. Key test parameters

879 * Used cipher suites and keys

881 * IPv4 and IPv6 traffic distribution
882 * Number of configured ACL

884 f. Details of application traffic mix used in the benchmarking
885 test "Throughput Performance with Application Traffic Mix"
886 (Section 7.1)

888 * Name of applications and layer 7 protocols

890 * Percentage of emulated traffic for each application and
891 layer 7 protocols

893 * Percentage of encrypted traffic and used cipher suites and
894 keys (The RECOMMENDED ciphers and keys are defined in
895 Section 4.3.1.3)

897 * Used object sizes for each application and layer 7
898 protocols

900 3. Results Summary / Executive Summary

902 a. Results SHOULD resemble a pyramid in how it is reported, with
903 the introduction section documenting the summary of results
904 in a prominent, easy to read block.

906 6.2. Detailed Test Results

908 In the result section of the test report, the following attributes
909 SHOULD be present for each benchmarking test.

911 a. KPIs MUST be documented separately for each benchmarking test.
912 The format of the KPI metrics SHOULD be presented as described in
913 Section 6.3.

915 b. The next level of details SHOULD be graphs showing each of these
916 metrics over the duration (sustain phase) of the test. This
917 allows the user to see the measured performance stability changes
918 over time.

920 6.3. Benchmarks and Key Performance Indicators

922 This section lists key performance indicators (KPIs) for overall
923 benchmarking tests. All KPIs MUST be measured during the sustain
924 phase of the traffic load profile described in Section 4.3.4. All
925 KPIs MUST be measured from the result output of test equipment.

[minor] At some other place of the document i think to remember observing of
DUT self-reporting. Shouldn't then the self-reporting of the DUT be vetted as well,
e.g.: compared against the TE report data ?

927 * Concurrent TCP Connections
928 The aggregate number of simultaneous connections between hosts
929 across the DUT/SUT, or between hosts and the DUT/SUT (defined in
930 [RFC2647]).

[minor] Add reference to section in rfc2647 where this is defined. Also: If you
refer but not reproduce

932 * TCP Connections Per Second

934 The average number of successfully established TCP connections per
935 second between hosts across the DUT/SUT, or between hosts and the
936 DUT/SUT. The TCP connection MUST be initiated via a TCP three-way
937 handshake (SYN, SYN/ACK, ACK). Then the TCP session data is sent.
938 The TCP session MUST be closed via either a TCP three-way close
939 (FIN, FIN/ACK, ACK), or a TCP four-way close (FIN, ACK, FIN, ACK),
940 and MUST NOT by RST.

942 * Application Transactions Per Second

944 The average number of successfully completed transactions per
945 second. For a particular transaction to be considered successful,
946 all data MUST have been transferred in its entirety. In case of
947 HTTP(S) transactions, it MUST have a valid status code (200 OK),
948 and the appropriate FIN, FIN/ACK sequence MUST have been
949 completed.

951 * TLS Handshake Rate

953 The average number of successfully established TLS connections per
954 second between hosts across the DUT/SUT, or between hosts and the
955 DUT/SUT.

957 * Inspected Throughput

959 The number of bits per second of examined and allowed traffic a
960 network security device is able to transmit to the correct
961 destination interface(s) in response to a specified offered load.
962 The throughput benchmarking tests defined in Section 7 SHOULD
963 measure the average Layer 2 throughput value when the DUT/SUT is
964 "inspecting" traffic. This document recommends presenting the
965 inspected throughput value in Gbit/s rounded to two places of
966 precision with a more specific Kbit/s in parenthesis.

968 * Time to First Byte (TTFB)

970 TTFB is the elapsed time between the start of sending the TCP SYN
971 packet from the client and the client receiving the first packet
972 of application data from the server or DUT/SUT. The benchmarking
973 tests HTTP Transaction Latency (Section 7.4) and HTTPS Transaction
974 Latency (Section 7.8) measure the minimum, average and maximum
975 TTFB. The value SHOULD be expressed in milliseconds.

977 * URL Response time / Time to Last Byte (TTLB)

979 URL Response time / TTLB is the elapsed time between the start of
980 sending the TCP SYN packet from the client and the client
981 receiving the last packet of application data from the server or
982 DUT/SUT. The benchmarking tests HTTP Transaction Latency
983 (Section 7.4) and HTTPS Transaction Latency (Section 7.8) measure
984 the minimum, average and maximum TTLB. The value SHOULD be
985 expressed in millisecond.

[minor] Up to this point i don't think the report would include comparison for these
KPI between no-DUT-present vs. DUT-present. Is that true ? How then is the reaader of
the report meant to be able to vet the relative impact of the DUT for all these metrics
vs. DUT not being present ?

987 7. Benchmarking Tests

[minor] I think it would be good to insert here some descriptive and comparative overview
of the tests from the different 7.x sections.

For example, i guess (but don't know from the test), that the 7.1 test should ?
perform throughput test for non-http/https applications, or else if all the applications
in 7.1 would be http/https, then it would duplicate the results of 7.3 and 7.7, right ?
Not sure though if/where it is written out that you therefore want a traffic mix of
only non-HTTP/HTTPS application traffic for 7.1.

If instead the customer relevant application mix (7.1.1) does include some percentage
of HTTP/HTTP applications, then shouldn't all the tests, even those focussing on the
HTTP/HTTPs characteristic also always include the non-HTTP/HTTPs application flows as
kind of "background" traffic, even if not measured in the tests of particular 7.x sub-section ?

[minor] Section 7. is a lot of work to get right. I observe that there is a lot of
procedural replication across the steps. It would be easier to read if all that
duplication was removed and described once - such as the initial/max/iterative step
description. But i can understand how much work this might be, to then especially
extraxct only the differences for each 7.x and only describe those 7.x differences there.

989 7.1. Throughput Performance with Application Traffic Mix

991 7.1.1. Objective

993 Using a relevant application traffic mix, determine the sustainable
994 inspected throughput supported by the DUT/SUT.

996 Based on the test customer's specific use case, testers can choose
997 the relevant application traffic mix for this test. The details
998 about the traffic mix MUST be documented in the report. At least the
999 following traffic mix details MUST be documented and reported
1000 together with the test results:

1002 Name of applications and layer 7 protocols

1004 Percentage of emulated traffic for each application and layer 7
1005 protocol

1007 Percentage of encrypted traffic and used cipher suites and keys
1008 (The RECOMMENDED ciphers and keys are defined in Section 4.3.1.3.)

1010 Used object sizes for each application and layer 7 protocols

1012 7.1.2. Test Setup

1014 Testbed setup MUST be configured as defined in Section 4. Any
1015 benchmarking test specific testbed configuration changes MUST be
1016 documented.

1018 7.1.3. Test Parameters

1020 In this section, the benchmarking test specific parameters SHOULD be
1021 defined.

1023 7.1.3.1. DUT/SUT Configuration Parameters

1025 DUT/SUT parameters MUST conform to the requirements defined in
1026 Section 4.2. Any configuration changes for this specific
1027 benchmarking test MUST be documented. In case the DUT/SUT is
1028 configured without SSL inspection, the test report MUST explain the
1029 implications of this to the relevant application traffic mix
1030 encrypted traffic.

[nit] /SSL inspection/SSL Inspection/ - capitalized in all other places in the doc.

[minor] I am not quite familiar with the details, so i hope a ready knows what
the "MUST explain the implication" means.

[minor] What is the equivalent for TLS (inspection), and why is it not equally mentioned ?

1032 7.1.3.2. Test Equipment Configuration Parameters

1034 Test equipment configuration parameters MUST conform to the
1035 requirements defined in Section 4.3. The following parameters MUST
1036 be documented for this benchmarking test:

1038 Client IP address range defined in Section 4.3.1.2

1040 Server IP address range defined in Section 4.3.2.2

1042 Traffic distribution ratio between IPv4 and IPv6 defined in
1043 Section 4.3.1.2

1045 Target inspected throughput: Aggregated line rate of interface(s)
1046 used in the DUT/SUT or the value defined based on requirement for
1047 a specific deployment scenario

[minor] maybe add: or based on DUT specified performance limits (DUT may not always
provide "linerate" throughput, so the ultimate test would be to see if/how much of
the vendor promised performance is reachable.

1049 Initial throughput: 10% of the "Target inspected throughput" Note:
1050 Initial throughput is not a KPI to report. This value is
1051 configured on the traffic generator and used to perform Step 1:
1052 "Test Initialization and Qualification" described under the
1053 Section 7.1.4.

1055 One of the ciphers and keys defined in Section 4.3.1.3 are
1056 RECOMMENDED to use for this benchmarking test.

1058 7.1.3.3. Traffic Profile

1060 Traffic profile: This test MUST be run with a relevant application
1061 traffic mix profile.

1063 7.1.3.4. Test Results Validation Criteria

1065 The following criteria are the test results validation criteria. The
1066 test results validation criteria MUST be monitored during the whole
1067 sustain phase of the traffic load profile.

1069 a. Number of failed application transactions (receiving any HTTP
1070 response code other than 200 OK) MUST be less than 0.001% (1 out
1071 of 100,000 transactions) of total attempted transactions.

[minor] So this is the right number, as opposed to the 0.01% in A.4...
If you don't intend to fix A.4 (requested there), pls. explain the reason for the
difference.

1073 b. Number of Terminated TCP connections due to unexpected TCP RST
1074 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1075 connections) of total initiated TCP connections.

1077 7.1.3.5. Measurement

1079 Following KPI metrics MUST be reported for this benchmarking test:

1081 Mandatory KPIs (benchmarks): Inspected Throughput, TTFB (minimum,
1082 average, and maximum), TTLB (minimum, average, and maximum) and
1083 Application Transactions Per Second

1085 Note: TTLB MUST be reported along with the object size used in the
1086 traffic profile.

1088 Optional KPIs: TCP Connections Per Second and TLS Handshake Rate

[minor] I would prefer for TCP connections to be mandatory too. Makes it easier to
communicate test data with lower layer folks. FOr example, network layer equipment
often has per 5-tuple flow state also with build/churn-rate limits, so to match
a security SUT with the other networking equipment this TCP connection rate rate
is quite important.

1090 7.1.4. Test Procedures and Expected Results

1092 The test procedures are designed to measure the inspected throughput
1093 performance of the DUT/SUT at the sustaining period of traffic load
1094 profile. The test procedure consists of three major steps: Step 1
1095 ensures the DUT/SUT is able to reach the performance value (initial
1096 throughput) and meets the test results validation criteria when it
1097 was very minimally utilized. Step 2 determines the DUT/SUT is able
1098 to reach the target performance value within the test results
1099 validation criteria. Step 3 determines the maximum achievable
1100 performance value within the test results validation criteria.

1102 This test procedure MAY be repeated multiple times with different IP
1103 types: IPv4 only, IPv6 only, and IPv4 and IPv6 mixed traffic
1104 distribution.

1106 7.1.4.1. Step 1: Test Initialization and Qualification

1108 Verify the link status of all connected physical interfaces. All
1109 interfaces are expected to be in "UP" status.

1111 Configure traffic load profile of the test equipment to generate test
1112 traffic at the "Initial throughput" rate as described in
1113 Section 7.1.3.2. The test equipment SHOULD follow the traffic load
1114 profile definition as described in Section 4.3.4. The DUT/SUT SHOULD
1115 reach the "Initial throughput" during the sustain phase. Measure all
1116 KPI as defined in Section 7.1.3.5. The measured KPIs during the
1117 sustain phase MUST meet all the test results validation criteria
1118 defined in Section 7.1.3.4.

1120 If the KPI metrics do not meet the test results validation criteria,
1121 the test procedure MUST NOT be continued to step 2.

1123 7.1.4.2. Step 2: Test Run with Target Objective

1125 Configure test equipment to generate traffic at the "Target inspected
1126 throughput" rate defined in Section 7.1.3.2. The test equipment
1127 SHOULD follow the traffic load profile definition as described in
1128 Section 4.3.4. The test equipment SHOULD start to measure and record
1129 all specified KPIs. Continue the test until all traffic profile
1130 phases are completed.

1132 Within the test results validation criteria, the DUT/SUT is expected
1133 to reach the desired value of the target objective ("Target inspected
1134 throughput") in the sustain phase. Follow step 3, if the measured
1135 value does not meet the target value or does not fulfill the test
1136 results validation criteria.

1138 7.1.4.3. Step 3: Test Iteration

1140 Determine the achievable average inspected throughput within the test
1141 results validation criteria. Final test iteration MUST be performed
1142 for the test duration defined in Section 4.3.4.

1144 7.2. TCP/HTTP Connections Per Second

1146 7.2.1. Objective

1148 Using HTTP traffic, determine the sustainable TCP connection
1149 establishment rate supported by the DUT/SUT under different
1150 throughput load conditions.

1152 To measure connections per second, test iterations MUST use different
1153 fixed HTTP response object sizes (the different load conditions)
1154 defined in Section 7.2.3.2.

1156 7.2.2. Test Setup

1158 Testbed setup SHOULD be configured as defined in Section 4. Any
1159 specific testbed configuration changes (number of interfaces and
1160 interface type, etc.) MUST be documented.

1162 7.2.3. Test Parameters

1164 In this section, benchmarking test specific parameters SHOULD be
1165 defined.

1167 7.2.3.1. DUT/SUT Configuration Parameters

1169 DUT/SUT parameters MUST conform to the requirements defined in
1170 Section 4.2. Any configuration changes for this specific
1171 benchmarking test MUST be documented.

1173 7.2.3.2. Test Equipment Configuration Parameters

1175 Test equipment configuration parameters MUST conform to the
1176 requirements defined in Section 4.3. The following parameters MUST
1177 be documented for this benchmarking test:

1179 Client IP address range defined in Section 4.3.1.2

1181 Server IP address range defined in Section 4.3.2.2

1183 Traffic distribution ratio between IPv4 and IPv6 defined in
1184 Section 4.3.1.2

1186 Target connections per second: Initial value from product datasheet
1187 or the value defined based on requirement for a specific deployment
1188 scenario

1190 Initial connections per second: 10% of "Target connections per
1191 second" (Note: Initial connections per second is not a KPI to report.
1192 This value is configured on the traffic generator and used to perform
1193 the Step1: "Test Initialization and Qualification" described under
1194 the Section 7.2.4.

1196 The client SHOULD negotiate HTTP and close the connection with FIN
1197 immediately after completion of one transaction. In each test
1198 iteration, client MUST send GET request requesting a fixed HTTP
1199 response object size.

1201 The RECOMMENDED response object sizes are 1, 2, 4, 16, and 64 KByte.

1203 7.2.3.3. Test Results Validation Criteria

1205 The following criteria are the test results validation criteria. The
1206 Test results validation criteria MUST be monitored during the whole
1207 sustain phase of the traffic load profile.

1209 a. Number of failed application transactions (receiving any HTTP
1210 response code other than 200 OK) MUST be less than 0.001% (1 out
1211 of 100,000 transactions) of total attempted transactions.

1213 b. Number of terminated TCP connections due to unexpected TCP RST
1214 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1215 connections) of total initiated TCP connections.

1217 c. During the sustain phase, traffic SHOULD be forwarded at a
1218 constant rate (considered as a constant rate if any deviation of
1219 traffic forwarding rate is less than 5%).

1221 d. Concurrent TCP connections MUST be constant during steady state
1222 and any deviation of concurrent TCP connections SHOULD be less
1223 than 10%. This confirms the DUT opens and closes TCP connections
1224 at approximately the same rate.

1226 7.2.3.4. Measurement

1228 TCP Connections Per Second MUST be reported for each test iteration
1229 (for each object size).

[minor] Add Variance or min/max rates to report in case above point d (line 1221) problem does
exist ?

1231 7.2.4. Test Procedures and Expected Results

1233 The test procedure is designed to measure the TCP connections per
1234 second rate of the DUT/SUT at the sustaining period of the traffic
1235 load profile. The test procedure consists of three major steps: Step
1236 1 ensures the DUT/SUT is able to reach the performance value (Initial
1237 connections per second) and meets the test results validation
1238 criteria when it was very minimally utilized. Step 2 determines the
1239 DUT/SUT is able to reach the target performance value within the test
1240 results validation criteria. Step 3 determines the maximum
1241 achievable performance value within the test results validation
1242 criteria.

1244 This test procedure MAY be repeated multiple times with different IP
1245 types: IPv4 only, IPv6 only, and IPv4 and IPv6 mixed traffic
1246 distribution.

1248 7.2.4.1. Step 1: Test Initialization and Qualification

1250 Verify the link status of all connected physical interfaces. All
1251 interfaces are expected to be in "UP" status.

1253 Configure the traffic load profile of the test equipment to establish
1254 "Initial connections per second" as defined in Section 7.2.3.2. The
1255 traffic load profile SHOULD be defined as described in Section 4.3.4.

1257 The DUT/SUT SHOULD reach the "Initial connections per second" before
1258 the sustain phase. The measured KPIs during the sustain phase MUST
1259 meet all the test results validation criteria defined in
1260 Section 7.2.3.3.

1262 If the KPI metrics do not meet the test results validation criteria,
1263 the test procedure MUST NOT continue to "Step 2".

1265 7.2.4.2. Step 2: Test Run with Target Objective

1267 Configure test equipment to establish the target objective ("Target
1268 connections per second") defined in Section 7.2.3.2. The test
1269 equipment SHOULD follow the traffic load profile definition as
1270 described in Section 4.3.4.

1272 During the ramp up and sustain phase of each test iteration, other
1273 KPIs such as inspected throughput, concurrent TCP connections and
1274 application transactions per second MUST NOT reach the maximum value
1275 the DUT/SUT can support. The test results for specific test
1276 iterations SHOULD NOT be reported, if the above-mentioned KPI
1277 (especially inspected throughput) reaches the maximum value.
1278 (Example: If the test iteration with 64 KByte of HTTP response object
1279 size reached the maximum inspected throughput limitation of the DUT/
1280 SUT, the test iteration MAY be interrupted and the result for 64
1281 KByte SHOULD NOT be reported.)

1283 The test equipment SHOULD start to measure and record all specified
1284 KPIs. Continue the test until all traffic profile phases are
1285 completed.

1287 Within the test results validation criteria, the DUT/SUT is expected
1288 to reach the desired value of the target objective ("Target
1289 connections per second") in the sustain phase. Follow step 3, if the
1290 measured value does not meet the target value or does not fulfill the
1291 test results validation criteria.

1293 7.2.4.3. Step 3: Test Iteration

1295 Determine the achievable TCP connections per second within the test
1296 results validation criteria.

1298 7.3. HTTP Throughput

1300 7.3.1. Objective

1302 Determine the sustainable inspected throughput of the DUT/SUT for
1303 HTTP transactions varying the HTTP response object size.

[nit] High level, what is the difference between 7.2 and 7.3 ? Some more explanation
would be useful. One interpretation i came up with is that 7.2 measures performane
of e.g.: HTTP connections where each connection performs a single GET, and 7.3
measures long-lived HTTP connections in which a high rate of HTTP GET is performed
(so as to differentiate transactions at TCP+HTTP level (7.2) from those only happening
at HTTP level (7.3). If that is a lucky guess it might help other similarily guessing
readers to write this out more explicitly.

1305 7.3.2. Test Setup

1307 Testbed setup SHOULD be configured as defined in Section 4. Any
1308 specific testbed configuration changes (number of interfaces and
1309 interface type, etc.) MUST be documented.

1311 7.3.3. Test Parameters

1313 In this section, benchmarking test specific parameters SHOULD be
1314 defined.

1316 7.3.3.1. DUT/SUT Configuration Parameters

1318 DUT/SUT parameters MUST conform to the requirements defined in
1319 Section 4.2. Any configuration changes for this specific
1320 benchmarking test MUST be documented.

1322 7.3.3.2. Test Equipment Configuration Parameters

1324 Test equipment configuration parameters MUST conform to the
1325 requirements defined in Section 4.3. The following parameters MUST
1326 be documented for this benchmarking test:

1328 Client IP address range defined in Section 4.3.1.2

1330 Server IP address range defined in Section 4.3.2.2

1332 Traffic distribution ratio between IPv4 and IPv6 defined in
1333 Section 4.3.1.2

1335 Target inspected throughput: Aggregated line rate of interface(s)
1336 used in the DUT/SUT or the value defined based on requirement for a
1337 specific deployment scenario
1338 Initial throughput: 10% of "Target inspected throughput" Note:
1339 Initial throughput is not a KPI to report. This value is configured
1340 on the traffic generator and used to perform Step 1: "Test
1341 Initialization and Qualification" described under Section 7.3.4.

1343 Number of HTTP response object requests (transactions) per
1344 connection: 10

1346 RECOMMENDED HTTP response object size: 1, 16, 64, 256 KByte, and
1347 mixed objects defined in Table 4.

1349 +=====================+============================+
1350 | Object size (KByte) | Number of requests/ Weight |
1351 +=====================+============================+
1352 | 0.2 | 1 |
1353 +---------------------+----------------------------+
1354 | 6 | 1 |
1355 +---------------------+----------------------------+
1356 | 8 | 1 |
1357 +---------------------+----------------------------+
1358 | 9 | 1 |
1359 +---------------------+----------------------------+
1360 | 10 | 1 |
1361 +---------------------+----------------------------+
1362 | 25 | 1 |
1363 +---------------------+----------------------------+
1364 | 26 | 1 |
1365 +---------------------+----------------------------+
1366 | 35 | 1 |
1367 +---------------------+----------------------------+
1368 | 59 | 1 |
1369 +---------------------+----------------------------+
1370 | 347 | 1 |
1371 +---------------------+----------------------------+

1373 Table 4: Mixed Objects

[minor] Interesting/useful data. If there was any reference/explanation how these
numbere where derived, that would be great to add.

1375 7.3.3.3. Test Results Validation Criteria

1377 The following criteria are the test results validation criteria. The
1378 test results validation criteria MUST be monitored during the whole
1379 sustain phase of the traffic load profile.

1381 a. Number of failed application transactions (receiving any HTTP
1382 response code other than 200 OK) MUST be less than 0.001% (1 out
1383 of 100,000 transactions) of attempt transactions.

1385 b. Traffic SHOULD be forwarded at a constant rate (considered as a
1386 constant rate if any deviation of traffic forwarding rate is less
1387 than 5%).

1389 c. Concurrent TCP connections MUST be constant during steady state
1390 and any deviation of concurrent TCP connections SHOULD be less
1391 than 10%. This confirms the DUT opens and closes TCP connections
1392 at approximately the same rate.

1394 7.3.3.4. Measurement

1396 Inspected Throughput and HTTP Transactions per Second MUST be
1397 reported for each object size.

1399 7.3.4. Test Procedures and Expected Results

1401 The test procedure is designed to measure HTTP throughput of the DUT/
1402 SUT. The test procedure consists of three major steps: Step 1
1403 ensures the DUT/SUT is able to reach the performance value (Initial
1404 throughput) and meets the test results validation criteria when it
1405 was very minimal utilized. Step 2 determines the DUT/SUT is able to
1406 reach the target performance value within the test results validation
1407 criteria. Step 3 determines the maximum achievable performance value
1408 within the test results validation criteria.

1410 This test procedure MAY be repeated multiple times with different
1411 IPv4 and IPv6 traffic distribution and HTTP response object sizes.

1413 7.3.4.1. Step 1: Test Initialization and Qualification

1415 Verify the link status of all connected physical interfaces. All
1416 interfaces are expected to be in "UP" status.

1418 Configure traffic load profile of the test equipment to establish
1419 "Initial inspected throughput" as defined in Section 7.3.3.2.

1421 The traffic load profile SHOULD be defined as described in
1422 Section 4.3.4. The DUT/SUT SHOULD reach the "Initial inspected
1423 throughput" during the sustain phase. Measure all KPI as defined in
1424 Section 7.3.3.4.

1426 The measured KPIs during the sustain phase MUST meet the test results
1427 validation criteria "a" defined in Section 7.3.3.3. The test results
1428 validation criteria "b" and "c" are OPTIONAL for step 1.

1430 If the KPI metrics do not meet the test results validation criteria,
1431 the test procedure MUST NOT be continued to "Step 2".

1433 7.3.4.2. Step 2: Test Run with Target Objective

1435 Configure test equipment to establish the target objective ("Target
1436 inspected throughput") defined in Section 7.3.3.2. The test
1437 equipment SHOULD start to measure and record all specified KPIs.
1438 Continue the test until all traffic profile phases are completed.

1440 Within the test results validation criteria, the DUT/SUT is expected
1441 to reach the desired value of the target objective in the sustain
1442 phase. Follow step 3, if the measured value does not meet the target
1443 value or does not fulfill the test results validation criteria.

1445 7.3.4.3. Step 3: Test Iteration

1447 Determine the achievable inspected throughput within the test results
1448 validation criteria and measure the KPI metric Transactions per
1449 Second. Final test iteration MUST be performed for the test duration
1450 defined in Section 4.3.4.

1452 7.4. HTTP Transaction Latency

[nit] It would be nice to have explanatory text explaining why 7.4 requires different
test runs as opposed to just measuring the transaction latency as part of 7.2 and 7.3.
I have not tried to compare in detail the descriptions here to figure out the differences
in test runs, but even if there are differences, why would transaction latency not
also be measured in 7.2 and 7.3 as a metric ?

1454 7.4.1. Objective

1456 Using HTTP traffic, determine the HTTP transaction latency when DUT
1457 is running with sustainable HTTP transactions per second supported by
1458 the DUT/SUT under different HTTP response object sizes.

1460 Test iterations MUST be performed with different HTTP response object
1461 sizes in two different scenarios. One with a single transaction and
1462 the other with multiple transactions within a single TCP connection.
1463 For consistency both the single and multiple transaction test MUST be
1464 configured with the same HTTP version

1466 Scenario 1: The client MUST negotiate HTTP and close the connection
1467 with FIN immediately after completion of a single transaction (GET
1468 and RESPONSE).

1470 Scenario 2: The client MUST negotiate HTTP and close the connection
1471 FIN immediately after completion of 10 transactions (GET and
1472 RESPONSE) within a single TCP connection.

1474 7.4.2. Test Setup

1476 Testbed setup SHOULD be configured as defined in Section 4. Any
1477 specific testbed configuration changes (number of interfaces and
1478 interface type, etc.) MUST be documented.

1480 7.4.3. Test Parameters

1482 In this section, benchmarking test specific parameters SHOULD be
1483 defined.

1485 7.4.3.1. DUT/SUT Configuration Parameters

1487 DUT/SUT parameters MUST conform to the requirements defined in
1488 Section 4.2. Any configuration changes for this specific
1489 benchmarking test MUST be documented.

1491 7.4.3.2. Test Equipment Configuration Parameters

1493 Test equipment configuration parameters MUST conform to the
1494 requirements defined in Section 4.3. The following parameters MUST
1495 be documented for this benchmarking test:

1497 Client IP address range defined in Section 4.3.1.2

1499 Server IP address range defined in Section 4.3.2.2

1501 Traffic distribution ratio between IPv4 and IPv6 defined in
1502 Section 4.3.1.2

1504 Target objective for scenario 1: 50% of the connections per second
1505 measured in benchmarking test TCP/HTTP Connections Per Second
1506 (Section 7.2)

1508 Target objective for scenario 2: 50% of the inspected throughput
1509 measured in benchmarking test HTTP Throughput (Section 7.3)

1511 Initial objective for scenario 1: 10% of "Target objective for
1512 scenario 1"

1514 Initial objective for scenario 2: 10% of "Target objective for
1515 scenario 2"

1517 Note: The Initial objectives are not a KPI to report. These values
1518 are configured on the traffic generator and used to perform the
1519 Step1: "Test Initialization and Qualification" described under the
1520 Section 7.4.4.

1522 HTTP transaction per TCP connection: Test scenario 1 with single
1523 transaction and test scenario 2 with 10 transactions.

1525 HTTP with GET request requesting a single object. The RECOMMENDED
1526 object sizes are 1, 16, and 64 KByte. For each test iteration,
1527 client MUST request a single HTTP response object size.

1529 7.4.3.3. Test Results Validation Criteria

1531 The following criteria are the test results validation criteria. The
1532 Test results validation criteria MUST be monitored during the whole
1533 sustain phase of the traffic load profile.

1535 a. Number of failed application transactions (receiving any HTTP
1536 response code other than 200 OK) MUST be less than 0.001% (1 out
1537 of 100,000 transactions) of attempt transactions.

1539 b. Number of terminated TCP connections due to unexpected TCP RST
1540 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1541 connections) of total initiated TCP connections.

1543 c. During the sustain phase, traffic SHOULD be forwarded at a
1544 constant rate (considered as a constant rate if any deviation of
1545 traffic forwarding rate is less than 5%).

1547 d. Concurrent TCP connections MUST be constant during steady state
1548 and any deviation of concurrent TCP connections SHOULD be less
1549 than 10%. This confirms the DUT opens and closes TCP connections
1550 at approximately the same rate.

1552 e. After ramp up the DUT MUST achieve the "Target objective" defined
1553 in Section 7.4.3.2 and remain in that state for the entire test
1554 duration (sustain phase).

1556 7.4.3.4. Measurement

1558 TTFB (minimum, average, and maximum) and TTLB (minimum, average and
1559 maximum) MUST be reported for each object size.

1561 7.4.4. Test Procedures and Expected Results

1563 The test procedure is designed to measure TTFB or TTLB when the DUT/
1564 SUT is operating close to 50% of its maximum achievable connections
1565 per second or inspected throughput. The test procedure consists of
1566 two major steps: Step 1 ensures the DUT/SUT is able to reach the
1567 initial performance values and meets the test results validation
1568 criteria when it was very minimally utilized. Step 2 measures the
1569 latency values within the test results validation criteria.

1571 This test procedure MAY be repeated multiple times with different IP
1572 types (IPv4 only, IPv6 only and IPv4 and IPv6 mixed traffic
1573 distribution), HTTP response object sizes and single and multiple
1574 transactions per connection scenarios.

1576 7.4.4.1. Step 1: Test Initialization and Qualification

1578 Verify the link status of all connected physical interfaces. All
1579 interfaces are expected to be in "UP" status.

1581 Configure traffic load profile of the test equipment to establish
1582 "Initial objective" as defined in Section 7.4.3.2. The traffic load
1583 profile SHOULD be defined as described in Section 4.3.4.

1585 The DUT/SUT SHOULD reach the "Initial objective" before the sustain
1586 phase. The measured KPIs during the sustain phase MUST meet all the
1587 test results validation criteria defined in Section 7.4.3.3.

1589 If the KPI metrics do not meet the test results validation criteria,
1590 the test procedure MUST NOT be continued to "Step 2".

1592 7.4.4.2. Step 2: Test Run with Target Objective

1594 Configure test equipment to establish "Target objective" defined in
1595 Section 7.4.3.2. The test equipment SHOULD follow the traffic load
1596 profile definition as described in Section 4.3.4.

1598 The test equipment SHOULD start to measure and record all specified
1599 KPIs. Continue the test until all traffic profile phases are
1600 completed.

1602 Within the test results validation criteria, the DUT/SUT MUST reach
1603 the desired value of the target objective in the sustain phase.

1605 Measure the minimum, average, and maximum values of TTFB and TTLB.

1607 7.5. Concurrent TCP/HTTP Connection Capacity

[nit] again a summary comparison of the traffic in 7.5 vs. the prior traffic profiles
would be helpful to understand the benefit of these test runs. Is this about any
real-world reqirement or more a synthetic performance number for unrealistic HTTP
connections (which would still be a useful number IMHO, just want to know) ?

The traffic profile below is somewhat strange because
it defines the rate of GET within a TCP connection based not on real-world application
behavior, but just to create some rate of GET per TCP connection over the steady state.
I guess the goal is something like "measure the maximum sustainable number of TCP/HTTP
connctions, wehreas each connection carries as little as possible traffic and a sufficiently
low number of HTTP (GET) transactions that the DUT is not too much performance loaded
with the HTTP level inspection, but mostly with HTTP/TCP flow maintainance ??

In general, describing for each of the 7.x section upfront the goal and design criteria
of the test runs in those high-level terms is IMHO very beneficial for reviewers to
vet if/how well the detailled description does meet the goals. Otherwise one is somewhat
left puzzling about that question. Aka: enhance the 7.x.1 objective sessions with that
amount of details.

1609 7.5.1. Objective

1611 Determine the number of concurrent TCP connections that the DUT/ SUT
1612 sustains when using HTTP traffic.

1614 7.5.2. Test Setup

1616 Testbed setup SHOULD be configured as defined in Section 4. Any
1617 specific testbed configuration changes (number of interfaces and
1618 interface type, etc.) MUST be documented.

1620 7.5.3. Test Parameters

1622 In this section, benchmarking test specific parameters SHOULD be
1623 defined.

1625 7.5.3.1. DUT/SUT Configuration Parameters

1627 DUT/SUT parameters MUST conform to the requirements defined in
1628 Section 4.2. Any configuration changes for this specific
1629 benchmarking test MUST be documented.

1631 7.5.3.2. Test Equipment Configuration Parameters

1633 Test equipment configuration parameters MUST conform to the
1634 requirements defined in Section 4.3. The following parameters MUST
1635 be noted for this benchmarking test:

1637 Client IP address range defined in Section 4.3.1.2

1639 Server IP address range defined in Section 4.3.2.2

1641 Traffic distribution ratio between IPv4 and IPv6 defined in
1642 Section 4.3.1.2

1644 Target concurrent connection: Initial value from product datasheet
1645 or the value defined based on requirement for a specific
1646 deployment scenario.

1648 Initial concurrent connection: 10% of "Target concurrent
1649 connection" Note: Initial concurrent connection is not a KPI to
1650 report. This value is configured on the traffic generator and
1651 used to perform the Step1: "Test Initialization and Qualification"
1652 described under the Section 7.5.4.

1654 Maximum connections per second during ramp up phase: 50% of
1655 maximum connections per second measured in benchmarking test TCP/
1656 HTTP Connections per second (Section 7.2)

1658 Ramp up time (in traffic load profile for "Target concurrent
1659 connection"): "Target concurrent connection" / "Maximum
1660 connections per second during ramp up phase"

1662 Ramp up time (in traffic load profile for "Initial concurrent
1663 connection"): "Initial concurrent connection" / "Maximum
1664 connections per second during ramp up phase"

1666 The client MUST negotiate HTTP and each client MAY open multiple
1667 concurrent TCP connections per server endpoint IP.

1669 Each client sends 10 GET requests requesting 1 KByte HTTP response
1670 object in the same TCP connection (10 transactions/TCP connection)
1671 and the delay (think time) between each transaction MUST be X
1672 seconds.

1674 X = ("Ramp up time" + "steady state time") /10

1676 The established connections SHOULD remain open until the ramp down
1677 phase of the test. During the ramp down phase, all connections
1678 SHOULD be successfully closed with FIN.

1680 7.5.3.3. Test Results Validation Criteria

1682 The following criteria are the test results validation criteria. The
1683 Test results validation criteria MUST be monitored during the whole
1684 sustain phase of the traffic load profile.

1686 a. Number of failed application transactions (receiving any HTTP
1687 response code other than 200 OK) MUST be less than 0.001% (1 out
1688 of 100,000 transaction) of total attempted transactions.

1690 b. Number of terminated TCP connections due to unexpected TCP RST
1691 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1692 connections) of total initiated TCP connections.

1694 c. During the sustain phase, traffic SHOULD be forwarded at a
1695 constant rate (considered as a constant rate if any deviation of
1696 traffic forwarding rate is less than 5%).

1698 7.5.3.4. Measurement

1700 Average Concurrent TCP Connections MUST be reported for this
1701 benchmarking test.

1703 7.5.4. Test Procedures and Expected Results

1705 The test procedure is designed to measure the concurrent TCP
1706 connection capacity of the DUT/SUT at the sustaining period of
1707 traffic load profile. The test procedure consists of three major
1708 steps: Step 1 ensures the DUT/SUT is able to reach the performance
1709 value (Initial concurrent connection) and meets the test results
1710 validation criteria when it was very minimally utilized. Step 2
1711 determines the DUT/SUT is able to reach the target performance value
1712 within the test results validation criteria. Step 3 determines the
1713 maximum achievable performance value within the test results
1714 validation criteria.

1716 This test procedure MAY be repeated multiple times with different
1717 IPv4 and IPv6 traffic distribution.

1719 7.5.4.1. Step 1: Test Initialization and Qualification

1721 Verify the link status of all connected physical interfaces. All
1722 interfaces are expected to be in "UP" status.

1724 Configure test equipment to establish "Initial concurrent TCP
1725 connections" defined in Section 7.5.3.2. Except ramp up time, the
1726 traffic load profile SHOULD be defined as described in Section 4.3.4.

1728 During the sustain phase, the DUT/SUT SHOULD reach the "Initial
1729 concurrent TCP connections". The measured KPIs during the sustain
1730 phase MUST meet all the test results validation criteria defined in
1731 Section 7.5.3.3.

1733 If the KPI metrics do not meet the test results validation criteria,
1734 the test procedure MUST NOT be continued to "Step 2".

1736 7.5.4.2. Step 2: Test Run with Target Objective

1738 Configure test equipment to establish the target objective ("Target
1739 concurrent TCP connections"). The test equipment SHOULD follow the
1740 traffic load profile definition (except ramp up time) as described in
1741 Section 4.3.4.

1743 During the ramp up and sustain phase, the other KPIs such as
1744 inspected throughput, TCP connections per second, and application
1745 transactions per second MUST NOT reach the maximum value the DUT/SUT
1746 can support.

1748 The test equipment SHOULD start to measure and record KPIs defined in
1749 Section 7.5.3.4. Continue the test until all traffic profile phases
1750 are completed.

1752 Within the test results validation criteria, the DUT/SUT is expected
1753 to reach the desired value of the target objective in the sustain
1754 phase. Follow step 3, if the measured value does not meet the target
1755 value or does not fulfill the test results validation criteria.

1757 7.5.4.3. Step 3: Test Iteration

1759 Determine the achievable concurrent TCP connections capacity within
1760 the test results validation criteria.

1762 7.6. TCP/HTTPS Connections per Second

[minor] The one big performance factor that i think is not documented or suggested
to be compared is the cost of certificate (chain) validation for different key-length
certificates used for the TCP/HTTPs connections. The parameters for TLS 1.2 and TLS 1.3
mentioned in before in the document do not cover that. I think it would be prudent
to figure out an Internet common minimum (fastest to process) certificate and a
common maximum complexity certificate. The latter one may simply be when revocation
is enabled, e.g.: checking the server certificate against a revocation list.

Just saying because server certificate verification may monopolise connection setup
performance - unless you want to make the argument that it is irrelevant because
due to the limited number of servers in the test, the DUT is assumed/known to be able
to cache server certificate validation results during ramput phase so it does become
irrelevant during steady state phase. But it would be at least good to describe this in text.

1763 7.6.1. Objective

1765 Using HTTPS traffic, determine the sustainable SSL/TLS session
1766 establishment rate supported by the DUT/SUT under different
1767 throughput load conditions.

1769 Test iterations MUST include common cipher suites and key strengths
1770 as well as forward looking stronger keys. Specific test iterations
1771 MUST include ciphers and keys defined in Section 7.6.3.2.

1773 For each cipher suite and key strengths, test iterations MUST use a
1774 single HTTPS response object size defined in Section 7.6.3.2 to
1775 measure connections per second performance under a variety of DUT/SUT
1776 security inspection load conditions.

1778 7.6.2. Test Setup

1780 Testbed setup SHOULD be configured as defined in Section 4. Any
1781 specific testbed configuration changes (number of interfaces and
1782 interface type, etc.) MUST be documented.

1784 7.6.3. Test Parameters

1786 In this section, benchmarking test specific parameters SHOULD be
1787 defined.

1789 7.6.3.1. DUT/SUT Configuration Parameters

1791 DUT/SUT parameters MUST conform to the requirements defined in
1792 Section 4.2. Any configuration changes for this specific
1793 benchmarking test MUST be documented.

1795 7.6.3.2. Test Equipment Configuration Parameters

1797 Test equipment configuration parameters MUST conform to the
1798 requirements defined in Section 4.3. The following parameters MUST
1799 be documented for this benchmarking test:

1801 Client IP address range defined in Section 4.3.1.2

1803 Server IP address range defined in Section 4.3.2.2

1805 Traffic distribution ratio between IPv4 and IPv6 defined in
1806 Section 4.3.1.2

1808 Target connections per second: Initial value from product datasheet
1809 or the value defined based on requirement for a specific deployment
1810 scenario.

1812 Initial connections per second: 10% of "Target connections per
1813 second" Note: Initial connections per second is not a KPI to report.
1814 This value is configured on the traffic generator and used to perform
1815 the Step1: "Test Initialization and Qualification" described under
1816 the Section 7.6.4.

1818 RECOMMENDED ciphers and keys defined in Section 4.3.1.3

1820 The client MUST negotiate HTTPS and close the connection with FIN
1821 immediately after completion of one transaction. In each test
1822 iteration, client MUST send GET request requesting a fixed HTTPS
1823 response object size. The RECOMMENDED object sizes are 1, 2, 4, 16,
1824 and 64 KByte.

1826 7.6.3.3. Test Results Validation Criteria

1828 The following criteria are the test results validation criteria. The
1829 test results validation criteria MUST be monitored during the whole
1830 test duration.

1832 a. Number of failed application transactions (receiving any HTTP
1833 response code other than 200 OK) MUST be less than 0.001% (1 out
1834 of 100,000 transactions) of attempt transactions.

1836 b. Number of terminated TCP connections due to unexpected TCP RST
1837 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
1838 connections) of total initiated TCP connections.

1840 c. During the sustain phase, traffic SHOULD be forwarded at a
1841 constant rate (considered as a constant rate if any deviation of
1842 traffic forwarding rate is less than 5%).

1844 d. Concurrent TCP connections MUST be constant during steady state
1845 and any deviation of concurrent TCP connections SHOULD be less
1846 than 10%. This confirms the DUT opens and closes TCP connections
1847 at approximately the same rate.

1849 7.6.3.4. Measurement

1851 TCP connections per second MUST be reported for each test iteration
1852 (for each object size).

1854 The KPI metric TLS Handshake Rate can be measured in the test using 1
1855 KByte object size.

1857 7.6.4. Test Procedures and Expected Results

1859 The test procedure is designed to measure the TCP connections per
1860 second rate of the DUT/SUT at the sustaining period of traffic load
1861 profile. The test procedure consists of three major steps: Step 1
1862 ensures the DUT/SUT is able to reach the performance value (Initial
1863 connections per second) and meets the test results validation
1864 criteria when it was very minimally utilized. Step 2 determines the
1865 DUT/SUT is able to reach the target performance value within the test
1866 results validation criteria. Step 3 determines the maximum
1867 achievable performance value within the test results validation
1868 criteria.

1870 This test procedure MAY be repeated multiple times with different
1871 IPv4 and IPv6 traffic distribution.

1873 7.6.4.1. Step 1: Test Initialization and Qualification

1875 Verify the link status of all connected physical interfaces. All
1876 interfaces are expected to be in "UP" status.

1878 Configure traffic load profile of the test equipment to establish
1879 "Initial connections per second" as defined in Section 7.6.3.2. The
1880 traffic load profile SHOULD be defined as described in Section 4.3.4.

1882 The DUT/SUT SHOULD reach the "Initial connections per second" before
1883 the sustain phase. The measured KPIs during the sustain phase MUST
1884 meet all the test results validation criteria defined in
1885 Section 7.6.3.3.

1887 If the KPI metrics do not meet the test results validation criteria,
1888 the test procedure MUST NOT be continued to "Step 2".

1890 7.6.4.2. Step 2: Test Run with Target Objective

1892 Configure test equipment to establish "Target connections per second"
1893 defined in Section 7.6.3.2. The test equipment SHOULD follow the
1894 traffic load profile definition as described in Section 4.3.4.

1896 During the ramp up and sustain phase, other KPIs such as inspected
1897 throughput, concurrent TCP connections, and application transactions
1898 per second MUST NOT reach the maximum value the DUT/SUT can support.
1899 The test results for specific test iteration SHOULD NOT be reported,
1900 if the above mentioned KPI (especially inspected throughput) reaches
1901 the maximum value. (Example: If the test iteration with 64 KByte of
1902 HTTPS response object size reached the maximum inspected throughput
1903 limitation of the DUT, the test iteration MAY be interrupted and the
1904 result for 64 KByte SHOULD NOT be reported).

1906 The test equipment SHOULD start to measure and record all specified
1907 KPIs. Continue the test until all traffic profile phases are
1908 completed.

1910 Within the test results validation criteria, the DUT/SUT is expected
1911 to reach the desired value of the target objective ("Target
1912 connections per second") in the sustain phase. Follow step 3, if the
1913 measured value does not meet the target value or does not fulfill the
1914 test results validation criteria.

1916 7.6.4.3. Step 3: Test Iteration

1918 Determine the achievable connections per second within the test
1919 results validation criteria.

1921 7.7. HTTPS Throughput

1923 7.7.1. Objective

1925 Determine the sustainable inspected throughput of the DUT/SUT for
1926 HTTPS transactions varying the HTTPS response object size.

1928 Test iterations MUST include common cipher suites and key strengths
1929 as well as forward looking stronger keys. Specific test iterations
1930 MUST include the ciphers and keys defined in Section 7.7.3.2.

1932 7.7.2. Test Setup

1934 Testbed setup SHOULD be configured as defined in Section 4. Any
1935 specific testbed configuration changes (number of interfaces and
1936 interface type, etc.) MUST be documented.

1938 7.7.3. Test Parameters

1940 In this section, benchmarking test specific parameters SHOULD be
1941 defined.

1943 7.7.3.1. DUT/SUT Configuration Parameters

1945 DUT/SUT parameters MUST conform to the requirements defined in
1946 Section 4.2. Any configuration changes for this specific
1947 benchmarking test MUST be documented.

1949 7.7.3.2. Test Equipment Configuration Parameters

1951 Test equipment configuration parameters MUST conform to the
1952 requirements defined in Section 4.3. The following parameters MUST
1953 be documented for this benchmarking test:

1955 Client IP address range defined in Section 4.3.1.2

1957 Server IP address range defined in Section 4.3.2.2

1959 Traffic distribution ratio between IPv4 and IPv6 defined in
1960 Section 4.3.1.2

1962 Target inspected throughput: Aggregated line rate of interface(s)
1963 used in the DUT/SUT or the value defined based on requirement for a
1964 specific deployment scenario.

1966 Initial throughput: 10% of "Target inspected throughput" Note:
1967 Initial throughput is not a KPI to report. This value is configured
1968 on the traffic generator and used to perform the Step1: "Test
1969 Initialization and Qualification" described under the Section 7.7.4.

1971 Number of HTTPS response object requests (transactions) per
1972 connection: 10

1974 RECOMMENDED ciphers and keys defined in Section 4.3.1.3

1976 RECOMMENDED HTTPS response object size: 1, 16, 64, 256 KByte, and
1977 mixed objects defined in Table 4 under Section 7.3.3.2.

1979 7.7.3.3. Test Results Validation Criteria

1981 The following criteria are the test results validation criteria. The
1982 test results validation criteria MUST be monitored during the whole
1983 sustain phase of the traffic load profile.

1985 a. Number of failed Application transactions (receiving any HTTP
1986 response code other than 200 OK) MUST be less than 0.001% (1 out
1987 of 100,000 transactions) of attempt transactions.

1989 b. Traffic SHOULD be forwarded at a constant rate (considered as a
1990 constant rate if any deviation of traffic forwarding rate is less
1991 than 5%).

1993 c. Concurrent TCP connections MUST be constant during steady state
1994 and any deviation of concurrent TCP connections SHOULD be less
1995 than 10%. This confirms the DUT opens and closes TCP connections
1996 at approximately the same rate.

1998 7.7.3.4. Measurement

2000 Inspected Throughput and HTTP Transactions per Second MUST be
2001 reported for each object size.

2003 7.7.4. Test Procedures and Expected Results

2005 The test procedure consists of three major steps: Step 1 ensures the
2006 DUT/SUT is able to reach the performance value (Initial throughput)
2007 and meets the test results validation criteria when it was very
2008 minimally utilized. Step 2 determines the DUT/SUT is able to reach
2009 the target performance value within the test results validation
2010 criteria. Step 3 determines the maximum achievable performance value
2011 within the test results validation criteria.

2013 This test procedure MAY be repeated multiple times with different
2014 IPv4 and IPv6 traffic distribution and HTTPS response object sizes.

2016 7.7.4.1. Step 1: Test Initialization and Qualification

2018 Verify the link status of all connected physical interfaces. All
2019 interfaces are expected to be in "UP" status.

2021 Configure traffic load profile of the test equipment to establish
2022 "Initial throughput" as defined in Section 7.7.3.2.

2024 The traffic load profile SHOULD be defined as described in
2025 Section 4.3.4. The DUT/SUT SHOULD reach the "Initial throughput"
2026 during the sustain phase. Measure all KPI as defined in
2027 Section 7.7.3.4.

2029 The measured KPIs during the sustain phase MUST meet the test results
2030 validation criteria "a" defined in Section 7.7.3.3. The test results
2031 validation criteria "b" and "c" are OPTIONAL for step 1.

2033 If the KPI metrics do not meet the test results validation criteria,
2034 the test procedure MUST NOT be continued to "Step 2".

2036 7.7.4.2. Step 2: Test Run with Target Objective

2038 Configure test equipment to establish the target objective ("Target
2039 inspected throughput") defined in Section 7.7.3.2. The test
2040 equipment SHOULD start to measure and record all specified KPIs.
2041 Continue the test until all traffic profile phases are completed.

2043 Within the test results validation criteria, the DUT/SUT is expected
2044 to reach the desired value of the target objective in the sustain
2045 phase. Follow step 3, if the measured value does not meet the target
2046 value or does not fulfill the test results validation criteria.

2048 7.7.4.3. Step 3: Test Iteration

2050 Determine the achievable average inspected throughput within the test
2051 results validation criteria. Final test iteration MUST be performed
2052 for the test duration defined in Section 4.3.4.

2054 7.8. HTTPS Transaction Latency

2056 7.8.1. Objective

2058 Using HTTPS traffic, determine the HTTPS transaction latency when
2059 DUT/SUT is running with sustainable HTTPS transactions per second
2060 supported by the DUT/SUT under different HTTPS response object size.

2062 Scenario 1: The client MUST negotiate HTTPS and close the connection
2063 with FIN immediately after completion of a single transaction (GET
2064 and RESPONSE).

2066 Scenario 2: The client MUST negotiate HTTPS and close the connection
2067 with FIN immediately after completion of 10 transactions (GET and
2068 RESPONSE) within a single TCP connection.

2070 7.8.2. Test Setup

2072 Testbed setup SHOULD be configured as defined in Section 4. Any
2073 specific testbed configuration changes (number of interfaces and
2074 interface type, etc.) MUST be documented.

2076 7.8.3. Test Parameters

2078 In this section, benchmarking test specific parameters SHOULD be
2079 defined.

2081 7.8.3.1. DUT/SUT Configuration Parameters

2083 DUT/SUT parameters MUST conform to the requirements defined in
2084 Section 4.2. Any configuration changes for this specific
2085 benchmarking test MUST be documented.

2087 7.8.3.2. Test Equipment Configuration Parameters

2089 Test equipment configuration parameters MUST conform to the
2090 requirements defined in Section 4.3. The following parameters MUST
2091 be documented for this benchmarking test:

2093 Client IP address range defined in Section 4.3.1.2

2095 Server IP address range defined in Section 4.3.2.2
2096 Traffic distribution ratio between IPv4 and IPv6 defined in
2097 Section 4.3.1.2

2099 RECOMMENDED cipher suites and key sizes defined in Section 4.3.1.3

2101 Target objective for scenario 1: 50% of the connections per second
2102 measured in benchmarking test TCP/HTTPS Connections per second
2103 (Section 7.6)

2105 Target objective for scenario 2: 50% of the inspected throughput
2106 measured in benchmarking test HTTPS Throughput (Section 7.7)

2108 Initial objective for scenario 1: 10% of "Target objective for
2109 scenario 1"

2111 Initial objective for scenario 2: 10% of "Target objective for
2112 scenario 2"

2114 Note: The Initial objectives are not a KPI to report. These values
2115 are configured on the traffic generator and used to perform the
2116 Step1: "Test Initialization and Qualification" described under the
2117 Section 7.8.4.

2119 HTTPS transaction per TCP connection: Test scenario 1 with single
2120 transaction and scenario 2 with 10 transactions

2122 HTTPS with GET request requesting a single object. The RECOMMENDED
2123 object sizes are 1, 16, and 64 KByte. For each test iteration,
2124 client MUST request a single HTTPS response object size.

2126 7.8.3.3. Test Results Validation Criteria

2128 The following criteria are the test results validation criteria. The
2129 Test results validation criteria MUST be monitored during the whole
2130 sustain phase of the traffic load profile.

2132 a. Number of failed application transactions (receiving any HTTP
2133 response code other than 200 OK) MUST be less than 0.001% (1 out
2134 of 100,000 transactions) of attempt transactions.

2136 b. Number of terminated TCP connections due to unexpected TCP RST
2137 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
2138 connections) of total initiated TCP connections.

2140 c. During the sustain phase, traffic SHOULD be forwarded at a
2141 constant rate (considered as a constant rate if any deviation of
2142 traffic forwarding rate is less than 5%).

2144 d. Concurrent TCP connections MUST be constant during steady state
2145 and any deviation of concurrent TCP connections SHOULD be less
2146 than 10%. This confirms the DUT opens and closes TCP connections
2147 at approximately the same rate.

2149 e. After ramp up the DUT/SUT MUST achieve the "Target objective"
2150 defined in the parameter Section 7.8.3.2 and remain in that state
2151 for the entire test duration (sustain phase).

2153 7.8.3.4. Measurement

2155 TTFB (minimum, average, and maximum) and TTLB (minimum, average and
2156 maximum) MUST be reported for each object size.

2158 7.8.4. Test Procedures and Expected Results

2160 The test procedure is designed to measure TTFB or TTLB when the DUT/
2161 SUT is operating close to 50% of its maximum achievable connections
2162 per second or inspected throughput. The test procedure consists of
2163 two major steps: Step 1 ensures the DUT/SUT is able to reach the
2164 initial performance values and meets the test results validation
2165 criteria when it was very minimally utilized. Step 2 measures the
2166 latency values within the test results validation criteria.

2168 This test procedure MAY be repeated multiple times with different IP
2169 types (IPv4 only, IPv6 only and IPv4 and IPv6 mixed traffic
2170 distribution), HTTPS response object sizes and single, and multiple
2171 transactions per connection scenarios.

2173 7.8.4.1. Step 1: Test Initialization and Qualification

2175 Verify the link status of all connected physical interfaces. All
2176 interfaces are expected to be in "UP" status.

2178 Configure traffic load profile of the test equipment to establish
2179 "Initial objective" as defined in the Section 7.8.3.2. The traffic
2180 load profile SHOULD be defined as described in Section 4.3.4.

2182 The DUT/SUT SHOULD reach the "Initial objective" before the sustain
2183 phase. The measured KPIs during the sustain phase MUST meet all the
2184 test results validation criteria defined in Section 7.8.3.3.

2186 If the KPI metrics do not meet the test results validation criteria,
2187 the test procedure MUST NOT be continued to "Step 2".

2189 7.8.4.2. Step 2: Test Run with Target Objective

2191 Configure test equipment to establish "Target objective" defined in
2192 Section 7.8.3.2. The test equipment SHOULD follow the traffic load
2193 profile definition as described in Section 4.3.4.

2195 The test equipment SHOULD start to measure and record all specified
2196 KPIs. Continue the test until all traffic profile phases are
2197 completed.

2199 Within the test results validation criteria, the DUT/SUT MUST reach
2200 the desired value of the target objective in the sustain phase.

2202 Measure the minimum, average, and maximum values of TTFB and TTLB.

2204 7.9. Concurrent TCP/HTTPS Connection Capacity

2206 7.9.1. Objective

2208 Determine the number of concurrent TCP connections the DUT/SUT
2209 sustains when using HTTPS traffic.

2211 7.9.2. Test Setup

2213 Testbed setup SHOULD be configured as defined in Section 4. Any
2214 specific testbed configuration changes (number of interfaces and
2215 interface type, etc.) MUST be documented.

2217 7.9.3. Test Parameters

2219 In this section, benchmarking test specific parameters SHOULD be
2220 defined.

2222 7.9.3.1. DUT/SUT Configuration Parameters

2224 DUT/SUT parameters MUST conform to the requirements defined in
2225 Section 4.2. Any configuration changes for this specific
2226 benchmarking test MUST be documented.

2228 7.9.3.2. Test Equipment Configuration Parameters

2230 Test equipment configuration parameters MUST conform to the
2231 requirements defined in Section 4.3. The following parameters MUST
2232 be documented for this benchmarking test:

2234 Client IP address range defined in Section 4.3.1.2

2236 Server IP address range defined in Section 4.3.2.2
2237 Traffic distribution ratio between IPv4 and IPv6 defined in
2238 Section 4.3.1.2

2240 RECOMMENDED cipher suites and key sizes defined in Section 4.3.1.3

2242 Target concurrent connections: Initial value from product
2243 datasheet or the value defined based on requirement for a specific
2244 deployment scenario.

2246 Initial concurrent connections: 10% of "Target concurrent
2247 connections" Note: Initial concurrent connection is not a KPI to
2248 report. This value is configured on the traffic generator and
2249 used to perform the Step1: "Test Initialization and Qualification"
2250 described under the Section 7.9.4.

2252 Connections per second during ramp up phase: 50% of maximum
2253 connections per second measured in benchmarking test TCP/HTTPS
2254 Connections per second (Section 7.6)

2256 Ramp up time (in traffic load profile for "Target concurrent
2257 connections"): "Target concurrent connections" / "Maximum
2258 connections per second during ramp up phase"

2260 Ramp up time (in traffic load profile for "Initial concurrent
2261 connections"): "Initial concurrent connections" / "Maximum
2262 connections per second during ramp up phase"

2264 The client MUST perform HTTPS transaction with persistence and each
2265 client can open multiple concurrent TCP connections per server
2266 endpoint IP.

2268 Each client sends 10 GET requests requesting 1 KByte HTTPS response
2269 objects in the same TCP connections (10 transactions/TCP connection)
2270 and the delay (think time) between each transaction MUST be X
2271 seconds.

2273 X = ("Ramp up time" + "steady state time") /10

2275 The established connections SHOULD remain open until the ramp down
2276 phase of the test. During the ramp down phase, all connections
2277 SHOULD be successfully closed with FIN.

2279 7.9.3.3. Test Results Validation Criteria

2281 The following criteria are the test results validation criteria. The
2282 Test results validation criteria MUST be monitored during the whole
2283 sustain phase of the traffic load profile.

2285 a. Number of failed application transactions (receiving any HTTP
2286 response code other than 200 OK) MUST be less than 0.001% (1 out
2287 of 100,000 transactions) of total attempted transactions.

2289 b. Number of terminated TCP connections due to unexpected TCP RST
2290 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000
2291 connections) of total initiated TCP connections.

2293 c. During the sustain phase, traffic SHOULD be forwarded at a
2294 constant rate (considered as a constant rate if any deviation of
2295 traffic forwarding rate is less than 5%).

2297 7.9.3.4. Measurement

2299 Average Concurrent TCP Connections MUST be reported for this
2300 benchmarking test.

2302 7.9.4. Test Procedures and Expected Results

2304 The test procedure is designed to measure the concurrent TCP
2305 connection capacity of the DUT/SUT at the sustaining period of
2306 traffic load profile. The test procedure consists of three major
2307 steps: Step 1 ensures the DUT/SUT is able to reach the performance
2308 value (Initial concurrent connection) and meets the test results
2309 validation criteria when it was very minimally utilized. Step 2
2310 determines the DUT/SUT is able to reach the target performance value
2311 within the test results validation criteria. Step 3 determines the
2312 maximum achievable performance value within the test results
2313 validation criteria.

2315 This test procedure MAY be repeated multiple times with different
2316 IPv4 and IPv6 traffic distribution.

2318 7.9.4.1. Step 1: Test Initialization and Qualification

2320 Verify the link status of all connected physical interfaces. All
2321 interfaces are expected to be in "UP" status.

2323 Configure test equipment to establish "Initial concurrent TCP
2324 connections" defined in Section 7.9.3.2. Except ramp up time, the
2325 traffic load profile SHOULD be defined as described in Section 4.3.4.

2327 During the sustain phase, the DUT/SUT SHOULD reach the "Initial
2328 concurrent TCP connections". The measured KPIs during the sustain
2329 phase MUST meet the test results validation criteria "a" and "b"
2330 defined in Section 7.9.3.3.

2332 If the KPI metrics do not meet the test results validation criteria,
2333 the test procedure MUST NOT be continued to "Step 2".

2335 7.9.4.2. Step 2: Test Run with Target Objective

2337 Configure test equipment to establish the target objective ("Target
2338 concurrent TCP connections"). The test equipment SHOULD follow the
2339 traffic load profile definition (except ramp up time) as described in
2340 Section 4.3.4.

2342 During the ramp up and sustain phase, the other KPIs such as
2343 inspected throughput, TCP connections per second, and application
2344 transactions per second MUST NOT reach to the maximum value that the
2345 DUT/SUT can support.

2347 The test equipment SHOULD start to measure and record KPIs defined in
2348 Section 7.9.3.4. Continue the test until all traffic profile phases
2349 are completed.

2351 Within the test results validation criteria, the DUT/SUT is expected
2352 to reach the desired value of the target objective in the sustain
2353 phase. Follow step 3, if the measured value does not meet the target
2354 value or does not fulfill the test results validation criteria.

2356 7.9.4.3. Step 3: Test Iteration

2358 Determine the achievable concurrent TCP connections within the test
2359 results validation criteria.

[mayor] I would really love to see DUT power consumption numbers captured and reported
for the 10% and the maximum achieved rates for the 7.x tests (during steady state).

Energy consumption is becoming a more and more important factor in networking, and all the
high-touch operations of security devices are amongst the most power/compute hungry operations
of any network device, but with a wide variety depending on how its implemented.
Its also extremely simple to just plug a power-meter into the supply line of the DUT.

This would encourage DUT vendors to reduce power consumption, something that often
can be achieved by just selecting appropriate components (lowest power CPU options, going
FPGA etc.. routes).

Personally, i am of course also interested in easily derived performance factors such as
comparing 100% power consumption for the HTTP vs. HTTPS case - cost of end-to-end security
that is. If a DUT just shows linerate for both HTTP and HTTPS, but with double the
power consumption when using HTTPs, that may even impact deployment - even in small cases
with a small 19" rack, some ventilation and some amount of power - 100..500W makes a difference
whethre its 100 or 500W.

2361 8. IANA Considerations

2363 This document makes no specific request of IANA.

2365 The IANA has assigned IPv4 and IPv6 address blocks in [RFC6890] that
2366 have been registered for special purposes. The IPv6 address block
2367 2001:2::/48 has been allocated for the purpose of IPv6 Benchmarking
2368 [RFC5180] and the IPv4 address block 198.18.0.0/15 has been allocated
2369 for the purpose of IPv4 Benchmarking [RFC2544]. This assignment was
2370 made to minimize the chance of conflict in case a testing device were
2371 to be accidentally connected to part of the Internet.

[minor] I don't think the secnd paragraph belongs into an IANA considerations
section. This section is usually resesrved only for actions IANA is supposed to
take for this document. I would suggest to move this paragraph to an earlier
section, maybe even simply make one up "Addressing for tests".

2373 9. Security Considerations

2375 The primary goal of this document is to provide benchmarking
2376 terminology and methodology for next-generation network security
2377 devices for use in a laboratory isolated test environment. However,
2378 readers should be aware that there is some overlap between
2379 performance and security issues. Specifically, the optimal
2380 configuration for network security device performance may not be the
2381 most secure, and vice-versa. The cipher suites recommended in this
2382 document are for test purpose only. The cipher suite recommendation
2383 for a real deployment is outside the scope of this document.

2385 10. Contributors

2387 The following individuals contributed significantly to the creation
2388 of this document:

2390 Alex Samonte, Amritam Putatunda, Aria Eslambolchizadeh, Chao Guo,
2391 Chris Brown, Cory Ford, David DeSanto, Jurrie Van Den Breekel,
2392 Michelle Rhines, Mike Jack, Ryan Liles, Samaresh Nair, Stephen
2393 Goudreault, Tim Carlin, and Tim Otto.

2395 11. Acknowledgements

2397 The authors wish to acknowledge the members of NetSecOPEN for their
2398 participation in the creation of this document. Additionally, the
2399 following members need to be acknowledged:

2401 Anand Vijayan, Chris Marshall, Jay Lindenauer, Michael Shannon, Mike
2402 Deichman, Ryan Riese, and Toulnay Orkun.

2404 12. References

2406 12.1. Normative References

2408 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
2409 Requirement Levels", BCP 14, RFC 2119,
2410 DOI 10.17487/RFC2119, March 1997,
2411 <https://www.rfc-editor.org/info/rfc2119>.

2413 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2414 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
2415 May 2017, <https://www.rfc-editor.org/info/rfc8174>.

2417 12.2. Informative References

2419 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for
2420 Network Interconnect Devices", RFC 2544,
2421 DOI 10.17487/RFC2544, March 1999,
2422 <https://www.rfc-editor.org/info/rfc2544>.

2424 [RFC2647] Newman, D., "Benchmarking Terminology for Firewall
2425 Performance", RFC 2647, DOI 10.17487/RFC2647, August 1999,
2426 <https://www.rfc-editor.org/info/rfc2647>.

2428 [RFC3511] Hickman, B., Newman, D., Tadjudin, S., and T. Martin,
2429 "Benchmarking Methodology for Firewall Performance",
2430 RFC 3511, DOI 10.17487/RFC3511, April 2003,
2431 <https://www.rfc-editor.org/info/rfc3511>.

2433 [RFC5180] Popoviciu, C., Hamza, A., Van de Velde, G., and D.
2434 Dugatkin, "IPv6 Benchmarking Methodology for Network
2435 Interconnect Devices", RFC 5180, DOI 10.17487/RFC5180, May
2436 2008, <https://www.rfc-editor.org/info/rfc5180>.

2438 [RFC6815] Bradner, S., Dubray, K., McQuaid, J., and A. Morton,
2439 "Applicability Statement for RFC 2544: Use on Production
2440 Networks Considered Harmful", RFC 6815,
2441 DOI 10.17487/RFC6815, November 2012,
2442 <https://www.rfc-editor.org/info/rfc6815>.

2444 [RFC6890] Cotton, M., Vegoda, L., Bonica, R., Ed., and B. Haberman,
2445 "Special-Purpose IP Address Registries", BCP 153,
2446 RFC 6890, DOI 10.17487/RFC6890, April 2013,
2447 <https://www.rfc-editor.org/info/rfc6890>.

2449 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
2450 Protocol (HTTP/1.1): Message Syntax and Routing",
2451 RFC 7230, DOI 10.17487/RFC7230, June 2014,
2452 <https://www.rfc-editor.org/info/rfc7230>.

2454 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol
2455 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
2456 <https://www.rfc-editor.org/info/rfc8446>.

2458 [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
2459 Multiplexed and Secure Transport", RFC 9000,
2460 DOI 10.17487/RFC9000, May 2021,
2461 <https://www.rfc-editor.org/info/rfc9000>.

2463 Appendix A. Test Methodology - Security Effectiveness Evaluation

[nit] /Evaluation/Test/ - called test in the rest of this doc.

2464 A.1. Test Objective

2466 This test methodology verifies the DUT/SUT is able to detect,

[nit] /verifies the/ verifies that the/

2467 prevent, and report the vulnerabilities.

2469 In this test, background test traffic will be generated to utilize
2470 the DUT/SUT. In parallel, the CVEs will be sent to the DUT/SUT as
2471 encrypted and as well as clear text payload formats using a traffic
2472 generator. The selection of the CVEs is described in Section 4.2.1.

2474 The following KPIs are measured in this test:

2476 * Number of blocked CVEs

2478 * Number of bypassed (nonblocked) CVEs

2480 * Background traffic performance (verify if the background traffic
2481 is impacted while sending CVE toward DUT/SUT)

2483 * Accuracy of DUT/SUT statistics in term of vulnerabilities
2484 reporting

2486 A.2. Testbed Setup

2488 The same testbed MUST be used for security effectiveness test and as
2489 well as for benchmarking test cases defined in Section 7.

2491 A.3. Test Parameters

2493 In this section, the benchmarking test specific parameters SHOULD be
2494 defined.

[nit] /SHOULD/are/ - a requirement against the authors of the document to write
desirable text in the document is not normative.

2496 A.3.1. DUT/SUT Configuration Parameters

2498 DUT/SUT configuration parameters MUST conform to the requirements
2499 defined in Section 4.2. The same DUT configuration MUST be used for
2500 Security effectiveness test and as well as for benchmarking test
2501 cases defined in Section 7. The DUT/SUT MUST be configured in inline
2502 mode and all detected attack traffic MUST be dropped and the session

[nit] /detected traffic/detected CVE traffic/ - there is also background traffic, which i guess shouldnot be dropped, right ?

[nit] /the session/its session/ ?

2503 SHOULD be reset

2505 A.3.2. Test Equipment Configuration Parameters

2507 Test equipment configuration parameters MUST conform to the
2508 requirements defined in Section 4.3. The same client and server IP
2509 ranges MUST be configured as used in the benchmarking test cases. In
2510 addition, the following parameters MUST be documented for this
2511 benchmarking test:

2513 * Background Traffic: 45% of maximum HTTP throughput and 45% of
2514 Maximum HTTPS throughput supported by the DUT/SUT (measured with
2515 object size 64 KByte in the benchmarking tests "HTTP(S)
2516 Throughput" defined in Section 7.3 and Section 7.7).

[nit] RECOMMENDED Background Traffic ?

2518 * RECOMMENDED CVE traffic transmission Rate: 10 CVEs per second

2520 * It is RECOMMENDED to generate each CVE multiple times
2521 (sequentially) at 10 CVEs per second

2523 * Ciphers and keys for the encrypted CVE traffic MUST use the same
2524 cipher configured for HTTPS traffic related benchmarking tests
2525 (Section 7.6 - Section 7.9)

2527 A.4. Test Results Validation Criteria

2529 The following criteria are the test results validation criteria. The
2530 test results validation criteria MUST be monitored during the whole
2531 test duration.

[nit] /criteria are/lists/ - duplication of criteria in sentence.

2533 a. Number of failed application transaction in the background
2534 traffic MUST be less than 0.01% of attempted transactions.

2536 b. Number of terminated TCP connections of the background traffic
2537 (due to unexpected TCP RST sent by DUT/SUT) MUST be less than
2538 0.01% of total initiated TCP connections in the background
2539 traffic.

[comment] That is quite high. Shouldn't this at least be 5 nines of
success ? 99.999% -> 0.001% maximum rate of errors ? I thought thats the common
lore service provider product quality requirement minimum.

2541 c. During the sustain phase, traffic SHOULD be forwarded at a
2542 constant rate (considered as a constant rate if any deviation of
2543 traffic forwarding rate is less than 5%).

[minor] This seems underspecified. I guess in the ideally behaving DUT case
all background traffic is passed unmodified and all CVE connection traffic is dropped.
So the total amount of traffic with CVE events must be configured to be less then
5% ?! What additional information would this 5% tell me that i do not already
get from a. and b. ? E.g.: if i fail some background connection, then the impact
depends on how big that connection would have been, but it doesn't seem as if
i get new information if a big NetFlix background flow got killed and therefore
5 Gigabyte less background traffic where observed, or if the same happened to
a 200KByte amazon shopping connection. It would just cause DUT to maybe do
less inspection on big flows in fear of triggering false resets on them ?? Is
that what we want from DUTs ?

2545 d. False positive MUST NOT occur in the background traffic.

[comment] I do not understand d. When a background transaction from a. fails,
how is that different from false-positively being classified as a CVE - it would
be droppen then, right ? Or are you saying that a./b. is the case that the
background traffic receives errors from the DUT even though the DUT does NOT
recognize it as a CVE ? Any example reason why that would happen ?

2547 A.5. Measurement

2549 Following KPI metrics MUST be reported for this test scenario:

2551 Mandatory KPIs:

2553 * Blocked CVEs: It SHOULD be represented in the following ways:

2555 - Number of blocked CVEs out of total CVEs

2557 - Percentage of blocked CVEs

2559 * Unblocked CVEs: It SHOULD be represented in the following ways:

2561 - Number of unblocked CVEs out of total CVEs

2563 - Percentage of unblocked CVEs

2565 * Background traffic behavior: It SHOULD be represented one of the
2566 followings ways:

2568 - No impact: Considered as "no impact'" if any deviation of
2569 traffic forwarding rate is less than or equal to 5 % (constant
2570 rate)

2572 - Minor impact: Considered as "minor impact" if any deviation of
2573 traffic forwarding rate is greater than 5% and less than or
2574 equal to10% (i.e. small spikes)

2576 - Heavily impacted: Considered as "Heavily impacted" if any
2577 deviation of traffic forwarding rate is greater than 10% (i.e.
2578 large spikes) or reduced the background HTTP(S) throughput
2579 greater than 10%

[minor] I would prefer reporting the a./b. numbers, e.g.: percentage of
failed background connections. As mentioned before, i find the total background
traffic rate impact a rather problematic/less valuable metric.

2581 * DUT/SUT reporting accuracy: DUT/SUT MUST report all detected
2582 vulnerabilities.

2584 Optional KPIs:

2586 * List of unblocked CVEs

[minor] I think this KPI is a SHOULD or even MUST. Otherwise one can not trace
security impacts (when one does not know which CVE it is). This is still the
security effectiveness appendix, and reporting is not effective without this.

2588 A.6. Test Procedures and Expected Results

2590 The test procedure is designed to measure the security effectiveness
2591 of the DUT/SUT at the sustaining period of the traffic load profile.
2592 The test procedure consists of two major steps. This test procedure
2593 MAY be repeated multiple times with different IPv4 and IPv6 traffic
2594 distribution.

2596 A.6.1. Step 1: Background Traffic

2598 Generate background traffic at the transmission rate defined in
2599 Appendix A.3.2.

2601 The DUT/SUT MUST reach the target objective (HTTP(S) throughput) in
2602 sustain phase. The measured KPIs during the sustain phase MUST meet
2603 all the test results validation criteria defined in Appendix A.4.

2605 If the KPI metrics do not meet the acceptance criteria, the test
2606 procedure MUST NOT be continued to "Step 2".

2608 A.6.2. Step 2: CVE Emulation

2610 While generating background traffic (in sustain phase), send the CVE
2611 traffic as defined in the parameter section.

2613 The test equipment SHOULD start to measure and record all specified
2614 KPIs. Continue the test until all CVEs are sent.

2616 The measured KPIs MUST meet all the test results validation criteria
2617 defined in Appendix A.4.

2619 In addition, the DUT/SUT SHOULD report the vulnerabilities correctly.

2621 Appendix B. DUT/SUT Classification

2623 This document aims to classify the DUT/SUT in four different
2624 categories based on its maximum supported firewall throughput
2625 performance number defined in the vendor datasheet. This
2626 classification MAY help user to determine specific configuration
2627 scale (e.g., number of ACL entries), traffic profiles, and attack
2628 traffic profiles, scaling those proportionally to DUT/SUT sizing
2629 category.

2631 The four different categories are Extra Small (XS), Small (S), Medium
2632 (M), and Large (L). The RECOMMENDED throughput values for the
2633 following categories are:

2635 Extra Small (XS) - Supported throughput less than or equal to1Gbit/s

2637 Small (S) - Supported throughput greater than 1Gbit/s and less than
2638 or equal to 5Gbit/s

2640 Medium (M) - Supported throughput greater than 5Gbit/s and less than
2641 or equal to10Gbit/s

2643 Large (L) - Supported throughput greater than 10Gbit/s

2645 Authors' Addresses

2647 Balamuhunthan Balarajah
2648 Berlin
2649 Germany

2651 Email: bm.balarajah@gmail.com
2652 Carsten Rossenhoevel
2653 EANTC AG
2654 Salzufer 14
2655 10587 Berlin
2656 Germany

2658 Email: cross@eantc.de

2660 Brian Monkman
2661 NetSecOPEN
2662 417 Independence Court
2663 Mechanicsburg, PA 17050
2664 United States of America

2666 Email: bmonkman@netsecopen.org

EOF

[Iot-directorate] [iotdir] telechat Review for dr… tte
Re: [Iot-directorate] [iotdir] telechat Review fo… Carsten Rossenhoevel
Re: [Iot-directorate] [iotdir] telechat Review fo… Toerless Eckert
Re: [Iot-directorate] [iotdir] telechat Review fo… Carsten Rossenhoevel
Re: [Iot-directorate] [iotdir] telechat Review fo… Toerless Eckert
Re: [Iot-directorate] [iotdir] telechat Review fo… Carsten Rossenhoevel