Re: [bmwg] [iotdir] telechat Review for draft-ietf-bmwg-ngfw-performance-13

Toerless Eckert <tte@cs.fau.de> Mon, 31 January 2022 09:41 UTC

Date: Mon, 31 Jan 2022 10:40:47 +0100
From: Toerless Eckert <tte@cs.fau.de>
To: Carsten Rossenhoevel <cross@eantc.de>
Cc: iot-directorate@ietf.org, evyncke@cisco.com, draft-ietf-bmwg-ngfw-performance.all@ietf.org, mariainesrobles@googlemail.com, bmwg@ietf.org
Message-ID: <Yfeun+FpkbxbYTqD@faui48e.informatik.uni-erlangen.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <795a6445-1b7b-1410-528a-5103811b529d@eantc.de>
Archived-At: <https://mailarchive.ietf.org/arch/msg/bmwg/SWSFrTURRhZR-9YSvJ-NbwfbB3Y>
Subject: Re: [bmwg] [iotdir] telechat Review for draft-ietf-bmwg-ngfw-performance-13
Precedence: list

On Mon, Jan 31, 2022 at 08:20:00AM +0100, Carsten Rossenhoevel wrote:
>  * NGFW and UTM are fixed terms in the industry. 

The terminology is fine. I just felt it would be fairly easy to somewhat
rewrite the intro such that it correctly introduces those terms and how
they relate to each other instead of expecting them to be known. Wrt to
your argument that everybodt knows these terms: Just mentor a few young folks
coming to IETF primarily to learn, and then revisit whether your statement
holds true to the target readers you want to address as opposed to the
expert you work with ;-)

>  * We have had extensive discussions of goodput, throughput, and other
>    terms describing how many packets of which type make it from source
>    to destination.

A summary of what was changed in scope and terminology from RFC3511.
Such a change summary section is also quite common (and i'd say required)
for RFCs that are meant to obsolete another draft. Just write there
that goodput has fallen out of fashion or the like.

Technically, my main concern is just badly working TCP stacks in the
NGFW that create unnecessary much retransmission in the face of bad
paths to either client or server, but let me check if/what standard
TCP measurements are recommended for that case and get back to you.

>  * Adding a terminology section would not have increased readability
>    for the target audience.
>  * This document cannot and does not aim to introduce the whole world
>    of network security terms.
>  * This is just a  benchmarking methodology document.  We have to assume
>    that readers are network security professionals understanding the basics
>    of today's network security functions.

Is there any other IETF RFC that introduces/explain the network security
device terminology/architecture ? If not RFC then any other reference
you could add ?

>    I am not sure how to respond to some of your questions (e.g.
>    about making SSL inspection mandatory

I did not ask for that. I just wondered about the text "MUST explain the implication".

>    , or recommending to implement Web Filtering across the
>    industry [row 294]).

Row 294 comments where not about web filtering.
I was asking for row294 whether my understanding of your term
"Certificate Validation" is correct, because the way i understand it,
it could never be optional for a security device: The NGFW must 
verify the server certificate if it is putting itslf in the middle of
a TLS session client/server.

>    DDoS is not a firewall feature (it is an attack type), so it is not
>    listed as a security feature in table 3.

DDoS is listed in Table 1 which is titled "NGFW Security Features".
Maybe you want to rename that table 1 line enty to something like "DDoS protection".
But in any case there seem to be some NGFW feature related to DDoS,
and i jut wonder if/how that feature is reflected in table 3.

>    DPI is an obsolete term.

Why then is it listed in Table 2 ? Or do you consider the written out term
"Deep Packet Inspection" (as in Table 2) to be something different than DPI ?
oI was just doing consistency checking. Table 1 and Table 2 claim to be
NGFW/NGIPS Security Features, and Table 3 claims to be "Security Feature Description",
so i was just trying to check completeness of Table 3.

>    Any NGFW does some type of deep-packet
>    inspection.  More detailed terms in table 3 describe the different
>    aspects of packet inspections.

It would be easier to read if the terms used across Tables 1,2,3 where consistent.
Maybe just put only those terms from Table 3 into Table 3 that apply to Table 2 ?

>  *   The time of ACL-based routers acting as firewalls is over.
>    Managing thousands of ACL entries is difficult and often bears a
>    security risk in itself.  It is not a typical application scenario
>    today.  Security is achieved by advanced features such as IDS and
>    malware detection.

That would be a good simple explanation for the rather low tested ACL value
that should be somewhere into the document, because it too is a part of the
evolution from RFC3511.

As an example counterpoint: There are new IETF protocols, such as MUD (RFC8520), that
ultimately result in automation of ACL building, for example for each IoT
device on the client side (lets say large enterprise) an ACL with a
few entries (defining the permitted traffic flows between this IoT device
and the Internet). So if the enterprise has maybe 4000 IoT devices, it could easily
be a (SrcIP,DstIP,Proto,*,DstPort) ACL list of maybe 15000 entries.

If such novel IETF work can not be put into a NGFW because that new firewall
will not support such long ACLs, that is your NGFW perogative, except that i would
feel a lot safer if that was written somewhere. Such as in the list of
excemptions not covered by this document.

I have seen similar automation of security through combinations of controllers
and firewalls in other areas as well, such as Telephony. 

>  * "security and measurement traffic" in row 411 can be changed if you
>    feel that the use of these descriptive words requires a definition.
>       If you like, we can remove "security and measurement", keeping
>    "traffic".

It seems to me that there are two type of traffic. One is the traffic passing
through the DUT to measure it, the other one is all the other collateral
traffic to make the test setup work. Those two type of traffic just need
to terms thart are defined. Unfortunately, i am not aware if/what any
two industry agreed upon terms are for these two type of traffic.

> *Obsoleting RFC3511*: RFC3511 shall indeed be obsoleted; we have had
> extensive discussions about it e.g. in IETF110
> <https://datatracker.ietf.org/meeting/110/materials/minutes-110-bmwg-01.pdf>
> and on the BMWG mailing list. Rationales:  1) Allowing the use of more than
> 18-year old benchmarking methodology for the same group of network security
> solutions in parallel to the new one would really confuse the market and not
> be good SDO workmanship.  2) All relevant benchmarks in RFC3511 are
> technically substituted and improved by the new draft; only a few L4
> (TCP/UDP) test cases have deliberately been obsoleted.  These test cases do
> not make any sense for today's NGFWs.  Even old firewalls (pre-NGFW) can be
> tested much more accurate with the new methodology.

I wasn't challenging the goal, i was just wondering if the document in
its current version has all the bells and whistles normally expected
for a document doing the obsoletion. Such as that "diff-from-obsoleted-rfc"
section and explanation such as the ones in your paragraph above.

> *Energy efficiency measurements* are of paramount importance - I agree. 
> Large-scale groups in ATIS and other SDOs are working on standardizing
> energy efficiency measurements.  I invite contributors to create a new draft
> for NGFW energy efficiency benchmarks. Unfortunately, attaching a power
> meter is not sufficient and is not a key performance indicator (KPI) by
> itself.

But it does become  very useful KPI for reders of reports as soon as 
they compare different products with for their purpose appropriate 
security/performance features set for their particular use-case. 

I actually ran through similar vetting with routers and customers. They
effectively where looking for Capex (equipment cost, space cost),
Opex (power consumption), vs. performance for a particular feature set.

> A firewall with fewer security functions is not better because it
> takes less power per Gigabit than a strong firewall. Balancing these KPIs
> with power usage is a difficult and sensitive task that requires a lot of
> compromises and industry consensus.

I am trying to parse what you say but have difficulties. Are you
afraid that people in the industry would fear that captiuring the power
consumption number during the steady state run of their devices would
aggravate people in the industry because they fear the power consumption
numbers would make their devices look bad and therefore you would prefer
for BMWG to not ask for taking such power measurement numbers ?

IMHO, A simple power consumption measurement taken the way i suggest to
(for the 10% and max-performance measurement points) is IMHO a very 
useful start, because it can help to motivate vendors within their
existing designs to optimize. For example CPU based NGFW would have more
of an incentive to go beyond simple PMD (Poll Mode Driver) for their
CPU forwarding plane (which is the worst CPU burner), and enabling
 of low-power mode dynamically anywere where the HW supports it when
only low performance is required. A lot of firewalls at the edge of the internet have
a lot of time where only low-performance is needed, but they need to
be bought for peak performance (such as most enterprise offices outside office
hours).

> I sincerely hope that the IETF
> recognizes the unparalleled importance of energy-efficient standardization
> and provides guidance to all areas and WGs; then the BMWG and NGFW
> benchmarks could well be expanded in future work.

I don't think we are setting a good example by simply pushing off the
task instead of taking care of low hanging fruits when they offer themselves,
like i think they do in this case.

> The *test setups* (Figure 2) are to be used as specified:
> 
>  * The goal of this benchmarking draft is to enable reproducible
>    results in controlled, minimum lab setups.  Of course we could
>    complicate it with mixing trusted and untrusted zones, adding more
>    security zones, etc.  Within the four years of development, the
>    group concluded that adding such complexity would not improve the
>    reproducibility and readibility of results.  "actual deployments"
>    and "typical deployments" relate to the "parameters and security
>    features" (line 258) - not to the network topology setup. 
>    Reproducibility is gained by adequate documentation of the
>    parameters and security features (see section 6 of the draft as
>    well) - not by nailing down specific configurations.  The NGFWs are
>    too diverse to aim for such a goal.

Agreed. But i just came to that realization after going through the whole
document and also not bing able to come up with a lot broader testing without
a lot more complexity. 

I think a sentence like the following would be good explanation for the
text: "The DUT test topology and test traffic flows do not aim
to exercise the variety of real-world traffic flow and security zones
often attached to an NGIPS/NGFW, but to represent the most simple topology
in which the performance of traffic flows through the DUT can be measured.".

>  * Of course, measurements with different sets of parameters will yield
>    different results (your comment on line 380).  Detailed reporting
>    (section 6) will allow readers to interpret results correctly.  One
>    potential use of this draft is to establish an external
>    certification program ("NetSecOPEN" initiative).  For such a
>    program, parameter sets need to be defined in more detail.  But the
>    consensus among authors and the WG was that the draft shall not be
>    limited to very specific certification setups.

Thats fine. Its again that the text is terse and lets one guess at what
you so much better explain above.

>  * "maximum security coverage" is a blanket clause, indicating that
>    should focus configurations on best security not only on achieving
>    maximum performance.  This is a typical conflict of goals in network
>    security benchmark testing, specifically if vendors carry out tests.

Seems to me as if you could simply delete "in order to achieve maximum network security coverage",
and the remaining text is still perfectly fine, and you avoid having readers wonder
about an undefined blanket clause.

>  * The technique of aggregating lower-speed interfaces from test
>    equipment for a higher-speed DUT interface is considered common lab
>    knowledge and thus not explained in this draft.
> 
>  * In line 255, the word "unique" could be misunderstood indeed. Maybe
>    the word "single" would explain it better?

single common ?
I am not a native english speaker, so please apply your best knowledge
to resolve those language nits if you agree with them ;-)

> *DUT classifications* into XS, S, M, and L were made in the main document to
> ensure this classification bears some weight.  It is important for
> apples-to-apples comparisons of benchmarks. While the requirements for
> number of rules per DUT classification is expected to be stable, the actual
> device scaling will change faster due to innovations.  This is why the DUT
> types are specified in Appendix B.  If you feel differently, please suggest
> text.

It just seems to me that the numbers in Figure 3 are completely dependent on the
XS/S/M/L classifications. If you'd make all XS/S/M/L devices in the future
10x faster, then you would need to equally change the numbers in Figure 3.
Tht makes the attempt of putting some numbers into an appendix, but not its
dependent numbers somewhat feeble.

Aka: I wouldn't t bother with appendix B. Just inline the text, makes the document
also easier to read. Or else you may also want to move out Figure 3 into
appendix B. Those seem to be the two consistent options to me.

And yes, sorry, this is also just structural text nitpicking, nothing
substantial, but hopefully adds to text quality.

> *Test Case descriptions*.
> 
>  * Your comment "Section 7 is a lot of work to get right" is
>    interesting. Procedural replication is intentional in test plans, to
>    make sure that each test case is complete.  Readers do not typically
>    appreciate complex referrals and footnotes when executing test cases
>    (speaking from a few years of experience). Being very descriptive in
>    test case descriptions improves the quality and reproducibility of
>    test execution.

Yes. Indeed. This was mostly from the concerns of a poor reviewer trying to
compare details across tests. Carry on.

(in real test plans i have often seen also big tables across test runs
 making comparisons easy when having large printers.... oh well ;-)

>  * Test case scale goals are already aligned with promised vs. measured
>    performance., as per the text in rows 1046/1047
>  * Row 1070 relates to foreground measurement traffic, appendix A.4 to
>    background traffic failure rates

Ah, ok. Thanks. But just because from the security perspective it's called
background traffic does not mean to me that 0.01% failure rate is appropriate.
That traffic would be potentially all business critical traffic and a good
amount of it might not be using resilient application code that recovers
from failures well.

E.g.: how did you folks come up with the 0.01% ?

Cheers
    Toerless

> Best regards, Carsten

[bmwg] [iotdir] telechat Review for draft-ietf-bm… tte
Re: [bmwg] [iotdir] telechat Review for draft-iet… Toerless Eckert
Re: [bmwg] [iotdir] telechat Review for draft-iet… Carsten Rossenhoevel
Re: [bmwg] [iotdir] telechat Review for draft-iet… Carsten Rossenhoevel
Re: [bmwg] [iotdir] telechat Review for draft-iet… Toerless Eckert
Re: [bmwg] [iotdir] telechat Review for draft-iet… Carsten Rossenhoevel