Re: [bmwg] WGLC on New version of draft-ietf-bmwg-ngfw-performance

Sarah Banks <sbanks@encrypted.net> Mon, 12 July 2021 21:19 UTC

Return-Path: <sbanks@encrypted.net>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B8A103A08F9 for <bmwg@ietfa.amsl.com>; Mon, 12 Jul 2021 14:19:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_NONE=0.001, WEIRD_QUOTING=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4MCfii9OMPfw for <bmwg@ietfa.amsl.com>; Mon, 12 Jul 2021 14:19:44 -0700 (PDT)
Received: from xyz.hosed.xyz (xyz.hosed.xyz [71.114.67.91]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D19503A091C for <bmwg@ietf.org>; Mon, 12 Jul 2021 14:19:43 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by xyz.hosed.xyz (Postfix) with ESMTP id 2B73013C1D75; Mon, 12 Jul 2021 17:19:42 -0400 (EDT)
X-Virus-Scanned: Debian amavisd-new at xyz.hosed.xyz
Received: from xyz.hosed.xyz ([127.0.0.1]) by localhost (xyz.hosed.xyz [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YRrZl6Cq0UOe; Mon, 12 Jul 2021 17:19:42 -0400 (EDT)
Received: from smtpclient.apple (c-73-71-250-98.hsd1.ca.comcast.net [73.71.250.98]) by xyz.hosed.xyz (Postfix) with ESMTPSA id 7EB7613C0607; Mon, 12 Jul 2021 17:19:41 -0400 (EDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\))
From: Sarah Banks <sbanks@encrypted.net>
In-Reply-To: <021f01d77356$7e19a2f0$7a4ce8d0$@netsecopen.org>
Date: Mon, 12 Jul 2021 14:19:39 -0700
Cc: bmwg@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <D1ED6898-D8C3-4C56-A3D3-221DD16B7300@encrypted.net>
References: <413e779fd7eb4dd4b3aa8473c171e282@att.com> <f1a2b5c5-ebf2-12ab-b053-b9b2538342ad@hit.bme.hu> <047501d745bb$e22f4ab0$a68de010$@netsecopen.org> <7dc6b282-7f41-bf7c-f09c-65e7ce94b674@hit.bme.hu> <048801d745be$31424b50$93c6e1f0$@netsecopen.org> <84196d5ce7474f9196ab000be64c49fd@att.com> <02629ACE-FDA4-4ACF-9459-825521596B83@encrypted.net> <001201d75266$05979140$10c6b3c0$@netsecopen.org> <059e01d75f7d$a62a4de0$f27ee9a0$@netsecopen.org> <009b01d76c46$8063e5a0$812bb0e0$@netsecopen.org> <770F93CB-A8CC-4420-8C1B-CB7B7A2289FB@encrypted.net> <021f01d77356$7e19a2f0$7a4ce8d0$@netsecopen.org>
To: bmonkman@netsecopen.org
X-Mailer: Apple Mail (2.3654.100.0.2.22)
Archived-At: <https://mailarchive.ietf.org/arch/msg/bmwg/JJE2kZMKkmJs-F_5j47kflnpX_Y>
Subject: Re: [bmwg] WGLC on New version of draft-ietf-bmwg-ngfw-performance
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jul 2021 21:19:54 -0000

Hi Brian et al,
    First, my apologies for the delay, and I very much appreciate your patience. I also appreciate the time and effort that went into the reply to my comments, which can be even more difficult to do as a large group :) Please see inline.

Thank you,
Sarah (as a participant)

>> 
>> Hi Sarah,
>> 
>> As I mentioned in the previous message, we will remove reference to 
>> IDS from the draft. Given that, none of the IDS related 
>> comments/questions are being addressed.

SB// Makes sense, thank you.

>> - The draft aims to replace RFC3511, but expands scope past Firewalls, 
>> to "next generation security devices". I'm not finding a definition of 
>> what a "next generation security device is", nor an exhaustive list of 
>> the devices covered in this draft. A list that includes is nice, but 
>> IMO not enough to cover what would be benchmarked here - I'd prefer to 
>> see a definition and an exhaustive list.
>> 
>> [bpm] "We avoid limiting the draft by explicitly adding a list of NG 
>> security devices currently available in the market only. In the 
>> future, there may be more and more new types of NG security devices 
>> that will appear on the market.

SB// I think there are 2 types of devices called out; I'm not seeing a definition of what a "NG security device" is, and I'm not comfortable with a draft that has a blanket to encompass what would come later. Who knows what new characteristics would arrive with that new device? I think the scope here is best suited for the devices we know about today and can point to and say we're applying knowledgeable benchmarking tests against.

>> 
>> [bpm] This draft includes a list of security features that the 
>> security device can have ( RFC 3511 doesn't have such a list). Also, 
>> we will describe in the draft that the security devices must be configured ""in-line"" mode.
>> We believe these two points qualifying the definition of next 
>> generation security.
>> 

SB// I strongly disagree. Well, I mean OK, for active inline devices maybe this is OK, but to say that the only way a device can be "NG" is to be active/inline, I disagree with. And if there is, have we gathered all of the items we'd want to actively test for in that case? For example, what about their abilities to handle traffic when a failure occurs? (fail open/closed). What about alerts and detections and the whole federation of tests around positives/false positives/false negatives, etc? I'm onboard with expanding the scope, but then we have to do the devices benchmarking justice, and I feel we're missing a lot here.

>> - What is a NGIPS or NGIDS? If there are standardized definitions 
>> pointing to them is fine, otherwise, there's a lot of wiggle room here.
>> 
>> [bpm] See above. We are removing NGIDS from the draft.

SB// Understood, thank you.

>> 
>> - I still have the concern I shared at the last IETF meeting, where 
>> here, we're putting active inline security devices in the same 
>> category as passive devices. On one hand, I'm not sure I'd lump these 
>> three together in the first place; on the other, active inline devices 
>> typically include additional functions to allow administrators to 
>> control what happens to packets in the case of failure, and I don't 
>> see those test cases included here.
>> 
>> [bpm] This draft focuses on ""in-line"" mode security devices only. We 
>> will describe this in section 4 in more detail.

SB// Understood, thank you.

>> 
>> [bpm] Additionally, the draft focuses mainly on performance tests. The 
>> DUT must be configured in ""fail close"" mode. We will describe this 
>> under section 4. Any failure scenarios like ""fail open"" mode is out of scope.

SB// OK, but I think an RFC that is going to encompass this device under the "NG security devices" classification is missing out on large portions of what customers will want to test. It'll also beg for another draft to cover them, and then I'm not sure we're serving the industry as well as we could. 

>> 
>> - Section 4.1 - it reads as if ANY device in the test setup cannot 
>> contribute to network latency or throughput issues, including the DUTs 
>> - is that what you intended?
>> 
>> [bpm] "Our intention is, if the external devices (routers and 
>> switches) are used in the test bed, they should not negatively impact DUT/SUT performance.
>> To address this, we added a section ( section 5 ""Test Bed 
>> Considerations"") which recommends a pre-test.  We can rename this as 
>> reference test or baseline test. "
>> 

SB// Thank you for the clarification. I think there's still a concern there. Who defines what "negative impact" is? You're traversing at least another L2 or L3 step in the network with each bump, which contributes some amount of latency. If they don't serve in control plane decisions and are passively passing data on, then we could consider removing them from the setup and removing the potential skew on results.


>> - Option 1: It'd be nice to see a specific, clean, recommended test bed.
>> There are options for multiple emulated routers. As a tester, I expect 
>> to see a specific, proscribed test bed that I should configure and 
>> test against.
>> 
>> [bpm] The draft describes that Option 1 is the recommended test setup.
>> However. We added emulated routers as optional in option 1. The reason 
>> for
>> that:
>> Some type of security devices for some deployment scenarios requires 
>> routers between test client/server and the DUT (e.g., NGFW) and some 
>> DUT/SUT doesn't need router (e.g. NGIPS )
>> 
>> - Follow on: I'm curious as to the choice of emulated routers here. 
>> The previous test suggests you avoid routers and switches in the topo, 
>> but then there are emulated ones here. I'm curious as to what 
>> advantages you think these bring over the real deal, and why they 
>> aren't subject to the same limitations previously described?
>> 
>> [bpm] Comparing real router, the emulated router gives more advantages 
>> for
>> L7 testing.
>> 
>> [bpm] - Emulated router doesn't add latency. Even if it adds delay due 
>> to the routing process, the test equipment can report the added 
>> latency, or it can consider this for the latency measurement.
>> 
>> [bpm] - Emulated routers simply do routing function only. But in a "real"
>> router, we are not sure what else the router is doing with the packets.

SB// Maybe I'm missing something here - a device can't perform a function for free, right? Even if it's impact is negligible, it's an impact of some sort. We're saying the emulated router is doing the routing - OK - but I think the same thing applies to the physical router - how do you know what else the emulated router is doing? if the test gear can call out the latency, I'd like to see clarification around how it's doing that and distinguishing the latency introduced by Device A, versus Device B, versus the DUT, etc.


>> 
>> [bpm] Your question regarding the need for routers:
>> 
>> [bpm] - We avoid impacting the DUT/SUT performance due to ARP or ND 
>> process
>> 
>> [bpm] - Represent realistic scenario (In the production environment 
>> the security devices will not be directly connected with the clients.)
>> 
>> [bpm] - Routing (L3 mode) is commonly used in the NG security devices.
>> 
>> [bpm] However, in both figures we mentioned that router including 
>> emulated router is optional. If there is no need have routing 
>> functionality on the test bed (e.g., if we used very small number of 
>> clients and server IPs or the DUT operates in Layer 2 mode), it can be ignored.
>> 
>> [bpm] Also, we described in Option 1, that the external devices are if 
>> there is need to aggregate the interfaces of the tester or DUT.  For 
>> an example, DUT has 2 Interfaces, but tester need to use it's 4 
>> interfaces to achieve the performance. So here we need switch/router 
>> to aggregate tester interface from 4 to 2.
>> 
>> - In section 4.1 the text calls out Option 1 as the preferred test 
>> bed, which includes L3 routing, but it's not clear why that's needed?
>> 
>> [bpm] See above.
>> 
>> - The difference between Option 1 and Option 2 is the inclusion of 
>> additional physical gear in Option 2 - it's not clear why that's 
>> needed, or why the tester can't simply directly connect the test 
>> equipment to the DUT and remove extraneous devices from potential influence on results?
>> 
>> [bpm] See above.
>> 
>> - Section 4.2, the table for NGFW features - I'm not sure what the 
>> difference is between RECOMMENDED and OPTIONAL? (I realize that you 
>> might be saying that RECOMMENDED is the "must have enabled" features, 
>> where as optional is at your discretion, but would suggest that you 
>> make that clear)
>> 
>> [bpm] The definition for OPTIONAL and RECOMMENDED is described in,  
>> and recommended, RFC2119. We already referenced this under the section 
>> 2 "Requirements".
>> 

SB// Thanks!

>> - Proscribing a list of features that have to be enabled for the test, 
>> or at least more than 1, feels like a strange choice here - I'd have 
>> expected tests cases that either test the specific features one at a 
>> time, or suggest several combinations, but that ultimately, we'd tell 
>> the tester to document WHICH features were enabled, to make the test 
>> cases repeatable? This allows the tester to apply a same set of apples 
>> to apples configurations to different vendor gear, and omit the 1 
>> feature that doesn't exist on a different NGFW (for example), but hold a baseline that could be tested.
>> 
>> - Table 2: With the assumption that NGIPS/IDS are required to have the 
>> features under "recommended", I disagree with this list. For example, 
>> some customers break and inspect at the tap/agg layer of the network - 
>> in this case, the feed into the NGIDS might be decrypted, and there's 
>> no need to enable SSL inspection, for example.
>> 
>> [bpm] IDS is being removed.

SB// OK...I'm not sure this addresses the feedback though :) A NGFW for sure will do break/inspect as well, right?

>> 
>> - Table 3: I disagree that an NGIDS IS REQUIRED to decrypt SSL. This 
>> behaviour might be suitable for an NGIPS, but the NGIDS is not a bump 
>> on the wire, and often isn't decrypting and re-encrypting the traffic.
>> 
>> [bpm] IDS is being removed.

SB// See comment above.

>> 
>> - Table 3: An NGIDS IMO is still a passive device - it wouldn't be 
>> blocking anything, but agree that it might tell you that it happened after the fact.
>> 
>> [bpm] IDS is being removed.
SB// Thanks!

>> 
>> - Table 3: Anti-evasion definition - define "mitigates". 
>> 
>> [bpm] Not sure why you are asking this as mitigate is not an uncommon 
>> term/word.
>> 
>> - Table 3: Web-filtering - not a function of an NGIDS.
>> 
>> [bpm] IDS is being removed.
>> 
>> - Table 3: DLP: Not applicable for an NGIDS.
>> 
>> [bpm] IDS is being removed.
>> 
>> - Can you expand on "disposition of all flows of traffic are logged" - 
>> what's meant here specifically, and why do they have to be logged? 
>> (Logging, particularly under high loads, will impact it's own 
>> performance marks, and colours output)
>> 
>> [bpm] We intentionally recommended enabling logging which will impact 
>> the performance. The draft is not aiming to get high performance 
>> number with minimal DUT/SUT configuration. In contrast, it aims to get 
>> reasonable performance number with realistic DUT configuration. The 
>> realistic configuration can vary based on DUT/SUT deployment scenario.
>> 
>> [bpm] In most of the DUT/SUT deployment scenarios or customer 
>> environments, logging is enabled as default configuration.
>> 
>> [bpm] "Disposition of all flows of traffic are logged": means that the 
>> DUT/SUT need to log all the traffic at the flow level not each packet.
>> 
>> [bpm] We will add more clarification for the meaning of "disposition 
>> of all flows of traffic are logged".
>> 

SB// Thanks!

>> - ACLs wouldn't apply to an IDS because IDS's aren't blocking traffic 
>> :)
>> 
>> [bpm] IDS is being removed.
>> 
>> - It might be helpful to testers to say something like "look, here's 
>> one suggested set of ACLs. If you're using them, great, reference 
>> that, but otherwise, make note of the ACLs you use, and use the same 
>> ones for repeatable testing".
>> 
>> [bpm] The draft gives guidance how to choose the ACL rules. We 
>> describe here a methodology to create ACL.
>> 
>> - 4.3.1.1 The doc proscribes specific MSS values for v4/v6 with no 
>> discussion around why they're chosen - that color could be useful to 
>> the reader.
>> 
>> [bpm] We will add some more clarification that these are the default 
>> number used in most of the client operating systems currently.
>> 

SB// Thanks!

>> - 4.3.1.1 - there's a period on the 3rd to last line "(SYN/ACL, ACK). and"
>> that should be changed.
>> 
>> [bpm] Thank you.
>> 
>> - 4.3.1.1 - As a tester with long time experience with major test 
>> equipment manufacturers, I can't possibly begin to guess which ones of 
>> them would conform to this - or even if they'd answer these questions. 
>> How helpful is this section to the non test houses? I suggest 
>> expansion here, ideally with either covering the scope of what you 
>> expect to cover, or hopefully which (open source/generally available) 
>> test tools or emulators could be considered for use as examples.
>> 
>> [bpm] We extensively discussed with Ixia and Spirent about this section.
>> This section was developed with significant input from these test 
>> tools vendors in addition to others.

SB// OK, that's really good to know, but there are plenty of us working with and looking for more cost effective options to Ixia and Spirent. :) I think the expansion would be good here.

>> 
>> - 4.3.1.3 - Do the emulated web browser attributes really apply to 
>> testing the NGIPS?
>> 
>> [bpm] Yes, we performed many PoC tests with test tools. Ixia and 
>> Spirent confirmed this.
>> 
>> - 4.3.2.3 - Do you expect to also leverage TLS 1.3 as a configuration 
>> option here?
>> 
>> [bpm] Yes
>> 
>> - 4.3.4 - I'm surprised to see the requirement that all sessions 
>> establish a distinct phase before moving on to the next. You might 
>> clarify why this is a requirement, and why staggering them is specifically rejected?
>> 
>> [bpm] This draft doesn't describe that all sessions establish a 
>> distinct phase before moving on to the next. We will remove the word 
>> "distinct" from the 1st paragraph in section 4.3.4.

SB// Thanks!

>> 
>> [bpm] Unlike Layer 2/3 testing, Layer 7 testing requires several 
>> phases in the traffic load profile. The traffic load profile described 
>> in the draft is the common profile mostly used for Layer 7 testing.
>> 
>> - 5.1 - I like the sentence, but it leaves a world of possibilities 
>> open as to how one confirmed that the ancillary switching, or routing 
>> functions didn't limit the performance, particularly the virtualized components?
>> 
>> [bpm] The sentence says, "Ensure that any ancillary switching or 
>> routing functions between the system under test and the test equipment 
>> do not limit the performance of the traffic generator."
>> 
>> [bpm] Here we discuss the traffic generator performance, and this can 
>> be confirmed by doing reference test.
>> 
>> [bpm] The section 5 recommends reference test to ensure that the 
>> maximum desired traffic generator's performance. Based on the 
>> reference test results it can be identified, if the external device 
>> added any impact on traffic generator's performance.
>> 
>> [bpm] We will add more content in section 5 to provide more details 
>> about reference test.
>> 

SB// Thanks!

>> - 5.3 - this is a nice assertion but again, how do I reasonably make 
>> the assertion?
>> 
>> [bpm] We will change the word from "Assertion" to "Ensure". Also, we 
>> will add more clarity about reference testing.


SB// Thanks!

>> 
>> - 6.1 - I would suggest that the test report include the configuration 
>> of ancillary devices on both client/server side as well
>> 
>> [bpm] We believe that adding configuration of the ancillary devices 
>> doesn't add more value in the report. Instead of this, we will 
>> recommend documenting the configuration of the ancillary devices by 
>> doing reference test. We will add this under the section 5 "Test bed consideration".
>> 

SB// I think including them assists greatly in the repeatability of the testing, for what it's worth.

>> - 6.3 - Nothing on drops anywhere?
>> 
>> [bpm] Are you referring to packet drops? If you are, there is no 
>> packet loss in stateful traffic. Instead of packet loss, the stateful 
>> traffic has retransmissions.
>> 
>> - 7.1.3.2 - Where are these numbers coming from? How are you 
>> determining the "initial inspected throughput"? Maybe I missed that in 
>> the document overall, but it's not clear to me where these KPIs are 
>> collected? I suggest this be called out.
>> 
>> [bpm] We will add more clarification in the next version. Thank you.

SB// Thanks!

>> 
>> - 7.1.3.3 - what is a "relevant application traffic mix" profile?
>> 
>> [bpm] This is described in section7.1.1  (2nd paragraph). We will add 
>> the word "relevant" in the 1st sentence of the 2nd pragraph.so the 
>> sentence will be "Based on customer use case, users can choose the 
>> relevant application traffic mix for this test.  The details about the 
>> traffic mix MUST be documented in the report.  At least the following 
>> traffic mix details MUST be documented and reported together with the test results:

SB// A set of example(s) could be helpful. Not required, just helpful.

>> 
>> - 7.1.3.4 - where does this monitoring occur?
>> 
>> [bpm] The monitoring or measurement must occur in the test equipment.
>> Section 4.3.4 describes this.
>> 
>> - 7.1.3.4 - This looks a bit like conformance testing -  Why does item 
>> (b) require a specific number/threshold?
>> 
>> [bpm] These numbers are synonymous with the zero-packet loss criteria 
>> for [RFC2544] Throughput and recognize the additional complexity of 
>> application layer performance. This was agreed by the IETF BMWG.
>> 
>> - 9: Why is the cipher squite recommendation for a real deployment 
>> outside the scope of this document?
>> 
>> [bpm] Because new cipher suites are frequently developed. Given that 
>> the draft will not be easily updated once it is accepted as an RFC we 
>> wanted to ensure there was flexibility to use future developed cipher suites.
>> 
>> Brian Monkman on behalf of....
>> 
>> Alex Samonte (Fortinet), Amritam Putatunda (Ixia/Keysight), Bala 
>> Balarajah (NetSecOPEN), Carsten Rossenhoevel (EANTC), Chris Brown 
>> (UNH-IOL), Mike Jack (Spirent), Ryan Liles (Cisco), Tim Carlin 
>> (UNH-IOL), Tim Otto (Juniper)
>>