Re: [bmwg] first WGLC on draft-ietf-bmwg-traffic
"MORTON, ALFRED C (AL)" <acm@research.att.com> Tue, 07 October 2014 20:58 UTC
Return-Path: <acm@research.att.com>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 974701A8872 for <bmwg@ietfa.amsl.com>; Tue, 7 Oct 2014 13:58:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.988
X-Spam-Level:
X-Spam-Status: No, score=-4.988 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.786, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pKtm-cvBvaLo for <bmwg@ietfa.amsl.com>; Tue, 7 Oct 2014 13:58:10 -0700 (PDT)
Received: from mail-pink.research.att.com (mail-pink.research.att.com [204.178.8.22]) by ietfa.amsl.com (Postfix) with ESMTP id 5803B1A8864 for <bmwg@ietf.org>; Tue, 7 Oct 2014 13:58:04 -0700 (PDT)
Received: from mail-green.research.att.com (H-135-207-255-15.research.att.com [135.207.255.15]) by mail-pink.research.att.com (Postfix) with ESMTP id C3F701216AA for <bmwg@ietf.org>; Tue, 7 Oct 2014 17:06:50 -0400 (EDT)
Received: from exchange.research.att.com (njfpsrvexg0.research.att.com [135.207.240.40]) by mail-green.research.att.com (Postfix) with ESMTP id 865E8E0248 for <bmwg@ietf.org>; Tue, 7 Oct 2014 16:55:47 -0400 (EDT)
Received: from NJFPSRVEXG0.research.att.com ([fe80::c5dd:2310:7197:58ea]) by NJFPSRVEXG0.research.att.com ([fe80::c5dd:2310:7197:58ea%17]) with mapi; Tue, 7 Oct 2014 16:58:03 -0400
From: "MORTON, ALFRED C (AL)" <acm@research.att.com>
To: "bmwg@ietf.org" <bmwg@ietf.org>
Date: Tue, 07 Oct 2014 16:56:34 -0400
Thread-Topic: first WGLC on draft-ietf-bmwg-traffic
Thread-Index: Ac/XLKjoeow850inTE6BX9hYGkXMcQLNkf6IAAOPOCQ=
Message-ID: <4AF73AA205019A4C8A1DDD32C034631D797C6129@NJFPSRVEXG0.research.att.com>
References: <E5827F45CE10424189851096E6072C351C4DE1AC25@NJFPSRVEXG0.research.att.com>, <4AF73AA205019A4C8A1DDD32C034631D797C6125@NJFPSRVEXG0.research.att.com>
In-Reply-To: <4AF73AA205019A4C8A1DDD32C034631D797C6125@NJFPSRVEXG0.research.att.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/bmwg/iP1jwyoVOQM3HMeDrbMZm30i_2o
Subject: Re: [bmwg] first WGLC on draft-ietf-bmwg-traffic
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Oct 2014 20:58:16 -0000
Authors, my suggestions follow, embedded in the text and prefaced with ACM: Al (as a participant) ACM: Global decide "remarking" or "re-marking", which ever is more correct or consistent with current literature. ACM: Global There are no step-by-step procedures, which are provided in almost all methodology work in BMWG. IMO, more specifics are needed. Section 6 should have several procedures, (individual, multi-port, combo), or core procedures with customization in the specific sections. ACM: Global Although "repeatability" is mentioned early on, there are no methods or metrics to evaluate repeatability of the results, and the topic is never mentioned beyond the introductory material and repeatable pattern generation (which is fine, but different). Repeatable *results* are the property most needed here. With so many stateful performance evaluations in play, this topic requires thorough treatment in the text. ... Constantine August 10, 2014 [Page 3] Internet-Draft Traffic Management Benchmarking August, 2014 1. Introduction Traffic management (i.e. policing, shaping, etc.) is an increasingly important component when implementing network Quality of Service (QoS). ACM: important component when implementing simple network features to improve Quality of Service (QoS). There is currently no framework to benchmark these features although some standards address specific areas. ACM: it would be good to mention the standards or RFCs your methods characterize here, or in the second paragraph. ACM: I see you did this later on, maybe say "(see section 1.1)" This draft provides a framework to conduct repeatable traffic management benchmarks for devices and systems in a lab environment. Specifically, this framework defines the methods to characterize the capacity of the following traffic management features in network devices; classification, policing, queuing / scheduling, and traffic shaping. This benchmarking framework can also be used as a test procedure to assist in the tuning of traffic management parameters before service activation. In addition to Layer 2/3 benchmarking, Layer 4 test patterns are proposed by this draft in order to more realistically benchmark end-user traffic. ACM: You actually have specific examples of L2, L3, and L4, right? it's less ambiguous to name them, rather than refer to layer numbers. 1.1. Traffic Management Overview In general, a device with traffic management capabilities performs the following functions: - Traffic classification: identifies traffic according to various configuration rules (i.e. VLAN, DSCP, etc.) and marks this traffic internally to the network device. Multiple external priorities (DSCP, 802.1p, etc.) can map to the same priority in the device. - Traffic policing: limits the rate of traffic that enters a network device according to the traffic classification. If the traffic exceeds the contracted limits, the traffic is either dropped or ACM: s/contracted/provisioned/ or /configured/ (no need to get the lawyers involved) remarked and sent onto to the next network device - Traffic Scheduling: provides traffic classification within the network device by directing packets to various types of queues and applies a dispatching algorithm to assign the forwarding sequence of packets - Traffic shaping: a traffic control technique that actively buffers and meters the output rate in an attempt to adapt bursty traffic ACM: s/meters/smooths/ to the configured limits - Active Queue Management (AQM): monitors the status of internal queues and actively drops (or re-marks) packets, which causes hosts using congestion-aware protocols to back-off and in turn can alleviate queue congestion. Note that AQM is outside of the scope of this testing framework. ACM: I'm interested to see how the AQM item is re-worded following feedback from Gorry/Dave/other AQM folks. Traffic management has the same effects on Congestion-Aware traffic when it drops or re-marks, but obviously the benchmarking scenarios are different (and maybe that's the additional point to make here, when saying that AQM is out of scope). Constantine August 10, 2014 [Page 4] Internet-Draft Traffic Management Benchmarking August, 2014 The following diagram is a generic model of the traffic management capabilities within a network device. It is not intended to represent all variations of manufacturer traffic management capabilities, but provide context to this test framework. |----------| |----------------| |--------------| |----------| | | | | | | | | |Interface | |Ingress Actions | |Egress Actions| |Interface | |Input | |(classification,| |(scheduling, | |Output | |Queues | | marking, | | shaping, | |Queues | | |-->| policing or |-->| active queue |-->| | | | | shaping) | | management | | | | | | | | re-marking) | | | |----------| |----------------| |--------------| |----------| Figure 1: Generic Traffic Management capabilities of a Network Device Ingress actions such as classification are defined in RFC 4689 and include IP addresses, port numbers, DSCP, etc. In terms of marking, RFC 2697 and RFC 2698 define a single rate and dual rate, three color marker, respectively. The MEF specifies policing and shaping in terms of Ingress and Egress Subscriber/Provider Conditioning Functions in MEF12.1; Ingress and Bandwidth Profile attributes in MEF 10.2 and MEF 26. ACM: Agree with Bhuvan's comment here, these need to be references to the MEF docs, preferably free versions if available. 1.2 DUT Lab Configuration and Testing Overview ACM: We always have to spell-out DUT at first usage, sorry, it's well-known in bmwg, but not IETF-wide. The following is the description of the lab set-up for the traffic management tests: +--------------+ +-------+ +----------+ +-----------+ | Transmitting | | | | | | Receiving | | Test Host | | | | | | Test Host | | |-----| DUT |---->| Network |--->| | | | | | | Delay | | | | | | | | Emulator | | | | |<----| |<----| |<---| | | | | | | | | | +--------------+ +-------+ +----------+ +-----------+ As shown in the test diagram, the framework supports uni-directional and bi-directional traffic management tests. ACM: . . ., where the transmitting and receiving roles would be reversed on the return path. Constantine August 10, 2014 [Page 5] Internet-Draft Traffic Management Benchmarking August, 2014 This testing framework describes the tests and metrics for each of the following traffic management functions: - Policing - Queuing / Scheduling - Shaping The tests are divided into individual tests and rated capacity tests. The individual tests are intended to benchmark the traffic management functions according to the metrics defined in Section 4. The capacity tests verify traffic management functions under full load. ACM: . . . under the load of many simultaneous individual tests and their flows. This involves concurrent testing of multiple interfaces with the specific traffic management function enabled, and doing so to the capacity limit of each interface. ACM: . . . and increasing load to the capacity limit of each interface. As an example: a device is specified to be capable of shaping on all of it's egress ports. The individual test would first be conducted to ACM: s/it's/its/ benchmark the advertised shaping function against the metrics defined ACM: s/advertised/specified/ in section 4. Then the capacity test would be executed to test the shaping function concurrently on all interfaces and with maximum traffic load. The Network Delay Emulator (NDE) is a requirement for the TCP stateful tests, which require network delay to allow TCP to fully open the TCP window. ACM: . . . to allow TCP to utilize a significant size TCP window in its control loop. Also note that the Network Delay Emulator (NDE) should be passive in nature such as a fiber spool. This is recommended to eliminate the potential effects that an active delay element (i.e. test impairment generator) may have on the test flows. In the case that a fiber spool is not practical due to the desired latency, an active NDE must be independently verified to be capable of adding the configured delay without loss. In other words, the DUT would be removed and the NDE performance benchmarked independently. Note the NDE should be used in "full pipe" delay mode. Most NDEs allow for per flow delay actions, emulating QoS prioritization. For this framework, the NDE's sole purpose is simply to add delay to all packets (emulate network latency). So to benchmark the performance of the NDE, maximum offered load should be tested against the following frame sizes: 128, 256, 512, 768, 1024, 1500,and 9600 bytes. The delay accuracy at each of these packet sizes can then be used to calibrate the range of expected BDPs for the TCP stateful tests. ACM: spell-out Bandwidth Delay Product (BDP) above (it is used before the glossary below). Constantine August 10, 2014 [Page 6] Internet-Draft Traffic Management Benchmarking August, 2014 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. The following acronyms are used: ACM: This list should include terms already used, such as AQM. BB: Bottleneck Bandwidth BDP: Bandwidth Delay Product BSA: Burst Size Achieved CBS: Committed Burst Size CIR: Committed Information Rate DUT: Device Under Test EBS: Excess Burst Size EIR: Excess Information Rate NDE: Network Delay Emulator SP: Strict Priority Queuing QL: Queue Length QoS: Quality of Service RED: Random Early Discard RTT: Round Trip Time SBB: Shaper Burst Bytes SBI: Shaper Burst Interval SR: Shaper Rate SSB: Send Socket Buffer Tc: CBS Time Interval Te: EBS Time Interval Constantine August 10, 2014 [Page 7] Internet-Draft Traffic Management Benchmarking August, 2014 Ti Transmission Interval TTP: TCP Test Pattern TTPET: TCP Test Pattern Execution Time WRED: Weighted Random Early Discard ACM: Just noting the inclusion of terms "RED" and "WRED" which are examples of AQM and indicated to be out-of scope… 3. Scope and Goals The scope of this work is to develop a framework for benchmarking and testing the traffic management capabilities of network devices in the lab environment. These network devices may include but are not limited to: - Switches (including Layer 2/3 devices) - Routers - Firewalls - General Layer 4-7 appliances (Proxies, WAN Accelerators, etc.) Essentially, any network device that performs traffic management as defined in section 1.1 can be benchmarked or tested with this framework. The primary goal is to assess the maximum forwarding performance that a network device can sustain without dropping or impairing packets, ACM: . . . deemed to be within the provisioned traffic limits (?) or compromising the accuracy of multiple instances of traffic management functions. This is the benchmark for comparison between devices. Within this framework, the metrics are defined for each traffic management test but do not include pass / fail criterion, which is not within the charter of BMWG. This framework provides the test methods and metrics to conduct repeatable testing, which will provide the means to compare measured performance between DUTs. As mentioned in section 1.2, this framework describes the individual tests and metrics for several management functions. It is also within scope that this framework will benchmark each function in terms of ACM: s/this framework/these methods/ overall rated capacity. This involves concurrent testing of multiple interfaces with the specific traffic management function enabled, up to the capacity limit of each interface. It is not within scope of this framework to specify the procedure for testing multiple traffic management functions concurrently. The multitudes of possible combinations is almost unbounded and the ability to identify functional "break points" would be most times impossible. ACM: Just to clarify, do you mean "testing *multiple configurations* of traffic management functions concurrently" is out of scope? Certainly there will be multiple traffic shapers is a test, perhaps they will all be configured the same, but they are different instances of the same shaper. Constantine August 10, 2014 [Page 8] Internet-Draft Traffic Management Benchmarking August, 2014 However, section 6.4 provides suggestions for some profiles of concurrent functions that would be useful to benchmark. The key requirement for any concurrent test function is that tests must produce reliable and repeatable results. Also, it is not within scope to perform conformance testing. Tests defined in this framework benchmark the traffic management functions according to the metrics defined in section 4 and do not address any conformance to standards related to traffic management. Traffic management specifications largely do not exist and this is a prime ACM: I think what you mean here is that the specifications don't specify exact behavior or implementation - and the specs that do exist allow implementations to vary w.r.t. short term rate accuracy of other factors. But there *are* specs, you cited them in section 1.1. driver for this framework; to provide an objective means to compare vendor traffic management functions. Another goal is to devise methods that utilize flows with congestion-aware transport (TCP) as part of the traffic load and still produce repeatable results in the isolated test environment. This framework will derive stateful test patterns (TCP or application layer) that can also be used to further benchmark the performance of applicable traffic management techniques such as queuing / scheduling and traffic shaping. In cases where the network device is stateful in nature (i.e. firewall, etc.), stateful test pattern traffic is important to test along with stateless, UDP traffic in specific test scenarios (i.e. applications using TCP transport and UDP VoIP, etc.) ACM: Repeatability is necessary, but you haven't given the reader a clue yet about how the methods described here will achieve it, or allow the tester to assess repeatability. It's section 3, so it's appropriate to add a bit of detail on repeatability now. And finally, this framework will provide references to open source tools that can be used to provide stateless and/or stateful traffic generation emulation. 4. Traffic Benchmarking Metrics The metrics to be measured during the benchmarks are divided into two (2) sections: packet layer metrics used for the stateless traffic testing and segment layer metrics used for the stateful traffic testing. ACM: . . . and transport layer metrics used for the stateful traffic testing, such as TCP segments of the byte stream. ("segment" threw me for a moment, it's better to provide the context of TCP before using the term segment) 4.1. Metrics for Stateless Traffic Tests For the stateless traffic tests, the metrics are defined at the layer 3 packet level versus layer 2 packet level for consistency. Constantine August 10, 2014 [Page 9] Internet-Draft Traffic Management Benchmarking August, 2014 Stateless traffic measurements require that sequence number and time-stamp be inserted into the payload for lost packet analysis. Delay analysis may be achieved by insertion of timestamps directly into the packets or timestamps stored elsewhere (packet captures). This framework does not specify the packet format to carry sequence number or timing information. However, RFC 4689 provides ACM: please add ref RFC 4737, which is the basis for the Out-of-order definition in 4689, and provides more exact definitions and discussion. recommendations for sequence tracking along with definitions of in-sequence and out-of-order packets. The following are the metrics to be used during the stateless traffic benchmarking components of the tests: - Burst Size Achieved (BSA): for the traffic policing and network queue tests, the tester will be configured to send bursts to test either the Committed Burst Size (CBS) or Excess Burst Size (EBS) of a policer or the queue / buffer size configured in the DUT. The Burst Size Achieved metric is a measure of the actual burst size received at the egress port of the DUT with no lost packets. As an example, the configured CBS of a DUT is 64KB and after the burst test, only a 63 KB can be achieved without packet loss. Then 63KB is the BSA. Also, the average Packet Delay Variation (PDV see below) as experienced by the packets sent at the BSA burst size should be recorded. - Lost Packets (LP): For all traffic management tests, the tester will transmit the test packets into the DUT ingress port and the number of packets received at the egress port will be measured. The difference between packets transmitted into the ingress port and received at the egress port is the number of lost packets as measured at the egress port. These packets must have unique identifiers such that only the test packets are measured. ACM: It's not clear if a sample (sub-set) of packets in one traffic flow is measured, or a sub-set of the total flows (the packets bearing sequence numbers) RFC 4737 and RFC 2680 describe the need to to establish the time threshold to wait before a packet is declared as lost. packet as lost, and this threshold MUST be reported with the results. - Out of Sequence (OOS): in additions to the LP metric, the test packets must be monitored for sequence and the out-of-sequence (OOS) packets. RFC 4689 defines the general function of sequence tracking, as well as definitions for in-sequence and out-of-order packets. Out-of- order packets will be counted per RFC 4737 and RFC 2680. - Packet Delay (PD): the Packet Delay metric is the difference between the timestamp of the received egress port packets and the packets transmitted into the ingress port and specified in RFC 2285. Constantine August 10, 2014 [Page 10] Internet-Draft Traffic Management Benchmarking August, 2014 - Packet Delay Variation (PDV): the Packet Delay Variation metric is the variation between the timestamp of the received egress port packets and specified in RFC 5481. ACM: We don't want INTER-packet delay variation here, there could be substantial IPDV due to shaping and it will be meaningless. We want the measurement of PDV in RFC 5481, in my opinion, which is variation of one-way delay across many packets in the traffic flow. - Shaper Rate (SR): the Shaper Rate is only applicable to the traffic shaping tests. The SR represents the average egress output rate (bps) over the test interval. - Shaper Burst Bytes (SBB): the Shaper Burst Bytes is only applicable to the traffic shaping tests. A traffic shaper will emit packets in different size "trains" (bytes back-to-back). This metric characterizes the method by which the shaper emits traffic. Some shapers transmit larger bursts per interval, while other shapers may transmit a single frame at the CIR rate (two extreme examples). ACM: This metric characterizes shaper behavior, but it is simply information for the tester? - Shaper Burst Interval(SBI): the interval is only applicable to the traffic shaping tests and again is the time between a shaper emitted bursts. ACM: and a burst of 1 packet would apply to the extreme case of a shaper sending a CBR stream of single packets. 4.2. Metrics for Stateful Traffic Tests The stateful metrics will be based on RFC 6349 TCP metrics and will include: - TCP Test Pattern Execution Time (TTPET): RFC 6349 defined the TCP Transfer Time for bulk transfers, which is simply the measured time to transfer bytes across single or concurrent TCP connections. The TCP test patterns used in traffic management tests will include bulk transfer and interactive applications. The interactive patterns include instances such as HTTP business applications, database applications, etc. The TTPET will be the measure of the time for a single execution of a TCP Test Pattern (TTP). Average, minimum, and maximum times will be measured or calculated. An example would be an interactive HTTP TTP session which should take 5 seconds on a GigE network with 0.5 millisecond latency. During ten (10) executions of this TTP, the TTPET results might be: average of 6.5 seconds, minimum of 5.0 seconds, and maximum of 7.9 seconds. - TCP Efficiency: after the execution of the TCP Test Pattern, TCP Efficiency represents the percentage of Bytes that were not retransmitted. Transmitted Bytes - Retransmitted Bytes TCP Efficiency % = --------------------------------------- X 100 Transmitted Bytes Transmitted Bytes are the total number of TCP Bytes to be transmitted including the original and the retransmitted Bytes. These retransmitted bytes should be recorded from the sender's TCP/IP stack perspective, to avoid any misinterpretation that a reordered packet is a retransmitted packet (as may be the case with packet decode interpretation). Constantine August 10, 2014 [Page 10] Internet-Draft Traffic Management Benchmarking August, 2014 - Buffer Delay: represents the increase in RTT during a TCP test versus the baseline DUT RTT (non congested, inherent latency). RTT and the technique to measure RTT (average versus baseline) are defined in RFC 6349. Referencing RFC 6349, the average RTT is derived from the total of all measured RTTs during the actual test sampled at every second divided by the test duration in seconds. Total RTTs during transfer Average RTT during transfer = ----------------------------- Transfer duration in seconds Average RTT during Transfer - Baseline RTT Buffer Delay % = ------------------------------------------ X 100 Baseline RTT Note that even though this was not explicitly stated in RFC 6349, retransmitted packets should not be used in RTT measurements. Also, the test results should record the average RTT in millisecond across the entire test duration and number of samples. 5. Tester Capabilities The testing capabilities of the traffic management test environment are divided into two (2) sections: stateless traffic testing and stateful traffic testing 5.1. Stateless Test Traffic Generation The test set must be capable of generating traffic at up to the link speed of the DUT. The test set must be calibrated to verify that it will not drop any packets. The test set's inherent PD and PDV must also be calibrated and subtracted from the PD and PDV metrics. The test set must support the encapsulation to be tested such as VLAN, Q-in-Q, MPLS, etc. Also, the test set must allow control of the classification techniques defined in RFC 4689 (i.e. IP address, DSCP, TOS, etc classification). The open source tool "iperf" can be used to generate stateless UDP traffic and is discussed in Appendix A. Since iperf is a software based tool, there will be performance limitations at higher link speeds (e.g. GigE, 10 GigE, etc.). Careful calibration of any test environment using iperf is important. At higher link speeds, it is recommended to use hardware based packet test equipment. ACM: Agree with Dave Taht's comment to incorporate other tools in the text, such as netperf. Constantine August 10, 2014 [Page 11] Internet-Draft Traffic Management Benchmarking August, 2014 5.1.1 Burst Hunt with Stateless Traffic A central theme for the traffic management tests is to benchmark the specified burst parameter of traffic management function, since burst parameters of SLAs are specified in bytes. For testing efficiency, it is recommended to include a burst hunt feature, which automates the manual process of determining the maximum burst size which can be supported by a traffic management function. The burst hunt algorithm should start at the target burst size (maximum burst size supported by the traffic management function) and will send single bursts until it can determine the largest burst that can pass without loss. ACM: need to add mention of the inter-burst interval here, and its influence one the test results when set improperly (too low). If the target burst size passes, then the test is complete. The hunt aspect occurs when the target burst size is not achieved; the algorithm will drop down to a configured minimum burst size and incrementally increase the burst until the maximum burst supported by the DUT is discovered. The recommended granularity of the incremental burst size increase is 1 KB. Optionally for a policer function and if the burst size passes, the burst should be increased by increments of 1 KB to verify that the policer is truly configured properly (or enabled at all). 5.2. Stateful Test Pattern Generation The TCP test host will have many of the same attributes as the TCP test host defined in RFC 6349. The TCP test device may be a standard computer or a dedicated communications test instrument. In both cases, it must be capable of emulating both a client and a server. For any test using stateful TCP test traffic, the Network Delay Emulator (NDE function from the lab set-up diagram) must be used in order to provide a meaningful BDP. As referenced in section 2, the target traffic rate and configured RTT must be verified independently using just the NDE for all stateful tests (to ensure the NDE can delay without loss). The TCP test host must be capable to generate and receive stateful TCP test traffic at the full link speed of the DUT. As a general rule of thumb, testing TCP Throughput at rates greater than 500 Mbps may require high performance server hardware or dedicated hardware based test tools. The TCP test host must allow adjusting both Send and Receive Socket Buffer sizes. The Socket Buffers must be large enough to fill the BDP for bulk transfer TCP test application traffic. Measuring RTT and retransmissions per connection will generally require a dedicated communications test instrument. In the absence of dedicated hardware based test tools, these measurements may need to be conducted with packet capture tools, i.e. conduct TCP Throughput tests and analyze RTT and retransmissions in packet captures. The TCP implementation used by the test host must be specified in the test results (i.e. OS version, i.e. LINUX OS kernel using TCP New Reno, TCP options supported, etc.). Constantine August 10, 2014 [Page 12] Internet-Draft Traffic Management Benchmarking August, 2014 While RFC 6349 defined the means to conduct throughput tests of TCP bulk transfers, the traffic management framework will extend TCP test execution into interactive TCP application traffic. Examples include email, HTTP, business applications, etc. This interactive traffic is bi-directional and can be chatty. The test device must not only support bulk TCP transfer application traffic but also chatty traffic. A valid stress test SHOULD include both traffic types. This is due to the non-uniform, bursty nature of chatty applications versus the relatively uniform nature of bulk transfers (the bulk transfer smoothly stabilizes to equilibrium state under lossless conditions). While iperf is an excellent choice for TCP bulk transfer testing, the open source tool "Flowgrind" (referenced in Appendix A) is client-server based and emulates interactive applications at the TCP layer. As with any software based tool, the performance must be qualified to the link speed to be tested. Hardware-based test equipment should be considered for reliable results at higher links speeds (e.g. 1 GigE, 10 GigE). 5.2.1. TCP Test Pattern Definitions As mentioned in the goals of this framework, techniques are defined to specify TCP traffic test patterns to benchmark traffic management technique(s) and produce repeatable results. Some network devices such as firewalls, will not process stateless test traffic which is another reason why stateful TCP test traffic must be used. An application could be fully emulated up to Layer 7, however this framework proposes that stateful TCP test patterns be used in order to provide granular and repeatable control for the benchmarks. The following diagram illustrates a simple Web Browsing application (HTTP). GET url Client ------------------------> Web Web 200 OK 100ms | Browser <------------------------ Server Constantine August 10, 2014 [Page 13] Internet-Draft Traffic Management Benchmarking August, 2014 In this example, the Client Web Browser (Client) requests a URL and then the Web Server delivers the web page content to the Client (after a Server delay of 100 millisecond). This asynchronous, "request/ response" behavior is intrinsic to most TCP based applications such as Email (SMTP), File Transfers (FTP and SMB), Database (SQL), Web Applications (SOAP), REST, etc. The impact to the network elements is due to the multitudes of Clients and the variety of bursty traffic, which stresses traffic management functions. The actual emulation of the specific application protocols is not required and TCP test patterns can be defined to mimic the application network traffic flows and produce repeatable results. There are two (2) techniques recommended by this framework to develop standard TCP test patterns for traffic management benchmarking. The first technique involves modeling, which have been described in "3GPP2 C.R1002-0 v1.0" and describe the behavior of HTTP, FTP, and WAP applications at the TCP layer. The models have been defined with various mathematical distributions for the Request/Response bytes and inter-request gap times. The Flowgrind tool (Appendix A) supports many of the distributions and is a good choice as long as the processing limits of the server platform are taken into consideration. The second technique is to conduct packet captures of the applications to test and then to statefully play the application back at the TCP layer. ACM: "packet capture" and "stateful playback" don't match here, unless you are determining some high-level aspects of application behavior (number of connections, bytes per connection, sequence of connections, etc.). So, this is "transport connection capture"? Not sure. . . The TCP playback includes the request byte size, response byte size, and inter-message gaps at both the client and the server. The advantage of this method is that very realistic test patterns can be defined based on real world application traffic. ACM: This sounds like you are building application-layer behavior on top of TCP transport, which would be more easily emulated using the real application layer communications, if it has the necessary sequence numbers and timestamps. This framework does not specify a fixed set of TCP test patterns, but does provide recommended test cases in Appendix B. Some of these examples reflect those specified in "draft-ietf-bmwg-ca-bench-meth-04" which suggests traffic mixes for a variety of representative application profiles. Other examples are simply well known application traffic types. Constantine August 10, 2014 [Page 14] Internet-Draft Traffic Management Benchmarking August, 2014 6. Traffic Benchmarking Methodology The traffic benchmarking methodology uses the test set-up from section 2 and metrics defined in section 4. Each test should be run for a minimum test time of 5 minutes. Each test should compare the network device's internal statistics (available via command line management interface, SNMP, etc.) to the measured metrics defined in section 4. This evaluates the accuracy of the internal traffic management counters under individual test conditions and capacity test conditions that are defined in each subsection. 6.1. Policing Tests The intent of the policing tests is to verify the policer performance (i.e. CIR-CBS and EIR-EBS parameters). The tests will verify that the network device can handle the CIR with CBS and the EIR with EBS and will use back-back packet testing concepts from RFC 2544 (but adapted to burst size algorithms and terminology). Also MEF-14,19,37 provide some basis for specific components of this test. The burst hunt algorithm defined in section 5.1.1 can also be used to automate the measurement of the CBS value. The tests are divided into two (2) sections; individual policer tests and then full capacity policing tests. It is important to benchmark the basic functionality of the individual policer then proceed into the fully rated capacity of the device. This capacity may include the number of policing policies per device and the number of policers simultaneously active across all ports. 6.1.1 Policer Individual Tests Policing tests should use stateless traffic. Stateful TCP test traffic will generally be adversely affected by a policer in the absence of traffic shaping. So while TCP traffic could be used, it is more accurate to benchmark a policer with stateless traffic. The policer test shall test a policer as defined by RFC 4115 or ACM: s/shall/SHALL/ MEF 10.2, depending upon the equipment's specification. As an example for RFC 4115, consider a CBS and EBS of 64KB and CIR and EIR of 100 Mbps on a 1GigE physical link (in color-blind mode). A stateless traffic burst of 64KB would be sent into the policer at the GigE rate. This equates to approximately a 0.512 millisecond burst time (64 KB at 1 GigE). The traffic generator must space these bursts to ensure that the aggregate throughput does not exceed the CIR. The Ti between the bursts would equal CBS * 8 / CIR = 5.12 millisecond in this example. The metrics defined in section 4.1 shall be measured at the egress ACM: s/shall/SHALL/ port and recorded. In addition to verifying that the policer allows the specified CBS and EBS bursts to pass, the policer test must verify that the policer will police at the specified CBS/EBS values. ACM: s/must/MUST/ also, "policer will police" is ambiguous, . . . "policer will re-mark or drop excess, and pass traffic at the specified CBS/EBS values." Constantine August 10, 2014 [Page 15] Internet-Draft Traffic Management Benchmarking August, 2014 For this portion of the test, the CBS/EBS value should be incremented ACM: s/should/SHOULD/ by 1000 bytes higher than the configured CBS and that the egress port measurements must show that the excess packets are dropped. ACM: s/the excess/only the excess/ ACM: the above is the last comment in this version
- [bmwg] first WGLC on draft-ietf-bmwg-traffic MORTON, ALFRED C (AL)
- Re: [bmwg] [aqm] first WGLC on draft-ietf-bmwg-tr… gorry
- Re: [bmwg] [aqm] first WGLC on draft-ietf-bmwg-tr… ramki Krishnan
- Re: [bmwg] [aqm] first WGLC on draft-ietf-bmwg-tr… gorry
- Re: [bmwg] [aqm] first WGLC on draft-ietf-bmwg-tr… Barry Constantine
- Re: [bmwg] [aqm] first WGLC on draft-ietf-bmwg-tr… Dave Taht
- Re: [bmwg] [aqm] first WGLC on draft-ietf-bmwg-tr… Barry Constantine
- Re: [bmwg] [aqm] first WGLC on draft-ietf-bmwg-tr… Toke Høiland-Jørgensen
- Re: [bmwg] first WGLC on draft-ietf-bmwg-traffic Bhuvaneswaran Vengainathan
- Re: [bmwg] first WGLC on draft-ietf-bmwg-traffic MORTON, ALFRED C (AL)
- Re: [bmwg] first WGLC on draft-ietf-bmwg-traffic MORTON, ALFRED C (AL)
- Re: [bmwg] first WGLC on draft-ietf-bmwg-traffic ramki Krishnan