Re: [bmwg] first WGLC on draft-ietf-bmwg-traffic

"MORTON, ALFRED C (AL)" <acm@research.att.com> Tue, 07 October 2014 20:58 UTC

Return-Path: <acm@research.att.com>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 974701A8872 for <bmwg@ietfa.amsl.com>; Tue, 7 Oct 2014 13:58:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.988
X-Spam-Level:
X-Spam-Status: No, score=-4.988 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.786, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pKtm-cvBvaLo for <bmwg@ietfa.amsl.com>; Tue, 7 Oct 2014 13:58:10 -0700 (PDT)
Received: from mail-pink.research.att.com (mail-pink.research.att.com [204.178.8.22]) by ietfa.amsl.com (Postfix) with ESMTP id 5803B1A8864 for <bmwg@ietf.org>; Tue, 7 Oct 2014 13:58:04 -0700 (PDT)
Received: from mail-green.research.att.com (H-135-207-255-15.research.att.com [135.207.255.15]) by mail-pink.research.att.com (Postfix) with ESMTP id C3F701216AA for <bmwg@ietf.org>; Tue, 7 Oct 2014 17:06:50 -0400 (EDT)
Received: from exchange.research.att.com (njfpsrvexg0.research.att.com [135.207.240.40]) by mail-green.research.att.com (Postfix) with ESMTP id 865E8E0248 for <bmwg@ietf.org>; Tue, 7 Oct 2014 16:55:47 -0400 (EDT)
Received: from NJFPSRVEXG0.research.att.com ([fe80::c5dd:2310:7197:58ea]) by NJFPSRVEXG0.research.att.com ([fe80::c5dd:2310:7197:58ea%17]) with mapi; Tue, 7 Oct 2014 16:58:03 -0400
From: "MORTON, ALFRED C (AL)" <acm@research.att.com>
To: "bmwg@ietf.org" <bmwg@ietf.org>
Date: Tue, 07 Oct 2014 16:56:34 -0400
Thread-Topic: first WGLC on draft-ietf-bmwg-traffic
Thread-Index: Ac/XLKjoeow850inTE6BX9hYGkXMcQLNkf6IAAOPOCQ=
Message-ID: <4AF73AA205019A4C8A1DDD32C034631D797C6129@NJFPSRVEXG0.research.att.com>
References: <E5827F45CE10424189851096E6072C351C4DE1AC25@NJFPSRVEXG0.research.att.com>, <4AF73AA205019A4C8A1DDD32C034631D797C6125@NJFPSRVEXG0.research.att.com>
In-Reply-To: <4AF73AA205019A4C8A1DDD32C034631D797C6125@NJFPSRVEXG0.research.att.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/bmwg/iP1jwyoVOQM3HMeDrbMZm30i_2o
Subject: Re: [bmwg] first WGLC on draft-ietf-bmwg-traffic
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Oct 2014 20:58:16 -0000

Authors,
 my suggestions follow, embedded in the text and prefaced with ACM:

Al
(as a participant)

ACM: Global
decide "remarking" or "re-marking", which ever is more correct
or consistent with current literature.

ACM: Global
There are no step-by-step procedures, which are provided in
almost all methodology work in BMWG. IMO, more specifics are needed.
Section 6 should have several procedures, (individual, multi-port,
combo), or core procedures with customization in the specific sections.

ACM: Global
Although "repeatability" is mentioned early on, there are no
methods or metrics to evaluate repeatability of the results,
and the topic is never mentioned beyond the introductory material
and repeatable pattern generation (which is fine, but different).
Repeatable *results* are the property most needed here. With so
many stateful performance evaluations in play, this topic
requires thorough treatment in the text.


...
Constantine                       August 10, 2014               [Page 3]

Internet-Draft     Traffic Management Benchmarking          August, 2014

1. Introduction

   Traffic management (i.e. policing, shaping, etc.) is an increasingly
   important component when implementing network Quality of Service
   (QoS).
ACM:
   important component when implementing simple network features to
improve Quality of Service (QoS).

   There is currently no framework to benchmark these features
   although some standards address specific areas.
ACM:
it would be good to mention the standards or RFCs your methods
characterize here, or in the second paragraph.
ACM:
I see you did this later on, maybe say "(see section 1.1)"

   This draft provides
   a framework to conduct repeatable traffic management benchmarks for
   devices and systems in a lab environment.

   Specifically, this framework defines the methods to characterize the
   capacity of the following traffic management features in network
   devices; classification, policing, queuing / scheduling, and
   traffic shaping.

   This benchmarking framework can also be used as a test procedure to
   assist in the tuning of traffic management parameters before service
   activation. In addition to Layer 2/3 benchmarking, Layer 4 test
   patterns are proposed by this draft in order to more realistically
   benchmark end-user traffic.
ACM:
You actually have specific examples of L2, L3, and L4, right?
it's less ambiguous to name them, rather than refer to layer
numbers.


1.1. Traffic Management Overview

   In general, a device with traffic management capabilities performs
   the following functions:

   - Traffic classification: identifies traffic according to various
    configuration rules (i.e. VLAN, DSCP, etc.) and marks this traffic
    internally to the network device. Multiple external priorities
   (DSCP, 802.1p, etc.) can map to the same priority in the device.
  - Traffic policing: limits the rate of traffic that enters a network
    device according to the traffic classification.  If the traffic
    exceeds the contracted limits, the traffic is either dropped or
ACM:
s/contracted/provisioned/ or /configured/
(no need to get the lawyers involved)

    remarked and sent onto to the next network device
  - Traffic Scheduling: provides traffic classification within the
    network device by directing packets to various types of queues and
    applies a dispatching algorithm to assign the forwarding sequence
    of packets
  - Traffic shaping: a traffic control technique that actively buffers
    and meters the output rate in an attempt to adapt bursty traffic
ACM:
s/meters/smooths/

    to the configured limits
  - Active Queue Management (AQM): monitors the status of internal
    queues and actively drops (or re-marks) packets, which causes hosts
    using congestion-aware protocols to back-off and in turn can
    alleviate queue congestion.  Note that AQM is outside of the scope
        of this testing framework.
ACM:
I'm interested to see how the AQM item is re-worded following feedback
from Gorry/Dave/other AQM folks. Traffic management has the same
effects on Congestion-Aware traffic when it drops or re-marks,
but obviously the benchmarking scenarios are different (and
maybe that's the additional point to make here, when saying that
AQM is out of scope).


Constantine                       August 10, 2014               [Page 4]

Internet-Draft     Traffic Management Benchmarking          August, 2014

   The following diagram is a generic model of the traffic management
   capabilities within a network device.  It is not intended to
   represent all variations of manufacturer traffic management
   capabilities, but provide context to this test framework.

   |----------|   |----------------|   |--------------|   |----------|
   |          |   |                |   |              |   |          |
   |Interface |   |Ingress Actions |   |Egress Actions|   |Interface |
   |Input     |   |(classification,|   |(scheduling,  |   |Output    |
   |Queues    |   | marking,       |   | shaping,     |   |Queues    |
   |          |-->| policing or    |-->| active queue |-->|          |
   |          |   | shaping)       |   | management   |   |          |
   |          |   |                |   | re-marking)  |   |          |
   |----------|   |----------------|   |--------------|   |----------|

   Figure 1: Generic Traffic Management capabilities of a Network Device

   Ingress actions such as classification are defined in RFC 4689 and
   include IP addresses, port numbers, DSCP, etc.  In terms of marking,
   RFC 2697 and RFC 2698 define a single rate and dual rate, three color
   marker, respectively.

   The MEF specifies policing and shaping in terms of Ingress and Egress
   Subscriber/Provider Conditioning Functions in MEF12.1; Ingress and
   Bandwidth Profile attributes in MEF 10.2 and MEF 26.

ACM:
Agree with Bhuvan's comment here, these need to be references
to the MEF docs, preferably free versions if available.


1.2 DUT Lab Configuration and Testing Overview

ACM:
We always have to spell-out DUT at first usage, sorry,
it's well-known in bmwg, but not IETF-wide.

   The following is the description of the lab set-up for the traffic
   management tests:

    +--------------+     +-------+     +----------+    +-----------+
    | Transmitting |     |       |     |          |    | Receiving |
    | Test Host    |     |       |     |          |    | Test Host |
    |              |-----| DUT   |---->| Network  |--->|           |
    |              |     |       |     | Delay    |    |           |
    |              |     |       |     | Emulator |    |           |
    |              |<----|       |<----|          |<---|           |
    |              |     |       |     |          |    |           |
    +--------------+     +-------+     +----------+    +-----------+

   As shown in the test diagram, the framework supports uni-directional
   and bi-directional traffic management tests.
ACM:
 . . ., where the transmitting and receiving roles would be reversed on
the return path.


Constantine                       August 10, 2014              [Page 5]

Internet-Draft     Traffic Management Benchmarking          August, 2014

   This testing framework describes the tests and metrics for each of
   the following traffic management functions:
   - Policing
   - Queuing / Scheduling
   - Shaping

   The tests are divided into individual tests and rated capacity tests.
   The individual tests are intended to benchmark the traffic management
   functions according to the metrics defined in Section 4.  The
   capacity tests verify traffic management functions under full load.
ACM:
. . . under the load of many simultaneous individual tests and their
flows.

   This involves concurrent testing of multiple interfaces with the
   specific traffic management function enabled, and doing so to the
   capacity limit of each interface.
ACM:
                                   . . . and increasing load to the
   capacity limit of each interface.


   As an example: a device is specified to be capable of shaping on all
   of it's egress ports. The individual test would first be conducted to
ACM:
s/it's/its/

   benchmark the advertised shaping function against the metrics defined
ACM:
s/advertised/specified/

   in section 4.  Then the capacity test would be executed to test the
   shaping function concurrently on all interfaces and with maximum
   traffic load.

   The Network Delay Emulator (NDE) is a requirement for the TCP
   stateful tests, which require network delay to allow TCP to fully
   open the TCP window.
ACM:
   . . . to allow TCP to utilize a significant size TCP window in its
   control loop.


   Also note that the Network Delay Emulator (NDE)
   should be passive in nature such as a fiber spool.  This is
   recommended to eliminate the potential effects that an active delay
   element (i.e. test impairment generator) may have on the test flows.
   In the case that a fiber spool is not practical due to the desired
   latency, an active NDE must be independently verified to be capable
   of adding the configured delay without loss.  In other words, the
   DUT would be removed and the NDE performance benchmarked
   independently.

   Note the NDE should be used in "full pipe" delay mode. Most NDEs
   allow for per flow delay actions, emulating QoS prioritization.  For
   this framework, the NDE's sole purpose is simply to add delay to all
   packets (emulate network latency). So to benchmark the performance of
   the NDE, maximum offered load should be tested against the following
   frame sizes: 128, 256, 512, 768, 1024, 1500,and 9600 bytes. The delay
   accuracy at each of these packet sizes can then be used to calibrate
   the range of expected BDPs for the TCP stateful tests.
ACM:
spell-out Bandwidth Delay Product (BDP) above (it is used before
the glossary below).


Constantine                       August 10, 2014               [Page 6]

Internet-Draft     Traffic Management Benchmarking          August, 2014

2. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   The following acronyms are used:

ACM:
This list should include terms already used, such as AQM.

   BB: Bottleneck Bandwidth

   BDP: Bandwidth Delay Product

   BSA: Burst Size Achieved

   CBS: Committed Burst Size

   CIR: Committed Information Rate

   DUT: Device Under Test

   EBS: Excess Burst Size

   EIR: Excess Information Rate

   NDE: Network Delay Emulator

   SP: Strict Priority Queuing

   QL: Queue Length

   QoS: Quality of Service

   RED: Random Early Discard

   RTT: Round Trip Time

   SBB: Shaper Burst Bytes

   SBI: Shaper Burst Interval

   SR: Shaper Rate

   SSB: Send Socket Buffer

   Tc: CBS Time Interval

   Te: EBS Time Interval

Constantine                 August 10, 2014                     [Page 7]

Internet-Draft     Traffic Management Benchmarking          August, 2014

   Ti Transmission Interval

   TTP: TCP Test Pattern

   TTPET: TCP Test Pattern Execution Time

   WRED: Weighted Random Early Discard

ACM:
Just noting the inclusion of terms "RED" and "WRED"
which are examples of AQM and indicated to be out-of scope…


3. Scope and Goals

   The scope of this work is to develop a framework for benchmarking and
   testing the traffic management capabilities of network devices in the
   lab environment.  These network devices may include but are not
   limited to:
   - Switches (including Layer 2/3 devices)
   - Routers
   - Firewalls
   - General Layer 4-7 appliances (Proxies, WAN Accelerators, etc.)

   Essentially, any network device that performs traffic management as
   defined in section 1.1 can be benchmarked or tested with this
   framework.

   The primary goal is to assess the maximum forwarding performance that
   a network device can sustain without dropping or impairing packets,
ACM:
. . .  deemed to be within the provisioned traffic limits (?)


   or compromising the accuracy of multiple instances of traffic
   management functions. This is the benchmark for comparison between
   devices.

   Within this framework, the metrics are defined for each traffic
   management test but do not include pass / fail criterion, which is
   not within the charter of BMWG.  This framework provides the test
   methods and metrics to conduct repeatable testing, which will
   provide the means to compare measured performance between DUTs.

   As mentioned in section 1.2, this framework describes the individual
   tests and metrics for several management functions. It is also within
   scope that this framework will benchmark each function in terms of
ACM:
s/this framework/these methods/

   overall rated capacity.  This involves concurrent testing of multiple
   interfaces with the specific traffic management function enabled, up
   to the capacity limit of each interface.

   It is not within scope of this framework to specify the procedure for
   testing multiple traffic management functions concurrently.  The
   multitudes of possible combinations is almost unbounded and the
   ability to identify functional "break points" would be most times
   impossible.
ACM:
Just to clarify, do you mean "testing *multiple configurations* of
traffic management functions concurrently" is out of scope?
Certainly there will be multiple traffic shapers is a test,
perhaps they will all be configured the same, but they are
different instances of the same shaper.



Constantine                 August 10, 2014                     [Page 8]

Internet-Draft     Traffic Management Benchmarking          August, 2014

   However, section 6.4 provides suggestions for some profiles of
   concurrent functions that would be useful to benchmark.  The key
   requirement for any concurrent test function is that tests must
   produce reliable and repeatable results.

   Also, it is not within scope to perform conformance testing. Tests
   defined in this framework benchmark the traffic management functions
   according to the metrics defined in section 4 and do not address any
   conformance to standards related to traffic management.  Traffic
   management specifications largely do not exist and this is a prime
ACM:
I think what you mean here is that the specifications don't specify
exact behavior or implementation - and the specs that do exist allow
implementations to vary w.r.t. short term rate accuracy of other factors.
But there *are* specs, you cited them in section 1.1.


   driver for this framework; to provide an objective means to compare
   vendor traffic management functions.

   Another goal is to devise methods that utilize flows with
   congestion-aware transport (TCP) as part of the traffic load and
   still produce repeatable results in the isolated test environment.
   This framework will derive stateful test patterns (TCP or
   application layer) that can also be used to further benchmark the
   performance of applicable traffic management techniques such as
   queuing / scheduling and traffic shaping. In cases where the
   network device is stateful in nature (i.e. firewall, etc.),
   stateful test pattern traffic is important to test along with
   stateless, UDP traffic in specific test scenarios (i.e.
   applications using TCP transport and UDP VoIP, etc.)
ACM:
Repeatability is necessary, but you haven't given the reader a
clue yet about how the methods described here will achieve it, or
allow the tester to assess repeatability.  It's section 3, so it's
appropriate to add a bit of detail on repeatability now.

   And finally, this framework will provide references to open source
   tools that can be used to provide stateless and/or stateful
   traffic generation emulation.

4. Traffic Benchmarking Metrics

   The metrics to be measured during the benchmarks are divided into two
   (2) sections: packet layer metrics used for the stateless traffic
   testing and segment layer metrics used for the stateful traffic
   testing.
ACM:
. . . and transport layer metrics used for the stateful traffic
   testing, such as TCP segments of the byte stream.
("segment" threw me for a moment, it's better to provide the context
of TCP before using the term segment)


4.1.  Metrics for Stateless Traffic Tests

   For the stateless traffic tests, the metrics are defined at the layer
   3 packet level versus layer 2 packet level for consistency.

Constantine                       August 10, 2014               [Page 9]

Internet-Draft     Traffic Management Benchmarking          August, 2014

   Stateless traffic measurements require that sequence number and
   time-stamp be inserted into the payload for lost packet analysis.
   Delay analysis may be achieved by insertion of timestamps directly
   into the packets or timestamps stored elsewhere (packet captures).
   This framework does not specify the packet format to carry sequence
   number or timing information.  However, RFC 4689 provides
ACM:
please add ref RFC 4737, which is the basis for the Out-of-order definition
in 4689, and provides more exact definitions and discussion.


   recommendations for sequence tracking along with definitions of
   in-sequence and out-of-order packets.

   The following are the metrics to be used during the stateless traffic
   benchmarking components of the tests:

   - Burst Size Achieved (BSA): for the traffic policing and network
   queue tests, the tester will be configured to send bursts to test
   either the Committed Burst Size (CBS) or Excess Burst Size (EBS) of
   a policer or the queue / buffer size configured in the DUT.  The
   Burst Size Achieved metric is a measure of the actual burst size
   received at the egress port of the DUT with no lost packets.  As an
   example, the configured CBS of a DUT is 64KB and after the burst test,
   only a 63 KB can be achieved without packet loss.  Then 63KB is the
   BSA.  Also, the average Packet Delay Variation (PDV see below) as
   experienced by the packets sent at the BSA burst size should be
   recorded.

   - Lost Packets (LP): For all traffic management tests, the tester will
   transmit the test packets into the DUT ingress port and the number of
   packets received at the egress port will be measured.  The difference
   between packets transmitted into the ingress port and received at the
   egress port is the number of lost packets as measured at the egress
   port.  These packets must have unique identifiers such that only the
   test packets are measured.
ACM:
It's not clear if a sample (sub-set) of packets in one traffic flow
is measured,
or a sub-set of the total flows (the packets bearing sequence numbers)


RFC 4737 and RFC 2680 describe the need to
   to establish the time threshold to wait before a packet is declared
   as lost. packet as lost, and this threshold MUST be reported with
   the results.

   - Out of Sequence (OOS): in additions to the LP metric, the test
   packets must be monitored for sequence and the out-of-sequence (OOS)
   packets. RFC 4689 defines the general function of sequence tracking, as
   well as definitions for in-sequence and out-of-order packets.  Out-of-
   order packets will be counted per RFC 4737 and RFC 2680.

   - Packet Delay (PD): the Packet Delay metric is the difference between
   the timestamp of the received egress port packets and the packets
   transmitted into the ingress port and specified in RFC 2285.

Constantine                       August 10, 2014              [Page 10]

Internet-Draft     Traffic Management Benchmarking          August, 2014

   - Packet Delay Variation (PDV): the Packet Delay Variation metric is
   the variation between the timestamp of the received egress port
   packets and specified in RFC 5481.
ACM:
We don't want INTER-packet delay variation here, there could be
substantial IPDV due to shaping and it will be meaningless. We want
the measurement of PDV in RFC 5481, in my opinion, which is variation
of one-way delay across many packets in the traffic flow.

   - Shaper Rate (SR): the Shaper Rate is only applicable to the
   traffic shaping tests.  The SR represents the average egress output
   rate (bps) over the test interval.

   - Shaper Burst Bytes (SBB): the Shaper Burst Bytes is
   only applicable to the traffic shaping tests.  A traffic shaper will
   emit packets in different size "trains" (bytes back-to-back).  This
   metric characterizes the method by which the shaper emits traffic.
   Some shapers transmit larger bursts per interval, while other shapers
   may transmit a single frame at the CIR rate (two extreme examples).
ACM:
This metric characterizes shaper behavior, but it is simply information
for the tester?

   - Shaper Burst Interval(SBI):  the interval is only applicable to the
   traffic shaping tests and again is the time between a shaper emitted
   bursts.
ACM:
and a burst of 1 packet would apply to the extreme case of a shaper
sending a CBR stream of single packets.


4.2. Metrics for Stateful Traffic Tests

   The stateful metrics will be based on RFC 6349 TCP metrics and will
   include:

   - TCP Test Pattern Execution Time (TTPET): RFC 6349 defined the TCP
   Transfer Time for bulk transfers, which is simply the measured time
   to transfer bytes across single or concurrent TCP connections. The
   TCP test patterns used in traffic management tests will include bulk
   transfer and interactive applications.  The interactive patterns include
   instances such as HTTP business applications, database applications,
   etc.  The TTPET will be the measure of the time for a single execution
   of a TCP Test Pattern (TTP). Average, minimum, and maximum times will
   be measured or calculated.

   An example would be an interactive HTTP TTP session which should take
   5 seconds on a GigE network with 0.5 millisecond latency. During ten (10)
   executions of this TTP, the TTPET results might be: average of 6.5
   seconds, minimum of 5.0 seconds, and maximum of 7.9 seconds.

   - TCP Efficiency: after the execution of the TCP Test Pattern, TCP
   Efficiency represents the percentage of Bytes that were not
   retransmitted.

                          Transmitted Bytes - Retransmitted Bytes

      TCP Efficiency % =  ---------------------------------------  X 100

                                   Transmitted Bytes

   Transmitted Bytes are the total number of TCP Bytes to be transmitted
   including the original and the retransmitted Bytes.  These retransmitted
   bytes should be recorded from the sender's TCP/IP stack perspective,
   to avoid any misinterpretation that a reordered packet is a retransmitted
   packet (as may be the case with packet decode interpretation).

Constantine                       August 10, 2014              [Page 10]

Internet-Draft     Traffic Management Benchmarking          August, 2014

   - Buffer Delay: represents the increase in RTT during a TCP test
   versus the baseline DUT RTT (non congested, inherent latency).  RTT
   and the technique to measure RTT (average versus baseline) are defined
   in RFC 6349.  Referencing RFC 6349, the average RTT is derived from
   the total of all measured RTTs during the actual test sampled at every
   second divided by the test duration in seconds.

                                         Total RTTs during transfer
         Average RTT during transfer = -----------------------------
                                        Transfer duration in seconds

                        Average RTT during Transfer - Baseline RTT
       Buffer Delay % = ------------------------------------------ X 100
                                    Baseline RTT

    Note that even though this was not explicitly stated in RFC 6349,
    retransmitted packets should not be used in RTT measurements.

    Also, the test results should record the average RTT in millisecond
        across the entire test duration and number of samples.

5. Tester Capabilities

    The testing capabilities of the traffic management test environment
    are divided into two (2) sections: stateless traffic testing and
    stateful traffic testing

5.1. Stateless Test Traffic Generation

   The test set must be capable of generating traffic at up to the
   link speed of the DUT.  The test set must be calibrated to verify
   that it will not drop any packets.  The test set's inherent PD and PDV
   must also be calibrated and subtracted from the PD and PDV metrics.
   The test set must support the encapsulation to be tested such as
   VLAN, Q-in-Q, MPLS, etc.  Also, the test set must allow control of
   the classification techniques defined in RFC 4689 (i.e. IP address,
   DSCP, TOS, etc classification).

   The open source tool "iperf" can be used to generate stateless UDP
   traffic and is discussed in Appendix A.  Since iperf is a software
   based tool, there will be performance limitations at higher link
   speeds (e.g. GigE, 10 GigE, etc.).  Careful calibration of any test
   environment using iperf is important.  At higher link speeds, it is
   recommended to use hardware based packet test equipment.

ACM:
Agree with Dave Taht's comment to incorporate other tools
in the text, such as netperf.

Constantine                       August 10, 2014              [Page 11]

Internet-Draft     Traffic Management Benchmarking          August, 2014

5.1.1 Burst Hunt with Stateless Traffic

   A central theme for the traffic management tests is to benchmark the
   specified burst parameter of traffic management function, since burst
   parameters of SLAs are specified in bytes.  For testing efficiency,
   it is recommended to include a burst hunt feature, which automates
   the manual process of determining the maximum burst size which can
   be supported by a traffic management function.

   The burst hunt algorithm should start at the target burst size (maximum
   burst size supported by the traffic management function) and will send
   single bursts until it can determine the largest burst that can pass
   without loss.
ACM:
need to add mention of the inter-burst interval here, and its
influence one the test results when set improperly (too low).

   If the target burst size passes, then the test is
   complete.  The hunt aspect occurs when the target burst size is not
   achieved; the algorithm will drop down to a configured minimum burst
   size and incrementally increase the burst until the maximum burst
   supported by the DUT is discovered.  The recommended granularity
   of the incremental burst size increase is 1 KB.

   Optionally for a policer function and if the burst size passes, the burst
   should be increased by increments of 1 KB to verify that the policer is
   truly configured properly (or enabled at all).

5.2. Stateful Test Pattern Generation

   The TCP test host will have many of the same attributes as the TCP test
   host defined in RFC 6349.  The TCP test device may be a standard
   computer or a dedicated communications test instrument. In both cases,
   it must be capable of emulating both a client and a server.

   For any test using stateful TCP test traffic, the Network Delay Emulator
   (NDE function from the lab set-up diagram) must be used in order to provide a
   meaningful BDP.  As referenced in section 2, the target traffic rate and
   configured RTT must be verified independently using just the NDE for all
   stateful tests (to ensure the NDE can delay without loss).

   The TCP test host must be capable to generate and receive stateful TCP
   test traffic at the full link speed of the DUT.  As a general rule of
   thumb, testing TCP Throughput at rates greater than 500 Mbps may require
   high performance server hardware or dedicated hardware based test tools.

   The TCP test host must allow adjusting both Send and Receive Socket
   Buffer sizes.  The Socket Buffers must be large enough to fill the BDP
   for bulk transfer TCP test application traffic.

   Measuring RTT and retransmissions per connection will generally require
   a dedicated communications test instrument. In the absence of
   dedicated hardware based test tools, these measurements may need to be
   conducted with packet capture tools, i.e. conduct TCP Throughput
   tests and analyze RTT and retransmissions in packet captures.

   The TCP implementation used by the test host must be specified in the
   test results (i.e. OS version, i.e. LINUX OS kernel using TCP New Reno,
   TCP options supported, etc.).


Constantine                       August 10, 2014              [Page 12]

Internet-Draft     Traffic Management Benchmarking          August, 2014

   While RFC 6349 defined the means to conduct throughput tests of TCP bulk
   transfers, the traffic management framework will extend TCP test
   execution into interactive TCP application traffic.  Examples include
   email, HTTP, business applications, etc.  This interactive traffic is
   bi-directional and can be chatty.

   The test device must not only support bulk TCP transfer application
   traffic but also chatty traffic.  A valid stress test SHOULD include
   both traffic types. This is due to the non-uniform, bursty nature of
   chatty applications versus the relatively uniform nature of bulk
   transfers (the bulk transfer smoothly stabilizes to equilibrium state
   under lossless conditions).

   While iperf is an excellent choice for TCP bulk transfer testing, the
   open source tool "Flowgrind" (referenced in Appendix A) is
   client-server based and emulates interactive applications at the TCP
   layer.  As with any software based tool, the performance must be
   qualified to the link speed to be tested.  Hardware-based test equipment
   should be considered for reliable results at higher links speeds (e.g.
   1 GigE, 10 GigE).

5.2.1. TCP Test Pattern Definitions

   As mentioned in the goals of this framework, techniques are defined
   to specify TCP traffic test patterns to benchmark traffic
   management technique(s) and produce repeatable results. Some
   network devices such as firewalls, will not process stateless test
   traffic which is another reason why stateful TCP test traffic must
   be used.

   An application could be fully emulated up to Layer 7, however this
   framework proposes that stateful TCP test patterns be used in order
   to provide granular and repeatable control for the benchmarks. The
   following diagram illustrates a simple Web Browsing application
   (HTTP).

                   GET url

   Client      ------------------------>   Web

   Web             200 OK        100ms |

   Browser     <------------------------    Server


Constantine                       August 10, 2014              [Page 13]

Internet-Draft     Traffic Management Benchmarking          August, 2014


   In this example, the Client Web Browser (Client) requests a URL and
   then the Web Server delivers the web page content to the Client
   (after a Server delay of 100 millisecond).  This asynchronous, "request/
   response" behavior is intrinsic to most TCP based applications such
   as Email (SMTP), File Transfers (FTP and SMB), Database (SQL), Web
   Applications (SOAP), REST, etc.   The impact to the network elements is
   due to the multitudes of Clients and the variety of bursty traffic,
   which stresses traffic management functions.  The actual emulation of
   the specific application protocols is not required and TCP test
   patterns can be defined to mimic the application network traffic flows
   and produce repeatable results.

   There are two (2) techniques recommended by this framework to develop
   standard TCP test patterns for traffic management benchmarking.

   The first technique involves modeling, which have been described in
   "3GPP2 C.R1002-0 v1.0" and describe the behavior of HTTP, FTP, and
   WAP applications at the TCP layer.  The models have been defined
   with various mathematical distributions for the Request/Response
   bytes and inter-request gap times.  The Flowgrind tool (Appendix A)
   supports many of the distributions and is a good choice as long as
   the processing limits of the server platform are taken into
   consideration.

   The second technique is to conduct packet captures of the
   applications to test and then to statefully play the application back
   at the TCP layer.
ACM:
"packet capture" and "stateful playback" don't match here,
unless you are determining some high-level aspects of application
behavior (number of connections, bytes per connection, sequence of
connections, etc.). So, this is "transport connection capture"?
Not sure. . .

   The TCP playback includes the request byte size,
   response byte size, and inter-message gaps at both the client and the
   server.  The advantage of this method is that very realistic test
   patterns can be defined based on real world application traffic.
ACM:
This sounds like you are building application-layer behavior on
top of TCP transport, which would be more easily emulated using
the real application layer communications, if it has the necessary
sequence numbers and timestamps.

   This framework does not specify a fixed set of TCP test patterns, but
   does provide recommended test cases in Appendix B.  Some of these examples
   reflect those specified in "draft-ietf-bmwg-ca-bench-meth-04" which
   suggests traffic mixes for a variety of representative application
   profiles.  Other examples are simply well known application traffic
   types.
Constantine                       August 10, 2014              [Page 14]

Internet-Draft     Traffic Management Benchmarking          August, 2014

6. Traffic Benchmarking Methodology

   The traffic benchmarking methodology uses the test set-up from
   section 2 and metrics defined in section 4.  Each test should be run
   for a minimum test time of 5 minutes.

   Each test should compare the network device's internal statistics
   (available via command line management interface, SNMP, etc.) to the
   measured metrics defined in section 4.  This evaluates the accuracy
   of the internal traffic management counters under individual test
   conditions and capacity test conditions that are defined in each
   subsection.

6.1. Policing Tests

   The intent of the policing tests is to verify the policer performance
   (i.e. CIR-CBS and EIR-EBS parameters). The tests will verify that the
   network device can handle the CIR with CBS and the EIR with EBS and
   will use back-back packet testing concepts from RFC 2544 (but adapted
   to burst size algorithms and terminology).  Also MEF-14,19,37 provide
   some basis for specific components of this test.  The burst hunt
   algorithm defined in section 5.1.1 can also be used to automate the
   measurement of the CBS value.

   The tests are divided into two (2) sections; individual policer
   tests and then full capacity policing tests. It is important to
   benchmark the basic functionality of the individual policer then
   proceed into the fully rated capacity of the device. This capacity may
   include the number of policing policies per device and the number of
   policers simultaneously active across all ports.

6.1.1 Policer Individual Tests

   Policing tests should use stateless traffic. Stateful TCP test traffic
   will generally be adversely affected by a policer in the absence of
   traffic shaping.  So while TCP traffic could be used, it is more
   accurate to benchmark a policer with stateless traffic.

   The policer test shall test a policer as defined by RFC 4115 or
ACM:
s/shall/SHALL/

   MEF 10.2, depending upon the equipment's specification. As an example
   for RFC 4115, consider a CBS and EBS of 64KB and CIR and EIR of
   100 Mbps on a 1GigE physical link (in color-blind mode).  A stateless
   traffic burst of 64KB would be sent into the policer at the GigE rate.
   This equates to approximately a 0.512 millisecond burst time (64 KB at
   1 GigE). The traffic generator must space these bursts to ensure that
   the aggregate throughput does not exceed the CIR.  The Ti between the
   bursts would equal CBS * 8 / CIR = 5.12 millisecond in this example.

   The metrics defined in section 4.1 shall be measured at the egress
ACM:
s/shall/SHALL/
   port and recorded.

   In addition to verifying that the policer allows the specified CBS
   and EBS bursts to pass, the policer test must verify that the policer
   will police at the specified CBS/EBS values.
ACM:
s/must/MUST/
also, "policer will police" is ambiguous,
. . . "policer will re-mark or drop
excess, and pass traffic at the specified CBS/EBS values."

Constantine                       August 10, 2014              [Page 15]

Internet-Draft     Traffic Management Benchmarking          August, 2014

   For this portion of the test, the CBS/EBS value should be incremented
ACM:
s/should/SHOULD/

   by 1000 bytes higher than the configured CBS and that the egress port
   measurements must show that the excess packets are dropped.
ACM:
s/the excess/only the excess/

ACM:
the above is the last comment in this version