Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-meth-01.txt

Mike Hamilton <mhamilton@breakingpoint.com> Mon, 12 March 2012 19:56 UTC

From: Mike Hamilton <mhamilton@breakingpoint.com>
To: "bmwg@ietf.org" <bmwg@ietf.org>
Thread-Topic: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-meth-01.txt
Thread-Index: AQHNAIhswmkbJEQGiUSI2j/iwDstQJZnJAIA
Date: Mon, 12 Mar 2012 19:56:06 +0000
Message-ID: <CB83BC6B.ADAF%mhamilton@breakingpoint.com>
In-Reply-To: <20120312194336.5854.36706.idtracker@ietfa.amsl.com>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.14.0.111121
Content-Type: text/plain; charset="us-ascii"
Content-ID: <B172A33B5C2BB64A891239C29F20DAA6@breakingpoint.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-meth-01.txt
Precedence: list

There are some fairly extensive updates in the draft in response to Tom
Alexander's comments, as well as Al and Barry Constantine.  Since Tom
provided so much detail in his comments, I went ahead and addressed them
one by one, as noted below.  Some comments are not addressed in this
version simply due to time constraints.

Mike

>1. The abstract notes that '.. these scenarios are designed to most
>accurately predict
>performance of these devices when subjected to dynamic traffic patterns
>..' "Most
>accurately" seems exceedingly ambitious; perhaps the authors meant 'more
>accurately'?
>Stating instead: '.. these scenarios are intended to predict the
>performance of these
>devices when subjected to dynamic traffic patterns ..' would be better.
Advice received and implemented in Abstract section.


>2. Most of the introduction appears to have been lifted verbatim from the
>terminology
>draft (or vice versa); thus the same objections apply to the methodology
>introduction
>as to the corresponding terminology sections. See my comments #1, #2 and
>#3 on the
>terminology draft. See RFC 2889 for an example of a condensed
>introduction and scope
>suitable for a methodology.
Reworded introduction further to incorporate suggestions and feedback


>3. In Section 2, Scope, rather than stating that something shall be
>explicitly
>stated - which is, by the way, a normative statement - it's better to
>just state it.
Agreed.

>In the third paragraph of this section, it says "These metrics will be
>implementation
>dependent". This does not make sense. The purpose of a BMWG RFC is to
>specify a set of
>metrics that are NOT implementation dependent; if they were, it would not
>be possible
>to compare two different implementations of the same device. (See the
>BMWG charter,
>particularly regarding "black-box testing".)
This was a typo and was previously fixed in the -00 version of the draft

>In the fourth paragraph, it is unnecessary to state that the document
>does not have
>functional testing as a purpose. There is no BMWG RFC I know of that
>pertains to
>functional testing.
You are making an assumption about what people know about BMWG.
While you may be aware of all the BMWG documents and what their
intended purposes are, the engineer in the QA
lab is typically not.  Many of these people are not just looking for
performance benchmarks, they are looking for functional test plans and
IMHO, it's important to qualify this methodology with that fact.

>Also, it seems strange to state up front that various DUT/SUT
>configurations to be
>tested will not be specified. This calls into question the validity of
>the draft. If
>the draft specifies a set of metrics that can be applied to a range of
>devices and
>enables comparison of the devices, should it not specify the
>configurations to be
>tested? Perhaps I am misinterpreting the authors' intent behind this
>statement?
Al and I specifically discussed this comment in detail in Taipei and BMWG
will
not specify any mandated configurations.   The document specifically
states in the setup
section that users should configure the DUT's in a manner consistent
with their end use of that equipment.  It should be well documented
to allow for a vendor's customer to reproduce that configuration.
It seems as though you're saying that by not specifying one of the likely
infinite combination of
configuration options, the draft is rendered useless, I must respectfully
disagree.  

>Also, add "Wireless LAN Controllers" to the list of devices in the Scope.
>Some of
>these devices support ACL capabilities that can be configured to perform
>application-layer snooping and IDS attack suppression.
Done.

>4. In Section 3, Test Setup, it states that the DUT configuration MUST be
>published
>with the results. In Section 2, Scope, it states that the DUT
>configuration SHOULD
>be published with the results. Which is it to be?
All have been converted to SHOULD's from other feedback, addressed in
the -00 version

>5. With regard to Section 3.3, Traffic Generation Requirements: the draft
>does not
>do the BMWG community any favors by refusing to specify a set of traffic
>mixes. These
>aren't particularly hard to come up with - for example, Appendix A of the
>draft has
>a good start. However, their absence allows vendors free rein to indulge
>in whatever
>"specmanship" they choose, by selecting an application traffic mix best
>suited to
>their devices. Further, the lack of specified application traffic mixes
>means that
>results published by different vendors cannot be compared, which defeats
>a key reason
>for standardizing benchmark testing.
This was actually suggested by multiple people to create a 'example'
mix. That was added in the -00 version.  While not specifying the mix
allows vendors "free reign" to indulge in whatever mix best suits
them, they would at least have to defend that traffic mix to the
users of the methodology.  A defined example mix is already defended
for them, which they can then optimize for anyway.  I think we're
looking at extensive pros/cons regardless of which way we go.

>If the specification of application traffic mixes is to remain outside
>the scope of
>the draft, then it should be explicitly stated in the Scope section that
>the purpose
>of the draft is to ONLY permit an end-user to compare different devices
>after
>performing his or her own tests from scratch. This is the inevitable
>result of
>encouraging everyone to come up with their own traffic mixes.
Previous discussion covers this I believe.

>I suggest that at least a subset of traffic mixes be specified, and
>adherence to the
>draft requires that everybody MUST publish results using this subset.
>Vendors are free
>to create and publish additional results using additional traffic mixes
>if they so 
>choose.
Great idea.

>6. In Section 3.4, I don't understand what the "concurrent application
>flows" (which
>seems to mean "number of concurrent application-layer connections") has
>to do with the
>discussion that follows. There is no illustration of how the number of
>simultaneously
>established connections has anything to do with physical limits unrelated
>to the
>inherent DUT/SUT capabilities. I can certainly see how bandwidth and
>connections per
>second - referred to here as "application flows per second" - are
>interrelated, but
>what does the total number of connections established over some
>arbitrarily long period
>have to do with anything?
I did not provide an example for this particular case, because I
didn't want to open a can of worms by needing to provide examples for
all permutations of CPS/Concurrent/Tput.  Concurrent flows is not the
number of connections over some arbitrary time, it is the
instantaneous number of connections currently active.  In a constant
bitrate scenario, i.e. SIP/RTP, the application will dictate the
throughput on a per-flow basis.  If the device is only able to track
10 flows, then I'll only be able to get a throughput bitrate of the
sum of the individual flows fixed bitrate.  I can provide this example
in the text if deemed necessary.

>Further, trying to verify that physical links are not being overloaded,
>by measuring
>the throughput of the device, is not going to work. For example, if a
>device with
>Gigabit Ethernet links nevertheless has inherent data throughput
>limitations and can't
>forward packets for beans, then following the requirements in Section 3.4
>would provide
>a false view of the device. (You could end up reporting that the device
>can support an
>excellent connection rate of 100% of its data forwarding capacity, and
>has GigE links,
>whereas in fact the device might only support 10 packets/second of any
>kind of traffic.)
I'm not sure I follow what you're saying here.  Can you provide more
specifics?

>Most of the discussion in Section 3.4 can be summarized in one sentence:
>"For each
>application flow type taken separately, verify that the offered load in
>connections per
>second does not cause the bandwidth of the physical links belonging to
>the DUT/SUT to
>be exceeded." Maybe the authors should consider replacing the section and
>its imposing
>title ("Network Mathematics") with something simpler?


>As a minor point, note that the example talks about a "single-homed
>device", but the
>accompanying figure clearly shows a device with two ports. Also, we are
>discussing
>internetworking devices, but the term "single-homed" is more appropriate
>to hosts.
>Maybe the term "one-armed device" would be more apropos (along with a
>figure to match).
This was modified to single-path.  I think it makes it more clear what
we're saying.

>7. Section 3.5 is very confusing. I understand the necessity for
>tabulating each
>application flow variant. I don't, however, understand the attributes
>being tabulated.
>What is "Flow Size in Bits"? (How does an application layer flow have a
>fixed size
>in bits?) Why is "Percentage" pre-specified as 25%? (And a percentage of
>what?)
>Why does "Destination Port" always have to be 80? (If you are dealing
>with a NAT box,
>how can you expect to control the TCP ports anyway?) Also, why isn't the
>nature
>of the application flow itself specified? There are a LOT more parameters
>required
>than simply the number of bits, the port and the transport protocol.
Instead of simply saying what isn't sufficient here, would you be able
to provide specifics of what you might be looking for?  I'm not sure
how to address your concerns without specific suggestions.  Otherwise
I'm shooting in the dark.

>8. The statement in Section 3.6 about "Device vendors have been known to
>optimize
>the operation of their devices for easily defined patterns" is completely
>at odds
>with the closely following statement "This methodology makes no
>assumptions about
>flow initiation sequence ...". Isn't this an invitation to the
>"specmanship" that
>the draft seeks to expunge?
The point of this section is to further qualify the discussion in RFC
3511 (Section 4.5 - Multiple Client/Server Testing) which qualifies
the requirement of round-robin traffic generation as a MUST.  I'm
attempting to explicitly state that flow initiation should be in a
pseudo-random manner across ingress ports.  Obviously, this wasn't
very clear so I will appropriately update the text in the draft.

>9. With regard to Section 3.7.1, the range of IP addresses noted in RFC
>2544 covers
>only 128K addresses. 128K addresses is unlikely to stress the forwarding
>tables
>of many devices available today. Further, they are all unicast addresses.
>What if
>we wish to test using multicast flows (e.g., RTP video)? It seems
>unnecessary to
>restrict users of this draft to the RFC 2544 IP address range.
With the suggestion from Al, RFC 1918 private address space may also
be used.  This has been added to the draft.

>10. In section 3.7.3, all devices tested MUST be subject to the same TCP
>options
>settings. Otherwise it may be impossible to compare their performance -
>for example,
>when comparing devices having WAN acceleration capabilities.
Ok.

>11. Section 4.1 is better titled "Maximum Application Session
>Establishment Rate".
>(By the way, the Terminology defines the term as pertaining only to TCP
>connection
>rate, so the methodology doesn't jibe with the terminology.)
Ok.

>In the Procedure for this metric, how can one subject the device to 110%
>of the
>expected maximum, if the maximum happens to be governed by the physical
>link rate
>and not the DUT? (See the example in Section 3.4.)
Good point.

>In the Procedure for this section, it is unclear as to what is meant by
>"each
>subsequent iteration beginning at 5% of expected maximum and increasing
>session
>establishment rate to 10% more than the maximum". If we perform, say, 100
>iterations, what is the point of every iteration using exactly the
>same limits?
Each iteration doesn't use the same limits.  The rest of the sentance
you quote says "increasing session establishment rate to 10% more
than the maximum observed from the previous test run."
I'll reword the sentance to make it more clear.

>By the way, how is this metric applicable to a connectionless application
>layer flow
>such as DNS? (And I won't even get into the issue of defining "RTP
>connections"!)
So this is a great question and I believe I have a very good answer.
Luckily, there is prior IETF art regarding 'Flows', as defined in RFC
2724.  This document actually does a very detailed and explicit way
to define 'flows' regardless of transport layer used.  I'll
incorporate their definitions and use cases going forward.

>In subsection 4.1.4.4: Section 3.13 of RFC 1242 does not define anything
>to do with
>latency. (It defines Policy-based Filtering.) I think the authors are
>referring to
>Section 3.8 of RFC 1242. More importantly, it is not clear how a piece of
>test gear
>is to go about measuring latency of an application-layer connection setup
>packet
>using the terminology/procedure defined by RFC 1242 / RFC 2544 for
>continuous
>network-layer data traffic. The draft should specify "application layer
>latency",
>and the procedure to measure it.
I misquoted the section (should be 3.8), but that's really a moot
point.  In getting more clarity from Al, RFC 1242 latency is measured
at RFC 2544 throughput, which doesn't apply here sinc we're not
measuring at RFC 2544 throughput.

>This section also confuses Application Flow Rate and "Session
>Establishment Rate".
>The term "Application Flow Rate" is never defined in the terminology -
>what is it?
>(I suspect they are the same ...)

>12. The whole of Section 4.2 on Application Throughput is broken. As
>noted in the
>comments on the terminology, RFC 1242 defines throughput as "The maximum
>rate at
>which none of the offered frames are dropped by the device". This
>definition is
>zealously defended by BMWG devotees. There is no good reason why this
>definition
>cannot be extended to the application layer. With this in mind, several
>things
>are wrong with Section 4.2:

>a) The concept of "maximum throughput" is nonexistent. Throughput is
>already a
>maximum. You cannot have a maximum of a maximum.

>b) There is no such thing as minimum and average throughput. If you are
>trying to
>measure these things, you are not measuring throughput.

>c) You cannot report packet loss when measuring throughput, because by
>definition
>there isn't any.

>d) The Procedure for Section 4.2 is broken, as it specifies that traffic
>is sent
>at a rate of 30% of the maximum. How can you measure throughput at a
>fixed rate?

>I think the authors are trying to specify forwarding rate or goodput,
>rather than
>a throughput measurement. If so, use the term "forwarding rate", not
>"throughput".
>(Even as a forwarding rate measurement, there are a number of things
>wrong with
>Section 4.2 - but redefining it would be a start.)

>13. Section 4.3 is rather underspecified. Basically, it enables anyone to
>run any
>kind of malicious traffic - in fact, any kind of traffic at all, as the
>definition
>of "malicious" is in the eye of the beholder - and report benchmark
>results. The
>level of malicious traffic is not specified (is it one attack per second?
>one attack
>per hour?). This allows vendors to set up a "malicious traffic handling
>test" with
>an imposing array of "attacks" and then report results that are
>impossible to
>compare to any other.
>The draft should at least attempt to itemize and describe certain
>baseline malicious
>traffic types/attacks and loads to be used, so that a common point of
>comparison can
>be obtained. The benchmarker is free to use additional attacks and
>traffic loads, of
>course, but we need some reference to compare with.
>By the way, the sentence in the Procedure ".. should generate malicious
>traffic
>representative of the final network deployment" is very strange. One does
>not add
>a source of malicious traffic into network deployments, as implied here.
I'll leave this discussion to you and Kenneth in your
security effectiveness draft work.  It has since been removed from the
benchmarking methodology.

>14. Section 4.4 suffers from the same problems as Section 4.3: an
>insufficient level
>of specification that leads to impossible-to-compare results. While it is
>nice to
>specify that the tester should "generate malformed traffic at all
>protocol layers",
>what does this statement actually mean? What is to be modified? Should
>all protocol
>layers be concurrently fuzzed in the same packet? If Vendor A chooses to
>modify just
>the TCP port numbers, while Vendor B artfully fuzzes everything including
>the MAC FCS
>in all of his/her malformed traffic (causing every packet to be summarily
>dropped by
>the interface hardware), are these results comparable?
This will be replaced with an algorithmic process for creating
malformed/fuzzed traffic.  This will clearly illustrate exactly
what's happening.

>Also, in Section 4.4, no metric values are required to be reported (i.e.,
>with a MUST).
>Thus we can conform to the draft by reporting the test procedure as
>having been
>performed, but there is no need to report any actual results!

>15. The numbers and parameters specified in Appendix A are exceedingly
>confusing. For
>example, what is "Flow Size"? (And how does "Flow Size" have anything to
>do with
>"DNS"?) What is "Flow Percentage" specified in terms of - a percentage of
>the traffic
>in bits/second, a percentage of frames/second, a percentage of total data
>transmitted,
>or what? What exactly is the traffic represented by "Web 1kB"? Is it an
>HTTP Get, an
>HTTP Post, or something else altogether?

>These parameters resemble something that would be picked from some test
>equipment
>GUI screen, rather than a precise specification of traffic. The level of
>"marketology"
>is high. I suggest trading in the "Web 1kB" tags for something more
>explicit, such
>as "HTTP Get Transactions Returning 1 kB Sized HTML Blocks" ... well,
>maybe not quite
>so verbose, but you get the idea.
I'm not sure how the "marketology" is high.  I'm not aware of any
tools that use this terminology.  Regardless, when I complete the
migration to the algorithmic application generation, the titles in
the table won't have much meaning, other than specifying the input to
the algorithm.


>16. The draft contains numerous typos and spelling errors (for example,
>"understan",
>"phenomina", "simgle-homed", "Configruation"). It should be run through a
>spell checker.
These have long been fixed for several drafts.

>17. The draft does not reference the accompanying terminology draft at
>any point.
>Instead, virtually all of the references are the terminology RFC for
>firewall
>benchmarking (RFC 2647). Given that, the value of the terminology draft
>is doubtful.
The terminology is still out of date, and will get settled in as the
methodology gets more fleshed out.




-----Original Message-----
From: <internet-drafts@ietf.org>
Date: Mon, 12 Mar 2012 12:43:36 -0700
To: <i-d-announce@ietf.org>
Cc: <bmwg@ietf.org>
Subject: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-meth-01.txt

>
>A New Internet-Draft is available from the on-line Internet-Drafts
>directories. This draft is a work item of the Benchmarking Methodology
>Working Group of the IETF.
>
>	Title           : Benchmarking Methodology for Content-Aware Network
>Devices
>	Author(s)       : Mike Hamilton
>                          Sarah Banks
>	Filename        : draft-ietf-bmwg-ca-bench-meth-01.txt
>	Pages           : 19
>	Date            : 2012-03-12
>
>   This document defines a set of test scenarios and metrics that can be
>   used to benchmark content-aware network devices.  The scenarios in
>   the following document are intended to more accurately predict the
>   performance of these devices when subjected to dynamic traffic
>   patterns.  This document will operate within the constraints of the
>   Benchmarking Working Group charter, namely black box characterization
>   in a laboratory environment.
>
>
>A URL for this Internet-Draft is:
>http://www.ietf.org/internet-drafts/draft-ietf-bmwg-ca-bench-meth-01.txt
>
>Internet-Drafts are also available by anonymous FTP at:
>ftp://ftp.ietf.org/internet-drafts/
>
>This Internet-Draft can be retrieved at:
>ftp://ftp.ietf.org/internet-drafts/draft-ietf-bmwg-ca-bench-meth-01.txt
>
>_______________________________________________
>bmwg mailing list
>bmwg@ietf.org
>https://www.ietf.org/mailman/listinfo/bmwg

[bmwg] I-D Action: draft-ietf-bmwg-ca-bench-meth-… internet-drafts
Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-m… Mike Hamilton
Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-m… Ilya Varlashkin
Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-m… Aamer Akhter (aakhter)
Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-m… Tom Alexander
Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-m… Mike Hamilton
Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-m… Tom Alexander
Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-m… Al Morton
Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-m… Chuck McAuley
Re: [bmwg] I-D Action: draft-ietf-bmwg-ca-bench-m… Al Morton