Re: [bmwg] An Upgrade to Benchmarking Methodology for Network Interconnect Devices -- Fwd: New Version Notification for draft-lencse-bmwg-rfc2544-bis-00.txt

"MORTON, ALFRED C (AL)" <acm@research.att.com> Sat, 20 June 2020 19:41 UTC

From: "MORTON, ALFRED C (AL)" <acm@research.att.com>
To: Lencse Gábor <lencse@hit.bme.hu>, "bmwg@ietf.org" <bmwg@ietf.org>
Thread-Topic: [bmwg] An Upgrade to Benchmarking Methodology for Network Interconnect Devices -- Fwd: New Version Notification for draft-lencse-bmwg-rfc2544-bis-00.txt
Thread-Index: AQHWLnmtoOX0Lok0+0aBCQF6UrSJLai0fxFAgAFiYQD///yb0IAN+lwAgB43J0A=
Date: Sat, 20 Jun 2020 19:41:18 +0000
Message-ID: <4D7F4AD313D3FC43A053B309F97543CF0108A66F2C@njmtexg5.research.att.com>
References: <158995996438.13925.2934780472900149847@ietfa.amsl.com> <14002442-9713-d474-8012-bca5dcd6976c@hit.bme.hu> <4D7F4AD313D3FC43A053B309F97543CF0108A5BA22@njmtexg5.research.att.com> <598e85fd-cf9b-1cdd-61c0-3a76623145f9@hit.bme.hu> <4D7F4AD313D3FC43A053B309F97543CF0108A5BC52@njmtexg5.research.att.com> <1c81f904-bb24-5f42-2ac4-919913fddf8a@hit.bme.hu>
In-Reply-To: <1c81f904-bb24-5f42-2ac4-919913fddf8a@hit.bme.hu>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative; boundary="_000_4D7F4AD313D3FC43A053B309F97543CF0108A66F2Cnjmtexg5resea_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/bmwg/5rrHkSlhVGWUuLNHsnvxSaLUEfQ>
Subject: Re: [bmwg] An Upgrade to Benchmarking Methodology for Network Interconnect Devices -- Fwd: New Version Notification for draft-lencse-bmwg-rfc2544-bis-00.txt
Precedence: list

Hi Gábor,

Please see long-delayed replies below: Sorry !
I hope the replies are still useful!

Al

From: bmwg [mailto:bmwg-bounces@ietf.org] On Behalf Of Lencse Gábor
Sent: Monday, June 1, 2020 5:44 AM
To: bmwg@ietf.org
Subject: Re: [bmwg] An Upgrade to Benchmarking Methodology for Network Interconnect Devices -- Fwd: New Version Notification for draft-lencse-bmwg-rfc2544-bis-00.txt

Dear Al,

Thank you very much for your reply. Please see my answers inline. (I keep only those parts of the text that I reply to.)
2020.05.23. 19:40 keltezéssel, MORTON, ALFRED C (AL) írta:

As for the non-overlapping areas of our draft and the documents you

cited, I have not found anything in them about our suggestion for

"Improved Throughput and Frame Loss Rate Measurement Procedures using

Individual Frame Timeout".

If you think this one joins well into your efforts to update RFC 2544,

then we could focus on this one first, and deal with some others later

one by one in (in different documents).

What do you think of it?

[acm]

Using a constant frame timeout for declaration of Loss (and delay) is

much like the IPPM WG metrics and methods (see RFC 7679 and RFC 7680)

for the production network measurements (Tmax). A constant waiting time for

frames to arrive at the receiver simply excludes frames on an on-going

basis. RFC 2544 has a waiting time at the end of the trial, where the

tester must wait for 2 seconds for buffers to clear but the first frame

sent has the entire trial duration+2sec to arrive.

Yes, this is why I feel that some of the frames we count as "received" may be completely useless for the applications.
[acm]
Since we are testing a small part of the end-2-end path between users and their applications (a single DUT in the network, or a few DUTs in a SUT), we have to be careful when setting thresholds for useless frames, both loss and delay.

Perhaps the biggest potential change to the Throughput definition is

whether or not we demand frame correspondence between sender and receiver,

so that we can calculate one-way delay. RFC 2544 and ETSI NFV TST009 only

require equal send and receive frame counts to satisfy the zero loss

criteria in the Throughput benchmark definition. But in ETSI NFV TST009,

the Capacity at X% Loss Ratio metric-variant definition begins to infer

frame correspondence. Later, the definitions of (one-way) Latency and

Delay Variation and Loss allow a Sample Filter, which could be a constant

maximum time-out for individual frames, as you suggest. Note that a Sample

Filter could be applied in post-measurement processing, assuming all the

delays are available.

I have implemented siitperf-pdv (part of siitpef, available from: https://github.com/lencsegabor/siitperf<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_lencsegabor_siitperf&d=DwMDaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=OfsSu8kTIltVyD1oL72cBw&m=i-yejWHFzdhCqrAxYUSaq-iSfAYq4sR8cvP-uc5Yjco&s=ywCqqfJV6g6Mf2GypEyUWw3GF4NrykY1O2EMRKoIktY&e=> ) in a way that depending on the value of its last parameter (frame timeout), it can do the post-processing in two different ways:
- If the value of frame timeout is 0, then proper PDV measurement is done.
- If the value of frame timeout is higher than zero, then rather a special throughput (or frame loss rate) measurement is performed, where the tester checks frame timeout for each frame individually: if the measured delay of a frame is longer than the timeout, then the frame is reclassified as lost.

But the real question in my mind now, after looking into the possibilities

to take frame correspondence with a fixed timeout into account, is whether

we can find some examples where adding timeout criteria makes a significant

difference for Throughput measurements that would not be accounted for by

revised/modernized Latency and new Delay Variation Benchmarks and

metric variants?

With other words (as my co-author, Keiichi Shima expressed):

"Can defining a per-frame timeout test provide more sense than the delay variation test?"

IMHO, we can answer "yes".

Let us consider the following example. A delay sensitive application can tolerate at most 100ms delay and 0.01% frame loss. (The "lost" frames include also frames with higher than 100ms delay.)

Section 7.3.1. of RFC 8219 ( https://tools.ietf.org/html/rfc8219#section-7.3.1<https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc8219-23section-2D7.3.1&d=DwMDaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=OfsSu8kTIltVyD1oL72cBw&m=i-yejWHFzdhCqrAxYUSaq-iSfAYq4sR8cvP-uc5Yjco&s=eF00ug97jcGIrilQQom5RBsDOfrRiTj4bPNkEKrREGw&e=>) defines PDV as follows:

   PDV = D99.9thPercentile - Dmin

   Where:

   o  D99.9thPercentile = the 99.9th percentile (as described in

      [RFC5481<https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc5481&d=DwMDaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=OfsSu8kTIltVyD1oL72cBw&m=i-yejWHFzdhCqrAxYUSaq-iSfAYq4sR8cvP-uc5Yjco&s=CAU9on15yDMbAzOQUl8zVEwPzoZMwnEXL0lUNCFdElU&e=>]) of the one-way delay for the stream

   o  Dmin = the minimum one-way delay in the stream

There can be two problems with using PDV:
- As  PDV uses 99.9th percentile, and even if we measure PDV as 99ms, it gives a guarantee for 99.9% of the frames and 0.1% of them may have higher than 100ms latency, whereas our system tolerates only 0.01% frame loss (including frames with higher latency than 100ms)
- Dmin is subtracted from the 99.9th percentile, thus the final result of the PDV measurement is not exactly, what we need.
[acm]
I think that you need to evaluate the one-way delay distribution w.r.t. the application threshold of 100ms in your example. This means that any packet with one-way delay >100ms would be declared lost by enforcing the fixed time-out.  We don’t need to know the PDV to perform these steps, if just a simple evaluation of one-way delay against a threshold.

IMHO, our suggested method can provide a better solution:
- The frame timeout parameter of siitperf-pdv should be set to 100ms
- The bash shell script that performs the binary search (and executes siitperf-pdv in every single step) should allow 0.01% frame loss.
Thus the measurement can be easily performed.

What do you think of it?
[acm]
So, as I alluded-to up-front, it is very hard to set application-specific thresholds in a useful way when benchmarking a single DUT.  We don’t know what the rest of the network will look like, and therefore how much additional delay and loss will be contributed by the “rest of the network”.
I think this is why we have always chosen Simple justifications for Benchmark thresholds, like zero loss, but it’s ok to investigate around these thresholds, and it sounds as though you will add that capability to your test tool

Here’s another point: If you are testing a virtualized network/DUT, you may see substantial benefit from the more advanced Binary Search with Loss Verification in ETSI NFV TST009. The justification appears in the text (clause 12.3.3) and the associated Annexes B and C. See [0]

hope this helps,
Al (participant)

[0] https://docbox.etsi.org/ISG/NFV/Open/Drafts/TST009ed341/NFV-TST009ed341v341_001.docx

Best regards,

Gábor

[bmwg] An Upgrade to Benchmarking Methodology for… Lencse Gábor
Re: [bmwg] An Upgrade to Benchmarking Methodology… MORTON, ALFRED C (AL)
Re: [bmwg] An Upgrade to Benchmarking Methodology… Lencse Gábor
Re: [bmwg] An Upgrade to Benchmarking Methodology… MORTON, ALFRED C (AL)
Re: [bmwg] An Upgrade to Benchmarking Methodology… tom petch
Re: [bmwg] An Upgrade to Benchmarking Methodology… Lencse Gábor
[bmwg] Question regarding source and destination … Gábor LENCSE
Re: [bmwg] Question regarding source and destinat… MORTON, ALFRED C (AL)
Re: [bmwg] An Upgrade to Benchmarking Methodology… MORTON, ALFRED C (AL)
Re: [bmwg] Question regarding source and destinat… Lencse Gábor