Re: [bmwg] An Upgrade to Benchmarking Methodology for Network Interconnect Devices -- Fwd: New Version Notification for draft-lencse-bmwg-rfc2544-bis-00.txt

"MORTON, ALFRED C (AL)" <acm@research.att.com> Sat, 20 June 2020 19:41 UTC

Return-Path: <acm@research.att.com>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0421B3A0936 for <bmwg@ietfa.amsl.com>; Sat, 20 Jun 2020 12:41:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.799
X-Spam-Level:
X-Spam-Status: No, score=-1.799 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MYkCj2p_bS9C for <bmwg@ietfa.amsl.com>; Sat, 20 Jun 2020 12:41:45 -0700 (PDT)
Received: from mx0a-00191d01.pphosted.com (mx0b-00191d01.pphosted.com [67.231.157.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9F8153A0935 for <bmwg@ietf.org>; Sat, 20 Jun 2020 12:41:45 -0700 (PDT)
Received: from pps.filterd (m0049462.ppops.net [127.0.0.1]) by m0049462.ppops.net-00191d01. (8.16.0.42/8.16.0.42) with SMTP id 05KJWKxo036764; Sat, 20 Jun 2020 15:41:34 -0400
Received: from tlpd255.enaf.dadc.sbc.com (sbcsmtp3.sbc.com [144.160.112.28]) by m0049462.ppops.net-00191d01. with ESMTP id 31sbqq8htd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 20 Jun 2020 15:41:34 -0400
Received: from enaf.dadc.sbc.com (localhost [127.0.0.1]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id 05KJfXfQ075119; Sat, 20 Jun 2020 14:41:33 -0500
Received: from zlp30496.vci.att.com (zlp30496.vci.att.com [135.46.181.157]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id 05KJfSq4074941 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 20 Jun 2020 14:41:28 -0500
Received: from zlp30496.vci.att.com (zlp30496.vci.att.com [127.0.0.1]) by zlp30496.vci.att.com (Service) with ESMTP id 6F046403A425; Sat, 20 Jun 2020 19:41:28 +0000 (GMT)
Received: from clph811.sldc.sbc.com (unknown [135.41.107.12]) by zlp30496.vci.att.com (Service) with ESMTP id 4750B403A422; Sat, 20 Jun 2020 19:41:28 +0000 (GMT)
Received: from sldc.sbc.com (localhost [127.0.0.1]) by clph811.sldc.sbc.com (8.14.5/8.14.5) with ESMTP id 05KJfSQO054365; Sat, 20 Jun 2020 14:41:28 -0500
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.178.11]) by clph811.sldc.sbc.com (8.14.5/8.14.5) with ESMTP id 05KJfKt5053964; Sat, 20 Jun 2020 14:41:20 -0500
Received: from exchange.research.att.com (njbdcas1.research.att.com [135.197.255.61]) by mail-blue.research.att.com (Postfix) with ESMTPS id 3E6E910A1935; Sat, 20 Jun 2020 15:41:19 -0400 (EDT)
Received: from njmtexg5.research.att.com ([fe80::b09c:ff13:4487:78b6]) by njbdcas1.research.att.com ([fe80::8c6b:4b77:618f:9a01%11]) with mapi id 14.03.0468.000; Sat, 20 Jun 2020 15:41:19 -0400
From: "MORTON, ALFRED C (AL)" <acm@research.att.com>
To: Lencse Gábor <lencse@hit.bme.hu>, "bmwg@ietf.org" <bmwg@ietf.org>
Thread-Topic: [bmwg] An Upgrade to Benchmarking Methodology for Network Interconnect Devices -- Fwd: New Version Notification for draft-lencse-bmwg-rfc2544-bis-00.txt
Thread-Index: AQHWLnmtoOX0Lok0+0aBCQF6UrSJLai0fxFAgAFiYQD///yb0IAN+lwAgB43J0A=
Date: Sat, 20 Jun 2020 19:41:18 +0000
Message-ID: <4D7F4AD313D3FC43A053B309F97543CF0108A66F2C@njmtexg5.research.att.com>
References: <158995996438.13925.2934780472900149847@ietfa.amsl.com> <14002442-9713-d474-8012-bca5dcd6976c@hit.bme.hu> <4D7F4AD313D3FC43A053B309F97543CF0108A5BA22@njmtexg5.research.att.com> <598e85fd-cf9b-1cdd-61c0-3a76623145f9@hit.bme.hu> <4D7F4AD313D3FC43A053B309F97543CF0108A5BC52@njmtexg5.research.att.com> <1c81f904-bb24-5f42-2ac4-919913fddf8a@hit.bme.hu>
In-Reply-To: <1c81f904-bb24-5f42-2ac4-919913fddf8a@hit.bme.hu>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [69.141.203.172]
Content-Type: multipart/alternative; boundary="_000_4D7F4AD313D3FC43A053B309F97543CF0108A66F2Cnjmtexg5resea_"
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-20_09:2020-06-19, 2020-06-20 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_policy_notspam policy=outbound_policy score=0 malwarescore=0 mlxlogscore=999 lowpriorityscore=0 mlxscore=0 spamscore=0 suspectscore=0 adultscore=0 bulkscore=0 priorityscore=1501 clxscore=1015 impostorscore=0 cotscore=-2147483648 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006200144
Archived-At: <https://mailarchive.ietf.org/arch/msg/bmwg/5rrHkSlhVGWUuLNHsnvxSaLUEfQ>
Subject: Re: [bmwg] An Upgrade to Benchmarking Methodology for Network Interconnect Devices -- Fwd: New Version Notification for draft-lencse-bmwg-rfc2544-bis-00.txt
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Jun 2020 19:41:48 -0000

Hi Gábor,

Please see long-delayed replies below: Sorry !
I hope the replies are still useful!

Al

From: bmwg [mailto:bmwg-bounces@ietf.org] On Behalf Of Lencse Gábor
Sent: Monday, June 1, 2020 5:44 AM
To: bmwg@ietf.org
Subject: Re: [bmwg] An Upgrade to Benchmarking Methodology for Network Interconnect Devices -- Fwd: New Version Notification for draft-lencse-bmwg-rfc2544-bis-00.txt

Dear Al,

Thank you very much for your reply. Please see my answers inline. (I keep only those parts of the text that I reply to.)
2020.05.23. 19:40 keltezéssel, MORTON, ALFRED C (AL) írta:





As for the non-overlapping areas of our draft and the documents you

cited, I have not found anything in them about our suggestion for

"Improved Throughput and Frame Loss Rate Measurement Procedures using

Individual Frame Timeout".



If you think this one joins well into your efforts to update RFC 2544,

then we could focus on this one first, and deal with some others later

one by one in (in different documents).



What do you think of it?

[acm]

Using a constant frame timeout for declaration of Loss (and delay) is

much like the IPPM WG metrics and methods (see RFC 7679 and RFC 7680)

for the production network measurements (Tmax). A constant waiting time for

frames to arrive at the receiver simply excludes frames on an on-going

basis. RFC 2544 has a waiting time at the end of the trial, where the

tester must wait for 2 seconds for buffers to clear but the first frame

sent has the entire trial duration+2sec to arrive.



Yes, this is why I feel that some of the frames we count as "received" may be completely useless for the applications.
[acm]
Since we are testing a small part of the end-2-end path between users and their applications (a single DUT in the network, or a few DUTs in a SUT), we have to be careful when setting thresholds for useless frames, both loss and delay.




Perhaps the biggest potential change to the Throughput definition is

whether or not we demand frame correspondence between sender and receiver,

so that we can calculate one-way delay. RFC 2544 and ETSI NFV TST009 only

require equal send and receive frame counts to satisfy the zero loss

criteria in the Throughput benchmark definition. But in ETSI NFV TST009,

the Capacity at X% Loss Ratio metric-variant definition begins to infer

frame correspondence. Later, the definitions of (one-way) Latency and

Delay Variation and Loss allow a Sample Filter, which could be a constant

maximum time-out for individual frames, as you suggest. Note that a Sample

Filter could be applied in post-measurement processing, assuming all the

delays are available.

I have implemented siitperf-pdv (part of siitpef, available from: https://github.com/lencsegabor/siitperf<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_lencsegabor_siitperf&d=DwMDaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=OfsSu8kTIltVyD1oL72cBw&m=i-yejWHFzdhCqrAxYUSaq-iSfAYq4sR8cvP-uc5Yjco&s=ywCqqfJV6g6Mf2GypEyUWw3GF4NrykY1O2EMRKoIktY&e=> ) in a way that depending on the value of its last parameter (frame timeout), it can do the post-processing in two different ways:
- If the value of frame timeout is 0, then proper PDV measurement is done.
- If the value of frame timeout is higher than zero, then rather a special throughput (or frame loss rate) measurement is performed, where the tester checks frame timeout for each frame individually: if the measured delay of a frame is longer than the timeout, then the frame is reclassified as lost.



But the real question in my mind now, after looking into the possibilities

to take frame correspondence with a fixed timeout into account, is whether

we can find some examples where adding timeout criteria makes a significant

difference for Throughput measurements that would not be accounted for by

revised/modernized Latency and new Delay Variation Benchmarks and

metric variants?

With other words (as my co-author, Keiichi Shima expressed):

"Can defining a per-frame timeout test provide more sense than the delay variation test?"

IMHO, we can answer "yes".

Let us consider the following example. A delay sensitive application can tolerate at most 100ms delay and 0.01% frame loss. (The "lost" frames include also frames with higher than 100ms delay.)

Section 7.3.1. of RFC 8219 ( https://tools.ietf.org/html/rfc8219#section-7.3.1<https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc8219-23section-2D7.3.1&d=DwMDaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=OfsSu8kTIltVyD1oL72cBw&m=i-yejWHFzdhCqrAxYUSaq-iSfAYq4sR8cvP-uc5Yjco&s=eF00ug97jcGIrilQQom5RBsDOfrRiTj4bPNkEKrREGw&e=>) defines PDV as follows:

   PDV = D99.9thPercentile - Dmin



   Where:



   o  D99.9thPercentile = the 99.9th percentile (as described in

      [RFC5481<https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc5481&d=DwMDaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=OfsSu8kTIltVyD1oL72cBw&m=i-yejWHFzdhCqrAxYUSaq-iSfAYq4sR8cvP-uc5Yjco&s=CAU9on15yDMbAzOQUl8zVEwPzoZMwnEXL0lUNCFdElU&e=>]) of the one-way delay for the stream



   o  Dmin = the minimum one-way delay in the stream

There can be two problems with using PDV:
- As  PDV uses 99.9th percentile, and even if we measure PDV as 99ms, it gives a guarantee for 99.9% of the frames and 0.1% of them may have higher than 100ms latency, whereas our system tolerates only 0.01% frame loss (including frames with higher latency than 100ms)
- Dmin is subtracted from the 99.9th percentile, thus the final result of the PDV measurement is not exactly, what we need.
[acm]
I think that you need to evaluate the one-way delay distribution w.r.t. the application threshold of 100ms in your example. This means that any packet with one-way delay >100ms would be declared lost by enforcing the fixed time-out.  We don’t need to know the PDV to perform these steps, if just a simple evaluation of one-way delay against a threshold.


IMHO, our suggested method can provide a better solution:
- The frame timeout parameter of siitperf-pdv should be set to 100ms
- The bash shell script that performs the binary search (and executes siitperf-pdv in every single step) should allow 0.01% frame loss.
Thus the measurement can be easily performed.

What do you think of it?
[acm]
So, as I alluded-to up-front, it is very hard to set application-specific thresholds in a useful way when benchmarking a single DUT.  We don’t know what the rest of the network will look like, and therefore how much additional delay and loss will be contributed by the “rest of the network”.
I think this is why we have always chosen Simple justifications for Benchmark thresholds, like zero loss, but it’s ok to investigate around these thresholds, and it sounds as though you will add that capability to your test tool

Here’s another point: If you are testing a virtualized network/DUT, you may see substantial benefit from the more advanced Binary Search with Loss Verification in ETSI NFV TST009. The justification appears in the text (clause 12.3.3) and the associated Annexes B and C. See [0]

hope this helps,
Al (participant)

[0] https://docbox.etsi.org/ISG/NFV/Open/Drafts/TST009ed341/NFV-TST009ed341v341_001.docx



Best regards,

Gábor