Re: [bmwg] New version of the draft Methodology for VNF Benchmarking Automation

Steven Van Rossem <> Wed, 13 November 2019 15:43 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A151D12013D for <>; Wed, 13 Nov 2019 07:43:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id qAmeVZN2g_1K for <>; Wed, 13 Nov 2019 07:43:19 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 9BBEA120073 for <>; Wed, 13 Nov 2019 07:43:18 -0800 (PST)
Received: from localhost ( []) by (Postfix) with ESMTP id 6146AEE188; Wed, 13 Nov 2019 16:43:15 +0100 (CET)
X-Virus-Scanned: by UGent DICT
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Rw_eaHO_AxQT; Wed, 13 Nov 2019 16:43:11 +0100 (CET)
Received: from [] (unknown []) (Authenticated sender: by (Postfix) with ESMTPSA id 2940FEE17C; Wed, 13 Nov 2019 16:43:11 +0100 (CET)
From: Steven Van Rossem <>
References: <>
Message-ID: <>
Date: Wed, 13 Nov 2019 16:42:56 +0100
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: multipart/alternative; boundary="------------E09657654DF9BC94098C4D84"
Content-Language: en-US
X-Miltered: at jchkm1 with ID 5DCC248E.002 by Joe's j-chkmail (!
X-j-chkmail-Auth: USER-ID
X-j-chkmail-Enveloppe: 5DCC248E.002 from unknown/unknown/[]/<>
X-j-chkmail-Score: MSGID : 5DCC248E.002 on : j-chkmail score : X : R=. U=. O=## B=0.000 -> S=0.166
X-j-chkmail-Status: Ham
Archived-At: <>
X-Mailman-Approved-At: Wed, 13 Nov 2019 08:44:16 -0800
Subject: Re: [bmwg] New version of the draft Methodology for VNF Benchmarking Automation
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Benchmarking Methodology Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 13 Nov 2019 15:43:24 -0000

Hi Raphael and authors of the vnfbench-draft,

At imec/Ghent University we are happy to see a work on VNF benchmarking 
being proposed at IETF. The work in this draft is very complementary to 
our own research. Actually most of the mentioned functional blocks or 
methodology steps actually coincide very much with our own implementations!

Please consider our learnings and remarks while we were going through 
the draft and the YANG models:

Section 4.2:
We basically follow the same approach during our VNF tests.
Our work in [2] would exemplify a ‘Method’, the paper describes a 
procedure to choose online which VNF parameters to use next in a ‘Test’.

Section 4.2.1 Deployment:
Section 5.1 Deployment Scenarios:
Section 6.1.4 Environment:
In our opinion there should be a clear separation between the initial 
'Deployment' of the benchmarking setup and the description of the 
'actual automated testing'. Currently it seems the initial orchestration 
is an inseparable part of the benchmarking methodology/model, or is this 
a wrong assumption?

Instead, we imagine two distinct steps:

1.All of the VNFs under test, traffic generators, probes, … are deployed 
first using available orchestration methods (e.g. each of these 
functional blocks are containers or VMs).

2.The benchmarking methodology comes in as complementary to the 
MANO/orchestration platform, or a functional layer on top of 
orchestration. For example, exploiting any monitoring framework already 
present in the platform.

Care should be taken that there is no overlap with already existing 
deployment descriptors. Everything related to deployment is probably 
present in the native descriptors of the orchestration platforms 
(Openstack, Kubernetes, MANO using HEAT, TOSCA, YANG, ETSI …).
We would therefore limit the number of parameters in the VNF-BD to only 
these parameters:
- workload or resource allocation settings which are being actively 
modified by the Agents (eg. a range of vCPU allocations). So only list 
the deltas compared to the initial orchestration here.
- benchmark-specific lifecycle configurations of the probers or VNFs, 
which are not possible in the initialorchestration (such as start/stop 
of the test workload).
- configuration settings only known after initial orchestration (e.g. 
allocated IP addresses)

On the other hand, if all needed orchestration info is needed in the 
VNF-BD, is it then expected that an adaptor is present in the 
benchmarking tool, which translates the VNF-BD directives to the api of 
the targeted orchestration platform? It would be interesting to know 
your view on this and elaborate on it in the draft.

*Specific remarks on the VNF-BD model:*

  * Bandwidth can also be included as a resource? Maybe referring to a link?
  * “Number of tests” is redundant information? I think this could be
    derived from the number of different parameters defined in the
    ranges to be tested. In case of adaptive sampling (e.g. [2]), the
    number of tests can even be unknown at the beginning of the test.
    There are many different stop criterions possible for the automated
    benchmarking such as: maximum duration allowed, or minimum
    confidence interval needed, or minimum model accuracy needed.So I
    would not solely consider a fixed benchmarking time, or fixed amount
    of trials/tests, as this can be dynamic...
  * The order (and dependencies) in which the configuration commands are
    executed should be specified somehow? For example: how to define
    that first the resource allocation should be done, then the server
    needs to start and finally the clients can send requests?


*Specific remarks on the VNF-PP model:*

  * The VNF-PP is explained as an output artifact. But where are is the
    input given of which metrics should be included in the VNF-PP in the
    first place?
    I think it can be considered to define all the metrics in the VNF-PP
    as *input* for the benchmarking (this is the way we have done it in
    our work [1] [2]). This way the VNF-PP can define the structure and
    attributes for each metric (such as unit, name, query to get it from
    the monitor database, …)
    As output, the measured values of each defined metric can for
    example be filled in the VNF-PP files.
  * Or is it the intention to define the metrics explicitly in the
    VNF-BD? I currently do not recognize any field in the VNF-BD to
    define a list of metrics to be gathered via the listeners?

Very complementary to the automation of VNF benchmarking, our work in 
[1] proposes that the different metrics/settings during the benchmark 
process, can be grouped into four categories:

- resource allocation metrics (res)
- workload configuration metrics (wl)
- VNF configuration metrics (cfg)
- performance metrics (perf)

Grouping the metrics like this, brings following benefits:

- a more structured approach for the vnf-bd and vnf-pp to list all 
related metrics
- helps to model the resulting measurements into an analytical 
performance prediction model:

f(wl, res, cfg) = perf

where the VNF performance is predicted using the wl, res, cfg 
measurements as independent variables or features. For the independent 
variables (res, wl, cfg) boundary values can be given in the VNF-BD.In 
between those boundaries, generated test traffic for benchmarking should 
then be generated. The creation of such a prediction model, can also 
influence online the benchmarking procedure, by choosing which parameter 
setting to execute in the next test, as is exemplified in [2].

Please consider also following ancillary features, which mitigate the 
overall profiling time:
(I think these extend further the points in 6.3.2 automated execution or 
6.4 particular cases)

  * measurement stability detection:
    detect when metrics have stabilized, then take a sample. This is
    more optimized then a fixed measurement period per workload. [1]
  * parallelization of tests:
    datacenters have often many of the same compute nodes available. To
    speed up the testing, one can run the profiling test on several of
    the same compute nodes, but each a different resource allocation or
    workload. The results can then later be combined. See also our work
    in [2].
  * dynamic sampling approach:
    using the structured definition of metrics, the benchmarking tool
    can decide more optimally and autonomously how to iterate over
    possible workload and resource configurations [2]. Ideally, the
    iteration through different benchmarking tests is steered by  a
    combination of  pre-defined values (included as expert knowledge,
    values which a service developer considers interesting to profile)
    and sample values decided online by the profiling tool. From the
    previous measurements, the tool should be able to propose a next
    workload and resource allocation to sample. I think this relates to
    the 'Method' mentioned under '4.2 Benchmarking Procedures'

Our work in [1], [2] explains and makes use of a ‘lighter weight’ form 
of descriptor. But the main idea and basic structure is similar. We also 
define Agents to actively change configurations, and a metrics 
descriptor similar to the vnf-pp, which lists all the metrics to be 
gathered during the tests. We also make heavily use of custom workflows 
to start/stop the workload and modify resource allocations. But this 
seems handled by the yang model (user-defined key-value parameters would 
be a must in the model, but this seems supported).


Steven Van Rossem -
Wouter Tavernier -

imec / Ghent University

PS: Referenced articles:

  [1] Steven Van Rossem,Wouter Tavernier, Didier Colle and Mario 
Pickavet, Piet Demeester.”Profile-based Resource Allocation for 
Virtualized Network Functions ” /IEEE TNSM journal./
doi: 10.1109/TNSM.2019.2943779

[2] Steven Van Rossem,Wouter Tavernier, Didier Colle and Mario Pickavet, 
Piet Demeester.”Optimized Sampling Strategies to Model the Performance 
of Virtualized Network Functions” /under review /Preprint available at:

On 6/11/2019 20:23, Raphael Vicente Rosa wrote:
> Hi BMWG,
> We had reviews on the draft since the last IETF meeting, as listed below:
> - Improvements based on Luis Contreras comments
> - Structured VNF Performance Profile section
> Associated with the draft content, we have done the following activities:
> - Developed Yang models for VNF-BD and VNF-PP [1]
> - Utilized VNF-BD  in both reference implementations, comparison tests 
> [2] (check in [2] the jupyter notebooks named analysis.ipynb in 
> gym-bench-02/results and upb-vnf-bench-02/results)
> We are going to present the draft updates and next activities in the 
> next IETF/BMWG meeting.
> Best regards,
> The Authors
> [1] 
> [2] 
> A new version of I-D, draft-rosa-bmwg-vnfbench-05.txt
> has been successfully submitted by Raphael Vicente Rosa and posted to the
> IETF repository.
> Name:           draft-rosa-bmwg-vnfbench
> Revision:       05
> Title:          Methodology for VNF Benchmarking Automation
> Document date:  2019-11-04
> Group:          Individual Submission
> Pages:          27
> URL:
> Status:
> Htmlized:
> Htmlized:
> Diff:
> Abstract:
>    This document describes a common methodology for the automated
>    benchmarking of Virtualized Network Functions (VNFs) executed on
>    general-purpose hardware.  Specific cases of automated benchmarking
>    methodologies for particular VNFs can be derived from this document.
>    Two open source reference implementations are reported as running
>    code embodiments of the proposed, automated benchmarking methodology.
> _______________________________________________
> bmwg mailing list

Steven Van Rossem
IMEC - Ghent University
Department of Information Technology (INTEC)
iGent, Technologiepark 126, B-9052 Gent (Zwijnaarde)