Re: [bmwg] An improved version of our I-D: draft-lencse-bmwg-benchmarking-stateful-02

Gábor LENCSE <lencse@hit.bme.hu> Sun, 07 November 2021 21:39 UTC

Content-Type: multipart/alternative; boundary="------------79kVzqy7SqQJEcmmgEyx4nWG"
Message-ID: <4258dd4b-62a8-fc16-4d28-f74440363016@hit.bme.hu>
Date: Sun, 07 Nov 2021 22:39:09 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0
Content-Language: en-US
To: "bmwg@ietf.org" <bmwg@ietf.org>
References: <163388281019.324.12493516140419452318@ietfa.amsl.com> <0b76d24f-90b1-0cff-57a8-d8b3c5ffda92@hit.bme.hu> <CH0PR02MB7980BADC0461D451ADB40D1DD38F9@CH0PR02MB7980.namprd02.prod.outlook.com>
From: Gábor LENCSE <lencse@hit.bme.hu>
In-Reply-To: <CH0PR02MB7980BADC0461D451ADB40D1DD38F9@CH0PR02MB7980.namprd02.prod.outlook.com>
Received-SPF: pass (frogstar.hit.bme.hu: authenticated connection) receiver=frogstar.hit.bme.hu; client-ip=79.121.41.97; helo=[192.168.1.146]; envelope-from=lencse@hit.bme.hu; x-software=spfmilter 2.001 http://www.acme.com/software/spfmilter/ with libspf2-1.2.10;
X-DCC-x.dcc-servers-Metrics: frogstar.hit.bme.hu; whitelist
X-Scanned-By: MIMEDefang 2.79 on 152.66.248.44
Archived-At: <https://mailarchive.ietf.org/arch/msg/bmwg/2NDvOIFnLjKtZY2GMcNhBYNR7gY>
X-Mailman-Approved-At: Sun, 07 Nov 2021 18:59:21 -0800
Subject: Re: [bmwg] An improved version of our I-D: draft-lencse-bmwg-benchmarking-stateful-02
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Nov 2021 21:39:39 -0000

Dear Al,

Thank you very much for your comments! And special thanks for sending 
them to me in advance, thus giving me time to think about them. :-)

First of all, let me mention that at the time of writing of the "02" 
version of the draft I had experience only with iptables (stateful 
NAT44). Since then, I have some experience with Jool (stateful NAT64).


11/6/2021 9:10 PM keltezéssel, MORTON JR., AL írta:
>
> Hi Gábor,
>
> Thanks for your recent updates to the “stateful” GW benchmarking 
> draft, and for sharing your test results with BMWG and v6OPS mailing 
> lists.
>
> I have a few comments, as a participant, and we can discuss during the 
> IETF-112 session:
>
> Sec4.2
>
> It seems that the preliminary test phase establishes the size of the 
> state table in the DUT.
>
> Q: (note to check – is the state table capacity mentioned as a 
> benchmark, later in the memo?
>

It is not mentioned as a benchmark. As for iptables, I can tune the size 
of the connection tracking table nearly arbitrarily, it seems that the 
memory capacity is the only limit. (Thus I did not feel it necessary at 
the time of writing version "02".)

Details:

I can tune two values:
nf_conntrack_max-- It specifies the maximum of the number of entries in 
the state table.
hashsize-- It specifies the number of entries of the hash table. (I far 
as I remember, all of them are linked list heads. The elements of the 
lists store the parameters of the connections.)

This source https://ixnfo.com/en/tuning-nf_conntrack.html recommends 
hashsize=nf_conntrack_max/8, (what would result in linked lists of 8 
elements on average) and my experience shows that larger hashsize 
ensures higher performance. (When I could, I used 
hashsize=nf_conntrack_max.) The value of hashsize is limited by the size 
of the memory. E.g. having 384GB of RAM, I could set it to 2^28, but I 
could not set it to 2^29. (I got an error message.) With hashsize=2^28, 
I used nf_conntrack_max=2^30, and the memory usage was somewhat below 
300GB. (Using another computer, I once accidentally exhausted its 256GB 
of RAM.)

> No, it is not.Then)
>
> Q: Is the state table capacity measurable? It seems useful to know 
> during testing, at least.
>

As we did not find any parameters, how to tune that of Jool, it may be 
useful to measure it. However, it does not seem to be simple. Why?

When I measured the maximum connection establishment rate, I set both 
the size of the connection tracking table and timeout so that the only 
reason for failure of the test could be that the rate of sending the 
preliminary tests frames (resulting in new connections) was too high.

If I do not know the size of the connection tracking table and I would 
like to measure it, then I would do a binary search as follows:

Send N number of preliminary tests frames with all different source port 
number destination number combinations in the preliminary phase.
Send N number of tests frames in the real test phase only in the 
"reverse" direction (from the Responder to the Initiator) using up all 
the element of the state table of the Tester.

If the frame rate is "not too high" both in the preliminary test phase 
and in the real test phase, and the timeout is large enough, then the 
failure of the test indicates the exhaustion of the connection tracking 
table of the DUT. Thus its size may be determined by a binary search.
(Note: It is not trivial how to ensure the frame rate conditions without 
measuring the maximum connection establishment rate and throughput, 
because their measurement would require the knowledge of the size of the 
connection tracking table.)

> Also:
>
> *The connections do not time out in the DUT even during the
>
> beginning of the real test phase.
>

In fact this statement just "remained" at the end of Section 4.2 from 
version "00" in version "02", but more specific criteria are given in 
Section 4.3. It either should be deleted or rewritten as:

*The connections do not time out in the DUT even during the

the real test phase, please see the details of the timeout

settings in Section 4.3.



> comment: this time-out might be a parameter to configure, or to know 
> and ensure that the time between test phases does not exceed. It’s 
> expressed as a statement of fact above, but it’s really a requirement 
> (to configure) for testing.
>

Yes, it can be ensured as stated in Section 4.3.

> Sec4.3
>
> *setting the UDP timeout of the NATxy gateway to a value higher
>
> than the length of the preliminary phase.
>

This is for achieving the "first extreme situation", which is used for 
measuring the maximum connection establishment rate. (In this case there 
is no real test phase.)


> It seems that this is the minimum timeout, and is only safe if traffic 
> on the first connection established during the preliminary phase 
> appears immediately when the “real” test phase begins.
>
> The “safer” specification of the UDP timeout seems to be:
>
> *setting the UDP timeout of the NATxy gateway to a value higher
>
> than the length of the preliminary phase and the real phase combined.
>

Something similar is written about how to achieve the "second extreme 
situation", which is used for all measurement in the real test phase:

    *  setting the UDP timeout of the NATxy gateway to a value higher
       than the length of the preliminary phase plus the gap between the
       two phases plus the length of the real test phase.


For various reasons, there is a gap between the two phases. (There is a 
timeout at the end of the preliminary phase, plus the transmitters and 
receivers have to be stopped after the end of the preliminary phase and 
started before the real test phase, and that requires nonzero time, 
especially in the case of software testers.)


> (I don’t think Sec4.7 offers the above as a solution, just cautions 
> and the loss problem.)
>

At the time of re-writing of Section 4.3 for version 02, we did not 
check the consistency with Section 4.7.1. Unfortunately Section 4.7.1. 
still reflects the old measurement method. (It used lower timeout value 
than now specified in Section 4.3 for the real test phase.) The first 
paragraph of Section 4.7.1. can be surely deleted due to the solution 
provided in Section 4.3. (And I will think about the rest of Section 
4.7.1, too.)

Even the loss problem presented in Section 4.7.2 is somewhat mitigated 
by the new method using higher timeout. Of course, zero loss is a MUST 
in the preliminary phase, but non-zero loss may be tolerated in the real 
test phase, as now the timeout is set to a high enough value that 
refreshing is not needed during the real test phase.

Lesson learnt: I should have read the entire draft after modifying 
section 4.3 to recognize the inconsistency with the other parts! We will 
correct all these things in the next version.

Thank you very much for your thorough reading!

Best regards,

Gábor

> regards,
>
> Al
>
> (as a participant)
>
>

[bmwg] An improved version of our I-D: draft-lenc… Gábor LENCSE
Re: [bmwg] An improved version of our I-D: draft-… MORTON JR., AL
Re: [bmwg] An improved version of our I-D: draft-… Gábor LENCSE