Re: [bmwg] Benchmarking MPLS Forwarding

Curtis Villamizar <curtis@occnc.com> Mon, 10 December 2012 20:59 UTC

Message-Id: <201212102058.qBAKwNo8046934@gateway1.orleans.occnc.com>
To: "Jay Karthik (jakarthi)" <jakarthi@cisco.com>
From: Curtis Villamizar <curtis@occnc.com>
In-reply-to: Your message of "Wed, 28 Nov 2012 02:18:53 GMT." <C6009DF6226EF44B90DD04DC2DFF768B1509971F@xmb-aln-x10.cisco.com>
Date: Mon, 10 Dec 2012 15:58:22 -0500
Cc: Al Morton <acmorton@att.com>, "bmwg@ietf.org" <bmwg@ietf.org>
Subject: Re: [bmwg] Benchmarking MPLS Forwarding
Precedence: list
Reply-To: curtis@occnc.com

Jay, Al,

Thanks for looking at the mpls-forwarding draft.  Jay - thank you for
the comments.

See inline.

Curtis

In message <C6009DF6226EF44B90DD04DC2DFF768B1509971F@xmb-aln-x10.cisco.com>
"Jay Karthik (jakarthi)" writes:
> 
> Al,
>  
> I took a look at this draft and in my opinion, most (if not all) of
> the clarifications in section 1.1 would need to be benchmarked
> appropriately.  Many of these scenarios described are beyond the scope
> of previous work (MPLS Forwarding Compliance and Performance
> Requirements/RFC5695) which focused on label stack depth of 1.

Label stack depth is one variable that has not been considered so far
in BMWG.

Another is a look at packet forwarding rate vs packet size with a fine
granularity of packet size.  When doing so, an sawtooth is observed.
In my experience, the sawtooth is almost always due to memory width
and an optimization toward packets of with 64 byte payloads (payload
padded in ethernet, plus header yielding on the order of 84 byte
times) and then multiples of some power of two, often 64.  It is
common to store an internal representation of destination in a
relative small number of bytes, such as 16, and then store the payload
separately in other banks of RAM.  The result is that the worst
performance is that experienced by packts with payloads that are a
little larger than 64 bytes.  This is bad news if carrying something
like pseudowires that may have small payloads, but a little larger
than 64 bytes.

> The entire section 4 (Forwarding Compliance and Performance Testing)
> could potentially be a new work item.

I agree but also think it might be worth breaking this up functionally
into more than one work item.  There has also been discussion off list
aimed at expanding on the pseudowire requirements and adding mpls-tp
oam forwarding requirements (currently the impact on POP and counters
is called out, but not the OAM itself).

It is nice to hear that there is interest in BMWG.  It might be best
though to wait for this work to settle a bit in MPLS so as not to be
working toward too much of a moving target.

> Here are some comments on draft-villamizar-mpls-forwarding-00.

I'm reading -01-preview1, but I think it is the same here.

> If T#1 in section 4 fails, the max supported depth should be
> determined for a given implementation.

The current text says "While packets with more than a dozen label
entriess are unlikely to be used in any practical scenario today, it
is useful to know if limitations exists."

So we appear to be in full agreement.  If this is not clear enough,
please suggest new wording.

> Subcase T#3b needs to be modified. While benchmarking, a device is
> either a tester/tool or a DUT. Specifically, the procedure described
> should apply to any DUT and not a router that has certain capability.

I think the issue is the sentence "One way to accomplish this is to
use a router with higher priority set on the interfaces on which small
packets are sent to it."  That router is not the DUT.  It is another
router front ending the DUT and providing the merged flow of traffic.
I can make that more clear by moving the entire suggested test setup
to an appendix with an ascii line drawing and referring to it from the
main body of the document.

The point here is that router A (front end) gets packets on more than
one interface and sends them out one interface.  Router B, the DUT
receives the packets complete with bursts of small packets on one
interface.  Router A has two or more forwarding engines working on the
problem and therefore is more likely to not bog down the its
forwarding engines.  The output side of router A could also be a
bottleneck but only if the issue is purely output side buffering and
is not related to the decision engine.  Clearly more than a paragraph
in the main part of the document is needed.

> T#5 is a generic functional validation that could apply to most cases
> (eg T#1). Same comment applies to other cases, which validates
> functionality or tests conformance (eg T#10). These tests could be
> part of the validation procedure but may not represent a benchmark.

The point in T#5 is that we've seen equipment in which the on-chip
counters are small enough that if all traffic went to the same LSP,
the counters would have to be polled in under 2 seconds, under 1
second if you prefer to avoid a rollover by half the counter capacity.
Since you don't know apriori which LSP is going to have a long burst
of activity, a brute force solution would require polling all of the
counters every second to update the 64 bit external DRAM version.
Some architectures are close to unworkable due to very small on chip
counters.  This really is a performance issue.  There is no reason
that the on-chip counters must be 64 bit, as long as there is an
efficient way to offload the counters and get them into 64 bit
counters in DRAM for the purpose of returning MIB values.

I can see your argument that T#10 is really a basic conformance test,
however it is now under "Multipath Capabilities and Performance" and
while it is conformance, it is conformance specific to multipath.  I
think T#10 is fine where it is.

> A few more test scenarios could be created to measure the max label
> stack size with PW control word, flow label and Entropy label. This is
> different from T#1 where the focus is to measure in general, the
> maximum stack size supported by DUT, instead of characterizing the max
> supported in different scenarios.

Andy Malis and I have had a long email thread off list.  Expect to see
a lot more in the 01 draft on pseudowire.

> -Jay
>  
> On 11/8/12 9:51 AM, "Al Morton" <acmorton@att.com> wrote:
>  
> >Folks,
> >
> >Curtis suggested I take a look at this (I did),
> >http://tools.ietf.org/html/draft-villamizar-mpls-forwarding-00#page-11
> >and I think others should take a look, too.
> >
> >Food for thought (at least) as we consider new work.
> >
> >regards,
> >Al
> >bmwg chair 

Thanks again for the comments.  It would be best to keep this on the
MPLS mailing list for now and return to BMWG later.  That said, the
MPLS WG list has been silent on this but I have been getting quite a
few comments in private email.  (Mostly from service providers.)

Curtis

[bmwg] Benchmarking MPLS Forwarding Al Morton
Re: [bmwg] Benchmarking MPLS Forwarding Jay Karthik (jakarthi)
Re: [bmwg] Benchmarking MPLS Forwarding Al Morton
Re: [bmwg] Benchmarking MPLS Forwarding Curtis Villamizar
Re: [bmwg] Benchmarking MPLS Forwarding Carlos Pignataro (cpignata)
Re: [bmwg] Benchmarking MPLS Forwarding Curtis Villamizar
Re: [bmwg] Benchmarking MPLS Forwarding Carlos Pignataro (cpignata)
Re: [bmwg] Benchmarking MPLS Forwarding Curtis Villamizar