RE: [bmwg] Is the BMWG a proper home for this I-D?

<sorry for the blank reply>

At the time the IPPM was "spun off" of the BMWG, the general understanding
was that BMWG would look at "things" more or less in isolation, although
that concept is somewhat elastic.  IPPM would look at end to end network
kinds of performance issues.

It seems clear to me that convergence is a performance characteristic of
complete networks and primarily an end to end or at least, core element to
core element characteristic.

My thought would be that this is interesting and useful work but not
necessarily for BMWG.

Jim McQuaid

-----Original Message-----
From: Howard C. Berkowitz [mailto:hcb@gettcomm.com]
Sent: Tuesday, April 06, 2004 7:43 PM
To: bmwg@ietf.org
Subject: Re: [bmwg] Is the BMWG a proper home for this I-D?

>Supporters of the BMWG:
>
>A [long] while back, Russ White approached the BMWG with an 
>individually submitted I-D
>that presented considerations when measuring network convergence:
>
>http://www.ietf.org/internet-drafts/draft-white-network-benchmark-00.txt
>
>The draft seeks to highlight: ...considerations that testers should 
>be aware of when
>attempting to measure network convergence using various methods. 
>(Adapted from the I-D's
>abstract.)
>
>Not being a pure terminology or methodological statement, or 
>constraining itself to
>laboratory evaluations, the topic might not be a perfect fit as BMWG 
>output.  On the
>other hand, the community might benefit from memo on this topic.
>
>Please offer your comments in two dimensions: a) does the topic 
>merit future effort, and
>b) is the BMWG the best home for this effort (as opposed to another 
>WG or progressing the
>work as an Individually submitted I-D)?  (You might have to read the 
>WG Charter to determine
>part B. :-)

I'm not quite sure that I'm ready to answer in the specific 
dimensions, but I can make some comments. I'll add a dimension that I 
have always believed that it would be highly desirable for the IETF 
to generate more frameworks, tutorials, etc., that help introduce 
newbies to a technology, and also help control the salesdroids from 
running amuck.

>
>
>Network Working Group                                           R. White
>Internet Draft                                                 V. Manral
>Expiration Date: August 2003                                    R. Adams
>File Name: draft-white-network-benchmark-00.txt            February 2003
>
>   Considerations in Benchmarking Routing Protocol Network Convergence
>                   draft-white-network-benchmark-00.txt
>
>1. Motivation
>
>    As the ability to benchmark components within a network appears to be
>    coming under greater scrutiny, and specifications are being written
>    to standardize ways to measure the performance of individual
>    components within given frameworks, the next level of benchmarking
>    has not been approached, that of measuring the perfomance of
>    networks. But what is meant when we say the performance of a network,
>    from the perspective of routing protocols? Various tests have been
>    used in the past to measure the convergence of a network, some of
>    which actually measure completely different things than others.
>
>    It's important to attempt to examine the measurement of network
>    convergence in a way that exposes these differences, and helps
>    vendors, end users, and those in the research community have some
>    common ground when discussing network convergence.
>
>
>2. A Problem of Definitions
>
>    As we examine the issues and concepts surrounding the measurement of
>    network performance in terms of convergence, we find that most of the
>    basic problems we face surround defining the terms in use. For
>    instance, what is convergence, exactly? What is a network? In the
>    following sections, we discuss each of these concepts, and attempt to
>    address each one.
>
>
>2.1. Networks
>
>    In its most nominal form, a network is composed of a group of devices
>    interconnected in some way, which send data over these
>    interconnections for various purposes. But, when we discuss the
>    concept of routing protocol convergence within a network, the
>    definition needs to be more precise. For instance, since hosts do
>    not, generally, participate in routing, should they be considered a
>    part of the network when benchmarking the performance of a routing
>    protocol?  The obvious answer appears to be a resounding no, but, in
>    some possible tests types, hosts which do not participate in routing
>    play a large part in the test itself.
>
>    When considering tests in which hosts participate as traffic or route
>    generators, then, we must consider the impact these hosts have on the
>    test results, although we may not consider them a part of the network
>    we are measuring the performance of.
>
>
>2.2. Convergence
>
>    Convergence is probably one of the hardest words in networking to
>    define.

:-) you mean it DOESN'T mean running voice, data, and video on the 
same network?

>Just about everyone who has worked on networks for a period
>    of time knows what it means, but no-one can explain it sufficently to
>    someone who doesn't understand how a network works for it to be
>    understood. In fact, this is because there are several different
>    meanings attributed to convergence, and which meaning is intended
>    depends on the context in which the word is set. Convergence can
>    mean:
>
>
>    o    The time at which all the routing protocol processes running on
>         devices which participate in routing in the network agree on the
>         best path to each reachable destination in the network.
>
>    o    The time at which the best path to each reachable destination in
>         the network has been loaded into some local table which may then
>         be used to forward packets (the routing information base, or
>         RIB).
>
>    o    The time at which each router in the network has built the
>         tables necessary to actually forward packets through the net-
>         work, so that a packet transmitted from one part of the network
>         would actually reach any given reachable destination within the
>         network.
>
>    For instance, on a Cisco router, show ip ospf stats would allow the
>    tester to see the time of the last completed SPF, show ip route would
>    allow the tester to see what routes are installed in the RIB, and
>    show ip cef would allow the tester to see the forwarding information
>    which has been built from the RIB. Each test designed to measure the
>    performance of routing protocols within a network must determine
>    which type of convergence is being measured, if that measurement is
>    acceptable to the information being gathered, and which test will
>    actually measure the desired type of convergence.

Not to criticize, because we consciously restricted our scope to 
single router in draft-ietf-bmwg-conterm-05.txt, leaving single-AS as 
a future and Internet-wide as outside our scope, is this definition 
not specific to single routers?

Should the paper include any definitions that apply to groups of 
routers?  Can the convergence of a group of routers, perhaps under 
the constraint that they start from a known state, receive some set 
of updates, then run to convergence, be described adequately in terms 
of the convergence of the last router to meet single-router 
convergence criteria?

>
>3. Polling Devices in a Network

By "the network", are you now suggesting a routing domain rather than 
a single router?

>
>    One common way to measure network convergence is to poll the devices
>    in the network, using some command suppplied within the routing
>    software, to determine when particular events have occured, or par-
>    ticular pieces of information have reached all the routers in the
>    network. Polling elimiates the need for the clock of each device
>    within the network to be synchronized for the test to have meaningful
>    results. However, there are some issues with the rate of polling dev-
>    ices within the network which need to be addressed in any test which
>    polls devices for this information; the first is the rate at which
>    polling takes place.
>
>    If, in a test, you are attempting to measure some parameter to within
>    one second of its occurance, then you would need to poll at a rate
>    much higher than once per second.
>
>     test starts here
>      |
>      |                   event occurs here
>      |                   |
>      v                   v
>    -+----------+----------+----------+---
>     ^          ^          ^          ^
>     |          |          |          |
>     0 seconds  1 second   2 seconds  3 seconds
>
>    For instance, in this time line, suppose a polling event is set up
>    which takes place every second. An event is started just after some
>    polling event takes place, but the polling process doesn't recognize
>    the test as starting until the 1 second poll. An event occurs just
>    before the 2 second poll, and the polling process detects this at the
>    2 second poll. The polling process would indicate that from the time
>    the event started until the time the event has finished, one second
>    has elapsed. In reality, closer to two seconds has elapsed.
>
>    The interval of the polling process can be reduced until the measure-
>    ment is felt to be accurate, but it should be at least half of the
>    desired accuracy. Common practice actually shows that it should be
>    about one tenth of the desired accuracy.
>
>    A second consideration when polling for network events is the prefor-
>    mance of the device running the polling process. If the process can-
>    not poll each device at the scheduled interval, or the polling is
>    "jittered," the time between each actual poll varies by some amount,
>    the accuracy of the tests will be called into question. The amount of
>    jitter introduced by the polling device, and the rate at which the
>    device can effectively poll, should be measured in some way, and this
>    measurement should be taken into account when designing tests which
>    rely on polling.

But are there not cases where some jittering is desirable to avoid 
unexpected weak synchronization effects (assuming the measurement 
itself has some Heisenbergian flavor), is it not wise to lengthen the 
test interval, or do multiple measurement runs with different seeds 
to a pseudorandom number generator, so the effect of jitter nulls out 
statistically?

>    Finally, when polling devices to determine when a network event
>    occurs, issues with serialization must be considered. Most devices
>    which would be used for polling will not be able to poll several dev-
>    ices within the network at once, and will thus serialize the polling
>    of devices.
>
>     p1   p3  p5  p7  p9
>      | p2| p4| p6| p8| p10
>      | | | | | | | | | |
>      v v v v v v v v v v
>    -+----------+----------+----------+---
>     ^          ^          ^          ^
>     |          |          |          |
>     0 seconds  1 second   2 seconds  3 seconds
>
>    Suppose, for instance, that a single device is polling ten devices in
>    the network. If it can poll five devices per second, it will take a
>    full two seconds for it to detect any event on all ten devices, giv-
>    ing an effective accuracy of about four seconds. The amount of time
>    required for a polling device to serialize through all the devices it
>    is polling needs to be considered when polling a very large number of
>    devices.

Or if there is a point where realistic measurement requires multiple 
polling devices.

>
>
>4. Tests to Measure Routing Protocols Convergence
>
>    In this section, we will outline some of the various tests which have
>    been used in the past to measure routing protocols convergence within
>    a network, and discuss some aspects of these tests.

It occurs to me that there is something of value here that probably 
isn't benchmarking, but certainly is relevant to the Operations Area. 
For want of a better term, I'll call it walking the reachability 
tables in a domain.

I'm currently doing some work with automated evaluation of student 
configurations, and part of that involves generating a meaningful 
script to check the control plane on multiple routers, at the point 
where a step of a scenario is complete. We ask ourselves the question 
of whether it is truly necessary to do the equivalent of 
ping-every-nonlocal-interface from each router in the domain, or if 
there is some meaningful way to define a smaller and more practical 
subset.

>
>
>4.1. Determining When Each Device has Received Information About All
>    Reachable Destinations
>
>    In link state protocols, information is flooded throughout the net-
>    work; discovering when each router in the network has received this
>    information is an important consideration in network convergence.
>    Slower flooding times will, of course, mean slower network conver-
>    gence overall, thus flooding performance directly impacts overall
>    routing protocols performance in the network.
>
>    There are three methods which can be used to determine when the
>    flooding of information has been completed.

A clarification here:  as I read this, I have a sense that you think of
black box testing as done by external stimulus/response devices, while
external packet monitoring is done by inference from live traffic streams.
Another way of putting this is that black box is active while monitoring
is passive.

Do I misconstrue what you mean?

>
>
>4.1.1. Black Box Polling
>
>[snip]
>4.1.2. White Box Output
>
>[snip]
>
>4.1.3. External Packet Monitoring

[snip]

>
>4.2. Determining When Each Device has Finished Finding the Best Path to
>    Each Reachable Destination
>
>    This is, probably, the most difficult measurement to take in a net-
>    work, since there are no known black box ways of determining when a
>    device has finished computing the best path to each destination in
>    the network. The only possible way of measuring this time is to use
>    output from the devices in the network to provide this information.
>
>    It's possible to poll each device periodically, examining output pro-
>    vided by the devices, to determine when each device has calculated
>    the best path to each destination in the network. This method is sub-
>    ject to the limitations described in the section on polling devices,
>    above.
>
>    It's also possible to rely on some event driven output of each device
>    in the network. For this to yield accurate results, the time clocks
>    of all the devices in the network must be closely synchronized.

While this may be "ideal" convergence, it does imply what I'll 
loosely call best exit routing policy.  There may be a perfectly 
reasonable earlier point where a router can reach all destinations 
and do some, although not ideally performing, work, as opposed to 
when it has the best path to all destinations.

[snip]
>
>
>4.3.1. The Various Elements of Performance Cannot Be Seperated
>
>    Using this sort of testing, there is no way to seperate the perfor-
>    mance of a routing protocol from the performance the interaction
>    between the routing protocol and the forwarding engine, nor from the
>    performance of the forwarding engine itself.

Here I disagree, if I take a broad look at "routing". Your statement 
is probably true if we limit ourselves to packet forwarding, even 
though it doesn't quite fit systems with a single control plane but 
multiple forwarders.

It is probably reasonable to talk of IP control plane in some 
isolation, if the control information is then used to set up (G)MPLS.

What I am not seeing is conditions of workload and test -- issues 
such as convergence of N routes in topologies where the router under 
test has M or 2M peers.

_______________________________________________
bmwg mailing list
bmwg@ietf.org
https://www1.ietf.org/mailman/listinfo/bmwg

_______________________________________________
bmwg mailing list
bmwg@ietf.org
https://www1.ietf.org/mailman/listinfo/bmwg