[Ospf-manet] Dual experimental drafts

I do not object to proceeding with two experimental
drafts, but I doubt that we have valid reasons for doing so.
Most of the reasons given for doing so are based on miconceptions
and are not valid, as I will explain in this message,
and all of these reasons can be discussed and addressed in the
next few months, so it may be better to allow more time
to address these issues before a decision is made.

One of my concerns is that if we proceed with two experimental
drafts, then we will be doing exactly what Bill Fenner said
we shouldn't do, since the two proposals are two ways to
solve the same problem, as argued below.  Maybe someone can
tell me that this is OK and that I misunderstood what
Bill Fenner said?

There are three main misconceptions that I discuss below.

Misconception 1:  MDRs perform better for some scenarios (e.g., dense
networks) and ORs perform better for other scenarios (e.g., sparse
networks or when adjacency reduction is not desired).

This is false, since ORs have not been shown to perform better than
MDRs in sparse or dense networks.  For example, the Milcom 2006
paper by Phil Spagnolo and Tom Henderson considered both sparse
and dense networks, and states:
"we consistently observed that MDR produces less overhead than a
comparably configured OR/SP implementation".

Misconception 2. ORs are simpler than MDRs when there is no adjacency
reduction.

The MDR draft was written to emphasize scalability and thus does not
explicitly specify how it should be implemented without adjacency
reduction.  Future versions of the MDR draft will do this, and will
show it is as simple as the OR solution.  For example, just as
any non-active OR can perform backup flooding, any non-MDR can also
perform backup flooding, making it unnecessary to run the Backup MDR
selection algorithm.  (However, as discussed below, this may not
be the best solution for MDRs or ORs.)

In fact, in this case the MDR solution will actually be *simpler*
than the OR solution, because the OR solution requires that
each router advertise its selected MPRs in Hellos, whereas
MDRs are not advertised in Hellos because they are self-selected.
(I could also discuss how this extra step required for MPRs results
in slower response to topology changes, but that has been discussed
before.)

Moreover, even if BMDR selection is performed, the draft allows
any algorithm to be used which attempts to find disjoint paths
from the neighbor Rmax to the other neighbors.  
(The algorithm need not always find disjoint paths even when they
exist, which allows simpler algorithms to be used.)
Appendix B.2 describes a "complex" algorithm and a simpler algorithm,
and other good simple algorithms also exist.

If ORs are used with smart peering (SP), and if biconnected
adjacencies are desired (for robustness), then the OR/SP
solution will also need an algorithm for computing disjoint
paths to all neighbors.

Misconception 3:  ORs are more robust because any non-active OR
(i.e., any non-MPR neighbor) can perform backup flooding, whereas
Backup MDRs are selected only to ensure biconnected coverage.

First of all, it is easy to allow any non-MDR to perform backup
flooding (as an option), if one believes this will improve robustness.
However, as Tom Henderson wrote in his message of 10/10/06:

"So I don't think it has ever been established that non-active ORs
significantly improve robustness, or that the overhead associated with
them is best spent there and not doing something else such as reducing
the Hello interval."

Allowing all routers to perform backup flooding will generate more
overhead and thus reduce scalability.  But Cisco avoided this by
changing the recommended values of the parameters PushBackInterval
and RxmtInterval such that only a small percentage of non-active ORs
are allowed to perform backup flooding.  However, this is not the
best approach as discussed below.

Tom Henderson wrote:
"As we reported in our other report last March, the Cisco results
presented at IETF-64 had the PushBackInterval at a large fraction of
the RxmtInterval, larger than what was recommended in the draft,
thereby significantly deemphasizing the firing of non-active relays"

This actually results in a random selection of non-active
ORs that can fire.  With RxmtInterval = 9, and with
PushBackInterval = 7, with a maximum jitter of 7 (so that the
actual delay is random between 7 and 14 sec), the probability that
a given non-active OR is allowed to fire is 2 / 7, so let's
say it is about .25.  Then if there are only 4 possible non-active
ORs, then on average only one can fire, but with probability
.75 ** 4 = .316, none of the non-active ORs will fire!
Also, note that this scheme is very sensitive to the amount
of jitter that is used!
This is why it is better to reduce the number of backup
relays deterministically (as in MDR) rather than randomly.

In addition, using such a large value of PushBackInterval
causes a significant delay in performing the backup flood.
In contrast, the equivalent parameter for MDR (BackupWaitInterval)
has a default value of 0.5 sec, resulting in very fast
backup flooding.

This argues in favor of selecting backup/redundant relays
deterministically.  But if one is using (neighbor-selected)
MPRs/ORs, then backup MPRs must be advertised in Hellos
along with primary MPRs, which would further increase the
complexity.  This is another advantage oF MDRs, since MDRs and
BMDRs are self-selected and thus are not advertised in Hellos.

In summary, allowing all routers to perform backup flooding
does not improve performance and will increase overhead
unless some mechanism (random or deteministic) is used
to limit them.  A deterministic mechanism (as in MDR)
is better because it guarantees redundancy, allows
fast backup flooding, and is not sensitive to the
amount of jitter.

Richard

_______________________________________________
Ospf-manet mailing list
Ospf-manet@ietf.org
https://www1.ietf.org/mailman/listinfo/ospf-manet