Re: Comments on draft-spagnolo-manet-ospf-design-00.txt

Richard Ogier <ogier@EARTHLINK.NET> Thu, 29 July 2004 06:39 UTC

Received: from cherry.ease.lsoft.com (cherry.ease.lsoft.com [209.119.0.109]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id CAA19148 for <ospf-archive@LISTS.IETF.ORG>; Thu, 29 Jul 2004 02:39:21 -0400 (EDT)
Received: from vms.dc.lsoft.com (209.119.0.2) by cherry.ease.lsoft.com (LSMTP for Digital Unix v1.1b) with SMTP id <5.00E2D170@cherry.ease.lsoft.com>; Thu, 29 Jul 2004 2:39:21 -0400
Received: from PEACH.EASE.LSOFT.COM by PEACH.EASE.LSOFT.COM (LISTSERV-TCP/IP release 1.8e) with spool id 28069696 for OSPF@PEACH.EASE.LSOFT.COM; Thu, 29 Jul 2004 02:39:18 -0400
Received: from 207.217.120.50 by WALNUT.EASE.LSOFT.COM (SMTPL release 1.0i) with TCP; Thu, 29 Jul 2004 02:39:18 -0400
Received: from sdn-ap-007castocp0247.dialsprint.net ([63.187.64.247] helo=earthlink.net) by avocet.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 1Bq4ZM-0003sO-00 for OSPF@PEACH.EASE.LSOFT.COM; Wed, 28 Jul 2004 23:39:13 -0700
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:0.9.4) Gecko/20011128 Netscape6/6.2.1 (emach0202)
X-Accept-Language: en-us
MIME-Version: 1.0
References: <LISTSERV%200407281706223520@PEACH.EASE.LSOFT.COM>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
Message-ID: <41089B8E.7000209@earthlink.net>
Date: Wed, 28 Jul 2004 23:39:10 -0700
Reply-To: Mailing List <OSPF@PEACH.EASE.LSOFT.COM>
Sender: Mailing List <OSPF@PEACH.EASE.LSOFT.COM>
From: Richard Ogier <ogier@EARTHLINK.NET>
Subject: Re: Comments on draft-spagnolo-manet-ospf-design-00.txt
To: OSPF@PEACH.EASE.LSOFT.COM
Precedence: list
Content-Transfer-Encoding: 7bit

Gary,

Please see my comments below.

Gary Pei wrote:

>Dear Richard,
>Thank you very much for your detailed comments. Please see my in-line
>comments.
>
>On Mon, 26 Jul 2004 09:07:06 -0700, Richard Ogier <ogier@EARTHLINK.NET>
>wrote:
>
>>Please consider my comments below for the draft
>>http://www.ietf.org/internet-drafts/draft-spagnolo-manet-ospf-design-00.txt
>>to be presented next week in San Diego.
>>
>>Overall, the draft presents some nice analysis, simulation results,
>>and conclusions, and I was happy to see that ideas from my draft
>>http://www.ietf.org/internet-drafts/draft-ogier-manet-ospf-extension-01.txt
>>(recently updated) were considered. However, I disagree with some of
>>the conclusions as noted below.
>>
>>My comments are divided into the following 5 items:
>>
>>1. Simulation Model
>>2. Originator-based LSA or ACK supression
>>3. Simulation results for reliable vs. unreliable flooding
>>4. Reliable flooding without ACKs, using periodic DDESC packets
>>5. Differential Hellos and differential LSAs
>>
>>Items 2 and 3 are a bit long, so I first summarize them, and later
>>provide more details in items 2a and 3a.
>>
>>1. Simulation Model
>>
>>Although the analysis considered up to 100 nodes, the simulation
>>results considered only 20 nodes, which is quite limited.
>>The high mobility case (for 20 nodes) had a neighbor change rate of
>>0.14, or one neighbor change every 7.14 sec on average.
>>If there are 100 nodes and thus 5 times as many neighbors, then the
>>neighbor change rate should increase to about 0.70, or one neighbor
>>change every 1.43 sec, so that an LSA would be generated almost every
>>MinLSInterval (5 sec), which would trigger the "periodic update" mode
>>(possibly with non-ackable LSAs) in any of the hybrid flooding schemes
>>we have considered (whether or not prediction of neighbor changes is
>>used). Thus, your results are highly dependent on your choice of
>>number of nodes and mobility level.
>>
>
>The neighbor change rate is more closely related to node density
>than the network size. It is not necessarily true that 100 nodes will have
>5 times higher neighbor changing rate. In fact, I am expecting that the
>neighbor changing rate remains constant if we keep the density the same for
>both 100 nodes and 20 nodes network.
>

*If* you keep the density constant, then I agree. I was talking about
the case where the density (average number of neighbors) increases
5 fold. For example, Figure 1 assumes that avg-neighbors = #nodes/2.
I don't think we want to assume that the number of neighbors will
never exceed 10!

>
>
>>In addition, a larger number of nodes would result in larger Hellos,
>>thus making differential Hellos more beneficial in a highly mobile
>>network (in which a smaller Hello interval would be used). As indicated
>>in your analysis (Figure 1), using 100 nodes and a 2 second Hello
>>interval results in about 62 times as much Hello overhead as using 20
>>nodes and a 10 second Hello interval (when full Hellos are used).
>>
>
>Figure 1 assumes the area is same for all network sizes.
>The hello overhead is really affected by the density not by the network
>size. Thus, the dense network may benefit from differential hellos.
>
We agree on this. (When I said a larger number of nodes, I again meant a
larger number
of neighbor nodes.)

>
>
>>Also, the only measure you used for data packets was delivery ratio,
>>which was about .78 in your high mobility scenario.
>>MANET simulations often assume that link-layer failure detection
>>is used, which typically results in a delivery ratio close to 0.99
>>for even faster mobility than you considered. It would be good
>>to run simulations that assume link-layer failure detection is used,
>>and that consider other performance measures such as delay.
>>
>>
>
>I believe that the delivery ratios (as well as delay) are very scenario
>dependent. In the simulation, each node sends packets to *every* other
>nodes using CBR traffic generator, the packet losses are due to the
>following reasons:
>(1) no possible routes physically exist at the time of delivery;
>(2) packets are dropped because the link exceeds the retransmission attempt
>limits due to fading etc. that effects an existing link;
>Note in this case, regardless whether the you have link-layer failure
>detection or not, the packet will be dropped.
>

Not necessarily. It is possible for the packet to be rerouted on another
link,
which we did in our ns-2 simulations of TBRPF. Also, link-layer failure
detection allows much faster response to link failures (e.g., a few
milliseconds)
than waiting for RouterDeadInterval (several seconds).

>
>(3) route-change dissemination latency causes packets routed via failure
>links. In this case, it may be true that the link-layer failure
>detection can speed up the route convergence if efficient layer 2/layer 3
>interface is implemented. However, PTMP only relies on Hello protocol to
>detect the link change. Thus the link-layer failure detection will not help
>unless the OSPF wireless extension considers the link-layer failure
>indication from link layer. So link-layer failure detection may only help
>case 3.
>
>I don't understand where 0.99 comes from. Sure, there exist 0.99 scenarios,
>but it should be neither general nor typical.
>

I meant 0.99 or better, which we typically obtained in our TBRPF
simulations when link-layer
failure detection was used, even when nodes were moving 20 m/s.

>
>
>
>>2. Originator-based LSA or ACK supression (summary)
>>
>>Although I mentioned receiver-based ACK suppression in version 00
>>of my draft, I later stated that I preferred originator-based
>>methods (described in version 01 of my draft), so I will focus on
>>such methods. However, I will also mention that your simulation
>>result in Figure 10, which shows increased overhead for my ACK
>>suppression method, does not use the parameters that I would use.
>>(I would have used a threshold parameter that suppresses ACKs only
>>when a new LSA is originated almost every MinLSInterval - not true
>>in your case - which should not increase overhead. However, this is
>>not important since we both prefer originator-based methods.)
>>
>>Regarding originator-based methods, we seem to agree that using
>>an option bit to indicate a non-ackable LSA can be useful and
>>should be investigated.
>>
>>However, I disagree with you that using an exponential moving average
>>(EMA) to measure the rate of LSA changes is not a good approach.
>>
>
>Please see my comments under 2.a below.
>
>>The following bullet from your draft refers to this approach:
>>
>>  "The approach adapts to the link changes passively by predicting
>>   the future LSA change based upon the LSA's history.  In the MANET
>>   environment, it is difficult to predict future link changes based
>>   on the past.  Looking at both fixed and exponentially weighted
>>   windows with different window sizes and weights, we could not
>>   predict when future LSAs would be created.  In fact the
>>   distribution of interarrival times looked almost as if the
>>   interarrivals were exponentially distributed (modulo the
>>   MinLSInterval)."
>>
>>My goal was to measure the long-term or medium-term average
>>rate of LSA changes, and to base the decision of using reliable
>>or periodic flooding on such a measurement.  (The threshold should
>>be chosen so that periodic flooding is used only if it generates
>>less overhead than reliable flooding.)  Thus, I was interested in
>>a quasi-static scheme based on an average rate, rather than a scheme
>>that reacts dynamically to short-term variations in the (exponentially
>>distributed) interarrival times. Your prediction scheme is of the
>>latter type, which tries to do better than the quasi-static approach
>>by making a short-term prediction of the interarrival times.
>>
>>Although using prediction of LSA changes can result in better
>>performance when the LSA change rate happens to be near the threshold
>>as in your case (so that your predicted measurement frequently
>>crosses the threshold), using an EMA based on LSA history can also
>>be quite effective in reducing overhead (when used by the originator
>>to decide whether to flood LSAs periodically vs. reliably) if
>>done properly, and could be a good choice if it is decided that
>>using a prediction technique is too complex.
>>
>>Simulations could be conducted to compare these two approaches
>>for the hybrid reliable/unreliable LSA flooding method described
>>in item 2a below (and in my updated draft), but we need to
>>consider a large range of scenarios, and avoid drawing conclusions
>>based only on a few scenarios.
>>
>>
>>3. Simulation results for reliable vs. unreliable flooding (summary)
>>
>>Your simulation results in Figure 17 compare the overhead
>>for reliable vs. unreliable flooding, where the reliable
>>flooding incorporates several of the proposed optimization
>>techniques (source-independent CDS, source LSA suppression,
>>multicast ACKS, and retransmit timer backoff.)
>>Your results show that, for the high mobility case, unreliable
>>flooding generates much less overhead than reliable flooding,
>>with no significant difference in delivery ratio.
>>
>>As I explain below (item 3a), a hybrid reliable/unreliable flooding
>>method can be used that performs the same as unreliable (periodic)
>>flooding when mobility is high, and never performs significantly worse
>>than unreliable flooding.  Also, reliable flooding (or a hybrid
>>solution) will generate much less overhead than periodic flooding
>>in *non-mobile* wireless networks, which is an important
>>advantage in low-bandwidth sensor networks, for example.
>>(Thus, the fact that unreliable flooding overhead is independent
>>of mobility can be a *bad* thing, for non-mobile networks.)
>>
>
>Please see my comments in 3a. BTW, I don't think it is good idea to run
>OSPF in sensor networks.
>


Even so, it would make sense to run OSPF in non-mobile wireless
networks, and periodic flooding would be inefficient in such networks.

>
>
>>Also, I discuss below (item 3a) other possible ways to reduce overhead
>>in reliable flooding. For example, the CDS algorithm you are using may
>>not be the best one. I have been running simulations to compare
>>different CDS algorithms, and may announce my results soon. Maybe you
>>can run simulations to evaluate the algorithm that I found to be the
>>most promising.  Other people may also suggest CDS algorithms, so
>>it would be good to compare several of them.
>>It is known that a good CDS will not result in more transmissions
>>of a given LSA than MPRs, i.e., a source-independent CDS
>>will be no larger than a source-dependent CDS (based on MPRs).
>>
>>
>>4. Reliable flooding without ACKs, using periodic DDESC packets
>>
>>In Section 6.5 of your draft, you mentioned the methods of INRIA [12]
>>and Ogier [5] to achieve reliable flooding without ACKs, based on
>>periodic sequence numbers, similar to IS-IS.
>>However, in Sections 7 and 9.2 you mentioned only [12] for
>>adjacency forming.  What I suggested in my draft is a simple
>>application of the IS-IS method of periodic Sequence Numbers packets,
>>which translates to *periodic* DDESC packets in OSPF that are
>>*broadcast* to all neighbors. This method is equivalent to using
>>LSA NACKs instead of LSA ACKs.  (The NACKs would actually be the LSR
>>packets sent in response to DDESC packets.)
>>If a CDS is used, then only CDS nodes need to send full DDESC
>>packets. (If MPRs are used, then each MPR only needs to include
>>a subset of LSA headers in the DDESC packets it sends, as described
>>in my draft.)  I agree with you that [12] is a big departure from
>>OSPF, but what I propose is less of a departure from OSPF since
>>it does not require any new message types. (An option bit could be
>>used to distinguish a periodic DDESC packet from a regular one.)
>>
>>Also, you propose using INRIA's method [12] only for external LSAs,
>>while using ACKs or unreliable flooding for internal LSAs, the
>>rationale being that there will be more external LSAs than internal
>>LSAs. (Although the manet could be configured as a stub area, you do
>>not mention this in your draft.)
>>However, it would be nice to have a single unified solution that
>>is as scalable for internal LSAs (for very large manets) as for
>>external LSAs. The method of periodic DDESC packets that I propose
>>is one possible way to achieve such a unified solution.
>>Simulations should be conducted to compare these alternatives.
>>
>>
>>5. Differential Hellos and differential LSAs
>>
>>As discussed above, Hellos can generate a significant amount of
>>overhead in dense, highly mobile networks where a relatively
>>small Hello interval is desirable. (That is why we used differential
>>Hellos in TBRPF.)  Therefore, I think that the design should include
>>differential Hellos.
>>
>>For the same reason, I think that differential LSAs should be
>>given serious consideration, since this can also reduce overhead
>>significantly in dense, highly mobile networks.
>>(That is why we used differental LSAs in TBRPF.)
>>Of course, simulations are needed to evaluate this.
>>
>>----------------
>>
>>I will now provide more details for items 2 and 3 above.
>>
>>2a. Originator-based LSA or ACK supression (details)
>>
>>As I said above, using prediction of neighbor changes can be helpful,
>>but simply using an EMA to measure the rate of neighbor changes
>>can also be quite effective if done properly. In fact, I believe
>>it can achieve the same reduction in overhead as using prediction
>>(if done properly). Using prediction may help to reduce LSA latency,
>>by allowing an LSA to be originated earlier if it is predicted that
>>no further link change is likely to occur soon.
>>(However, your draft concludes that LSA latency is not a problem
>>even if unreliable, periodic flooding is used.)
>>
>>In the hybrid reliable/unreliable flooding method that I suggest
>>(described in my updated draft) the originator decides
>>when to generate periodic non-ackable LSAs (unreliable/periodic mode)
>>vs. event-driven ackable LSAs (reliable/event-driven mode), using
>>an EMA that estimates the probability that a link change will
>>occur within the next MinLSInterval (which I think is the same
>>as your LSA_WAIT_TIME, which I assume is 5 seconds).
>>When the EMA goes above a certain threshold (say 0.9), the
>>originator goes to unreliable mode and generates a non-ackable
>>LSA periodically every PERIODIC_WAIT_TIME (say 10 seconds).
>>Since the LSAs are non-ackable, no ACKs or retransmitted LSAs
>>are sent in this case, so going to unreliable mode can only
>>decrease overhead. The originator goes to reliable mode when
>>the EMA goes below the threshold (possibly a different threshold
>>for hysteresis). The last LSA sent in periodic mode must be
>>ackable, so that subsequent LSAs need to be originated only
>>when there is a link change.
>>
>
>I think we are talking about the same thing from your above description.
>Here you suggest to estimate the probability. In simulation, the link
>change time is estimated. If the link change interarrival time is indeed
>exponentially distributed, the probability that a link change will occur in
>next T seconds is independent from history and should be constant. I don't
>see why EMA is good approach in this case.
>

If it is a constant, then I am simply suggesting using an EMA to
measure this constant. (But of course, it will actually change as
the number of neighbors changes.) As I said in item 2, I suggest using
using a quasi-static method to decide whether to use reliable or
unreliable flooding, depending on the average interrarival time.
Thus, reliable flooding will be used if link changes are infrequent,
and unreliable/periodic flooding will be used if link changes are
frequent.
The threshold is chosen so that unreliable flooding is used only if
it generates less overhead than reliable flooding. This seems like
a good approach to me. You are claiming that it helps to predict
future link changes (which is a more dynamic approach and can result
in frequent transitions between periodic and event-driven flooding),
but I am claiming that (if the EMA approach is done correctly),
prediction will not result in significantly less overhead (but could
improve LSA latency). Simulations are needed to compare these
methods. (You compared using prediction to using an EMA based on
LSA history, but not for the method I am now proposing.) Also,
simulations should be run for a wider range of scenarios (e.g.,
different densities and node speeds). If such simulations show
that using prediction helps significantly, then maybe the
additional complexity is justified. Otherwise, the simpler
quasi-static approach may be preferable. (However, methods
that avoid LS ACKs completely, such as the method of periodic
DDESC packets, may be more efficient than any of these adaptive
flooding methods.)

>
>
>>Frankly, I was assuming that PERIODIC_WAIT_TIME = MinLSInterval,
>>since I don't think it is a good idea to increase LSA latency
>>when link changes are frequent.  If this causes too much
>>overhead, then the period used for unreliable mode can be
>>adjusted based on the measured overhead of originated LSAs,
>>using an EMA. (If the main goal is to control overhead, then
>>I don't think it helps much to use prediction.)
>>
>>
>>3a. Simulation results for reliable vs. unreliable flooding (details)
>>
>>In item 3 above, I claimed that a hybrid reliable/unreliable flooding
>>method can be used that performs the same as unreliable flooding when
>>mobility is high, and never performs significantly worse than
>>unreliable flooding. This can be achieved using something like the
>>hybrid method described in item 2a above, except for the DDESC and LSR
>>overhead, which I discuss below.  The key is for the originator-based
>>method to go to reliable/event-driven mode ONLY when doing so will
>>generate less overhead than unreliable/periodic mode.
>>
>>The LSR overhead was negligible in your results of Figure 17, but
>>the DDESC overhead (which was a significant 7.81 kbps) can
>>be reduced by using one or more of the following techniques:
>>
>>- Only CDS nodes need to send a full DDESC (as described in my draft).
>>  Since a CDS typically consists of less than 10% of the
>>  nodes, this could reduce DDESC overhead by 90% or more.
>>
>>- A DDESC packet only needs to contain headers of ACKable LSAs, and
>>  thus could be EMPTY if all LSAs are non-ackable (in unreliable mode).
>>
>>- If the method of sending periodic DD packets is used (similar to
>>  IS-IS, as suggested in my draft), then only CDS nodes would send
>>  periodic DDESC packets.
>>
>
>I agree that adaptive flooding is the right approach. The key difficulty is
>however the conditions on which each node decides to change it flooding
>mode. I think these conditions should be identified and quantified. It
>seems that no proposals are currently addressing this issue.
>
We have discussed several ideas that can be compared via simulations.
These include source-based LSA suppression, with or without non-ackable
LSAs, and with or without prediction; and methods based on IS-IS that
avoids ACKs (e.g., periodic DDESC packets).

Regards,
Richard Ogier

>
>
>Regards,
>Gary Pei
>