Re: [manet] New draft: "Routing Loop Issue in Mobile Ad Hoc Networks (MANETs)"

Elektra <onelektra@gmx.net> Fri, 05 June 2009 13:10 UTC

Return-Path: <onelektra@gmx.net>
X-Original-To: manet@core3.amsl.com
Delivered-To: manet@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A69023A6D4F for <manet@core3.amsl.com>; Fri, 5 Jun 2009 06:10:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o9Wby-09t+Fk for <manet@core3.amsl.com>; Fri, 5 Jun 2009 06:10:37 -0700 (PDT)
Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by core3.amsl.com (Postfix) with SMTP id 359D63A6D44 for <manet@ietf.org>; Fri, 5 Jun 2009 06:10:37 -0700 (PDT)
Received: (qmail invoked by alias); 05 Jun 2009 13:10:39 -0000
Received: from brln-4dbc3631.pool.einsundeins.de (EHLO x.localnet) [77.188.54.49] by mail.gmx.net (mp057) with SMTP; 05 Jun 2009 15:10:39 +0200
X-Authenticated: #2746672
X-Provags-ID: V01U2FsdGVkX188uAc1uhA8IFaAucWlPM2YssVHyQA99I6z95LKPM zYlQrDcMq6wUjs
From: Elektra <onelektra@gmx.net>
Organization: open-mesh.net
To: manet@ietf.org
Date: Fri, 05 Jun 2009 15:10:38 +0200
User-Agent: KMail/1.11.2 (Linux/2.6.28-11-generic; KDE/4.2.2; i686; ; )
References: <4A1BB562.7000104@net.ie.niigata-u.ac.jp> <4A25027E.3070503@net.ie.niigata-u.ac.jp> <4A28E99A.6050905@net.ie.niigata-u.ac.jp>
In-Reply-To: <4A28E99A.6050905@net.ie.niigata-u.ac.jp>
MIME-Version: 1.0
Content-Type: Text/Plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200906051510.39239.onelektra@gmx.net>
X-Y-GMX-Trusted: 0
X-FuHaFi: 0.49
Subject: Re: [manet] New draft: "Routing Loop Issue in Mobile Ad Hoc Networks (MANETs)"
X-BeenThere: manet@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Mobile Ad-hoc Networks <manet.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/manet>, <mailto:manet-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/manet>
List-Post: <mailto:manet@ietf.org>
List-Help: <mailto:manet-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/manet>, <mailto:manet-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Jun 2009 13:10:40 -0000

Hello all!

I'd dare to state that transient routing loops are normal in Olsr and other 
Link State protocols in a wireless environment with fluctuating link quality as 
long as TCs are communicated via broadcasts. (Using unicasts instead doesn't 
seem like a viable solution).

Since the topology data base has to be synced amongst adjacent nodes via a 
lossy medium to avoid loops (up to 3 hops range I'd say) I have drawn and 
tested the following conclusions when hacking on RFC3626 for the Freifunk 
community mesh network in Berlin together with Tomas Lopatic and others in the 
years 2004 and 2005:

- when we tested RFC3626 with participants of the Wizards of OS 3 conference 
the results were devastating in a ~25 node mesh

- MPRs are contra-indicated if you want to avoid loops - the redundancy of 
multiple paths propagating TC information is desperately needed in a 
productive environment where payload saturates the links. Otherwise TC data 
bases get out of sync locally and things become very, very messy

- MPRs don't help much in terms of scalability as long as every TC has a 
global TTL

- RFC3626's hysteresis is a no-no - especially in combination with MPRs. MPRs 
are selected which frequently get kicked out by hysteresis - the problem of 
syncing TC information gets even worse & the protocol needs to negotiate the 
MPR set again.

- A simple fish-eye algorithm worked much better in terms of scalability and 
allowed us to send TC updates more often than Hellos without suffocating our 
city wide mesh network of +400 nodes (mostly Linksys WRT54G) at the time. 
Hence TC updates were redundant which drastically mitigated the looping 
problem. Still under very high network load we had transient routing loops. 
This is still a issue. The mesh grew up to ~650 nodes in Berlin without 
problems - however we were pushing the CPU load to the limits before the code 
was optimized by the Olsr-ng project. 

- Minimum hop count metric optimizes packet loss, so the first thing we did was 
adding a metric based on ETX which communicates link quality data with every 
Hello message to single hop neighbors (LQ/NLQ)

The experience of the Freifunk people with OLSR is described in more detail 
here: https://www.open-mesh.org/wiki/the-olsr-story Note that the document is 
summing up the experience until mid 2005.

Our work was/is miserably documented to say the least. As an excuse I can say 
that we were focused on getting the mesh network usable for the community. The 
modifications are present in the Olsr implementation from olsr.org and can be 
switched on and off via the configuration file. Sample configurations come with 
the source code or can be found in the wiki pages of the wireless communities.

I don't know any real-life network using the RFC3626 configuration in a 
productive environment. If you do know, let me know. However there are 
community networks in Tanzania, South Africa, India, Europe and elsewhere 
successfully using the Olsr daemon with the modifications. 

The modified Olsr daemon has been tested with up to 850 nodes in real life. It 
is working on low-end consumer APs running Openwrt with moderate CPU load 
thanks to the OLSR-NG project which has optimized the code after I ceased 
being active in the Olsr.org development. 

We have started a new community-driven mesh protocol development site in 2006 
- open-mesh.net. The name of our current protocol project is BATMAN. 
Generation three of the algorithm is described in a document available here: 
http://tools.ietf.org/html/draft-wunderlich-openmesh-manet-routing-00

A reviewer has claimed that the protocol allegedly would loop and consume too 
much CPU and battery power for a MANET protocol. We didn't bother to defend 
the draft because we have better use for our time. Please reprieve me from 
protocol discussion dogfights.

Note that the document is referring to mark III of the algorithm, while we are 
currently working on designing mark 5. Implementations of mark 4 are available 
from our web site and as opkg-packages in Openwrt.

A comparison between Batman mark IV (Batman-experimental branch) and the 
modified (!) OLSR daemon is available here: 
http://wirelessafrica.meraka.org.za/wiki/images/9/98/Batman_ifip.pdf

BATMAN is a stigmeric approach which has proven to be loop-free and scalable. 
There are two competing implementations and change sets for OSI Layer 3 and 
one for "Layer 2.5". The largest real-life deployments have ~120 nodes - we 
are currently facing the difficulty to establish the new protocol in the 
community alongside the modified OLSR, because people hesitate to change their 
"running system" and known "standard" for something new. And some people love 
to defend "their" protocol. Germans tend to say: "The apple in the hand is 
better than the sparrow on the roof." No matter how cute the sparrow is ;-)

Cheers,
Elektra