Re: Autonomous System Sanity Protocol

Jon Crowcroft <J.Crowcroft@cs.ucl.ac.uk> Mon, 28 April 1997 08:50 UTC

Received: from cnri by ietf.org id aa14654; 28 Apr 97 4:50 EDT
Received: from murtoa.cs.mu.OZ.AU by CNRI.Reston.VA.US id aa05234; 28 Apr 97 4:50 EDT
Received: from mailing-list by murtoa.cs.mu.OZ.AU (8.6.9/1.0) id SAA12276; Mon, 28 Apr 1997 18:37:27 +1000
Received: from munnari.OZ.AU by murtoa.cs.mu.OZ.AU (8.6.9/1.0) with SMTP id SAA12258; Mon, 28 Apr 1997 18:25:46 +1000
Received: from bells.cs.ucl.ac.uk by munnari.OZ.AU with SMTP (5.83--+1.3.1+0.56) id IA26216; Mon, 28 Apr 1997 18:25:36 +1000 (from J.Crowcroft@cs.ucl.ac.uk)
Received: from waffle.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP id <g.05394-0@bells.cs.ucl.ac.uk>; Mon, 28 Apr 1997 09:25:28 +0100
To: big-internet@munnari.oz.au
Cc: J.Crowcroft@cs.ucl.ac.uk
Subject: Re: Autonomous System Sanity Protocol
In-Reply-To: Your message of "Sun, 27 Apr 1997 21:07:15 PDT." <199704280407.VAA02740@chimp.jnx.com>
Date: Mon, 28 Apr 1997 09:25:26 +0100
Message-Id: <860.862215926@cs.ucl.ac.uk>
From: Jon Crowcroft <J.Crowcroft@cs.ucl.ac.uk>
Precedence: bulk


there seem to be a spate of routing problems and a gamut of solutions,
with different time frame applicability....

1. this incident (egp/igp/egp specific) has been well addressed by per
and tony....

in the longer run, it seems to me that there are several pieces to any
reasonably engineered systems design for stable, disaster resistent
multi provider routing:

1/ trust chains - 
i) you need to sign updtes with a hash (RC4?) and
progate aggregated routes with this, and then use PGP trust chaining -
unfortuantely, this wont scale, or in fact deal with the specific
current problem.....
ii) as part of this, you need not only route settlements, but
penalties...i.e. $$$ for trouble caused, as well as connectivity
provided...

2/ use engineering statistics - e.g. i nthe mbone, we don't propogate
route updates containing more than a plausible number of
entries...this is heavy handed and manual/operater intensive, but
there are a number of route pathologies (e.g. vern paxson's work
identifies their salient parameters and dynamics) which could be used
to either
i) set alarms
ii) automate sanity filters....

3/ on a general (jnc type note), there does seem to be a problem
"solving" the larger problem elegantly - piecewise refinement goes
just so far, but a scheme to colour areas and exchange maps should be
a lot more resistent to some collapses (though i don't really see how
either nimrod or radia's byzantine routing solve the problem of an
accidently or delibverately misconfigured AS..........the only way yo
udo that is to use "safety in numbers" or redundancy...
i) provide hints for routing from an alternate source (e.g. put GPS
receivers in all border routers, and log their gographic posn; as part
of the BGP and make sure that  a reasoanble percentage of the relevant 
nets adverttized are actually "within" a border - yes i know lots are
not, but a sanity check (to ring alarms) should be derivable from
this....
ii) actually combine link state and distance vector - normally each
LS router receives a bunch of LSAs from all other routers in an area - in
distance vector, you get a route table from all neighbour routers
if you got a part of the hierarchical route table from all other
routes, then you could _vote_ (yes i know it doesn't scale, but
you can be selective about who you need backup info from.....or
trigger it based on engineering alarms from 2/...)

4/ multicast and rsvp are going to cause even more problems than
unicast messups like this (because of the denial of service problems
and multiplier effects they both have) unless the underlying system
has some strong sanity checks..... what do you _expect_ as a provider
? what cosntraints does that put on what updates you should receive,
and from whom?

 jon