Re: Autonomous System Sanity Protocol

Noel Chiappa <jnc@ginger.lcs.mit.edu> Mon, 28 April 1997 14:29 UTC

Received: from cnri by ietf.org id aa27789; 28 Apr 97 10:29 EDT
Received: from murtoa.cs.mu.OZ.AU by CNRI.Reston.VA.US id aa10981; 28 Apr 97 10:29 EDT
Received: from mailing-list by murtoa.cs.mu.OZ.AU (8.6.9/1.0) id AAA12722; Tue, 29 Apr 1997 00:19:02 +1000
Received: from munnari.OZ.AU by murtoa.cs.mu.OZ.AU (8.6.9/1.0) with SMTP id AAA12704; Tue, 29 Apr 1997 00:14:39 +1000
Received: from ginger.lcs.mit.edu by munnari.OZ.AU with SMTP (5.83--+1.3.1+0.56) id OA03505; Tue, 29 Apr 1997 00:14:35 +1000 (from jnc@ginger.lcs.mit.edu)
Received: by ginger.lcs.mit.edu id AA28893; Mon, 28 Apr 97 10:14:07 -0400
Date: Mon, 28 Apr 1997 10:14:07 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9704281414.AA28893@ginger.lcs.mit.edu>
To: J.Crowcroft@cs.ucl.ac.uk, big-internet@munnari.oz.au
Subject: Re: Autonomous System Sanity Protocol
Cc: jnc@ginger.lcs.mit.edu
Precedence: bulk

    From: Jon Crowcroft <J.Crowcroft@cs.ucl.ac.uk>

    ii) as part of this, you need not only route settlements, but
    penalties...i.e. $$$ for trouble caused

The problem with this is that I think this latest case follows a typical
model for many "large-scale" (not that this is on the same absolute scale as
most, I admit) technological failures, which is that one or more "hidden"
faults elsewhere in the system, faults which are pre-existent but which have
remained hidden by not actually occurring are "excited" (in the technical
sense) by what would otherwise be a small-scale "trigger" event.

In our latest event, it does appear that the fact that "bad" routes continued
to circulate in the system for some hours *after* the site that initiated
them had been completely disconnected appears to show that we have an event
in that class. Surely, it is *independent* faults elsewhere in the system,
nothing to do with the site which initiated the event, which caused the
event to be as severe as it was.

If so, it's unfair to charge all the cost to the initiator - and any attempt
to do so would probably wind up in the legal system.

    this is heavy handed and manual/operater intensive

A technique which is going to be less and less applicable as the network
grows.


    i don't really see how either nimrod or radia's byzantine routing solve
    the problem of an accidently or delibverately misconfigured AS

I can't follow this - can you please give some examples of misconfiguration
that you think this wouldn't handle? With PKS signing of all statments of
the form "I'm router r19, and I'm connected to topology piece X", how can
anyone initiate an update which cannot be verified? Each half of that
statement has to be signed with the right private key.

Which is not to say that this solves all possible problems; we can list them
and discuss the countermeasures to them once we have this issue sorted out.

    actually combine link state and distance vector ... if you got a part of
    the hierarchical route table from all other routes, then you could _vote_

Again, I'm not following this - how would you use various neighbours' routing
tables when it came to selecting paths (remember, I'm assuming explicit/
unitary path selection, *not* hop-by-hop)?

Anyway, even if you are using a hop-by-hop design, such systems depend for
their correct operation on everyone reaching the *same* conclusion on what
the exact path is from X to Y (otherwise loops can form). The easiest way to
do this is ensure that everyone i) has the same data, and ii) is using the
exact same algorithm to turn that data into routes. (Unless you can use some
high-powered proof to show that the results will always be the same, e.g. if
everyone is required to compute the optimal path across the graph, it's OK to
use different algorithms to do so, since that final state is precisely
defined, mathematically, so how you get there is not crucial.) So, whatever
your algorithm is, it better have that property, otherwise you loose...

	Noel