Re: [rrg] Constraints due to the need for widespread voluntary adoption

jnc@mercury.lcs.mit.edu (Noel Chiappa) Fri, 04 December 2009 17:34 UTC

Return-Path: <jnc@mercury.lcs.mit.edu>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 385C03A676A for <rrg@core3.amsl.com>; Fri, 4 Dec 2009 09:34:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.402
X-Spam-Level:
X-Spam-Status: No, score=-6.402 tagged_above=-999 required=5 tests=[AWL=0.197, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mEUzKFQa9IA7 for <rrg@core3.amsl.com>; Fri, 4 Dec 2009 09:34:43 -0800 (PST)
Received: from mercury.lcs.mit.edu (mercury.lcs.mit.edu [18.26.0.122]) by core3.amsl.com (Postfix) with ESMTP id E1FE43A682B for <rrg@irtf.org>; Fri, 4 Dec 2009 09:34:42 -0800 (PST)
Received: by mercury.lcs.mit.edu (Postfix, from userid 11178) id 3A9DA6BE583; Fri, 4 Dec 2009 12:34:33 -0500 (EST)
To: rrg@irtf.org
Message-Id: <20091204173433.3A9DA6BE583@mercury.lcs.mit.edu>
Date: Fri, 04 Dec 2009 12:34:33 -0500
From: jnc@mercury.lcs.mit.edu
Cc: jnc@mercury.lcs.mit.edu
Subject: Re: [rrg] Constraints due to the need for widespread voluntary adoption
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Dec 2009 17:34:44 -0000

    > From: Patrick Frejborg <pfrejborg@gmail.com> architecture .

    > here I do get headache

Sorry... :-)

    > I think this is an architectural question.

The point your raise immediately below, very definitely yes! :-)

    > In order to get this working you need to combine two architectures -
    > the routing architecture and a mapping database architecture .

Well, not necessarily. Here is how I look at the problem (sorry if this is a
bit of a diversion, but I want to explain the framework in which I am
thinking about this problem - it helps me avoid headaches :-).

To me, designs which encapsulate and decapsulate packets which are sent
across some existing substrate are best thought of as a new packet-switching
layer, one built on top of an existing PS layer. (This is a circumstance we
have seen before, e.g. in IP over X.25 networks, or over the ARPANET, etc,
etc.)

Now, packet-switching systems all have common problems they have to solve:
selecting the next hop, making sure that next-hop is up and reachable (i.e.
packets from the first box can successfully get to the second box across
whatever is linking the two), etc.

So, the new encapsulating layer system has to solve all these classical
packet-switching problems. To do so, it can either build its own 'native'
mechanisms to do so (i.e. direct inter-device exchanges among the set of
encapsulating and/or decapsulating devices), or it can try and 'tap into'
existing mechanisms at the layer below to perform these functions.

Which way to go is a complex question, which includes questions like 'how
expensive is a native mechanism' (a particular concern for overlay systems,
which may have thousands of 'direct' neighbours at their level, unlike
lower-level systems built directly on the hardware, which likely have much
smaller numbers of direct neighbours); 'is the information in the lower layer
even accessible' (in some cases it is not, such as the ARPANET); 'can the
lower-layer system really do what I need' (and sometimes the answer is 'no' -
or at least not 'yes, with a high enough level of reliabiility for my
purposes'); etc.

So, in this framework, your question can be recast as a series of questions,
such as 'is the information lower layer's routing system accessible to the
higher layer' (yes); 'does the lower-level's routing architecture provide all
the information I need at the higher level' (depending on the level of
reliability you need for things like reachability, perhaps not); etc.


    > You have a mapping database but that database have no information about
    > an ETR's availability ... unless you integrate the current routing
    > architecture with the mapping database. If the link between the ETR and
    > DFZ is lost the mapping database need to know that, i.e. the routing
    > protocol must inform the database.

There are two _separate_ functions happening at the encapsulating layer:
path-selection, and neighbour liveness. Many (most?) routing systems do not
clearly separate these functions, but this all becomes easier to think about
- and engineer for - if you do.

Now, clearly, there is a feedback from the second to the first: there is no
point in selecting a neighbour as a next hop if you cannot reach it. However,
particularly in a encapsulating system (which have a mapping subsystem which
might not be as dynamic as one would like for path selection), there is value
to separating the two.

The mapping output produces a set of 'plausible' next hops for that ultimate
destination. Each of that set is tested to see if it is reachable. If not, it
is discarded as a plausible next hop for that (in fact, for all) ultimate
destinations. This test can either use some existing lower-level mechanism
(e.g. the routing, at the layer below), or a 'native' mechanism at the
encapsulation system layer - e.g. a direct 'ping'.

There is no absolute need to update the mapping system's data to indicate
that one of the 'next hops' it lists is down, provided that there is a
liveness check on actually using one of those listed potential next hops.


    > after that the database members needs to inform their ITRs so that the
    > ITRs that have ongoing sessions to the affected ETR will flush their
    > cache and replace the entry to the secondary ETR. So you will have a
    > redistribution from routing protocol -> mapping database -> routing
    > protocol, instead of having BGP churn you could have cache churn
    > instead.

It is indeed the case that if a mapping is updated, entities which have
cached copies of that mapping need to find out that the mapping has been
updated. This is not an insolube engineering problem, and the exact details
of the solution will depend on where copies may be cached: e.g. only in
devices which are directly communicating with the entities named in the
mapping, e.g. ITRs; or perhaps in other places as well.


    > If you prefer to avoid the redistribution you could let the routing
    > architecture take care to inform the ITR of ETRs availability.

See above comments about two separate functions, etc.

    > But that will have negative impact on the DFZ. Today the EID in a
    > multi-homing solution usually creates a /20 prefix - regardless of how
    > many ISP connections it uses. But by using ETRs each attachment point
     will create a /32 entry in the DFZ

You seem to be assuming that the only way for encapsulating-layer devices to
perform the kind of control functionality they need at their layer is by
using lower-layer mechanisms (e.g. routing). This is not necessary. There
does not need to be a separate entry for each ETR in the routing (as would
indeed be true if the lower-layer routing were the _only_ mechanism available
to the higher layer).

    > So it seems that you have to do a redistribution of routing protocol ->
    > mapping database -> routing protocol

At an abstract architectural level, when one see/proposes such complex
interactions (particularly involving real-time feedback) between various
subsystems, one's 'architectural bad idea alarm bell' should go off... That
kind of thing is prone to problems, as it's hard to model how it will
operate, particularly in complex configurations.

It's better to adopt a design philosophy in which the interactions are
simpler, and do not involve dependency loops. The approach above (the mapping
provides a set of potential next hops, and another mechanisms selects which
ones are actually reachable) does that.


    > And the current DNS system is quite slow to update

Which would indicate that keeping the _mappings_ in the DNS would not be
good, if you want to be able to change them, and have the changes propogated
quickly.

There has been discussion of hybrid systems in which the DNS instead stores
information about the entities which are authoritative for a given mapping,
not the mappings themselves; the distribution of mappings themselves is part
of the new encapsulation system. This does mean that authoritative mapping
servers cannot be quickly added/removed, but their dynamicity is likely to be
fairly low.

This allows questions such as the cache updating problem you raised above,
etc to be handled; since that part of the overall mechanism is in the new
subsystem, it can be designed accordingly, to meet whatever performance goals
are desirable.


    > what happened in Sweden was an engineering issue but to get the
    > services restored was due to the architecture of the system

I am unfamiliar with this case, but will review it - study of history is
perhaps the single most important tool for a system architect.

    > one hour or longer is really not what multi-homed enterprises are
    > expecting in failure cases

Indeed, and understandably. However, a correct combination of architecture
and engineering should provide much better performance at a 'reasonable'
cost.

	Noel