Re: [rrg] IRON-RANGER scalability and support for packets from non-upgraded networks

Robin Whittle <rw@firstpr.com.au> Thu, 18 March 2010 02:50 UTC

Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id CECAF3A6801 for <rrg@core3.amsl.com>; Wed, 17 Mar 2010 19:50:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.358
X-Spam-Level:
X-Spam-Status: No, score=0.358 tagged_above=-999 required=5 tests=[AWL=-1.891, BAYES_40=-0.185, DNS_FROM_OPENWHOIS=1.13, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, J_CHICKENPOX_14=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tEQu5OdfxX4j for <rrg@core3.amsl.com>; Wed, 17 Mar 2010 19:50:36 -0700 (PDT)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id 62A703A67A6 for <rrg@irtf.org>; Wed, 17 Mar 2010 19:50:33 -0700 (PDT)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id 2E560175CC0; Thu, 18 Mar 2010 13:50:42 +1100 (EST)
Message-ID: <4BA19503.1040008@firstpr.com.au>
Date: Thu, 18 Mar 2010 13:50:43 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
References: <C7B93DF3.4F45%tony.li@tony.li> <4B94617E.1010104@firstpr.com.au > <E1829B60731D1740BB7A0626B4FAF0A649511933 94@XCH-NW-01V.nw.nos.boeing.co m > <4B953EA5.4090707@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A6495 1 19 34CF@XCH-NW-01V.nw.nos.boeing.com> <4B97016B.5050506@firstpr.com.au> < E1 829B60731D1740BB7A0626B4FAF0A6495119413D@XCH-NW-01V.nw.nos.boeing.com> < 4B9 98826.9070104@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511 DCE A0@XCH-NW-01V.nw.nos.boeing.com> <4B9B0244.7010304@firstpr.com.au> <E18 29B60731D1740BB7A0626B4FAF0A649511DD102@XCH-NW-01V.nw.nos.boeing.com> <4B9F 6E22.60509@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511DD643@XCH-NW-01V.nw.nos.boeing.com> <4BA022A3.6060607@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511DD9B1@XCH-NW-01V.nw.nos.boeing.com>
In-Reply-To: <E1829B60731D1740BB7A0626B4FAF0A649511DD9B1@XCH-NW-01V.nw.nos.boeing.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: Re: [rrg] IRON-RANGER scalability and support for packets from non-upgraded networks
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Mar 2010 02:50:39 -0000

Short version:   Continuing discussions:

                    Is OSPF suitable for the I-R overlay?  BGP is
                    highly decentralised, but I am not sure this is
                    the case with OSPF.

                    Fred suggests  Virtual Router Redundancy Protocol
                    (VRRP) RFC 5798 for the task of "DEL" routers
                    registering their EID prefixes with a handful of
                    VP routers.  My initial impression is that this
                    won't work because it requires multicast - which
                    I think would be impossible or at least
                    unscalable on a global overlay network as I-R
                    requires.

                    I give some names to the various roles I think
                    IRON routers might have, and consider what
                    combinations of roles might be valid.



Hi Fred,

Continuing our interesting conversation on the design of IRON-RANGER,
you wrote:

>> I assumed that these "DITR-like" routers were not necessarily VP routers.
> 
> Correct; these routers (IDMs) may also be VP routers on
> the IRON but need not be. So, we have three classes of
> IRON routers: 1) VP routers, 2) IDMs, and 3) both.

I understand the population of IRON routers can be classified
according to their roles.  I am giving some new names to these roles.

I think it is important to invent names for concepts in a new design,
otherwise we have to use phrases which are longer, and might be
written in different ways when they really refer to the same thing.

   DEL Delivers packets to one or more nearby end-user networks, and
       so needs to register the one or more EID prefixes this
       involves with VP routers (VPRs).  (Typically 2 VPRs, but
       I tend to think of 2, 3 or 4 or so for robustness.)


   LFR Local Forwarding Router.  Advertises a few prefixes covering
       all I-R "edge" space in the local routing system of the
       network it is located in. For the purposes of discussion, I
       will assume this is an ISP network, but it could be the
       network of a large end-user network which has its own AS and
       participated in the interdomain routing system.

       This router is not advertising anything outside the ISP
       network - so it is not "advertising I-R edge space in the
       DFZ".

       Any packets addressed to I-R edge space (that is to any
       EID prefix used by an I-R-using end-user network) will
       go to this router rather than to the ISP's BRs.  So this
       router needs to tunnel it to one of the VPRs for the VP this
       EID address is within.  That VPR doesn't have to be the
       "closest" of the two or more VPRs, but if you use BGP in the
       overlay network then it typically will be the "closest" in BGP
       terms - which would be desirable, to reduce path lengths and
       delays for the one or more initial packets which go via the
       VPR.  That VPR will tunnel the packet to the IRON router
       playing the DEL role.  The VPR will also send mapping to
       this LFR-role router which will subsequently tunnel further
       traffic packets whose destination address matches the EID
       prefix in the mapping to one of the IRON routers which are
       playing the DEL role for this prefix.

       So this LFR role involves the IRON router knowing the address
       of at least one VPR for every VP.  It finds this out, in the
       current design, from the BGP best path it gets from the I-R
       overlay BGP system.  However, it doesn't tunnel the packet via
       that overlay system - it tunnels it via the Internet.

       (This means the VPR must use its Internet IP address for its
       VP advertisements on the overlay network.)


   IDM IRON Default Mapper.  As for LFR, except this router is a
       BR of the ISP and advertises all I-R edge space to
       neighbouring ISPs - that is, to the DFZ.

       As you wrote below, IDMs also advertise "default" on the
       I-R overlay ("on the IRON") - but I don't understand the
       purpose of this.


   VPR Virtual Prefix Routers.  These advertise one or more Virtual
       Prefixes (VPs) on the I-R overlay.  Each VP covers multiple
       (thousands to millions in principle) individual EID prefixes,
       each of which is used by an end-user network via one or more
       IRON routers playing the DEL role.

       See LFR above for a description of the responsibilities of
       VPR routers regarding traffic packets.

       The VPR role also involves accepting registrations from DEL-
       role routers for all the EIDs covered by the VP.  The
       mechanisms for doing so are currently undefined, since we
       have been discussing the inability of the BGP overlay system
       to tell each DEL router the addresses of all VPRs for the
       VP which matches the DEL router's EID prefix.  The new
       arrangements may involve (at least I suggested this as a
       possibility) the VPRs for a given VP working together to
       share registration information.  I figure they will be run
       by the one organization, or at least this VPR role for a given
       VP will be controlled by the organization which runs the VP -
       so they will presumably be coordinated in some way.  This
       would probably mean they don't need to automatically discover
       the other VPR-role routers which are handling a given VP.


As best I understand it, any IRON router can perform the DEL role -
its just a matter of somehow configuring it to initiate the
registration process for an EID prefix for an end-user network it
can deliver packets to.

As far as I know, IRON routers are typically not DFZ routers and are
(to a rough approximation) not BRs - so they typically perform the
LFR role as well.   (A BR could still perform the LFR role, by not
advertising the I-R edge space to other ASes - only within its own
network.)  Its just that a router which is not a BR can't advertise
the edge prefixes in the DFZ, and so can't perform the IDM role.

However, a subset of IRON routers are BRs and are also configured to
perform the IDM role.  While a router could perform purely this IDM
role and not advertise the edge prefixes locally, I will assume this
would not be typical.

A VPR need not be a BR.  It need not perform any other roles, but I
guess it typically would perform some, such as DEL.

Assuming that all IRON routers will, or could, perform the DEL role,
here are the various combinations:

   LFR?   IDM?   VPR?   BR?

 0 -      -      -      Maybe    Just playing the DEL role.

 1 -      -      VPR    Maybe    Also playing the VPR role.

 2 -      IDM    -      Yes      Just playing DEL and IDM roles  -
                                 but for some non-obvious reason not
                                 advertising I-R edge space to local
                                 routing system.

 3 -      IDM    VPR    Yes      As for 2, but also VPR role.


 4 LFR    -      -      Maybe    DEL and accepting packets from the
                                 local network too.

 5 LFR    -      VPR    Maybe    As for 1, but also accepting packets
                                 from the local network.

 6 LFR    IDM    -      Yes      As for 2, but also accepting packets
                                 from the local network.

 7 LFR    IDM    VPR    Yes      As for 3, but also accepting packets
                                 from the local network.


>> Here is my understanding on what you just wrote:
>>
>>> The more I think about it, the more these specialized
>>> VP routers
>>
>> I think you mean the "DITR-like" routers are VP routers. Later you
>> refer to these as "IRON Default Mappers (IDMs)".  I had assumed they
>> either were not VP routers, or that they need not be VP routers.
> 
> The latter - IDMs need not also be VP routers, but they
> could be.

OK.


>> However, this part:
>>
>>> On the IRON, they advertise "default"
>>
>> makes no sense to me.  I don't recall any IRON router advertising
>> "default" on the IRON overlay network.  I understand that a VP router
>> advertises its one or more VPs.
> 
> Yes; this is new. By having the IDMs connected to the DFZ
> advertise "default" on the IRON, other IRON routers that do
> not connect to the DFZ can discover a nearby IDM that can
> reach the non-upgraded IPv6 Internet.

Assuming all IRON routers are IPv6 routers, why would they need to
find another IRON router via the overlay network which could deliver
packets to any IPv6 address?

I think the reasoning for this must come from your mixed IPv4 / IPv6
plans, which I have tried to avoid thinking about so far.

Can you explain more about your vision for this?


>>>> They are going to be busy, depending on where they are located, the
>>>> traffic patterns, how many of them there are etc.   So they need to
>>>> be able to handle the cached mapping of some potentially large number
>>>> of I-R end-user network prefixes.
>>>
>>> In the case of IPv6, I think whether the IRON Default
>>> Mappers (IDMs) will be very busy depends on how large
>>> the IPv6 DFZ becomes. In my understanding, the IPv6 DFZ
>>> is not very big yet. So, if most IPv6 growth occurs in
>>> the IRON and not in the IPv6 DFZ the packet forwarding
>>> load on the IDMs might not be so great.
>>
>> This would only be true if you could convince most networks adopting
>> IPv6 to adopt I-R at the same time.
> 
> Well, now is the time to put forward the case for
> handling new IPv6 growth in the IRON instead of in
> the IPv6 DFZ. Otherwise, once growth in the IPv6
> DFZ takes off and we start to see significant PI
> addressing and multihoming, we will eventually
> end up in the same boat we are in with the IPv4
> DFZ today.

OK.  But I still prefer Ivip for IPv6 since it will be able to give
end-user networks, or their appointees, real-time control of
tunneling behavior.  This will be advantageous for real-time
responsive inbound TE and for quickly getting all traffic packets to
the newly selected TTR (Translating Tunnel Router) in TTR Mobility -
so the MN can quickly drop the tunnel it made to the previous TTR.


>>> The term "bubbles" came from teredo (RFC4380). Maybe we can
>>> think of a better term to use for IRON-RANGER?
>>
>> OK.  I don't think "bubbles" is appropriate for the registration
>> methods you have described so far, or that I have suggested.
> 
> OK. How about Channel Queries (CQs)?

I don't see any "channels" and it doesn't look like a "query".

In my nomenclature, it is a DEL router registering an EID prefix (I
think this is the term you use in I-R) with a VPR because this VPR is
one of the typically two or more VPRs which handle this VP.

What about "EID Registration Message" - ERM?


>>>> I am definitely not going to try to think about mixed IPv4/v6
>>>> implementations of I-R.  I can handle thinking about purely IPv4 and
>>>> purely IPv6.
>>>
>>> I choose to think of mixed IPv4/IPv6 for at least three
>>> reasons:
>>>
>>> 1) We already have global deployment of IPv4, and that won't
>>>    go away overnight when IPv6 begins to deploy.
>>
>> I agree.
>>
>>> 2) IPv4 is fully built-out, so new growth will come via IPv6.
>>
>> I don't agree with this at all.  I think there's plenty of scope for
>> more growth in the IPv4 Internet.  Fig. 11 at:
>>
>>   http://www.potaroo.net/tools/ipv4/
>>
>> shows 130 /8s worth of space is currently advertised.  Fig. 5 shows
>> this in more detail.  Of the /8s to to 223, a handful can't be used
>> (127, 0 maybe).  There are still a bunch of /8s which are
>> unadvertised.  As time progresses, this space will be too valuable to
>> use internally, probably inefficiently - so I expect quite a lot of
>> that will be made available and advertised too.
> 
> OK, but how bad would it be if we just let IPv4 address
> depletion run out under the current system, then jack up
> to IPv6 in parallel to handle PI addressing and multihoming?

I am interested in exploring the IRON-RANGER design - including for
mixed IPv4 and IPv6, because I find this stuff generally interesting
and a good way to learn about scalable routing.  Maybe at the end of
this process I might think that IRON-RANGER is practical and in some
ways desirable compared to Ivip or LISP, which are the only two I
consider potentially practical or desirable at present.  (msg06219)

A likely outcome is that this process will prompt me to think of
improvements to Ivip - since previous improvements to Ivip came from
thinking about other proposals, not from a conscious effort to
improve Ivip.

However, I think it is wildly unrealistic to assume that IPv4 will
die or become anything but *the* Internet everyone relies upon for a
very long time, perhaps forever.  I am not saying this is a good thing.

If you can articulate your vision for mixed IPv4 and IPv6 IRON-RANGER
operation, I can go along with it.  But I don't believe at all that
IPv6 will take over from IPv4 for most end-users before 2020.  As I
mentioned, there's still a lot of unused advertised space - and (I
assume) unused unadvertised - global unicast IPv4 address space.

I can't envisage a situation where it will be better to sell ordinary
(non-mobile) users purely an IPv6 service, without even behind-NAT
IPv4 connectivity, than to sell them a service which is either a
single global unicast IPv4 address or behind-NAT IPv4.

Mobile users could be different, since many functions and services
suitable for hand-held cellphone-like devices could be done via IPv6
- and since there would always be an option to tunnel through IPv6 to
an IPv4 NAT box so people can run client-style IPv4 applications on
their MN when they want to.


>> Then there are ways of using space more efficiently, as Ivip, LISP
>> and probably IRON-RANGER could do, by slicing and dicing it into much
>> smaller chunks than is possible with the /24 limit on prefixes in the
>> DFZ.
> 
> OK.

So to me, a successful implementation of IRON-RANGER would be as good
as Ivip or LISP in enabling really high levels of address utilization
in IPv4.  This will considerably extend the ability of IPv4 to handle
new users, including new end-user networks which need real global
unicast space (not behind-NAT) because they are running servers.


>> I think that most growth in Internet usage will occur in the IPv4
>> Internet for at least the rest of this decade.  The only time it
>> would make sense to use IPv6 instead of direct IPv4 or IPv4 behind
>> NAT would be for some service where it wasn't important to be able to
>> connect to IPv4.  At present, you couldn't sell any such service. I
>> guess that it may be possible to do this for large IP cell-phone
>> deployments where there are enough IPv6 services available to do a
>> reasonable subset of what people want in a hand-held device, and
>> where tunneling to a server which provides behind-NAT IPv4
>> connectivity would also be possible.
> 
> I agree that the IPv4 Internet is not only not going away
> but also continuing to grow. But, I still think that users
> will want to have both IPv4 (behind NAT if necessary) and
> IPv6 as we move forward from here.

At present, there's only one scenario in which I can imagine there
being a real demand among non-mobile customers for IPv6.  Let's say
that one or more large mobile phone companies decides to make their
new, or existing, 3G systems work with each MN having its own global
unicast IPv6 address (or perhaps /64).   This would enable direct
host-to-host connectivity between any of these MNs.  (Though carriers
typically want to avoid this, to stop people running VoIP and instead
to use their voice call services, for which they charge more than
they can for basic IP connectivity).

Now let's say there are hundreds of millions or billions of these
MNs, each with its own global unicast IPv6 address.  That address
could be stable as long as the MN is in the one carrier network.  If
it roams to another network, it would probably get another address.
However, the TTR Mobility system would fix this - and give each MN
its own permanent /64, no matter how it connected to the Net, as long
as it is via IPv6.  (I do not currently plan any connections between
Ivip or TTR Mobility for IPv4 and IPv6 - best to keep them as
separate systems.)

In this situation, people on non-mobile networks would have a genuine
reason to get native IPv6 connectivity.  Firstly, they might want to
sell or give services to these MN users.  Secondly, from home, they
might want to run a web-cam, file sharing, VPN or whatever which the
MN could access directly, on a host-to-host basis, without mucking
around with IPv4.

So I can imagine this trend happening - but only once there are a
substantial number of ordinary users with native IPv6 connectivity.
I guess this is most likely to occur with cell-phones.


>>> 3) IPv6 addresses can embed IPv4 addresses such that there
>>>    is stateless address mapping between an EID nexthop and
>>>    an RLOC.
>>
>> Can you explain this with an example?  I can't clearly envisage what
>> you mean.
> 
> I mean, if the IPv6 EID FIB includes entries with a next-hop
> address such as: 'fe80::5efe:V4ADDR' (i.e., an IPv6 address
> with embedded IPv4 address), then V4ADDR can be statelessly
> extracted as the RLOC address of the ETR.

So the "mapping", which the LFR-role and IDR-role routers get from
the VP router is actually telling them to tunnel subsequent traffic
packets to an IPv4 address?   That would only work if every LFR-role
and IDR-role router had IPv4 access - unless you were to establish
special routers to act as gateways for delivering to IPv4 addresses,
which is not out of the question.

Also, an IPv6 VPR would need to be able to do the same thing - tunnel
an IPv6 traffic packet to a DEL-role router which is actually on an
IPv4 address, but which is nonetheless delivering packets to an
end-user network which uses an IPv6 EID.

This could be done, I guess, but there are messy PMTUD problems to
solve.  I prefer not to think about such things, but for now can
imagine you might want to do this, and that you could devise a way of
doing it.



>>>> There are two reasons an IRON router M might need to know about which
>>>> other IRON routers A, B and C advertise a given VP:
>>>>
>>>>  1 - When M has a traffic packet.  (M is either an ordinary IRON
>>>>      router and advertises the I-R "edge" space in its own network
>>>>      or it is a "DITR-like" router advertising this space in the
>>>>      DFZ.)  M needs to tunnel the packet to one of these VP routers.
>>>>
>>>>      The VP router will tunnel it to the IRON router Z it chooses as
>>>>      the best one to deliver the packet to the destination network
>>>>      and will send a "mapping" packet to M which will cache this
>>>>      information and from then on tunnel packets matching the
>>>>      end-user network prefix in the "mapping" to Z (or some other
>>>>      IRON router like Z, if there were two or more in the "mapping").
>>>>
>>>>      In this case, M needs only the address of one of the A, B or C
>>>>      routers.  Ideally it would have the address of the closest one -
>>>>      but it doesn't matter too much if it has the address of a more
>>>>      distant one.  That would involve a somewhat longer trip to the
>>>>      VP router, and perhaps a longer or shorter trip from there to Z.
>>>>      (This would typically be shorter than the path taken through
>>>>      LISP-ALT's overlay network.)
>>>>
>>>>      After M gets the "mapping", it tunnels traffic packets to Z - so
>>>>      the distance to the VP router no longer affects the path of
>>>>      traffic packets.
>>>>
>>>>      In this case, BGP on the overlay would be perfectly good - since
>>>>      it provides the best path to one of A, B or C - typically that
>>>>      of the "closest" (in BGP terms).
>>>>
>>>>
>>>>  2 - When M is one of potentially multiple IRON routers which
>>>>      delivers packets to a given end-user network - packets whose
>>>>      destination address matches a given end-user network prefix P.
>>>>
>>>>      M needs to "blow bubbles" (highly technical term from this
>>>>      R&D phase of IRON-RANGER) to A, B and C.  The most obvious
>>>>      way to do this is for M to be able to know, via the overlay
>>>>      network the addresses of all VP routers which advertise a given
>>>>      VP.  There may be two or three or a few more of these.  They
>>>>      could be anywhere in the world.
>>>>
>>>>      BGP does not appear to be a suitable mechanism for this, since
>>>>      its "best path" basic functions would only provide M with
>>>>      the IP address of one of A, B and C.
>>>>
>>>>      You could do it with BGP, by having A, B and C all know about
>>>>      each other, and with all three sending everything they get to
>>>>      the others.  This is not too bad in scaling terms for two,
>>>>      three of four such VP routers.
>>>>
>>>>      Then, M sends its registration to one of them - whichever it
>>>>      gets the address of via the BGP of the overlay network - and
>>>>      A, B and C compare notes so they all get the registration.
>>>>
>>>>      I will call this the "VP router flooding system".
>>>
>>> This is a nice idea. If I get what you are suggesting, each
>>> IRON router that advertises the same VP (e.g., VP(x)) would
>>> need to engage in a routing protocol instance with one
>>> another to track all of the PI prefix registrations. The
>>> problem I have with it is that that would make for perhaps
>>> 10^5 or more of these little routing protocol instances as
>>> well as lots and lots of manually-configured peering
>>> arrangements between the IRON routers that advertise VP(x).
>>
>> Something like this - but I am not sure what you mean by "routing
>> protocol instance".  I understand that the two or three VP routers
>> for any one VP "P" do need to cooperate and share their various
>> registrations.  You could either create a fresh protocol to do this,
>> or push into service some existing protocol, including perhaps a
>> routing protocol.
> 
> We haven't brought the Virtual Router Redundancy Protocol (VRRP)
> into discussion yet [RFC5798], but we might want to consider
> looking at this as a way of providing fault tolerance for VP
> routers. I'm not sure whether VRRP would also support load
> balancing between the multiple routers, but it seems like
> fault tolerance is the dominant consideration.

I agree - fault tolerance is more important than load balancing at
this stage of the design, though some form of load balancing might be
possible and desirable too.

I don't want to try to read this RFC in order to imagine how it might
work with I-R, so if you can describe how it would work, that would
be good.


> Using VRRP also reduces the "fanout" of VP-advertising routers
> to just a single RLOC address, and so makes for less complexity
> in ferrying CQs around the IRON.

But if all VPRs are on the one IP address, this would radically alter
the nature of the overlay network.  Also a single router might be VPR
for multiple VPs - so I can't see how this would work.

A quick look into this RFC:

  http://tools.ietf.org/html/rfc5798#section-5.1.1.2

indicates that it relies on multicast.  I think VRRP is intended for
multiple routers in a single local network, where multicast could be
done.  I can't imagine how you could scalably implement multicast on
the I-R overlay network.

I think this illustrates our differing design approaches.  I think
you tend to view the subsystems from a very high level - and it if
looks like one might do the trick, you consider it.  I immediately
want to know whether it is possible to do such things, and in this
case, it took me a few minutes with a protocol I had never heard of
to find a "lower level" detail which seems to preclude its use in the
way you intend.

I am not suggesting my approach is always the best - because I think
it is important to brainstorm ideas and think loosely for a while.
Too much "no, it can't be done" thinking too soon results in there
being nothing to explore.


>> You haven't specified anything other than manual configuration for
>> how an IRON router becomes a VP router.  VP routers have extra
>> workload, so whoever runs such a router must have a reason to do
>> this, probably involving payment of money in some way from the
>> end-user networks whose EID prefixes are covered by this VP.
> 
> Yes. End-users have to pay either a one-time or
> recurring cost for their PI prefixes.

OK - but what about the costs of running the IDMs, which will handle
widely varying traffic loads from one EID to the next, with these
loads generally having little correlation with the amount of space in
the EID?


>> If there are two or three IRON routers acting as VP routers for a
>> given VP, then some organisation is responsible for that VP, is
>> collecting payments as described above and is therefore the one
>> organisation driving the existence of these two or three VP routers.
>>  So manual configuration seems OK to me - I don't think there needs
>> to be a fancy automated system by which one VP router for a given VP
>> "P" would auto-discover any other VP router for "P" in the whole I-R
>> system.  However, these VP routers for the one VP do need to work
>> together to share registrations, and to quickly detect when one or
>> more of the set becomes unreachable.
> 
> VRRP maybe?

Since it appears to involve multicast, maybe not.

It shouldn't be too hard to develop a protocol by which a handful of
VPRs work together.  Maybe some existing protocols can be used as
part of this.


>>> For these reasons, I believe it is better for IRON router
>>> M to know about all three of A, B and C and direct bubbles
>>> to each of them. I think we can achieve this using OSPF
>>> with the NBMA link model in the IRON overlay.
>>
>> OK - but I guess that means not running BGP.  I don't know anything
>> about OSPF or its scaling properties.  BGP has no central
>> coordination - something which is understandably attractive to many
>> people.  Does OSPF have central coordination, single points of
>> failure etc.?
> 
> In this case, central coordination would be through
> maintenance of the domainname-to-RLOC mappings for
> the FQDN "isatapv2.net". In other words, when a new
> IDM comes into existence its RLOC address gets added
> to the DNS RR's for "isatapv2.net". In the same way,
> when an existing IDM is decommissioned its RLOC address
> is removed.
> 
> Currently, "isatapv2.net" is registered to me. Do you
> trust me to maintain it properly? :^}

Sure!

Whoever runs it needs to have some fancy way of recognising or
rejecting attempts to register whatever needs to be in this branch of
the DNS.

But how is OSPF structured?  BGP is flat and egalitarian, with links
between nearby routers all that is required - and of course care
about which prefixes are advertised.

You could chop the whole BGP-based interdomain routing system into
two or more pieces and they would keep running, just fine - although
of course each only with a subset of the prefixes.

I quick look at:

  http://en.wikipedia.org/wiki/OSPF

and the IPv4 RFC:

  http://tools.ietf.org/html/rfc2328#page-19

indicates that a large OSPF network is organised into various areas.
 How would you do this for the IRON-RANGER overlay network?  Don't
OSPF and ISIS require more centralised administration, such as to
structure the whole system into sub-systems and to give certain
routers particular roles, on which other routers depend?

I haven't read the OSPF article, but my impression is that it is a
valuable resource, with Wbenton:

  http://en.wikipedia.org/wiki/User:Wbenton-test

contributing many things, not least a formidable table and diagram of
interdependencies between RFCs.  The diagram looks like it needs it
own routing protocol!


>>> Please note: the EID-based IRON overlay is configured over
>>> the DFZ, which is using BGP to disseminate RLOC-based
>>> prefix information. So, it is BGP in the underlay and
>>> OSPF in the overlay - weird, but I think it works.
>>
>> Yes the DFZ uses BGP and the overlay uses . . . originally I-R used
>> BGP (a separate instance of BGP in each such router).  Also, IRON
>> routers don't need to be DFZ routers and in many or most cases are
>> not DFZ (BR) routers - but they all communicate via tunnels which are
>> carried between networks via the ordinary Internet (using the DFZ).
>>
>> I guess these tunnels between IRON routers will need to be manually
>> configured, since they are typically between physically and
>> topologically nearby routers.
> 
> No manual config needed; the IRON is just a gigantic NBMA
> link, and can use automatic tunneling the same as for VET
> and ISATAP.

But it is important for IRON routers to run their new BGP instance
with neighbouring IRON routers which are generally physically or
topologically close.  Otherwise, the "distance" metrics in the
overlay network won't resemble the real "distance" to the other
routers, and your routers playing the LFR or IDM role won't
automatically discover the address of the "closest" VPR for a given VP.

These tunnels surely need to be manually configured - and that
defines the membership in the I-R overlay network and its structure
for the purposes of its BGP (or OSPF?) control plane.


>>>>>> Also, this is just for 10 minute registrations.  I recall that the 10
>>>>>> minute time is directly related to the worst-case (10 minute) and
>>>>>> average (5 minute) multihoming service restoration time, as per our
>>>>>> previous discussions.  I think that these are rather long times.
>>>>>
>>>>> Well, let's touch on this a moment. The real mechanism
>>>>> used for multihoming service restoration is Neighbor
>>>>> Unreachability Detection. Neighbor Unreachability
>>>>> Detection uses "hints of forward progress" to tell if
>>>>> a neighbor has gone unreachable, and uses a default
>>>>> staletime of 30sec after which a reachability probe
>>>>> must be sent. This staletime can be cranked down even
>>>>> further if there needs to be a more timely response to
>>>>> path failure. This means that the PI prefix-refreshing
>>>>> "bubbles" can be spaced out much longer - perhaps 1 every
>>>>> 10hrs instead of 10min. (Maybe even 1 every 10 days!)
>>>>
>>>> OK, I am not sure if I ever knew the details of "Neighbor
>>>> Unreachability Detection" - but shortening the time for these
>>>> mechanisms raises its own scaling problems.
>>>>
>>>> Can you give some examples of how this would work?
>>>
>>> I want to go back on this notion of extended inter-bubble
>>> intervals, and return to something shorter like 600sec
>>> or even 60sec. There needs to be a timely flow of bubbles
>>> in case one or a few IRON routers goes down and needs to
>>> have its PI prefix registrations refreshed.
>>
>> OK - I will stay tuned for further details.
> 
> Bringing VRRP into the consideration could have a
> contributing factor to how long the bubble (er, CQ)
> interval needs to be.

I regard the whole question of registering EIDs with VPRs as being
undecided until you propose an exact mechanism.


>>>> At present, I can see these choices for this registration mechanism:
>>>>
>>>>   1 - Keep BGP as the overlay protocol and use my proposed "VP router
>>>>       flooding system".
>>>>
>>>>   2 - Retain your current plan of each IRON router like M needing to
>>>>       know the addresses of all the routers handing a given VP (A, B
>>>>       and C) which BGP can't do.  So you could:
>>>>
>>>>       2a - keep BGP and add some other mechanism.  Maybe M sends a
>>>>            message to the one of A, B or C it has a best path to,
>>>>            requesting the full list of all routers A, B and C which
>>>>            handle a given VP.  When M gets the list, it sends
>>>>            registration "bubbles" to the routers on the list.  This
>>>>            needs to be repeated from time-to-time to discover
>>>>            new VP routers.
>>>>
>>>>       2b - use something different from BGP which provides all the
>>>>            A, B and C router addresses to every IRON router, such as
>>>>            M.  This needs to dynamically change as A, B and C die and
>>>>            are restarted, or joined by others.
>>>
>>> Right - I am still leaning toward OSPF with its NBMA
>>> link model capabilities. The good news is that the
>>> IRON topology itself should be relatively stable, so
>>> not much churn due to dynamic updates.
>>
>> OK.  Since the IRON routers have their own IP addresses and are
>> generally in networks multihomed by existing BGP techniques, then any
>> outages don't affect the IRON routers' IP addresses or their
>> tunneling arrangements.  There would still be transitory breaks in
>> connectivity, before the BGP multihoming arrangements kick in.  If
>> you could ignore those by some means in the overlay's routing system
>> (BGP or OSPF) then yes, the IRON routers should be pretty stable.
> 
> With VRRP, probably even moreso.

Or with your own purpose-designed protocol involving one, two or a
few more IRON routers in their DEL-roles registering the one EID with
two or maybe a few more VPRs.

  - Robin