Re: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks

"Templin, Fred L" <Fred.L.Templin@boeing.com> Fri, 19 March 2010 19:22 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9920D3A6943 for <rrg@core3.amsl.com>; Fri, 19 Mar 2010 12:22:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.503
X-Spam-Level:
X-Spam-Status: No, score=-5.503 tagged_above=-999 required=5 tests=[AWL=-0.634, BAYES_00=-2.599, DNS_FROM_OPENWHOIS=1.13, J_CHICKENPOX_14=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id otP5kvOf-q9b for <rrg@core3.amsl.com>; Fri, 19 Mar 2010 12:21:58 -0700 (PDT)
Received: from blv-smtpout-01.boeing.com (blv-smtpout-01.boeing.com [130.76.32.69]) by core3.amsl.com (Postfix) with ESMTP id 68CFC3A67B1 for <rrg@irtf.org>; Fri, 19 Mar 2010 12:21:35 -0700 (PDT)
Received: from stl-av-01.boeing.com (stl-av-01.boeing.com [192.76.190.6]) by blv-smtpout-01.ns.cs.boeing.com (8.14.4/8.14.4/8.14.4/SMTPOUT) with ESMTP id o2JJLdwv026589 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Fri, 19 Mar 2010 12:21:43 -0700 (PDT)
Received: from stl-av-01.boeing.com (localhost [127.0.0.1]) by stl-av-01.boeing.com (8.14.4/8.14.4/DOWNSTREAM_RELAY) with ESMTP id o2JJLdaF019479; Fri, 19 Mar 2010 14:21:39 -0500 (CDT)
Received: from XCH-NWHT-06.nw.nos.boeing.com (xch-nwht-06.nw.nos.boeing.com [130.247.25.110]) by stl-av-01.boeing.com (8.14.4/8.14.4/UPSTREAM_RELAY) with ESMTP id o2JJLccl019464 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=OK); Fri, 19 Mar 2010 14:21:38 -0500 (CDT)
Received: from XCH-NW-01V.nw.nos.boeing.com ([130.247.64.120]) by XCH-NWHT-06.nw.nos.boeing.com ([130.247.25.110]) with mapi; Fri, 19 Mar 2010 12:21:38 -0700
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Robin Whittle <rw@firstpr.com.au>, RRG <rrg@irtf.org>
Date: Fri, 19 Mar 2010 12:21:38 -0700
Thread-Topic: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks
Thread-Index: AcrHBOBAICfnUaKmRRO/tSDlnJf5rwAhurQg
Message-ID: <E1829B60731D1740BB7A0626B4FAF0A64951224C5A@XCH-NW-01V.nw.nos.boeing.com>
References: <C7B93DF3.4F45%tony.li@tony.li> <4B94617E.1010104@firstpr.com.au > <E1829B60731D1740BB7A0626B4FAF0A649511933 94@XCH-NW-01V.nw.nos.boeing.co m > <4B953EA5.4090707@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A6495 1 19 34CF@XCH-NW-01V.nw.nos.boeing.com> <4B97016B.5050506@firstpr.com.au> < E1 829B60731D1740BB7A0626B4FAF0A6495119413D@XCH-NW-01V.nw.nos.boeing.com> < 4B9 98826.9070104@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511 DCE A0@XCH-NW-01V.nw.nos.boeing.com> <4B9B0244.7010304@firstpr.com.au> <E1 8 29B60731D1740BB7A0626B4FAF0A649511DD102@XCH-NW-01V.nw.nos.boeing.com> <4 B 9F 6E22.60509@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511DD643 @X CH-NW-01V.nw.nos.boeing.com> <4BA022A3.6060607@firstpr.com.au> <E1829B60 731D1740BB7A0626B4FAF0A649511DD9B1@XCH-NW-01V.nw.nos.boeing.com> <4BA19503. 1040008@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649512248D1@XCH-NW-01V.nw.nos.boeing.com> <4BA2D58C.4050706@firstpr.com.au>
In-Reply-To: <4BA2D58C.4050706@firstpr.com.au>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Mar 2010 19:22:12 -0000

Hi Robin,

> -----Original Message-----
> From: Robin Whittle [mailto:rw@firstpr.com.au]
> Sent: Thursday, March 18, 2010 6:38 PM
> To: RRG
> Cc: Templin, Fred L
> Subject: Re: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks
>
> Hi Fred,
>
> I will try to use for your simpler arrangement for naming the IRON
> routers, but I still find it easiest to think and write in terms of
> some roles I mentioned, which have no exact name in your arrangement.

OK; I speak more to this below.

> When I referred to "BR" I mean a router in an AS which also connects
> to other ASes - so I was referring to its status in the interdomain
> routing system, not whether it was a "Border" router from the
> perspective of the I-R overlay network.  You use "BR" to refer to the
> router's status within the I-R network.

Actually, the IRON BR is a router that connects an enterprise
network to a different enterprise network - whether within
the same AS or in a different AS.

> I only spent a few minutes looking at VRRP - and without trying to
> understand how it uses multicast, I thought maybe multicast of any
> kind would be difficult or impossible on the I-R overlay network.  I
> wasn't suggesting this was definitely the case.

The assumption I have been making is that there is
a shared link in an underlying network over which the
VRRP multicasts are exchanged. This would not affect
the IRON in any way. But, I believe we can relax the
role of VRRP to where it can be considered as only
an optimization and not a fundamental underpinning
of the architecture.

> You wrote, in part:
>
> > Consider that all IRON
> > routers are IRON border routers (IBRs), in that they
> > connect zero or more EID-based enterprise networks to
> > the IRON. Each IBR:
> >
> >   - participates in the IRON overlay routing protocol
> >   - advertises zero or more VPs into the routing protocol
> >   - connects zero or more EID-based enterprise networks to
> >     the IRON
> >   - may or may not connect the IRON to the DFZ
> >
> > I choose to view this latter category as "gateways" from
> > the IRON to the DFZ, so I will call these as IBGs. So,
> > we now simply have only IBRs and IBGs.
>
> OK.  But those which advertise VPs have many extra responsibilities,
> so I think it is important to think of something like a VP Router, or
> the VP role or something as a distinct entity which some IBRs or IBGs
> have and which the rest don't.

Right; that's why I say an IBR can service _zero or more_
VPs.

> > What I am calling "Border Router (BR)" is any router that
> > can be used for getting off the IRON and onto either an EID-
> > based enterprise network or onto the DFZ.
>
> "getting off the IRON" implies to me that packets sent along the I-R
> overlay would be handled by this IRON router and forwarded to the
> EID-based enterprise network (what I referred to as and "EID-using
> end-user network") or towards some arbitrary host on the Internet
> (IPv4 and/or IPv6?) via forwarding to other routers (the DFZ).
>
> I don't understand this because the I-R overlay network doesn't
> handle traffic packets.

Not so - the IRON is used both for control plane and forwarding
plane, and both use ITE->ETE tunneling.

> It only handles BGP best path communications
> between the IRON routers so each IRON router can discover the IP
> address (the Internet address and I-R overlay addresses are the same)
> of routers which advertise in the I-R overlay some particular prefix.
>  The only IRON routers in the I-R overlay which advertise anything
> into the overlay are those which advertise a VP.  If two or more
> advertise a VP, then with BGP, every IRON router will get a best path
> to one of these VP routers, and so find out the IP address of one of
> them.

The IRON is used for:

  - BGP exchanges of VPs so that the next hop addresses
    of all IBRs that hold VPs can be determined
  - initial data packets sent to an IBR that holds a VP
    that covers the EID destination address prefix
  - secure redirection from the VP-holding IBR to an
    IBR that has a more-specific route to the final
    destination
  - subsequent forwarding plane traffic from an IBR's
    ITE to the ETE of an IBR that has a more-specific
    route to the final destination

> No traffic packets are sent over the I-R network.  When an IRON
> router performing the IBG or IBR role tunnels a packet to an IRON
> router which advertises a VP in the overlay, the tunnel is via the
> Internet.

The IRON is an overlay over the Internet, and it is used
for both control and forwarding plane.

> So I don't understand what you mean by:
>
>    > getting off the IRON and onto either an EID-
>    > based enterprise network or onto the DFZ.

Hopefully what I said above helped.

> (But see potential explanation a few paragraphs down.)
>
> > In this sense,
> > any router that "sinks" EID-addressed packets that do not
> > belong to either an EID-based enterprise network nor the
> > DFZ is also considered as a BR.
>
> I don't see how this would be needed.  No IRON router receives
> traffic packets on the overlay - the overlay is simply a mechanism by
> which IRON routers discover the "nearest" IRON router which is
> advertising a VP.  As I wrote previously (and as is quoted below in
> points 1 and 2), there are two reasons for doing this.  One is to
> tunnel a traffic packet to that VP router.  The other is to register
> an EID prefix with that VP router and with the other VP routers which
> advertise the VP which covers this EID prefix.

Let's say an IBR holds the VP 4000::/32. Of this VP, the
IBR has delegated many ::/56's to customers but many others
remain undelegated (e.g., 4000::/56). Now, suppose the IBR
receives a packet with destination address 4000:0:0:0::1.
Since this destination matches an undelegated prefix, the
IBR must know enough to discard the packet and send back
a "network unreachable" message of some sort. This is all
I am meaning to say when I said that the IBR "sinks" the
packet.

> >> However, a subset of IRON routers are BRs and are also configured to
> >> perform the IDM role.  While a router could perform purely this IDM
> >> role and not advertise the edge prefixes locally, I will assume this
> >> would not be typical.
> >
> > All IRON routers are BRs (IBRs). Some IBRs are also
> > gateways for getting off the IRON and onto the DFZ.
> > These are called IBGs.
>
> Oh - you mean the IBG advertises all I-R "edge" prefixes, (perhaps as
> one or a few prefixes in IPv6, though probably many would be required
> for IPv4) and that this means it acts like an Ivip DITR or LISP PTR -
> accepting traffic packets sent by other ASes which lack their own
> IRON routers and then tunneling them firstly to an VP router, and
> then (after the VP router sends back "mapping") to an IRON router
> which can deliver the packet to its destination network.

Correct.

> I wouldn't describe this as "getting off the IRON and onto the DFZ" -
> except in terms of the flow of advertisements of routes.  I tend to
> think more in terms of the flow of packets rather than the flow of
> information about routes to particular prefixes.

Flow of packets is what I am talking about. Data packets
traverse the IRON the same as for control message.

> >> A VPR need not be a BR.  It need not perform any other roles, but I
> >> guess it typically would perform some, such as DEL.
> >>
> >> Assuming that all IRON routers will, or could, perform the DEL role,
> >> here are the various combinations:
> >>
> >>    LFR?   IDM?   VPR?   BR?
> >>
> >>  0 -      -      -      Maybe    Just playing the DEL role.
> >>
> >>  1 -      -      VPR    Maybe    Also playing the VPR role.
> >>
> >>  2 -      IDM    -      Yes      Just playing DEL and IDM roles  -
> >>                                  but for some non-obvious reason not
> >>                                  advertising I-R edge space to local
> >>                                  routing system.
> >>
> >>  3 -      IDM    VPR    Yes      As for 2, but also VPR role.
> >>
> >>
> >>  4 LFR    -      -      Maybe    DEL and accepting packets from the
> >>                                  local network too.
> >>
> >>  5 LFR    -      VPR    Maybe    As for 1, but also accepting packets
> >>                                  from the local network.
> >>
> >>  6 LFR    IDM    -      Yes      As for 2, but also accepting packets
> >>                                  from the local network.
> >>
> >>  7 LFR    IDM    VPR    Yes      As for 3, but also accepting packets
> >>                                  from the local network.
> >
> > This gets way too complex, and I believe is greatly
> > simplified by what I said above.
>
> Yes, but now I have to write "and IRON router which advertises a VP"
> rather than "a VPR" which means the same thing - or an "IRON router
> which delivers packets to an EID-using enterprise network" rather
> than  a "DEL" router.

OK, but I really don't want a long list of disassociated
acronyms which are hard for the reader to remember. I'd
really rather stick with a single "base" TLA and have
variants of it based on function. From now on, I will
use IRON Router (IR) as the base acronyom and extend
it with parenthesized functions as follows:

  IR(VP) - an IR that advertises one or more VPs into
           the IRON BGP RIB

  IR(GW) - an IR that connects the IRON to the DFZ

  IR(EID) - an IR that connects a customer EID network

  IR(RR) - an IR that is a BG Route Reflector in the IRON

Note that it is possible for a single IBR to serve
*more than one* of these roles, but I think that it
is sufficient to discuss one function at a time so
I don't think we will need even more complex TLAs.

> >>>> Here is my understanding on what you just wrote:
> >>>>
> >>>>> The more I think about it, the more these specialized
> >>>>> VP routers
> >>>>
> >>>> I think you mean the "DITR-like" routers are VP routers. Later you
> >>>> refer to these as "IRON Default Mappers (IDMs)".  I had assumed they
> >>>> either were not VP routers, or that they need not be VP routers.
> >>>
> >>> The latter - IDMs need not also be VP routers, but they
> >>> could be.
> >>
> >> OK.
> >>
> >>
> >>>> However, this part:
> >>>>
> >>>>> On the IRON, they advertise "default"
> >>>>
> >>>> makes no sense to me.  I don't recall any IRON router advertising
> >>>> "default" on the IRON overlay network.  I understand that a VP router
> >>>> advertises its one or more VPs.
> >>>
> >>> Yes; this is new. By having the IDMs connected to the DFZ
> >>> advertise "default" on the IRON, other IRON routers that do
> >>> not connect to the DFZ can discover a nearby IDM that can
> >>> reach the non-upgraded IPv6 Internet.
> >>
> >> Assuming all IRON routers are IPv6 routers, why would they need to
> >> find another IRON router via the overlay network which could deliver
> >> packets to any IPv6 address?
> >
> > Because all IBRs have full knowledge of all VPs advertised
> > in the IRON,
>
> Yes, but with BGP in the overlay, they only get a best path to one of
> the multiple routers which advertise a VP.

The way the IRON BGP is going to work is as follows. There
will be a well-known FQDN, e.g., "isatapv2.net", which
resolves to a list of IP addresses - possibly many. Each
IP address is the RLOC of an IR(RR) per RFC4456. Quoting
from RFC4456, Section 6:

   "In a simple configuration, the backbone could be divided into many
   clusters.  Each RR would be configured with other RRs as Non-Client
   peers (thus all the RRs will be fully meshed).  The Clients will be
   configured to maintain IBGP session only with the RR in their
   cluster.  Due to route reflection, all the IBGP speakers will receive
   reflected routing information."

and this is precisely the arrangement that will be used
for the IRON "backbone". In particular, upon startup each
IBR(RR) resolves the name "isatapv2.net", and forms a full
mesh of BGP sessions with all other IBR(RR)'s.

Then, when each IBR(VP) starts up it likewise resolves the
name "isatapv2.net" and forms a BGP session with one or
a few IR(RR)s that are "nearby" (the IR(VP) can decide
for itself what constitutes "nearby"). The IR(VP)s then
advertise their VPs into the IRON, and the IR(RR)s ensure
that all VPs are reflected to all IR(VP)s.

After all of the VPs have been disseminated to all of the
IR(VP)s, then the IRON-RANGER data packet forwarding and
route optimization can be coordinated in the forwarding
plane. That's it.

> > but only some IBRs have knowledge of prefixes
> > advertised within the DFZ.
>
> I don't think any IRON routers need to know what prefixes are
> advertised in the DFZ, since they don't forward packets to DFZ routers.

IR(GW)s forward data packets from the IRON into the DFZ.

> > This latter class is known as
> > IBGs, and they advertise "default" into the IRON.
>
> Yes, but this is for the purpose of being like an Ivip DITR or LISP
> PTR, as described above.  They don't forward packets to DFZ routers
> so as far as I know, they don't need to know what prefixes their DFZ
> router neighbours are advertising best paths for.

Yes they do. Again, the IRON is used for both the control
and forwarding planes.

> >> I think the reasoning for this must come from your mixed IPv4 / IPv6
> >> plans, which I have tried to avoid thinking about so far.
> >>
> >> Can you explain more about your vision for this?
> >
> > My reasons for thinking so strictly about mixed IPv4
> > and IPv6 was the nice property of stateless address
> > mapping when only an IPv6 address is known and not the
> > corresponding IPv4 address. However, with a routing
> > protocol now in use in the IRON we have state - so, my
> > rationale no longer applies.
>
> OK - but isn't "stateless address mapping" what is contemplated below
> for these?:
>
>     IPv6-EID/IPv4-RLOC

Yes, because an IPv4-RLOC address can fit inside of
an IPv6-EID address.

>     IPv4-EID/IPv6-RLOC

No, because an IPv6-RLOC address cannot fit inside
of an IPv4-EID address.

> > With this in mind, IRON
> > applies equally well for IPv6-EID/IPv6-RLOC, IPv4-EID/
> > IPv4-RLOC and IPv6-EID/IPv4-RLOC (however, I need to
> > think more about IPv4/IPv6).
>
> IPv6-EID/IPv6-RLOC  This is what would happen if I-R was purely
>                     for IPv6.
>
> IPv4-EID/IPv4-RLOC  This is what would happen if I-R was purely
>                     for IPv4.
>
> IPv6-EID/IPv4-RLOC  I understand this as the ability of the mapping
>                     (which the VP router always has, as developed

VP router == IR(VP) in my nomenclature.

>                     the potentially multiple "DEL" router

DEL == IR(EID) in my nomenclature.

>                     registrations for a given EID prefix, and which
>                     it sends as "mapping" - AKA route redirection -
>                     to any IRON router which tunnels a traffic packet
>                     to this VP router) to tell IRON routers to
>                     somehow tunnel IPv6 traffic packets to an IRON
>                     router (this is where I want to use my DEL term)
>                     which will deliver them to an EID-using
>                     enterprise network which is an IPv6 network, but
>                     which presumably doesn't have native IPv6
>                     connectivity since this delivering role IRON
>                     router ("DEL"!) is on an IPv4 address.
>
>                     So this is to support isolated IPv6 networks
>                     which use I-R "edge" (EID) space and which
>                     receive their incoming packets via an IPv4
>                     service.  (Or potentially a multihomed such
>                     network where one or more of its "DEL" routers
>                     is on IPv4, rather than IPv6.)
>
> IPv4-EID/IPv6-RLOC  This would be an I-R "edge" using (EID)
>                     network which lacked IPv4 connectivity (at
>                     least for this particular "DEL" router) and
>                     so which accepted incoming packets via such
>                     a router on an IPv6 address.
>
>                     Maybe a MN doing IPv4 applications when it is
>                     physically connected only to an IPv6 network.
>
>                     Or a non-mobile network which for some reason
>                     only has an IPv6 service, but wants to run
>                     IPv4 space, using I-R "edge" EID space.
>
>
> >>>>>> They are going to be busy, depending on where they are located, the
> >>>>>> traffic patterns, how many of them there are etc.   So they need to
> >>>>>> be able to handle the cached mapping of some potentially large number
> >>>>>> of I-R end-user network prefixes.
> >>>>>
> >>>>> In the case of IPv6, I think whether the IRON Default
> >>>>> Mappers (IDMs) will be very busy depends on how large
> >>>>> the IPv6 DFZ becomes. In my understanding, the IPv6 DFZ
> >>>>> is not very big yet. So, if most IPv6 growth occurs in
> >>>>> the IRON and not in the IPv6 DFZ the packet forwarding
> >>>>> load on the IDMs might not be so great.
> >>>>
> >>>> This would only be true if you could convince most networks adopting
> >>>> IPv6 to adopt I-R at the same time.
> >>>
> >>> Well, now is the time to put forward the case for
> >>> handling new IPv6 growth in the IRON instead of in
> >>> the IPv6 DFZ. Otherwise, once growth in the IPv6
> >>> DFZ takes off and we start to see significant PI
> >>> addressing and multihoming, we will eventually
> >>> end up in the same boat we are in with the IPv4
> >>> DFZ today.
> >>
> >> OK.  But I still prefer Ivip for IPv6 since it will be able to give
> >> end-user networks, or their appointees, real-time control of
> >> tunneling behavior.  This will be advantageous for real-time
> >> responsive inbound TE and for quickly getting all traffic packets to
> >> the newly selected TTR (Translating Tunnel Router) in TTR Mobility -
> >> so the MN can quickly drop the tunnel it made to the previous TTR.
> >
> > I will have to finally take the time to understand Ivip.
> > I will try to do so soon so I can converse with you on
> > more even terms.
>
> OK - I would really appreciate this.

OK.

> >>>>> The term "bubbles" came from teredo (RFC4380). Maybe we can
> >>>>> think of a better term to use for IRON-RANGER?
> >>>>
> >>>> OK.  I don't think "bubbles" is appropriate for the registration
> >>>> methods you have described so far, or that I have suggested.
> >>>
> >>> OK. How about Channel Queries (CQs)?
> >>
> >> I don't see any "channels" and it doesn't look like a "query".
> >>
> >> In my nomenclature, it is a DEL router registering an EID prefix (I
> >> think this is the term you use in I-R) with a VPR because this VPR is
> >> one of the typically two or more VPRs which handle this VP.
> >>
> >> What about "EID Registration Message" - ERM?
> >
> > I was thinking "Prefix Control Messages (PCMs)", but I like
> > yours slightly better. I will give it more thought.
>
> OK.
>
>
>
> >> However, I think it is wildly unrealistic to assume that IPv4 will
> >> die or become anything but *the* Internet everyone relies upon for a
> >> very long time, perhaps forever.  I am not saying this is a good thing.
> >>
> >> If you can articulate your vision for mixed IPv4 and IPv6 IRON-RANGER
> >> operation, I can go along with it.  But I don't believe at all that
> >> IPv6 will take over from IPv4 for most end-users before 2020.  As I
> >> mentioned, there's still a lot of unused advertised space - and (I
> >> assume) unused unadvertised - global unicast IPv4 address space.
> >>
> >> I can't envisage a situation where it will be better to sell ordinary
> >> (non-mobile) users purely an IPv6 service, without even behind-NAT
> >> IPv4 connectivity, than to sell them a service which is either a
> >> single global unicast IPv4 address or behind-NAT IPv4.
> >>
> >> Mobile users could be different, since many functions and services
> >> suitable for hand-held cellphone-like devices could be done via IPv6
> >> - and since there would always be an option to tunnel through IPv6 to
> >> an IPv4 NAT box so people can run client-style IPv4 applications on
> >> their MN when they want to.
> >
> > For many of the reasons you have mentioned, I am going
> > to back down and say that IRON-RANGER can be agnostic to
> > whatever IPvX/IPvY protocol combination gets used. I
> > still believe that the expanded address space of IPv6
> > will eventually steer new growth toward IPv6, but I
> > won't be so brave as to guess a timeframe for this.
> >
> > Still, one of the salient features of IRON-RANGER is
> > support for IPv6 transition.
>
> OK.  I understand there is potential value in the two crossover cases
> mentioned above:
>
>    IPv6-EID/IPv4-RLOC
>    IPv4-EID/IPv6-RLOC
>
> but for now I am imagining Ivip development being completely separate
> for IPv4 and IPv6.  Before finalising any protocols, I would be keen
> to investigate possible interworking - so I am not saying it is
> impossible or a bad idea.  Just that in this stage of development I
> am keeping the two systems entirely independent.

OK, but from a legacy perspective IRON-RANGER is descended
from ISATAP which deals only with IPv6-EID/IPv4-RLOC.
(ISATAP is itself descended from mechanisms such as
RFC2529, RFC4213, etc.)

> >>>> Then there are ways of using space more efficiently, as Ivip, LISP
> >>>> and probably IRON-RANGER could do, by slicing and dicing it into much
> >>>> smaller chunks than is possible with the /24 limit on prefixes in the
> >>>> DFZ.
> >>>
> >>> OK.
> >>
> >> So to me, a successful implementation of IRON-RANGER would be as good
> >> as Ivip or LISP in enabling really high levels of address utilization
> >> in IPv4.  This will considerably extend the ability of IPv4 to handle
> >> new users, including new end-user networks which need real global
> >> unicast space (not behind-NAT) because they are running servers.
> >
> > Can do.
>
> OK.  Ivip will be able to slice and dice IPv4 space into 1, 2, 3, 4,
> 5, 6, 7 or any integer number of IPv4 addresses in a single micronet.
>  It is not restricted to "prefixes".
>
> LISP and IRON-RANGER work in prefixes, but can still do 1, 2, 4, 8
> IPv4 addresses, which will be pretty much as good for finely slicing
> the space and mapping it to wherever it is to be used.
>
> I predict the vast majority of IPv4 micronets (EID prefixes for LISP
> and I-R) will be less than 256 IPv4 addresses.  If you look at the
> huge preponderance of /24 currently advertised in the DFZ:
>
>   http://bgp.potaroo.net/as2.0/bgp-active.html
>
>   /19  18191    5.74%
>   /20  22214    7.00%
>   /21  22356    7.05%
>   /22  28914    9.12%
>   /23  28732    9.06%
>   /24 166028   52.35%
>
> I conclude that the great majority of end-user networks find 256 IP
> addresses sufficient.  Since this is the smallest number of addresses
> they can advertise in the DFZ (and have the best-paths propagated
> through the whole DFZ) I believe it is reasonable to assume that many
> would be happy with 128 addresses, 64, 32 or whatever.  Perhaps quite
> a few would be happy with 1 or 4 addresses.
>
> At present, these /24s only take up a fraction of a percent of the
> IPv4 advertised address space, so arguably they are not wasting much
> space now, by being forced to be 256 addresses when less would suit
> the needs of the advertising networks.  However, a good scalable
> routing solution will be catering to many more end-user networks than
> those which currently advertise their own prefixes in the DFZ.
> Assuming the new kind of "scalable" "edge" space of LISP, Ivip or I-R
> has few or no performance problems, then it could be widely used and
> be used in just the right quantities required, without wasting much
> space.

All right. I think I understand the concept of IPv4
micronets now. IRON-RANGER can handle this too.

> >>>> I think that most growth in Internet usage will occur in the IPv4
> >>>> Internet for at least the rest of this decade.  The only time it
> >>>> would make sense to use IPv6 instead of direct IPv4 or IPv4 behind
> >>>> NAT would be for some service where it wasn't important to be able to
> >>>> connect to IPv4.  At present, you couldn't sell any such service. I
> >>>> guess that it may be possible to do this for large IP cell-phone
> >>>> deployments where there are enough IPv6 services available to do a
> >>>> reasonable subset of what people want in a hand-held device, and
> >>>> where tunneling to a server which provides behind-NAT IPv4
> >>>> connectivity would also be possible.
> >>>
> >>> I agree that the IPv4 Internet is not only not going away
> >>> but also continuing to grow. But, I still think that users
> >>> will want to have both IPv4 (behind NAT if necessary) and
> >>> IPv6 as we move forward from here.
> >>
> >> At present, there's only one scenario in which I can imagine there
> >> being a real demand among non-mobile customers for IPv6.  Let's say
> >> that one or more large mobile phone companies decides to make their
> >> new, or existing, 3G systems work with each MN having its own global
> >> unicast IPv6 address (or perhaps /64).   This would enable direct
> >> host-to-host connectivity between any of these MNs.  (Though carriers
> >> typically want to avoid this, to stop people running VoIP and instead
> >> to use their voice call services, for which they charge more than
> >> they can for basic IP connectivity).
> >>
> >> Now let's say there are hundreds of millions or billions of these
> >> MNs, each with its own global unicast IPv6 address.  That address
> >> could be stable as long as the MN is in the one carrier network.  If
> >> it roams to another network, it would probably get another address.
> >> However, the TTR Mobility system would fix this - and give each MN
> >> its own permanent /64, no matter how it connected to the Net, as long
> >> as it is via IPv6.  (I do not currently plan any connections between
> >> Ivip or TTR Mobility for IPv4 and IPv6 - best to keep them as
> >> separate systems.)
> >>
> >> In this situation, people on non-mobile networks would have a genuine
> >> reason to get native IPv6 connectivity.  Firstly, they might want to
> >> sell or give services to these MN users.  Secondly, from home, they
> >> might want to run a web-cam, file sharing, VPN or whatever which the
> >> MN could access directly, on a host-to-host basis, without mucking
> >> around with IPv4.
> >>
> >> So I can imagine this trend happening - but only once there are a
> >> substantial number of ordinary users with native IPv6 connectivity.
> >> I guess this is most likely to occur with cell-phones.
> >
> > I honestly don't know what the drivers will be, Robin,
> > but I still believe (and I still believe that the *IETF*
> > believes) that IPv6 is where we need to go in the long
> > run. Again, however, I agree with you that IPv4 will
> > still be around for a very long time.
>
> None of us know anything about the future - we have to make do with
> educated guesses.
>
> I agree we should plan for widespread IPv6 adoption.  I am only
> arguing against assumptions such as:
>
>    IPv6 widespread adoption will being real soon now.
>
>    IPv4 usage is near its peak - so there's no need to plan for
>    it to be more widely used, to solve its scaling problem etc.
>
> For an example of the latter position, and implicitly the first, see
> Tony Li's recent message:
>
>   http://www.ietf.org/mail-archive/web/rrg/current/msg06192.html
>
>      IPv4 is done.  Over.  Cooked. Fully toast.  It will either
>      enter a black market where we deaggregate and no proposal
>      will help, or we shift to v6 and v4 is irrelevant.  In
>      either case, we're not in time to do anything significant
>      for v4.  And we still need a v6 solution, that's clearly
>      higher priority.
>
>
> >>>>> 3) IPv6 addresses can embed IPv4 addresses such that there
> >>>>>    is stateless address mapping between an EID nexthop and
> >>>>>    an RLOC.
> >>>>
> >>>> Can you explain this with an example?  I can't clearly envisage what
> >>>> you mean.
> >>>
> >>> I mean, if the IPv6 EID FIB includes entries with a next-hop
> >>> address such as: 'fe80::5efe:V4ADDR' (i.e., an IPv6 address
> >>> with embedded IPv4 address), then V4ADDR can be statelessly
> >>> extracted as the RLOC address of the ETR.
> >>
> >> So the "mapping", which the LFR-role and IDR-role routers get from
> >> the VP router is actually telling them to tunnel subsequent traffic
> >> packets to an IPv4 address?   That would only work if every LFR-role
> >> and IDR-role router had IPv4 access - unless you were to establish
> >> special routers to act as gateways for delivering to IPv4 addresses,
> >> which is not out of the question.
> >
> > Public IPv4 RLOCs that are routable within the IPv4 DFZ
> > is what I am suggesting.
>
> Yes, but if you are able to specify this in the mapping sent by a VP
> router to an IRON router which is accepting traffic packets and
> tunneling them (initially to the VP router, and then to whichever
> "DEL" role router it decides to from the mapping sent by the VP
> router in response to this initially tunneled packet), then it will
> only work if all these IRON routers (IBR and IBG in your terminology)
> can tunnel packets to any IPv4 "RLOC" address.  Yet these are all
> IRON routers which are on IPv6 addresses.
>
> They would need either direct IPv4 connectivity to do this, or a
> means of forwarding the packet, in tunneled form, to some other IPv6
> router which could send them to the IPv4 address of the "DEL" role
> IRON router.

Direct connectivity to the IPv4 Internet is required and assumed.

> Neither of these things are in the current design, as far as I know.

I'll try to make the assumption explicit.

> >> Also, an IPv6 VPR would need to be able to do the same thing - tunnel
> >> an IPv6 traffic packet to a DEL-role router which is actually on an
> >> IPv4 address, but which is nonetheless delivering packets to an
> >> end-user network which uses an IPv6 EID.
> >>
> >> This could be done, I guess, but there are messy PMTUD problems to
> >> solve.  I prefer not to think about such things, but for now can
> >> imagine you might want to do this, and that you could devise a way of
> >> doing it.
> >
> > SEAL should help.
>
> OK.
>
>
> >>>>>> There are two reasons an IRON router M might need to know about which
> >>>>>> other IRON routers A, B and C advertise a given VP:
> >>>>>>
> >>>>>>  1 - When M has a traffic packet.  (M is either an ordinary IRON
> >>>>>>      router and advertises the I-R "edge" space in its own network
> >>>>>>      or it is a "DITR-like" router advertising this space in the
> >>>>>>      DFZ.)  M needs to tunnel the packet to one of these VP routers.
> >>>>>>
> >>>>>>      The VP router will tunnel it to the IRON router Z it chooses as
> >>>>>>      the best one to deliver the packet to the destination network
> >>>>>>      and will send a "mapping" packet to M which will cache this
> >>>>>>      information and from then on tunnel packets matching the
> >>>>>>      end-user network prefix in the "mapping" to Z (or some other
> >>>>>>      IRON router like Z, if there were two or more in the "mapping").
> >>>>>>
> >>>>>>      In this case, M needs only the address of one of the A, B or C
> >>>>>>      routers.  Ideally it would have the address of the closest one -
> >>>>>>      but it doesn't matter too much if it has the address of a more
> >>>>>>      distant one.  That would involve a somewhat longer trip to the
> >>>>>>      VP router, and perhaps a longer or shorter trip from there to Z.
> >>>>>>      (This would typically be shorter than the path taken through
> >>>>>>      LISP-ALT's overlay network.)
> >>>>>>
> >>>>>>      After M gets the "mapping", it tunnels traffic packets to Z - so
> >>>>>>      the distance to the VP router no longer affects the path of
> >>>>>>      traffic packets.
> >>>>>>
> >>>>>>      In this case, BGP on the overlay would be perfectly good - since
> >>>>>>      it provides the best path to one of A, B or C - typically that
> >>>>>>      of the "closest" (in BGP terms).
> >>>>>>
> >>>>>>
> >>>>>>  2 - When M is one of potentially multiple IRON routers which
> >>>>>>      delivers packets to a given end-user network - packets whose
> >>>>>>      destination address matches a given end-user network prefix P.
> >>>>>>
> >>>>>>      M needs to "blow bubbles" (highly technical term from this
> >>>>>>      R&D phase of IRON-RANGER) to A, B and C.  The most obvious
> >>>>>>      way to do this is for M to be able to know, via the overlay
> >>>>>>      network the addresses of all VP routers which advertise a given
> >>>>>>      VP.  There may be two or three or a few more of these.  They
> >>>>>>      could be anywhere in the world.
> >>>>>>
> >>>>>>      BGP does not appear to be a suitable mechanism for this, since
> >>>>>>      its "best path" basic functions would only provide M with
> >>>>>>      the IP address of one of A, B and C.
> >>>>>>
> >>>>>>      You could do it with BGP, by having A, B and C all know about
> >>>>>>      each other, and with all three sending everything they get to
> >>>>>>      the others.  This is not too bad in scaling terms for two,
> >>>>>>      three of four such VP routers.
> >>>>>>
> >>>>>>      Then, M sends its registration to one of them - whichever it
> >>>>>>      gets the address of via the BGP of the overlay network - and
> >>>>>>      A, B and C compare notes so they all get the registration.
> >>>>>>
> >>>>>>      I will call this the "VP router flooding system".
> >>>>>
> >>>>> This is a nice idea. If I get what you are suggesting, each
> >>>>> IRON router that advertises the same VP (e.g., VP(x)) would
> >>>>> need to engage in a routing protocol instance with one
> >>>>> another to track all of the PI prefix registrations. The
> >>>>> problem I have with it is that that would make for perhaps
> >>>>> 10^5 or more of these little routing protocol instances as
> >>>>> well as lots and lots of manually-configured peering
> >>>>> arrangements between the IRON routers that advertise VP(x).
> >>>>
> >>>> Something like this - but I am not sure what you mean by "routing
> >>>> protocol instance".  I understand that the two or three VP routers
> >>>> for any one VP "P" do need to cooperate and share their various
> >>>> registrations.  You could either create a fresh protocol to do this,
> >>>> or push into service some existing protocol, including perhaps a
> >>>> routing protocol.
> >>>
> >>> We haven't brought the Virtual Router Redundancy Protocol (VRRP)
> >>> into discussion yet [RFC5798], but we might want to consider
> >>> looking at this as a way of providing fault tolerance for VP
> >>> routers. I'm not sure whether VRRP would also support load
> >>> balancing between the multiple routers, but it seems like
> >>> fault tolerance is the dominant consideration.
> >>
> >> I agree - fault tolerance is more important than load balancing at
> >> this stage of the design, though some form of load balancing might be
> >> possible and desirable too.
> >
> > VRRP says that load balancing is possible, but AFAICT
> > leaves it out of scope.
>
> OK.
>
> I imagine you would want load balancing with generally the "nearest"
> VP router being used, when an IBR or IBG router tunnels the initial
> one or more traffic packets (before it gets "mapping" from the VP
> router.
>
> The use of both nearest and the load sharing would generally make the
> system work better.  Also, when one VP router dies, if you had three,
> it would only affect on average 1/3 of the IBR and IBG routers.
>
> Load sharing would be vital for scaling purposes - even if VRRP
> somehow handled the robustness problem, there's no way you want all
> the initial packets for any set of EID prefixes having to go to just
> one physical router in the Net.

As I mentioned from the beginning, I am going to relax the
requirement for VRRP as an underpinning of the architecture.
Instead, VRRP can be used as an optimization if desired, but
the architecture will allow for a single VP to be advertised
by multiple IR(VPs). That would satisfy the scenario you
are describing above.

> >> I don't want to try to read this RFC in order to imagine how it might
> >> work with I-R, so if you can describe how it would work, that would
> >> be good.
> >
> > I touched on this above, which is just about as deep
> > as my understanding goes. In an nutshell, with VRRP
> > each router shares the same IP address, and each
> > router maintains synchronized state. One of the
> > routers is chosen as the primary, and the others
> > are designated as backups. If the primary fails,
> > one of the backups takes over sort of like an
> > uninterruptible power supply.
>
> I can't see how you can have this shared IP address arrangement for
> IRON routers which are going to be in different places, and therefore
>  in different parts of the topology.  The diverse placement is
> essential for robustness - and for the goals of load-sharing with
> generally shorter paths.
>
> In the I-R overlay system, each IRON router which advertised a VP
> does so giving its IP address as being the same as the IP address it
> uses in the Internet.  This is because IRON routers playing your IBR
> or IBG roles tunnel traffic packets directly (via the Internet, not
> the I-R overlay network) to the VP router.
>
> So I don't see how you could have multiple such routers, which must
> be on different IP addresses on the Internet, behaving on the I-R
> overlay as if they all had the one IP address.

>From now on, for the base case there will be multiple IR(VP)s
for each VP; each with its own IP address. However, each IR(VP)
could also be cloned using VRRP, so there may also be more than
one IR(VP) per IP address for fault tolerance.

> >>> Using VRRP also reduces the "fanout" of VP-advertising routers
> >>> to just a single RLOC address, and so makes for less complexity
> >>> in ferrying CQs around the IRON.
> >>
> >> But if all VPRs are on the one IP address, this would radically alter
> >> the nature of the overlay network.  Also a single router might be VPR
> >> for multiple VPs - so I can't see how this would work.
> >
> > No, it doesn't alter the overlay network in any way.
>
> I just wrote about how I can't see how it could work.  At some stage,
>  if you adopt VRRP, I guess you will explain exactly how it will work
> in the context of the I-R overlay.

Again, I am relaxing the use of VRRP to only being
an optimization.

> >> A quick look into this RFC:
> >>
> >>   http://tools.ietf.org/html/rfc5798#section-5.1.1.2
> >>
> >> indicates that it relies on multicast.  I think VRRP is intended for
> >> multiple routers in a single local network, where multicast could be
> >> done.  I can't imagine how you could scalably implement multicast on
> >> the I-R overlay network.
> >
> > No - not multicast over the I-R overlay network;
> > link-local multicast on an underlying link.
>
> I haven't read the VRRP RFC, but I don't understand how you could
> scalably do any multicast on the I-R overlay network.  It is a bunch
> of IRON routers, using their Internet IP addresses, with tunnels
> between them purely (at present, as best I understand it) for the
> purpose of handling BGP messages in this overlay network.  No other
> packets flow in the overlay network itself.  So I am not sure what
> "link-local" would mean in this context
>
>   http://en.wikipedia.org/wiki/Link-local_address
>
> It means within a local, physical, IPv6 network, or within
> 169.254.0.0/16 for IPv4.
>
> I understand there is an I-R overlay BGP instance for IPv6 and
> another one, involving the same (or mainly the same) routers, for
> IPv4.  The participating routers use their Internet IP addresses on
> these overlay networks, so I can't see how either IPv4 or IPv6
> "link-local" addressing or multicast could be done.

I touched on this above, so it would only serve to
confuse to try to explain it in more detail here.

> >> I think this illustrates our differing design approaches.  I think
> >> you tend to view the subsystems from a very high level - and it if
> >> looks like one might do the trick, you consider it.  I immediately
> >> want to know whether it is possible to do such things, and in this
> >> case, it took me a few minutes with a protocol I had never heard of
> >> to find a "lower level" detail which seems to preclude its use in the
> >> way you intend.
> >>
> >> I am not suggesting my approach is always the best - because I think
> >> it is important to brainstorm ideas and think loosely for a while.
> >> Too much "no, it can't be done" thinking too soon results in there
> >> being nothing to explore.
> >
> > I'm somewhat amazed by this assessment. I am very much
> > a "bottom-up" designer by nature, as can be seen in VET
> > and SEAL. Higher-level architecture descriptions are not
> > my strongest suit, but I guarantee you that everything I
> > describe has a path toward something that can actually
> > be implemented.
>
> If I was going to suggest VRRP as a solution, I would also point out
> how its mechanisms would work in the intended scenario.  Since it is
> not at all obvious how you can scalably do any kind of multicast on
> the overlay network, and since VRRP apparently relies on multicast,
> and on some other things such as the routers sharing the one IP
> address, I would have accompanied plans for VRRP with an explanation
> for how these things could in fact be done in the overlay network.
>
> But that's fine - I regard this as a brainstorming phase of design,
> so its OK to consider things just because they look like they might
> do the job, without assuming that they really can do it.  Even if
> they can't, it might lead to a line of thought which turns out to be
> of lasting value.

OK.

> >>>> You haven't specified anything other than manual configuration for
> >>>> how an IRON router becomes a VP router.  VP routers have extra
> >>>> workload, so whoever runs such a router must have a reason to do
> >>>> this, probably involving payment of money in some way from the
> >>>> end-user networks whose EID prefixes are covered by this VP.
> >>>
> >>> Yes. End-users have to pay either a one-time or
> >>> recurring cost for their PI prefixes.
> >>
> >> OK - but what about the costs of running the IDMs, which will handle
> >> widely varying traffic loads from one EID to the next, with these
> >> loads generally having little correlation with the amount of space in
> >> the EID?
> >
> > Somehow this cost has to be factored into EID prefix
> > registry business sector's cost of doing business.
> > After all, if all the EID prefix registries did was
> > run VP routers and serve up EID prefixes, then the
> > IRON would be detached from the DFZ and kept apart
> > from a huge set of content on the Internet. So, it
> > seems like each EID prefix registry should also be
> > required to stand up an IBG.
>
> Yes - IBGs are the equivalents of Ivip DITRs and LISP PTRs.  I think
> they will need to monitor traffic on these and charge the destination
> networks accordingly.  Ivip anticipates this, but so far LISP and
> IRON-RANGER don't.

Again with the new terminology, IBG == IR(GW). But,
I'm not sure about any business model that would charge
the end-users specifically for the use of the IR(GW).
I think the business model is for charging the end-users
for the use of the VP, and then amortize the cost of
operating an IR(GW) into the VP costs passed on to
the end-user.

> >>>> If there are two or three IRON routers acting as VP routers for a
> >>>> given VP, then some organisation is responsible for that VP, is
> >>>> collecting payments as described above and is therefore the one
> >>>> organisation driving the existence of these two or three VP routers.
> >>>>  So manual configuration seems OK to me - I don't think there needs
> >>>> to be a fancy automated system by which one VP router for a given VP
> >>>> "P" would auto-discover any other VP router for "P" in the whole I-R
> >>>> system.  However, these VP routers for the one VP do need to work
> >>>> together to share registrations, and to quickly detect when one or
> >>>> more of the set becomes unreachable.
> >>>
> >>> VRRP maybe?
> >>
> >> Since it appears to involve multicast, maybe not.
> >
> > I'm pretty sure it will work.
>
> OK.
>
>
> >> It shouldn't be too hard to develop a protocol by which a handful of
> >> VPRs work together.  Maybe some existing protocols can be used as
> >> part of this.
> >
> > I really don't want to require any adjunct protocols
> > that aren't already standardized.
>
> I agree, but maybe there's nothing already in existence which will do
> the job, or do it as efficiently as a purpose-built protocol.

The question is how does an IR(EID) forward EID Prefix
Advertisements (PAs) to multiple IR(VP)s? The answer I
am planning to use is "one at a time". In other words,
I am going to ask the IR(EID) to forward each PA to
each and every IR(VP) that advertises a matching VP.
So, no purpose-built protocol between the IR(VPs).
Does that work for you?

> >>>>> For these reasons, I believe it is better for IRON router
> >>>>> M to know about all three of A, B and C and direct bubbles
> >>>>> to each of them. I think we can achieve this using OSPF
> >>>>> with the NBMA link model in the IRON overlay.
> >> I quick look at:
> >>
> >>   http://en.wikipedia.org/wiki/OSPF
> >>
> >> and the IPv4 RFC:
> >>
> >>   http://tools.ietf.org/html/rfc2328#page-19
> >>
> >> indicates that a large OSPF network is organised into various areas.
> >>
> >> How would you do this for the IRON-RANGER overlay network?  Don't
> >> OSPF and ISIS require more centralised administration, such as to
> >> structure the whole system into sub-systems and to give certain
> >> routers particular roles, on which other routers depend?
> >
> > My understanding is that the set of designated routers
> > determines each OSPF area. The name "isatapv2.net" is
> > essentially the list of designated routers for the entire
> > IRON as a single area. But admittedly, I need to do a
> > deeper dive into OSPF to prove that this is feasible.
>
> OK.

OSPF looks like a dead end due to its requirements for
using areas for very large networks. I want everything
to be one big, flat network with a backbone of IR(RR)s.
I think BGP with Route Reflectors fits the bill well.

> >> I haven't read the OSPF article, but my impression is that it is a
> >> valuable resource, with Wbenton:
> >>
> >>   http://en.wikipedia.org/wiki/User:Wbenton-test
> >>
> >> contributing many things, not least a formidable table and diagram of
> >> interdependencies between RFCs.  The diagram looks like it needs it
> >> own routing protocol!
> >
> > I appreciate all of these links, and will go chase
> > them down.
>
> OK.
>
> >>>>> Please note: the EID-based IRON overlay is configured over
> >>>>> the DFZ, which is using BGP to disseminate RLOC-based
> >>>>> prefix information. So, it is BGP in the underlay and
> >>>>> OSPF in the overlay - weird, but I think it works.
> >>>>
> >>>> Yes the DFZ uses BGP and the overlay uses . . . originally I-R used
> >>>> BGP (a separate instance of BGP in each such router).  Also, IRON
> >>>> routers don't need to be DFZ routers and in many or most cases are
> >>>> not DFZ (BR) routers - but they all communicate via tunnels which are
> >>>> carried between networks via the ordinary Internet (using the DFZ).
> >>>>
> >>>> I guess these tunnels between IRON routers will need to be manually
> >>>> configured, since they are typically between physically and
> >>>> topologically nearby routers.
> >>>
> >>> No manual config needed; the IRON is just a gigantic NBMA
> >>> link, and can use automatic tunneling the same as for VET
> >>> and ISATAP.
> >>
> >> But it is important for IRON routers to run their new BGP instance
> >> with neighbouring IRON routers which are generally physically or
> >> topologically close.  Otherwise, the "distance" metrics in the
> >> overlay network won't resemble the real "distance" to the other
> >> routers, and your routers playing the LFR or IDM role won't
> >> automatically discover the address of the "closest" VPR for a given VP.
> >
> > Do you mean distance as in hopcount? Because, every IRON
> > router is a neighbor on the link - i.e., hopcount is 1
> > always.
>
> I meant the "distance" metrics of BGP - each best-path offered by a
> neighbouring router is assessed according to the number of ASes it
> contains, and then, subject to local policy, the one with the lowest
> number of ASes is usually chosen - with this being offered to all the
> neighbours, with an additional AS added (or a few, to make it less
> attractive, if this is desired).
>
> I assume that in the BGP overlay each IRON router uses as its AS the
> AS it is within on the Internet.

The IRs actually use IBGP within the IRON, so everything
is within a *single* AS. (Yes, that's a very big AS, but
that's the way it is.) IR(GW)s use EBGP within the DFZ.

> >> These tunnels surely need to be manually configured - and that
> >> defines the membership in the I-R overlay network and its structure
> >> for the purposes of its BGP (or OSPF?) control plane.
> >
> > Automatic tunneling is the goal I am working toward.
>
> But if you have 100k IRON routers, how does any one IRON router
> decide which small subset of these to create tunnels to?

By resolving the name "isatapv2.net".

> You can't have all 100,000 IRON routers tunneling to the 99,999 other
> IRON routers.

No; the 100k IRs tunnel to one or a few IR(RR)s among
perhaps 100 or so IR(RR)s around the world. Each IR(RR)
is fully meshed to all other IR(RR)s.

> If you are going to use BGP to provide each IRON router with a
> best-path to the "nearest"  VP router of several VP routers
> advertising a given VP, then you need these tunnels to be with
> topologically nearby routers.   I can't imagine how you could do this
> automatically.

When each IR that needs to form a tunnel with an IR(RR)
comes up, it checks through the list of all IR(RR) IP
addresses and picks one or a few that are "nearby". It
is up to each IR to decide what is "nearby".

> Also, if I ran an AS with one or more IRON routers, I would want to
> manually configure which IRON routers in other ASes each one tunneled
> to, rather than trusting some automagic system to do this.  There
> will be real flows of BGP messages over those tunnels, and I would
> want most or all of them to be with ASes I had a zero-cost peering
> arrangement with.

How about all IRs within the same AS using IBGP? That
would keep the AS peering costs at zero always.

> >>>>>>>> Also, this is just for 10 minute registrations.  I recall that the 10
> >>>>>>>> minute time is directly related to the worst-case (10 minute) and
> >>>>>>>> average (5 minute) multihoming service restoration time, as per our
> >>>>>>>> previous discussions.  I think that these are rather long times.
> >>>>>>>
> >>>>>>> Well, let's touch on this a moment. The real mechanism
> >>>>>>> used for multihoming service restoration is Neighbor
> >>>>>>> Unreachability Detection. Neighbor Unreachability
> >>>>>>> Detection uses "hints of forward progress" to tell if
> >>>>>>> a neighbor has gone unreachable, and uses a default
> >>>>>>> staletime of 30sec after which a reachability probe
> >>>>>>> must be sent. This staletime can be cranked down even
> >>>>>>> further if there needs to be a more timely response to
> >>>>>>> path failure. This means that the PI prefix-refreshing
> >>>>>>> "bubbles" can be spaced out much longer - perhaps 1 every
> >>>>>>> 10hrs instead of 10min. (Maybe even 1 every 10 days!)
> >>>>>>
> >>>>>> OK, I am not sure if I ever knew the details of "Neighbor
> >>>>>> Unreachability Detection" - but shortening the time for these
> >>>>>> mechanisms raises its own scaling problems.
> >>>>>>
> >>>>>> Can you give some examples of how this would work?
> >>>>>
> >>>>> I want to go back on this notion of extended inter-bubble
> >>>>> intervals, and return to something shorter like 600sec
> >>>>> or even 60sec. There needs to be a timely flow of bubbles
> >>>>> in case one or a few IRON routers goes down and needs to
> >>>>> have its PI prefix registrations refreshed.
> >>>>
> >>>> OK - I will stay tuned for further details.
> >>>
> >>> Bringing VRRP into the consideration could have a
> >>> contributing factor to how long the bubble (er, CQ)
> >>> interval needs to be.
> >>
> >> I regard the whole question of registering EIDs with VPRs as being
> >> undecided until you propose an exact mechanism.
> >
> > The mechanism is periodic transmission of signed router
> > advertisements with credentials that prove ownership of
> > the advertised prefixes. These are what I have formerly
> > called "bubbles", but as discussed above we should
> > probably try for a new name.
>
> OK - but depending on your choice of BGP or OSPF I understand you
> will devise a mechanism, potentially using VRRP.

Right. The IR(EID) will forward a separate PA to each
IR(VP) that advertised a VP that covers the EID prefix.

> >>>>>> At present, I can see these choices for this registration mechanism:
> >>>>>>
> >>>>>>   1 - Keep BGP as the overlay protocol and use my proposed "VP router
> >>>>>>       flooding system".
> >>>>>>
> >>>>>>   2 - Retain your current plan of each IRON router like M needing to
> >>>>>>       know the addresses of all the routers handing a given VP (A, B
> >>>>>>       and C) which BGP can't do.  So you could:
> >>>>>>
> >>>>>>       2a - keep BGP and add some other mechanism.  Maybe M sends a
> >>>>>>            message to the one of A, B or C it has a best path to,
> >>>>>>            requesting the full list of all routers A, B and C which
> >>>>>>            handle a given VP.  When M gets the list, it sends
> >>>>>>            registration "bubbles" to the routers on the list.  This
> >>>>>>            needs to be repeated from time-to-time to discover
> >>>>>>            new VP routers.
> >>>>>>
> >>>>>>       2b - use something different from BGP which provides all the
> >>>>>>            A, B and C router addresses to every IRON router, such as
> >>>>>>            M.  This needs to dynamically change as A, B and C die and
> >>>>>>            are restarted, or joined by others.
> >>>>>
> >>>>> Right - I am still leaning toward OSPF with its NBMA
> >>>>> link model capabilities. The good news is that the
> >>>>> IRON topology itself should be relatively stable, so
> >>>>> not much churn due to dynamic updates.
> >>>>
> >>>> OK.  Since the IRON routers have their own IP addresses and are
> >>>> generally in networks multihomed by existing BGP techniques, then any
> >>>> outages don't affect the IRON routers' IP addresses or their
> >>>> tunneling arrangements.  There would still be transitory breaks in
> >>>> connectivity, before the BGP multihoming arrangements kick in.  If
> >>>> you could ignore those by some means in the overlay's routing system
> >>>> (BGP or OSPF) then yes, the IRON routers should be pretty stable.
> >>>
> >>> With VRRP, probably even moreso.
> >>
> >> Or with your own purpose-designed protocol involving one, two or a
> >> few more IRON routers in their DEL-roles registering the one EID with
> >> two or maybe a few more VPRs.
> >
> > I'd really prefer not to do that if at all possible.
> > I think VRRP fits.
>
> OK - if you can make VRRP do the trick, then of course that is
> preferable to devising a new protocol.

We touched on this several times above. VRRP is an
option, and devising a new protocol is not required.

> Thanks for the continuing discussion.

Thanks - Fred
fred.l.templin@boeing.com

>   - Robin