Re: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks

"Templin, Fred L" <Fred.L.Templin@boeing.com> Thu, 18 March 2010 19:15 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 79BBE3A6B6C for <rrg@core3.amsl.com>; Thu, 18 Mar 2010 12:15:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.583
X-Spam-Level:
X-Spam-Status: No, score=-5.583 tagged_above=-999 required=5 tests=[AWL=-0.714, BAYES_00=-2.599, DNS_FROM_OPENWHOIS=1.13, J_CHICKENPOX_14=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8AeAZ455XW7K for <rrg@core3.amsl.com>; Thu, 18 Mar 2010 12:15:47 -0700 (PDT)
Received: from blv-smtpout-01.boeing.com (blv-smtpout-01.boeing.com [130.76.32.69]) by core3.amsl.com (Postfix) with ESMTP id 7D5903A6974 for <rrg@irtf.org>; Thu, 18 Mar 2010 12:15:43 -0700 (PDT)
Received: from stl-av-01.boeing.com (stl-av-01.boeing.com [192.76.190.6]) by blv-smtpout-01.ns.cs.boeing.com (8.14.4/8.14.4/8.14.4/SMTPOUT) with ESMTP id o2IJFhlb017298 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Thu, 18 Mar 2010 12:15:46 -0700 (PDT)
Received: from stl-av-01.boeing.com (localhost [127.0.0.1]) by stl-av-01.boeing.com (8.14.4/8.14.4/DOWNSTREAM_RELAY) with ESMTP id o2IJFg0N005912; Thu, 18 Mar 2010 14:15:42 -0500 (CDT)
Received: from XCH-NWHT-04.nw.nos.boeing.com (xch-nwht-04.nw.nos.boeing.com [130.247.64.250]) by stl-av-01.boeing.com (8.14.4/8.14.4/UPSTREAM_RELAY) with ESMTP id o2IJFflZ005882 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=OK); Thu, 18 Mar 2010 14:15:42 -0500 (CDT)
Received: from XCH-NW-01V.nw.nos.boeing.com ([130.247.64.120]) by XCH-NWHT-04.nw.nos.boeing.com ([130.247.64.250]) with mapi; Thu, 18 Mar 2010 12:15:41 -0700
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Robin Whittle <rw@firstpr.com.au>, RRG <rrg@irtf.org>
Date: Thu, 18 Mar 2010 12:15:39 -0700
Thread-Topic: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks
Thread-Index: AcrGRdItZGatkPCkSRqcvEhIFkUiPAAeVXOg
Message-ID: <E1829B60731D1740BB7A0626B4FAF0A649512248D1@XCH-NW-01V.nw.nos.boeing.com>
References: <C7B93DF3.4F45%tony.li@tony.li> <4B94617E.1010104@firstpr.com.au > <E1829B60731D1740BB7A0626B4FAF0A649511933 94@XCH-NW-01V.nw.nos.boeing.co m > <4B953EA5.4090707@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A6495 1 19 34CF@XCH-NW-01V.nw.nos.boeing.com> <4B97016B.5050506@firstpr.com.au> < E1 829B60731D1740BB7A0626B4FAF0A6495119413D@XCH-NW-01V.nw.nos.boeing.com> < 4B9 98826.9070104@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511 DCE A0@XCH-NW-01V.nw.nos.boeing.com> <4B9B0244.7010304@firstpr.com.au> <E1 8 29B60731D1740BB7A0626B4FAF0A649511DD102@XCH-NW-01V.nw.nos.boeing.com> <4B 9F 6E22.60509@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511DD643@X CH-NW-01V.nw.nos.boeing.com> <4BA022A3.6060607@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511DD9B1@XCH-NW-01V.nw.nos.boeing.com> <4BA19503.1040008@firstpr.com.au>
In-Reply-To: <4BA19503.1040008@firstpr.com.au>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Mar 2010 19:15:51 -0000

Hi Robin,

Thanks for the continued discussion:

> -----Original Message-----
> From: Robin Whittle [mailto:rw@firstpr.com.au]
> Sent: Wednesday, March 17, 2010 7:51 PM
> To: RRG
> Cc: Templin, Fred L
> Subject: Re: [rrg] IRON-RANGER scalability and support for packets from non-upgradednetworks
>
> Short version:   Continuing discussions:
>
>                     Is OSPF suitable for the I-R overlay?  BGP is
>                     highly decentralised, but I am not sure this is
>                     the case with OSPF.

This clearly needs further investigation, which I will
be working from my end. I fully believe there is a good
fit to be found.

>                     Fred suggests  Virtual Router Redundancy Protocol
>                     (VRRP) RFC 5798 for the task of "DEL" routers
>                     registering their EID prefixes with a handful of
>                     VP routers.  My initial impression is that this
>                     won't work because it requires multicast - which
>                     I think would be impossible or at least
>                     unscalable on a global overlay network as I-R
>                     requires.

The multicast required is only link-local multicast on
a shared link that connects the multiple VP routers.
The "link" could be an ordinary physical link, or it
could even be a mesh of tunnels of some form. But, the
multicast scope is link-local only - it is not IRON-wide
nor Internet-wide multicast. I think VRRP fits the purpose.

>                     I give some names to the various roles I think
>                     IRON routers might have, and consider what
>                     combinations of roles might be valid.

I don't want to add lots of new acronyms for this, as it
rapidly becomes too complex to follow while the situation
itself is really quite simple. Consider that all IRON
routers are IRON border routers (IBRs), in that they
connect zero or more EID-based enterprise networks to
the IRON. Each IBR:

  - participates in the IRON overlay routing protocol
  - advertises zero or more VPs into the routing protocol
  - connects zero or more EID-based enterprise networks to
    the IRON
  - may or may not connect the IRON to the DFZ

I choose to view this latter category as "gateways" from
the IRON to the DFZ, so I will call these as IBGs. So,
we now simply have only IBRs and IBGs.

> Hi Fred,
>
> Continuing our interesting conversation on the design of IRON-RANGER,
> you wrote:
>
> >> I assumed that these "DITR-like" routers were not necessarily VP routers.
> >
> > Correct; these routers (IDMs) may also be VP routers on
> > the IRON but need not be. So, we have three classes of
> > IRON routers: 1) VP routers, 2) IDMs, and 3) both.
>
> I understand the population of IRON routers can be classified
> according to their roles.  I am giving some new names to these roles.
>
> I think it is important to invent names for concepts in a new design,
> otherwise we have to use phrases which are longer, and might be
> written in different ways when they really refer to the same thing.
>
>    DEL Delivers packets to one or more nearby end-user networks, and
>        so needs to register the one or more EID prefixes this
>        involves with VP routers (VPRs).  (Typically 2 VPRs, but
>        I tend to think of 2, 3 or 4 or so for robustness.)
>
>
>    LFR Local Forwarding Router.  Advertises a few prefixes covering
>        all I-R "edge" space in the local routing system of the
>        network it is located in. For the purposes of discussion, I
>        will assume this is an ISP network, but it could be the
>        network of a large end-user network which has its own AS and
>        participated in the interdomain routing system.
>
>        This router is not advertising anything outside the ISP
>        network - so it is not "advertising I-R edge space in the
>        DFZ".
>
>        Any packets addressed to I-R edge space (that is to any
>        EID prefix used by an I-R-using end-user network) will
>        go to this router rather than to the ISP's BRs.  So this
>        router needs to tunnel it to one of the VPRs for the VP this
>        EID address is within.  That VPR doesn't have to be the
>        "closest" of the two or more VPRs, but if you use BGP in the
>        overlay network then it typically will be the "closest" in BGP
>        terms - which would be desirable, to reduce path lengths and
>        delays for the one or more initial packets which go via the
>        VPR.  That VPR will tunnel the packet to the IRON router
>        playing the DEL role.  The VPR will also send mapping to
>        this LFR-role router which will subsequently tunnel further
>        traffic packets whose destination address matches the EID
>        prefix in the mapping to one of the IRON routers which are
>        playing the DEL role for this prefix.
>
>        So this LFR role involves the IRON router knowing the address
>        of at least one VPR for every VP.  It finds this out, in the
>        current design, from the BGP best path it gets from the I-R
>        overlay BGP system.  However, it doesn't tunnel the packet via
>        that overlay system - it tunnels it via the Internet.
>
>        (This means the VPR must use its Internet IP address for its
>        VP advertisements on the overlay network.)

These are roles belonging to IBRs.

>    IDM IRON Default Mapper.  As for LFR, except this router is a
>        BR of the ISP and advertises all I-R edge space to
>        neighbouring ISPs - that is, to the DFZ.
>
>        As you wrote below, IDMs also advertise "default" on the
>        I-R overlay ("on the IRON") - but I don't understand the
>        purpose of this.

This is the role of the IBG, which is really just an IBR
that also happens to be a "default mapper". Hence, I
would like to deprecate the term "IDM".

>    VPR Virtual Prefix Routers.  These advertise one or more Virtual
>        Prefixes (VPs) on the I-R overlay.  Each VP covers multiple
>        (thousands to millions in principle) individual EID prefixes,
>        each of which is used by an end-user network via one or more
>        IRON routers playing the DEL role.
>
>        See LFR above for a description of the responsibilities of
>        VPR routers regarding traffic packets.
>
>        The VPR role also involves accepting registrations from DEL-
>        role routers for all the EIDs covered by the VP.  The
>        mechanisms for doing so are currently undefined, since we
>        have been discussing the inability of the BGP overlay system
>        to tell each DEL router the addresses of all VPRs for the
>        VP which matches the DEL router's EID prefix.  The new
>        arrangements may involve (at least I suggested this as a
>        possibility) the VPRs for a given VP working together to
>        share registration information.  I figure they will be run
>        by the one organization, or at least this VPR role for a given
>        VP will be controlled by the organization which runs the VP -
>        so they will presumably be coordinated in some way.  This
>        would probably mean they don't need to automatically discover
>        the other VPR-role routers which are handling a given VP.

Again, any IBR can also be a "VP Router".

> As best I understand it, any IRON router can perform the DEL role -
> its just a matter of somehow configuring it to initiate the
> registration process for an EID prefix for an end-user network it
> can deliver packets to.

Right.

> As far as I know, IRON routers are typically not DFZ routers and are
> (to a rough approximation) not BRs - so they typically perform the
> LFR role as well.   (A BR could still perform the LFR role, by not
> advertising the I-R edge space to other ASes - only within its own
> network.)  Its just that a router which is not a BR can't advertise
> the edge prefixes in the DFZ, and so can't perform the IDM role.

What I am calling "Border Router (BR)" is any router that
can be used for getting off the IRON and onto either an EID-
based enterprise network or onto the DFZ. In this sense,
any router that "sinks" EID-addressed packets that do not
belong to either an EID-based enterprise network nor the
DFZ is also considered as a BR.

> However, a subset of IRON routers are BRs and are also configured to
> perform the IDM role.  While a router could perform purely this IDM
> role and not advertise the edge prefixes locally, I will assume this
> would not be typical.

All IRON routers are BRs (IBRs). Some IBRs are also
gateways for getting off the IRON and onto the DFZ.
These are called IBGs.

> A VPR need not be a BR.  It need not perform any other roles, but I
> guess it typically would perform some, such as DEL.
>
> Assuming that all IRON routers will, or could, perform the DEL role,
> here are the various combinations:
>
>    LFR?   IDM?   VPR?   BR?
>
>  0 -      -      -      Maybe    Just playing the DEL role.
>
>  1 -      -      VPR    Maybe    Also playing the VPR role.
>
>  2 -      IDM    -      Yes      Just playing DEL and IDM roles  -
>                                  but for some non-obvious reason not
>                                  advertising I-R edge space to local
>                                  routing system.
>
>  3 -      IDM    VPR    Yes      As for 2, but also VPR role.
>
>
>  4 LFR    -      -      Maybe    DEL and accepting packets from the
>                                  local network too.
>
>  5 LFR    -      VPR    Maybe    As for 1, but also accepting packets
>                                  from the local network.
>
>  6 LFR    IDM    -      Yes      As for 2, but also accepting packets
>                                  from the local network.
>
>  7 LFR    IDM    VPR    Yes      As for 3, but also accepting packets
>                                  from the local network.

This gets way too complex, and I believe is greatly
simplified by what I said above.

> >> Here is my understanding on what you just wrote:
> >>
> >>> The more I think about it, the more these specialized
> >>> VP routers
> >>
> >> I think you mean the "DITR-like" routers are VP routers. Later you
> >> refer to these as "IRON Default Mappers (IDMs)".  I had assumed they
> >> either were not VP routers, or that they need not be VP routers.
> >
> > The latter - IDMs need not also be VP routers, but they
> > could be.
>
> OK.
>
>
> >> However, this part:
> >>
> >>> On the IRON, they advertise "default"
> >>
> >> makes no sense to me.  I don't recall any IRON router advertising
> >> "default" on the IRON overlay network.  I understand that a VP router
> >> advertises its one or more VPs.
> >
> > Yes; this is new. By having the IDMs connected to the DFZ
> > advertise "default" on the IRON, other IRON routers that do
> > not connect to the DFZ can discover a nearby IDM that can
> > reach the non-upgraded IPv6 Internet.
>
> Assuming all IRON routers are IPv6 routers, why would they need to
> find another IRON router via the overlay network which could deliver
> packets to any IPv6 address?

Because all IBRs have full knowledge of all VPs advertised
in the IRON, but only some IBRs have knowledge of prefixes
advertised within the DFZ. This latter class is known as
IBGs, and they advertise "default" into the IRON.

> I think the reasoning for this must come from your mixed IPv4 / IPv6
> plans, which I have tried to avoid thinking about so far.
>
> Can you explain more about your vision for this?

My reasons for thinking so strictly about mixed IPv4
and IPv6 was the nice property of stateless address
mapping when only an IPv6 address is known and not the
corresponding IPv4 address. However, with a routing
protocol now in use in the IRON we have state - so, my
rationale no longer applies. With this in mind, IRON
applies equally well for IPv6-EID/IPv6-RLOC, IPv4-EID/
IPv4-RLOC and IPv6-EID/IPv4-RLOC (however, I need to
think more about IPv4/IPv6).

> >>>> They are going to be busy, depending on where they are located, the
> >>>> traffic patterns, how many of them there are etc.   So they need to
> >>>> be able to handle the cached mapping of some potentially large number
> >>>> of I-R end-user network prefixes.
> >>>
> >>> In the case of IPv6, I think whether the IRON Default
> >>> Mappers (IDMs) will be very busy depends on how large
> >>> the IPv6 DFZ becomes. In my understanding, the IPv6 DFZ
> >>> is not very big yet. So, if most IPv6 growth occurs in
> >>> the IRON and not in the IPv6 DFZ the packet forwarding
> >>> load on the IDMs might not be so great.
> >>
> >> This would only be true if you could convince most networks adopting
> >> IPv6 to adopt I-R at the same time.
> >
> > Well, now is the time to put forward the case for
> > handling new IPv6 growth in the IRON instead of in
> > the IPv6 DFZ. Otherwise, once growth in the IPv6
> > DFZ takes off and we start to see significant PI
> > addressing and multihoming, we will eventually
> > end up in the same boat we are in with the IPv4
> > DFZ today.
>
> OK.  But I still prefer Ivip for IPv6 since it will be able to give
> end-user networks, or their appointees, real-time control of
> tunneling behavior.  This will be advantageous for real-time
> responsive inbound TE and for quickly getting all traffic packets to
> the newly selected TTR (Translating Tunnel Router) in TTR Mobility -
> so the MN can quickly drop the tunnel it made to the previous TTR.

I will have to finally take the time to understand Ivip.
I will try to do so soon so I can converse with you on
more even terms.

> >>> The term "bubbles" came from teredo (RFC4380). Maybe we can
> >>> think of a better term to use for IRON-RANGER?
> >>
> >> OK.  I don't think "bubbles" is appropriate for the registration
> >> methods you have described so far, or that I have suggested.
> >
> > OK. How about Channel Queries (CQs)?
>
> I don't see any "channels" and it doesn't look like a "query".
>
> In my nomenclature, it is a DEL router registering an EID prefix (I
> think this is the term you use in I-R) with a VPR because this VPR is
> one of the typically two or more VPRs which handle this VP.
>
> What about "EID Registration Message" - ERM?

I was thinking "Prefix Control Messages (PCMs)", but I like
yours slightly better. I will give it more thought.

> >>>> I am definitely not going to try to think about mixed IPv4/v6
> >>>> implementations of I-R.  I can handle thinking about purely IPv4 and
> >>>> purely IPv6.
> >>>
> >>> I choose to think of mixed IPv4/IPv6 for at least three
> >>> reasons:
> >>>
> >>> 1) We already have global deployment of IPv4, and that won't
> >>>    go away overnight when IPv6 begins to deploy.
> >>
> >> I agree.
> >>
> >>> 2) IPv4 is fully built-out, so new growth will come via IPv6.
> >>
> >> I don't agree with this at all.  I think there's plenty of scope for
> >> more growth in the IPv4 Internet.  Fig. 11 at:
> >>
> >>   http://www.potaroo.net/tools/ipv4/
> >>
> >> shows 130 /8s worth of space is currently advertised.  Fig. 5 shows
> >> this in more detail.  Of the /8s to to 223, a handful can't be used
> >> (127, 0 maybe).  There are still a bunch of /8s which are
> >> unadvertised.  As time progresses, this space will be too valuable to
> >> use internally, probably inefficiently - so I expect quite a lot of
> >> that will be made available and advertised too.
> >
> > OK, but how bad would it be if we just let IPv4 address
> > depletion run out under the current system, then jack up
> > to IPv6 in parallel to handle PI addressing and multihoming?
>
> I am interested in exploring the IRON-RANGER design - including for
> mixed IPv4 and IPv6, because I find this stuff generally interesting
> and a good way to learn about scalable routing.  Maybe at the end of
> this process I might think that IRON-RANGER is practical and in some
> ways desirable compared to Ivip or LISP, which are the only two I
> consider potentially practical or desirable at present.  (msg06219)
>
> A likely outcome is that this process will prompt me to think of
> improvements to Ivip - since previous improvements to Ivip came from
> thinking about other proposals, not from a conscious effort to
> improve Ivip.
>
> However, I think it is wildly unrealistic to assume that IPv4 will
> die or become anything but *the* Internet everyone relies upon for a
> very long time, perhaps forever.  I am not saying this is a good thing.
>
> If you can articulate your vision for mixed IPv4 and IPv6 IRON-RANGER
> operation, I can go along with it.  But I don't believe at all that
> IPv6 will take over from IPv4 for most end-users before 2020.  As I
> mentioned, there's still a lot of unused advertised space - and (I
> assume) unused unadvertised - global unicast IPv4 address space.
>
> I can't envisage a situation where it will be better to sell ordinary
> (non-mobile) users purely an IPv6 service, without even behind-NAT
> IPv4 connectivity, than to sell them a service which is either a
> single global unicast IPv4 address or behind-NAT IPv4.
>
> Mobile users could be different, since many functions and services
> suitable for hand-held cellphone-like devices could be done via IPv6
> - and since there would always be an option to tunnel through IPv6 to
> an IPv4 NAT box so people can run client-style IPv4 applications on
> their MN when they want to.

For many of the reasons you have mentioned, I am going
to back down and say that IRON-RANGER can be agnostic to
whatever IPvX/IPvY protocol combination gets used. I
still believe that the expanded address space of IPv6
will eventually steer new growth toward IPv6, but I
won't be so brave as to guess a timeframe for this.

Still, one of the salient features of IRON-RANGER is
support for IPv6 transition.

> >> Then there are ways of using space more efficiently, as Ivip, LISP
> >> and probably IRON-RANGER could do, by slicing and dicing it into much
> >> smaller chunks than is possible with the /24 limit on prefixes in the
> >> DFZ.
> >
> > OK.
>
> So to me, a successful implementation of IRON-RANGER would be as good
> as Ivip or LISP in enabling really high levels of address utilization
> in IPv4.  This will considerably extend the ability of IPv4 to handle
> new users, including new end-user networks which need real global
> unicast space (not behind-NAT) because they are running servers.

Can do.

> >> I think that most growth in Internet usage will occur in the IPv4
> >> Internet for at least the rest of this decade.  The only time it
> >> would make sense to use IPv6 instead of direct IPv4 or IPv4 behind
> >> NAT would be for some service where it wasn't important to be able to
> >> connect to IPv4.  At present, you couldn't sell any such service. I
> >> guess that it may be possible to do this for large IP cell-phone
> >> deployments where there are enough IPv6 services available to do a
> >> reasonable subset of what people want in a hand-held device, and
> >> where tunneling to a server which provides behind-NAT IPv4
> >> connectivity would also be possible.
> >
> > I agree that the IPv4 Internet is not only not going away
> > but also continuing to grow. But, I still think that users
> > will want to have both IPv4 (behind NAT if necessary) and
> > IPv6 as we move forward from here.
>
> At present, there's only one scenario in which I can imagine there
> being a real demand among non-mobile customers for IPv6.  Let's say
> that one or more large mobile phone companies decides to make their
> new, or existing, 3G systems work with each MN having its own global
> unicast IPv6 address (or perhaps /64).   This would enable direct
> host-to-host connectivity between any of these MNs.  (Though carriers
> typically want to avoid this, to stop people running VoIP and instead
> to use their voice call services, for which they charge more than
> they can for basic IP connectivity).
>
> Now let's say there are hundreds of millions or billions of these
> MNs, each with its own global unicast IPv6 address.  That address
> could be stable as long as the MN is in the one carrier network.  If
> it roams to another network, it would probably get another address.
> However, the TTR Mobility system would fix this - and give each MN
> its own permanent /64, no matter how it connected to the Net, as long
> as it is via IPv6.  (I do not currently plan any connections between
> Ivip or TTR Mobility for IPv4 and IPv6 - best to keep them as
> separate systems.)
>
> In this situation, people on non-mobile networks would have a genuine
> reason to get native IPv6 connectivity.  Firstly, they might want to
> sell or give services to these MN users.  Secondly, from home, they
> might want to run a web-cam, file sharing, VPN or whatever which the
> MN could access directly, on a host-to-host basis, without mucking
> around with IPv4.
>
> So I can imagine this trend happening - but only once there are a
> substantial number of ordinary users with native IPv6 connectivity.
> I guess this is most likely to occur with cell-phones.

I honestly don't know what the drivers will be, Robin,
but I still believe (and I still believe that the *IETF*
believes) that IPv6 is where we need to go in the long
run. Again, however, I agree with you that IPv4 will
still be around for a very long time.

> >>> 3) IPv6 addresses can embed IPv4 addresses such that there
> >>>    is stateless address mapping between an EID nexthop and
> >>>    an RLOC.
> >>
> >> Can you explain this with an example?  I can't clearly envisage what
> >> you mean.
> >
> > I mean, if the IPv6 EID FIB includes entries with a next-hop
> > address such as: 'fe80::5efe:V4ADDR' (i.e., an IPv6 address
> > with embedded IPv4 address), then V4ADDR can be statelessly
> > extracted as the RLOC address of the ETR.
>
> So the "mapping", which the LFR-role and IDR-role routers get from
> the VP router is actually telling them to tunnel subsequent traffic
> packets to an IPv4 address?   That would only work if every LFR-role
> and IDR-role router had IPv4 access - unless you were to establish
> special routers to act as gateways for delivering to IPv4 addresses,
> which is not out of the question.

Public IPv4 RLOCs that are routable within the IPv4 DFZ
is what I am suggesting.

> Also, an IPv6 VPR would need to be able to do the same thing - tunnel
> an IPv6 traffic packet to a DEL-role router which is actually on an
> IPv4 address, but which is nonetheless delivering packets to an
> end-user network which uses an IPv6 EID.
>
> This could be done, I guess, but there are messy PMTUD problems to
> solve.  I prefer not to think about such things, but for now can
> imagine you might want to do this, and that you could devise a way of
> doing it.

SEAL should help.

> >>>> There are two reasons an IRON router M might need to know about which
> >>>> other IRON routers A, B and C advertise a given VP:
> >>>>
> >>>>  1 - When M has a traffic packet.  (M is either an ordinary IRON
> >>>>      router and advertises the I-R "edge" space in its own network
> >>>>      or it is a "DITR-like" router advertising this space in the
> >>>>      DFZ.)  M needs to tunnel the packet to one of these VP routers.
> >>>>
> >>>>      The VP router will tunnel it to the IRON router Z it chooses as
> >>>>      the best one to deliver the packet to the destination network
> >>>>      and will send a "mapping" packet to M which will cache this
> >>>>      information and from then on tunnel packets matching the
> >>>>      end-user network prefix in the "mapping" to Z (or some other
> >>>>      IRON router like Z, if there were two or more in the "mapping").
> >>>>
> >>>>      In this case, M needs only the address of one of the A, B or C
> >>>>      routers.  Ideally it would have the address of the closest one -
> >>>>      but it doesn't matter too much if it has the address of a more
> >>>>      distant one.  That would involve a somewhat longer trip to the
> >>>>      VP router, and perhaps a longer or shorter trip from there to Z.
> >>>>      (This would typically be shorter than the path taken through
> >>>>      LISP-ALT's overlay network.)
> >>>>
> >>>>      After M gets the "mapping", it tunnels traffic packets to Z - so
> >>>>      the distance to the VP router no longer affects the path of
> >>>>      traffic packets.
> >>>>
> >>>>      In this case, BGP on the overlay would be perfectly good - since
> >>>>      it provides the best path to one of A, B or C - typically that
> >>>>      of the "closest" (in BGP terms).
> >>>>
> >>>>
> >>>>  2 - When M is one of potentially multiple IRON routers which
> >>>>      delivers packets to a given end-user network - packets whose
> >>>>      destination address matches a given end-user network prefix P.
> >>>>
> >>>>      M needs to "blow bubbles" (highly technical term from this
> >>>>      R&D phase of IRON-RANGER) to A, B and C.  The most obvious
> >>>>      way to do this is for M to be able to know, via the overlay
> >>>>      network the addresses of all VP routers which advertise a given
> >>>>      VP.  There may be two or three or a few more of these.  They
> >>>>      could be anywhere in the world.
> >>>>
> >>>>      BGP does not appear to be a suitable mechanism for this, since
> >>>>      its "best path" basic functions would only provide M with
> >>>>      the IP address of one of A, B and C.
> >>>>
> >>>>      You could do it with BGP, by having A, B and C all know about
> >>>>      each other, and with all three sending everything they get to
> >>>>      the others.  This is not too bad in scaling terms for two,
> >>>>      three of four such VP routers.
> >>>>
> >>>>      Then, M sends its registration to one of them - whichever it
> >>>>      gets the address of via the BGP of the overlay network - and
> >>>>      A, B and C compare notes so they all get the registration.
> >>>>
> >>>>      I will call this the "VP router flooding system".
> >>>
> >>> This is a nice idea. If I get what you are suggesting, each
> >>> IRON router that advertises the same VP (e.g., VP(x)) would
> >>> need to engage in a routing protocol instance with one
> >>> another to track all of the PI prefix registrations. The
> >>> problem I have with it is that that would make for perhaps
> >>> 10^5 or more of these little routing protocol instances as
> >>> well as lots and lots of manually-configured peering
> >>> arrangements between the IRON routers that advertise VP(x).
> >>
> >> Something like this - but I am not sure what you mean by "routing
> >> protocol instance".  I understand that the two or three VP routers
> >> for any one VP "P" do need to cooperate and share their various
> >> registrations.  You could either create a fresh protocol to do this,
> >> or push into service some existing protocol, including perhaps a
> >> routing protocol.
> >
> > We haven't brought the Virtual Router Redundancy Protocol (VRRP)
> > into discussion yet [RFC5798], but we might want to consider
> > looking at this as a way of providing fault tolerance for VP
> > routers. I'm not sure whether VRRP would also support load
> > balancing between the multiple routers, but it seems like
> > fault tolerance is the dominant consideration.
>
> I agree - fault tolerance is more important than load balancing at
> this stage of the design, though some form of load balancing might be
> possible and desirable too.

VRRP says that load balancing is possible, but AFAICT
leaves it out of scope.

> I don't want to try to read this RFC in order to imagine how it might
> work with I-R, so if you can describe how it would work, that would
> be good.

I touched on this above, which is just about as deep
as my understanding goes. In an nutshell, with VRRP
each router shares the same IP address, and each
router maintains synchronized state. One of the
routers is chosen as the primary, and the others
are designated as backups. If the primary fails,
one of the backups takes over sort of like an
uninterruptible power supply.

> > Using VRRP also reduces the "fanout" of VP-advertising routers
> > to just a single RLOC address, and so makes for less complexity
> > in ferrying CQs around the IRON.
>
> But if all VPRs are on the one IP address, this would radically alter
> the nature of the overlay network.  Also a single router might be VPR
> for multiple VPs - so I can't see how this would work.

No, it doesn't alter the overlay network in any way.

> A quick look into this RFC:
>
>   http://tools.ietf.org/html/rfc5798#section-5.1.1.2
>
> indicates that it relies on multicast.  I think VRRP is intended for
> multiple routers in a single local network, where multicast could be
> done.  I can't imagine how you could scalably implement multicast on
> the I-R overlay network.

No - not multicast over the I-R overlay network;
link-local multicast on an underlying link.

> I think this illustrates our differing design approaches.  I think
> you tend to view the subsystems from a very high level - and it if
> looks like one might do the trick, you consider it.  I immediately
> want to know whether it is possible to do such things, and in this
> case, it took me a few minutes with a protocol I had never heard of
> to find a "lower level" detail which seems to preclude its use in the
> way you intend.
>
> I am not suggesting my approach is always the best - because I think
> it is important to brainstorm ideas and think loosely for a while.
> Too much "no, it can't be done" thinking too soon results in there
> being nothing to explore.

I'm somewhat amazed by this assessment. I am very much
a "bottom-up" designer by nature, as can be seen in VET
and SEAL. Higher-level architecture descriptions are not
my strongest suit, but I guarantee you that everything I
describe has a path toward something that can actually
be implemented.

> >> You haven't specified anything other than manual configuration for
> >> how an IRON router becomes a VP router.  VP routers have extra
> >> workload, so whoever runs such a router must have a reason to do
> >> this, probably involving payment of money in some way from the
> >> end-user networks whose EID prefixes are covered by this VP.
> >
> > Yes. End-users have to pay either a one-time or
> > recurring cost for their PI prefixes.
>
> OK - but what about the costs of running the IDMs, which will handle
> widely varying traffic loads from one EID to the next, with these
> loads generally having little correlation with the amount of space in
> the EID?

Somehow this cost has to be factored into EID prefix
registry business sector's cost of doing business.
After all, if all the EID prefix registries did was
run VP routers and serve up EID prefixes, then the
IRON would be detached from the DFZ and kept apart
from a huge set of content on the Internet. So, it
seems like each EID prefix registry should also be
required to stand up an IBG.

> >> If there are two or three IRON routers acting as VP routers for a
> >> given VP, then some organisation is responsible for that VP, is
> >> collecting payments as described above and is therefore the one
> >> organisation driving the existence of these two or three VP routers.
> >>  So manual configuration seems OK to me - I don't think there needs
> >> to be a fancy automated system by which one VP router for a given VP
> >> "P" would auto-discover any other VP router for "P" in the whole I-R
> >> system.  However, these VP routers for the one VP do need to work
> >> together to share registrations, and to quickly detect when one or
> >> more of the set becomes unreachable.
> >
> > VRRP maybe?
>
> Since it appears to involve multicast, maybe not.

I'm pretty sure it will work.

> It shouldn't be too hard to develop a protocol by which a handful of
> VPRs work together.  Maybe some existing protocols can be used as
> part of this.

I really don't want to require any adjunct protocols
that aren't already standardized.

> >>> For these reasons, I believe it is better for IRON router
> >>> M to know about all three of A, B and C and direct bubbles
> >>> to each of them. I think we can achieve this using OSPF
> >>> with the NBMA link model in the IRON overlay.
> >>
> >> OK - but I guess that means not running BGP.  I don't know anything
> >> about OSPF or its scaling properties.  BGP has no central
> >> coordination - something which is understandably attractive to many
> >> people.  Does OSPF have central coordination, single points of
> >> failure etc.?
> >
> > In this case, central coordination would be through
> > maintenance of the domainname-to-RLOC mappings for
> > the FQDN "isatapv2.net". In other words, when a new
> > IDM comes into existence its RLOC address gets added
> > to the DNS RR's for "isatapv2.net". In the same way,
> > when an existing IDM is decommissioned its RLOC address
> > is removed.
> >
> > Currently, "isatapv2.net" is registered to me. Do you
> > trust me to maintain it properly? :^}
>
> Sure!
>
> Whoever runs it needs to have some fancy way of recognising or
> rejecting attempts to register whatever needs to be in this branch of
> the DNS.
>
> But how is OSPF structured?  BGP is flat and egalitarian, with links
> between nearby routers all that is required - and of course care
> about which prefixes are advertised.
>
> You could chop the whole BGP-based interdomain routing system into
> two or more pieces and they would keep running, just fine - although
> of course each only with a subset of the prefixes.
>
> I quick look at:
>
>   http://en.wikipedia.org/wiki/OSPF
>
> and the IPv4 RFC:
>
>   http://tools.ietf.org/html/rfc2328#page-19
>
> indicates that a large OSPF network is organised into various areas.
>  How would you do this for the IRON-RANGER overlay network?  Don't
> OSPF and ISIS require more centralised administration, such as to
> structure the whole system into sub-systems and to give certain
> routers particular roles, on which other routers depend?

My understanding is that the set of designated routers
determines each OSPF area. The name "isatapv2.net" is
essentially the list of designated routers for the entire
IRON as a single area. But admittedly, I need to do a
deeper dive into OSPF to prove that this is feasible.

> I haven't read the OSPF article, but my impression is that it is a
> valuable resource, with Wbenton:
>
>   http://en.wikipedia.org/wiki/User:Wbenton-test
>
> contributing many things, not least a formidable table and diagram of
> interdependencies between RFCs.  The diagram looks like it needs it
> own routing protocol!

I appreciate all of these links, and will go chase
them down.

> >>> Please note: the EID-based IRON overlay is configured over
> >>> the DFZ, which is using BGP to disseminate RLOC-based
> >>> prefix information. So, it is BGP in the underlay and
> >>> OSPF in the overlay - weird, but I think it works.
> >>
> >> Yes the DFZ uses BGP and the overlay uses . . . originally I-R used
> >> BGP (a separate instance of BGP in each such router).  Also, IRON
> >> routers don't need to be DFZ routers and in many or most cases are
> >> not DFZ (BR) routers - but they all communicate via tunnels which are
> >> carried between networks via the ordinary Internet (using the DFZ).
> >>
> >> I guess these tunnels between IRON routers will need to be manually
> >> configured, since they are typically between physically and
> >> topologically nearby routers.
> >
> > No manual config needed; the IRON is just a gigantic NBMA
> > link, and can use automatic tunneling the same as for VET
> > and ISATAP.
>
> But it is important for IRON routers to run their new BGP instance
> with neighbouring IRON routers which are generally physically or
> topologically close.  Otherwise, the "distance" metrics in the
> overlay network won't resemble the real "distance" to the other
> routers, and your routers playing the LFR or IDM role won't
> automatically discover the address of the "closest" VPR for a given VP.

Do you mean distance as in hopcount? Because, every IRON
router is a neighbor on the link - i.e., hopcount is 1
always.

> These tunnels surely need to be manually configured - and that
> defines the membership in the I-R overlay network and its structure
> for the purposes of its BGP (or OSPF?) control plane.

Automatic tunneling is the goal I am working toward.

> >>>>>> Also, this is just for 10 minute registrations.  I recall that the 10
> >>>>>> minute time is directly related to the worst-case (10 minute) and
> >>>>>> average (5 minute) multihoming service restoration time, as per our
> >>>>>> previous discussions.  I think that these are rather long times.
> >>>>>
> >>>>> Well, let's touch on this a moment. The real mechanism
> >>>>> used for multihoming service restoration is Neighbor
> >>>>> Unreachability Detection. Neighbor Unreachability
> >>>>> Detection uses "hints of forward progress" to tell if
> >>>>> a neighbor has gone unreachable, and uses a default
> >>>>> staletime of 30sec after which a reachability probe
> >>>>> must be sent. This staletime can be cranked down even
> >>>>> further if there needs to be a more timely response to
> >>>>> path failure. This means that the PI prefix-refreshing
> >>>>> "bubbles" can be spaced out much longer - perhaps 1 every
> >>>>> 10hrs instead of 10min. (Maybe even 1 every 10 days!)
> >>>>
> >>>> OK, I am not sure if I ever knew the details of "Neighbor
> >>>> Unreachability Detection" - but shortening the time for these
> >>>> mechanisms raises its own scaling problems.
> >>>>
> >>>> Can you give some examples of how this would work?
> >>>
> >>> I want to go back on this notion of extended inter-bubble
> >>> intervals, and return to something shorter like 600sec
> >>> or even 60sec. There needs to be a timely flow of bubbles
> >>> in case one or a few IRON routers goes down and needs to
> >>> have its PI prefix registrations refreshed.
> >>
> >> OK - I will stay tuned for further details.
> >
> > Bringing VRRP into the consideration could have a
> > contributing factor to how long the bubble (er, CQ)
> > interval needs to be.
>
> I regard the whole question of registering EIDs with VPRs as being
> undecided until you propose an exact mechanism.

The mechanism is periodic transmission of signed router
advertisements with credentials that prove ownership of
the advertised prefixes. These are what I have formerly
called "bubbles", but as discussed above we should
probably try for a new name.

> >>>> At present, I can see these choices for this registration mechanism:
> >>>>
> >>>>   1 - Keep BGP as the overlay protocol and use my proposed "VP router
> >>>>       flooding system".
> >>>>
> >>>>   2 - Retain your current plan of each IRON router like M needing to
> >>>>       know the addresses of all the routers handing a given VP (A, B
> >>>>       and C) which BGP can't do.  So you could:
> >>>>
> >>>>       2a - keep BGP and add some other mechanism.  Maybe M sends a
> >>>>            message to the one of A, B or C it has a best path to,
> >>>>            requesting the full list of all routers A, B and C which
> >>>>            handle a given VP.  When M gets the list, it sends
> >>>>            registration "bubbles" to the routers on the list.  This
> >>>>            needs to be repeated from time-to-time to discover
> >>>>            new VP routers.
> >>>>
> >>>>       2b - use something different from BGP which provides all the
> >>>>            A, B and C router addresses to every IRON router, such as
> >>>>            M.  This needs to dynamically change as A, B and C die and
> >>>>            are restarted, or joined by others.
> >>>
> >>> Right - I am still leaning toward OSPF with its NBMA
> >>> link model capabilities. The good news is that the
> >>> IRON topology itself should be relatively stable, so
> >>> not much churn due to dynamic updates.
> >>
> >> OK.  Since the IRON routers have their own IP addresses and are
> >> generally in networks multihomed by existing BGP techniques, then any
> >> outages don't affect the IRON routers' IP addresses or their
> >> tunneling arrangements.  There would still be transitory breaks in
> >> connectivity, before the BGP multihoming arrangements kick in.  If
> >> you could ignore those by some means in the overlay's routing system
> >> (BGP or OSPF) then yes, the IRON routers should be pretty stable.
> >
> > With VRRP, probably even moreso.
>
> Or with your own purpose-designed protocol involving one, two or a
> few more IRON routers in their DEL-roles registering the one EID with
> two or maybe a few more VPRs.

I'd really prefer not to do that if at all possible.
I think VRRP fits.

Thanks - Fred
fred.l.templin@boeing.com

>   - Robin