Re: [rrg] IRON (RANGER): description and draft critique

"Templin, Fred L" <Fred.L.Templin@boeing.com> Wed, 10 February 2010 00:21 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EA95828C17D for <rrg@core3.amsl.com>; Tue, 9 Feb 2010 16:21:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.41
X-Spam-Level:
X-Spam-Status: No, score=-6.41 tagged_above=-999 required=5 tests=[AWL=-0.126, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tVzM6E3mmcm3 for <rrg@core3.amsl.com>; Tue, 9 Feb 2010 16:21:07 -0800 (PST)
Received: from stl-smtpout-01.boeing.com (stl-smtpout-01.boeing.com [130.76.96.56]) by core3.amsl.com (Postfix) with ESMTP id 354EC3A67B2 for <rrg@irtf.org>; Tue, 9 Feb 2010 16:21:07 -0800 (PST)
Received: from stl-av-01.boeing.com (stl-av-01.boeing.com [192.76.190.6]) by stl-smtpout-01.ns.cs.boeing.com (8.14.0/8.14.0/8.14.0/SMTPOUT) with ESMTP id o1A0M0FS007940 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Tue, 9 Feb 2010 18:22:05 -0600 (CST)
Received: from stl-av-01.boeing.com (localhost [127.0.0.1]) by stl-av-01.boeing.com (8.14.0/8.14.0/DOWNSTREAM_RELAY) with ESMTP id o1A0M0hn018998; Tue, 9 Feb 2010 18:22:00 -0600 (CST)
Received: from XCH-NWHT-01.nw.nos.boeing.com (xch-nwht-01.nw.nos.boeing.com [130.247.70.222]) by stl-av-01.boeing.com (8.14.0/8.14.0/UPSTREAM_RELAY) with ESMTP id o1A0Lxlq018982 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=OK); Tue, 9 Feb 2010 18:21:59 -0600 (CST)
Received: from XCH-NW-01V.nw.nos.boeing.com ([130.247.64.120]) by XCH-NWHT-01.nw.nos.boeing.com ([130.247.70.222]) with mapi; Tue, 9 Feb 2010 16:21:59 -0800
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Robin Whittle <rw@firstpr.com.au>, RRG <rrg@irtf.org>
Date: Tue, 09 Feb 2010 16:21:58 -0800
Thread-Topic: IRON (RANGER): description and draft critique
Thread-Index: AcqpUMt8j7PkfzUXSCmzwWGc3+cMvgAi9KwA
Message-ID: <E1829B60731D1740BB7A0626B4FAF0A64951038452@XCH-NW-01V.nw.nos.boeing.com>
References: <4B70FF87.5010404@firstpr.com.au>
In-Reply-To: <4B70FF87.5010404@firstpr.com.au>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
x-tm-as-product-ver: SMEX-8.0.0.1181-6.000.1038-17182.005
x-tm-as-result: No--42.867100-8.000000-31
x-tm-as-user-approved-sender: No
x-tm-as-user-blocked-sender: No
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [rrg] IRON (RANGER): description and draft critique
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Feb 2010 00:21:11 -0000

Hi Robin,

See below for follow-up:

> -----Original Message-----
> From: Robin Whittle [mailto:rw@firstpr.com.au]
> Sent: Monday, February 08, 2010 10:24 PM
> To: RRG
> Cc: Templin, Fred L
> Subject: IRON (RANGER): description and draft critique
>
> Here is my current understanding of Fred Templin's IRON Core-Edge
> Separation scalable routing proposal.  Its proper name (msg05979) is
> "IRON-RANGER", but I am using "IRON" for short.
>
> The proposal was called RANGER, but RANGER is an over-arching system
> capable of many things, and there is a new ID IRON to explain how
> RANGER, SEAL and VET are used for scalable routing:
>
>   http://tools.ietf.org/html/draft-templin-iron-00
>
> My understanding is incomplete, and so has questions and suggestions.
>
> At the end, I have a draft critique.  I am relying on Fred to review
> all this, suggest corrections etc.  Then I hope to be able to
> finalise the critique.
>
> I think IRON has some interesting characteristics, including being
> able to handle packets without the "initial packet delays" (actually
> "initial packets being dropped, and then later ones being tunneled")
> of LISP-ALT.  IRON also operates without a mapping system in the
> usual sense of the word.  There is a two-stage arrangement by which
> initial packets get to the destination network, which is replaced by
> a direct path after that.
>
> I don't think IRON would be as good as Ivip, but I suggest that
> anyone interested in Core-Edge Separation architectures would find it
> intriguing.
>
>   - Robin
>
>
>
> The reference documents are, in order of importance:
>
> Discussions between Fred and me recently.  Generally the later ones
> are more relevant, but the one marked ** is where Fred gave the best
> initial account of IRON.
>
>   RANGER and SEAL critique
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05796.html RW
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05803.html   FT
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05806.html RW
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05807.html   FT
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05810.html RW
> **http://www.ietf.org/mail-archive/web/rrg/current/msg05815.html   FT
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05817.html RW
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05889.html RW
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05937.html   FT
>
> I haven't yet replied to Fred's last message, but we have been
> communicating off-list too.  He has since written the IRON ID, so the
> following explanation is really a response to that ID and the last
> message above.
>
>   See also the RFC-to-be from:
>   http://tools.ietf.org/html/draft-templin-ranger-09
>
>   http://tools.ietf.org/html/draft-russert-rangers-01
>   http://tools.ietf.org/html/draft-templin-intarea-vet-06
>
> Regarding SEAL tunneling with PMTUD, see:  draft-templin-
> intarea-seal-08 and my recent message and whatever Fred writes about it:
>
>   Re: [rrg] IRON: SEAL summary V2
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05982.html
>
> The IRON ID and most of RANGER uses IPv6 examples.  I will use IPv4,
> in part because I want to know how it would work with IPv4.
>
>
>
> Virtual Prefixes (VPs)
> ----------------------
>
> IRON uses a subset of the global unicast space called "edge" space -
> the remainder is "core" space.  Please see
>
>   CES & CEE are completely different (graphs)
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05865.html
>
> for a general description of how CES architectures achieve scalable
> routing.
>
> "Edge" space in IRON is made of multiple Virtual Prefixes (VPs), each
> of which is handled by one, or perhaps several IRON routers.  For a
> given VP, the one (or more) such routers is (are) known as the VP
> router(s).
>
> Not all IRON routers handle VPs, and a single IRON router could
> handle multiple VPs.  For simplicity, in most of the following
> discussion, a single VP router is assumed for each VP.  In previous
> discussions, this was the router in Seattle.
>
> It is not clear to me how IRON is be introduced so that each End
> User Network (EUN) which was using edge address space could have the
> benefits - portability, multihoming and inbound TE (and supposedly
> mobility, though I don't know how) for all incoming packets, when not
> all ISPs and other networks (PI EUNs connecting straight to the DFZ)
> had adopted IRON.  So below I assume 100% adoption of IRON by all
> ISPs and any other networks connecting directly to the DFZ.
>
> The sum total of all these VPs constitutes "edge" space - and all of
> it can be divided very finely into individual prefixes for EUNs which
> use this space.  It is not clear what the limits are for IPv4, but I
> guess within IPv4 it would be divisible to prefixes as long as /32
> (single IPv4 address).  For IPv6, the longest prefix IRON would
> handle is /56 (Fred mentioned this off-list).
>
> As far as I know, according to the IDs and discussions so far, these
> VPs of "edge" space are neither advertised by DFZ routers directly,
> nor are they covered by any prefixes advertised in the DFZ.
>
> With Ivip, the MAB (Mapped Address Block) prefixes are advertised in
> the DFZ by DITRs and in LISP, the same things (which have no name)
> are advertised in the DFZ by PTRs.   However, if these VPs were
> advertised by one or ideally more IRON routers in the DFZ, then this
> would enable all packets, including those sent from non-upgraded
> networks, to be handled through the IRON system - so all adoptors of
> the IRON "edge" ("EID") space would then get benefits of portability
> and multihoming for all incoming traffic.
>
> The VPs referred to here are not necessarily isolated.
>
> For instance, there could be four VPs on contiguous prefixes:
>
>    33.44.0.0 / 16
>    33.45.0.0 / 16
>    33.46.0.0 / 16
>    33.47.0.0 / 16
>
> which might each be handled within the IRON system by separate IRON
> routers.  To advertise this in the DFZ, an IRON router would only
> advertise a single prefix:
>
>    33.44.0.0 / 14
>
> Therefore, the VPs could be more numerous than the number of prefixes
> to be advertised in the DFZ, if this were adopted.
>
> Generally speaking, the more VPs there are the less each VP router
> needs to do, in terms of handling packets, having more-specific
> routes in its FIB etc.
>
> The more VPs there are, generally the greater the number of prefixes
> in the RIBs and FIBs of all IRON routers.  My guess is that there
> should be no more than a hundred thousand or so - which is presumably
> something BGP can handle.
>
> However, in the IRON ID, Fred my have implied a very much lower
> number of VPs, because he mentions (page 5) IPv6 /8 prefixes.  Even
> if the whole of IPv6's address space was used for global unicast
> address space, this would imply no more than 256 VPs.  I would have
> thought that 10,000 to 100,000 or 200,000 VPs would be a better way
> of spreading the load over multiple VP IRON routers.

I don't want to stipulate and exact maximum prefix length,
but O(10K) - O(100K) VPs in the IRON RIB sounds reasonable.

> In (msg05937) Fred mentions: "BAA::/16" as an example of a VP - so
> this anticipates there being many more than hundred of them.
>
> As long as "edge" space could be covered by some lower number of
> prefixes for advertising in the DFZ (such as 50,000) I think this
> would be fine.  However, the IRON proposal as I understand it does
> not anticipate advertising covering prefixes for "edge" space in the DFZ.
>
>
> IRON routers
> ------------
>
> IRON routers are not necessarily DFZ routers.  However, they are
> probably located topologically close to DFZ routers, near the borders
> of ISP networks and of other large networks such as PI-using
> corporations and universities etc. who may have their own DFZ
> routers, or who connect to the DFZ via one or more ISPs.
>
> In principle it would be possible to implement an IRON router in a
> DFZ router, but the intention of IRON is that the IRON routers are
> not DFZ routers.

Not necessarily; IRON routers can also be DFZ routers but
then they would participate in tow separate BGP instances;
one for core RLOC prefixes and the other for edge EID
prefixes (this latter being the IRON BGP control plane).

The purpose in mentioning that IRON routers need not also
be DFZ routers was simply to show an incremental deployment
strategy.

> IRON routers connect to internal routing systems of the networks of
> ISPs, and EUNs which advertise their own PI space in the DFZ -
> including those who do so directly, with their own DFZ routers and
> without using an ISP.
>
> IRON routers do not participate in the DFZ control plane.  They have
> their own BGP implementation and these are linked in sessions with
> other IRON routers to form the IRON BGP control plane (my term).
>
> The IRON BGP control plane is completely separate from the DFZ
> control plane.
>
> As far as I know, every IRON router must advertise the complete set
> of VPs (that is, the totality of the IRON-managed "edge" space) in
> the local routing systems of whatever ISP, corporation, university
> etc. they are located in.  As noted above, this would probably be a
> fraction of the number of VPs, since I assume that many VPs could be
> aggregated into shorter prefixes.
>
> Fred wrote in (msg05937):
>
> > I was actually thinking that the IRON routers would only advertise
> > "default" into the local routing system, but they could just as
> > well advertise 42.0.0.0 /16 if they wanted to.
>
> I think the IRON routers must advertise only the prefixes which cover
> all the "edge" space.  They couldn't advertise the default route,
> since this always leads "towards the rest of the Internet" until we
> get to a router which has no such default - a DFZ router - because
> some prefixes of "the rest of the Internet" have best paths out one
> interface and outer prefixes have best paths out one or more other
> interfaces.
>
> IRON routers need a peer connection to one or more internal or
> Border (typically DFZ) Routers  by which they can advertise the
> VP prefixes.  They also need an IP address by which they can send and
> receive packets from other IRON routers - potentially from any other
> IRON router in the world.
>
> They do not need a connection to any DFZ router.  As far as I
> understand IRON, there is no provision for them advertising the VP
> prefixes in the DFZ - however, as noted above, if some of IRON
> routers did this, this would be acting like DITRs or PTRs.
>
> IRON routers discover (in Fred's description) other nearby IRON
> routers, such as those in nearby ISPs, corporate networks etc.  I am
> unclear about multiple IRON routers in a single ISP, corporation etc.
> linking to each other.
>
> I guess that IRON routers could best be implemented, initially at
> least, as software in a server - though in the future these functions
> could be added to routers from the major vendors.
>
> Fred describes IRON routers discovering nearby routers via PRLs
> (Possible Router Lists) which are part of RANGER, or via some
> DNS-based methods.  I am interested in understanding IRON with as
> little as possible of RANGER, since I find RANGER vary open-ended,
> complex and hard to understand.  To me, it would be acceptable if
> each IRON router was manually configured with the IP addresses of a
> handful of "nearby" IRON routers.

Manual config is always possible. Autoconfiguration can
also be used when it is available, but static manual
config should be the base case.

> IRON routers set up their BGP sessions over VET/SEAL tunnels, using
> the internal "VET interface" construct.  I don't clearly understand
> VET, but I view it as some kind of software construct by which
> packets can be sent to remote devices - in this case other IRON
> routers, via SEAL tunnels, which are in themselves unidirectional,
> but which can be used in both directions to make a 2 way link.
>
> From the point of view of the IRON router, every other IRON router in
> the world is a "single hop" away, via VET - because the VET
> "interface" tunnels packets going outwards and receives tunnel
> packets coming in, for all IRON routers, just as if they were all
> directly connected (from BGP's point of view) to the non-physical VET
> "interface".
>
> So if an IRON router A has an IP address of another IRON router B, it
> can send it packets out the VET interface, and receive them from B as
> well.
>
> There is no need to establish a SEAL tunnel before sending any
> packets using such a tunnel.  When an IRON router A with address
> 22.33.44.55 sends a packet to an IRON router B, with address
> 66.77.88.99, it does so via its internal VET interface which uses
> SEAL to tunnel the packets, using the outer header destination
> address 66.77.88.99.  This is then forwarded out of the IRON router,
> into the local routing system, where it is (typically) forwarded to a
> DFZ router, various other DFZ routers and eventually (perhaps through
> some internal routers of the network in which B is located) to B
> using its 66.77.88.99 IP address.
>
> This tunnel behaves like a physical link, since via the VET
> interface, a packet can be sent from A to B which is not addressed to
> B - traffic packets can be sent just like they could be put out of a
> point-to-point link from one router another.  However, the
> "link" is a tunnel, typically across the DFZ, with SEAL's PMTUD
> mechanisms.
>
> The BGP sessions are made over these tunnels using the VET interface.
>
> According to the ID, these BGP sessions should be with IRON routers
> nearby.  However, I think that if there is only one VP router for
> each VP, it doesn't matter what the structure of the IRON BGP links
> is.
>
> Multiple IRON routers "owning" a VP are possible - I think the
> word "selected" means one or more such routers handing a single VP.
> Then, I think it would be important (but not absolutely essential)
> for each IRON router to know the IP address of the nearest one of the
> multiple VP routers for each VP.  This would only be possible if each
> IRON router generally had BGP sessions with the IRON routers of
> "nearby" ASNs other than its own - and if the global system of IRON
> routers had each one using the ASN of the network it was operating
> within.  "Nearby" means close according to the physical links between
> DFZ routers.  Only then would BGP's natural path selection mechanisms
> provide a given IRON router A with the IP address of the genuinely
> closest of multiple VP routers which were all advertising the same VP
> in the IRON BGP control plane.
>
>
> The New Zealand - Seattle example continued
> -------------------------------------------
>
> To continue the example from the previous discussion, a sending host
> (SH) in the North Island of New Zealand sends a packet to an edge
> address of a multihomed IRON-edge-address-using EUN of a tour company
>  in the Fox Glacier township.  The tour company's EID prefix is
> 43.0.56.76 /30 and the packet is addressed to 43.0.56.78.
>
> The tour company has this space multihomed via some kind of router at
> its site which connects to two ISPs in the South Island ISP-4 and
> ISP-5.  The ISP-4 link is via a fixed IP address DSL link with the
> address 33.22.22.33.  There's probably only a single fibre or cable
> going to this remote and marvellous part of New Zealand.  (Every
> establishment has its own generators because trees regularly fall
> down and bring down the power line, causing blackouts on a very
> frequent basis.)
>
> Lets imagine that ISP-5 has a 3G data network there and the tour
> company also has a suitable modem, with a fixed IP address service
> for this, on 55.66.66.55.  Or perhaps there is an expensive, slow,
> high-latency geosynchronous satellite service.  Normally, the tour
> company prefers data to come in via the DSL line.
>
> Somehow, in ISP-4 there is an IRON router D which can forward packets
> for the 43.0.56.76 /30 prefix to the tour-company's router via the
> DSL service.  Likewise ISP-5 has an IRON router E which can forward
> packets addressed to this prefix to the 3G modem.
>
> In this example, one of the thousands of VPs is 43.0.0.0 /16 - and
> this covers the EID prefix of the tour company.  In this example,
> only one IRON router advertises this VP in the IRON BGP control plane
> - a router B in Seattle.
>
> There must be some direct or indirect commercial relationship between
> the tour company and the ISP - or whatever kind of company it is -
> which runs the Seattle router.  The Seattle router "owns" this VP,
> which means its owners pay for its upkeep and connectivity - which
> means they must be paid directly or indirectly to do this by
> potentially thousands of companies such as the Fox Glacier tour
> company.  Maybe this is a branch office of a glacier tour company in
> Washington state - and they rented a larger set of "edge" space from
> the Seattle ISP, which was renting space in 43.0.0.0 /16 to thousands
> of EUNs.  These EUNs could be anywhere in the world. The do not need
> to be connected to the Seattle ISP to be able to use this "edge"
> space, which is managed by the IRON system.
>
>
> The packet from the North Island host is forwarded in the network of
> the Auckland ISP towards its IRON router A.  (Maybe it has more than
> one, but this will do.)  This is because A is advertising to the
> local routing system all the prefixes which cover IRON's "edge"
> address space, including a prefix such as 43.0.0.0 /16 or 43.0.0.0
> /14 which covers 43.0.56.78.
>
> The IRON router A may have BGP neighbours in the North and South
> Island, and perhaps a neighbour in Australia, Fiji or Los Angeles.
>
> The IRON routers form a globally connected system - all via VET/SEAL
> tunnels - to create their own BGP control plane.   By this means, the
> IRON router finds the best path for packets matching the VP 43.0.0.0
> /16 - and this best path is towards the IP address of the Seattle
> router - which is IRON router B.
>
> Generally, each IRON router in its RIB and FIB has a minimum set of
> things:
>
>   1 - The best paths for all the VPs.
>
>   2 - Best path for prefixes which cover the IP addresses of its
>       IRON BGP control plane neighbours.
>
> They may also have additional routes in their FIB alone, for two
> reasons, which are explained below.  One is a complete set of
> "more-specifics" in the VP router(s) FIB (not RIB) - all the EUN
> prefixes in that VP, of which there could be tens of thousands.  The
> other is individual such prefixes installed temporarily in the FIBs
> of IRON routers near the sending host, as a result of receiving a
> SEAL redirect message from a VP router it just tunneled a packet to.
>
> By means which are not at all clear to me, the Seattle router B has
> securely installed in its FIB (but not RIB) a prefix for 43.0.56.76
> /30 with a best path leading to the IP address of the IRON router D.
>  I am not sure how multihoming service restoration works in IRON,
> which I think must be a crucial function of this "registration" process.
>
>   See in msg05980 the mention of "bubbles".  Fred described in an
>   off-list message how the Fox Glacier township router could
>   propagate its prefix upwards in the routing system by means
>   of Router Advertisements.  I don't really understand these, and
>   as far as I know they are part of IPv6 only.  Hopefully he will
>   explain this better, especially for IPv4.
>
> The FIB of the Seattle router has an additional set of prefixes - a
> complete set of prefixes such as just described for all the other
> "edge"-using EUNs whose "edge" space is within the 43.0.0.0 /16
> prefix.  This could be thousands or tens of thousands of prefixes,
> since many EUNs will be fine with a single IPv4 address at each of
> their sites.  In principle, this /16 could have 2^16 separate EID
> prefixes - so this is a substantial addition to the FIB of the
> Seattle router.  In this example, so far, the Seattle router B is the
> only IRON router to be the VP router for this 43.0.0.0 /16 prefix.
>
> The IRON router A in Auckland finds that the packet matches the
> 43.0.0.0 /16 or 43.0.0.0 /14 prefix in its FIB, and that the best BGP
> path for packets matching this prefix ends in an IP address which is
> one of the IRON routers - since this best-path came via one of its
> IRON BGP neighbours.
>
> Through the magic of VET (which means I assume this can be done, but
> I don't exactly understand how) the A router tunnels the traffic
> packet to the Seattle router.  This means the encapsulated packet has
> the Seattle router's address as its outer destination address - and
> the A router forwards it to the local routing system, where it is
> forwarded towards a DFZ router, and so forwarded to the Seattle IRON
> router B, just like any other packet.
>
> The continually active tunnels between IRON BGP control plane
> neighbours primarily carry BGP messages.  These tunnels could also
> carry a traffic packet, tunnelled as just described.  Then the tunnel
> would already have been established, so SEAL would have state for it
> at both ends and would have figured out the PMTU in both directions.
>
> If we assume that the Auckland router A had never sent a packet to
> the Seattle router B, then this packet marks the beginning of a
> one-way tunnel from A to B, so the A router's SEAL tunneling software
> would instantiate new variables for the SEAL state for router B.
> This includes choosing a random 32 bit value for the first SEAL_ID
> value.  Subsequent packets will use values one more than the last.
>
> When the packet arrives at the Seattle router, it is decapsulated and
> emerges from the VET interface, to be handled by the FIB.
>
> It is possible that a packet sent to the Seattle router is addressed
> to a host in an EUN directly connected to that Seattle router.  In
> this case, as usually, this is not true.

Not necessarily. There may be significant portions
of the Seattle router's VP that are "at home" and
really use the Seattle router as their point of
attachment to the IRON. For those, the packet is
simply delivered and no redirect is sent.

> The Seattle router's FIB has a more-specific prefix which matches
> this destination address - the prefix 43.0.56.76 /30 which has a best
> path to IRON router D in the South Island - the one which has the DSL
> link to the tour company.
>
> The Seattle router now tunnels the packet to the router D.  This is
> on page 6:
>
> Translating the sentence:
>
>    'C' then forwards the packet to an IRON router 'D' which
>    connects the RANGER network where 'E' currently resides.
>
> to represent the current example:
>
>    The Seattle router 'B' then forwards the packet to an IRON router
>    'D' which connects the ISP-4 network where the tour company
>    currently prefers its packets to be delivered.
>
> However, "forward" in this sentence is not, as far as I know,
> ordinary forwarding in the DFZ.  The previous reference to "forward"
> was "forwards the packet via VET automatic tunneling" - so I think
> the second usage also implies VET automatic tunneling:
>
>    IRON router 'B' then consults its FIB and discovers a VP that
>    covers the 'E' prefix, then forwards the packet via VET automatic
>    tunneling to an IRON router 'C' that owns the VP.
>
> translated:
>
>    Auckland ISP IRON router 'A' then consults its FIB and discovers a
>    VP 43.0.0.0 /16 that covers the destination address 43.0.56.78,
>    then forwards the packet via VET automatic tunneling to an IRON
>    router 'B' in Seattle that owns the VP.

Right.

> So I think the Seattle router A uses VET tunneling to "forward" the
> packet to the IRON router D in the South Island - which will deliver
> it to the tour company's DSL service.
>
> The most obvious problem with this is that the packet had to traverse
> the Pacific Ocean and the Equator back and forth to get from the
> North to the South Island.
>
> This is where the RANGER "route optimization" comes into play.
>
> But how does the Seattle router B get the packet to router D in the
> ISP-4 of the South Island?
>
> I thought that B would use VET tunneling to D.
>
> However, what Fred told me about router discovery made me think that
> perhaps the tour company router, via D, does some kind of "bubble
> blowing" process by which D winds up with an FIB entry for the
> 43.0.56.76 /30 prefix, with a best path which leads to intermediate
> routers including D.
>
> I don't know how this would work for IPv6, much less IPv4 - or how it
> would scale considering there will be millions of EUN "edge"
> prefixes, like the one used by the tour company in the South Island.

It would work very similar to Teredo [RFC4380].

> Route optimization
> ------------------
>
> The B router in Seattle will send back a SEAL message, via a SEAL
> tunnel from B to A, to the A router in the North Island.  This tells
> the A router that for any packets addressed to the 43.0.56.76 /30
> prefix, it should on longer forward them on the path to the B router
> in Seattle, but should forward them directly to the IRON router D in
> the South Island.
>
> This is, in effect, a route redirect message.  It would also come
> with a caching time.

Right.

> This results in the installation of a "more-specific" prefix in the
> FIB of the A router in the North Island.  This has precedence over
> the 43.0.0.0 /16 or 43.0.0.0 /14 prefix which all IRON routers have.
>
> As best I understand Fred's plans, the A router will have a locally
> configured STALETIME, such as 120 seconds.  I understand that if no
> traffic packets use this new "more-specific" FIB entry within any 120
> second period, then it will be deleted.
>
> I understand that the A router also caches a SEAL_ID with this - the
> SEAL_ID which came with the redirect message, which itself was copied
> from the initial traffic packet which A sent to B.  So this SEAL_ID,
> which A generated, enabled A to authenticate the SEAL redirect message.
>
> I think it could also be used to authenticate a second redirect
> message from B, but as far as I know, B would not send such a
> message, at least in respect of the initial traffic packet.
>
> Now, as long as traffic packets keep arriving for this prefix less
> than 120 seconds after each other, and as long as the redirect's
> cache time has not expired - and as long as nothing else happens -
> the A router in the North Island will tunnel packets to the D router
> in the South Island, and all will be well.
>
> If the D router becomes unreachable, or if it cannot reach the router
> in the tour company (say the prodigious rainfall and stiff winds
> bring down another tree and pull down a fibre cable line which the
> DSL service depends upon), then the A router will delete this
> more-specific entry and its cached SEAL_ID.  This would only occur if
> the D router sent a destination unreachable message to the A router,
> or if the D router was somehow unreachable - but that would require
> some other router to send a destination unreachable, I think, since I
> understood that all IRON routers are presumed to be reachable via the
> VET interface.

Neighbor Unreachability Detection is used to detect
unreachable RLOCs so that other RLOCs can be tried.

> The next time a packet arrives at the A router, with a destination
> address matching the 43.0.56.76 /30 prefix, the A router will once
> again tunnel the packet to the B router in Seattle and the process
> will begin again.
>
> However, by now - by some means I don't fully understand - the B
> router in Seattle knows that the packet should be tunneled to the E
> router in the South Island, which uses a 3G link or whatever to the
> tour company's network.  So that is where the packet is sent by B,
> and the A router gets a redirect to the E router, rather than the D
> router.

Here is a piece that you may be missing. The initial
redirect from B would name *both* D and E as RLOCs
of IRON routers that know how to reach the Fox Glacier
network. So, if D goes down A will try E before it
gives up on the route altogether.

So, B needs to discover *all* of the other IRON routers
that service the Fox Glacier network - not just one.

> Somehow:
>
>   1 - The VP router (B in Seattle) already knew about both D and E as
>       being IRON routers which could accept packets addressed to the
>       43.0.56.76 /30.
>
>   2 - The VP router initially knew that both D and E were reachable,
>       and that they could reach the tour company's router.
>
>   3 - The VP router knew that the D router was preferred over the E
>       router.  (I don't know if this is possible via Router
>       Advertisements.)
>
>   4 - After the outage, the VP router was told that D could not be
>       used any more, so it altered the path in its FIB for the more
>       specific route 43.0.56.76 /30 to point the E router instead.

It doesn't need to be told that D can't be used anymore.
The periodic RAs coming from D would cease, and the cache
time for the RLOC for this FIB entry would expire. But,
the FIB entry would still be retained because E is still
reachable.

> Let's say the outage happened a minute after the first packet, and
> by some means the VP router in Seattle found out about it 10 seconds
> later.  Could the VP router send a second redirect to the A router?
> I guess it could, but as far as I know, this is not part of IRON.

No second redirects. Once the first redirect is sent
out, the IRON router that receives the redirect is
left to its own devices.

> The caching time in the redirect is to avoid the A router from
> sending packets for too long according to the redirect, when it
> should periodically forget the redirect and let the next packet(s) go
> to the VP router in Seattle, and await any redirect which results.
>
> The STALETIME value is to reduce unwanted clutter in the A router's
> FIB in the absence of them actually being used.

And also to more quickly discover any changes that may
have occurred at B without having to first try to reach
Fox Glacier directly using stale information.

> Multiple VP routers
> -------------------
>
> I understand there can be multiple routers such as the one in Seattle
> which advertise the 43.0.0.0 /16 Virtual Prefix in the IRON BGP
> control plane.
>
> This would have three advantages at least:
>
>   1 - The load for this prefix would be spread over more than one
>       VP router.
>
>   2 - There would be natural failure recovery - if the Seattle
>       router was down, whatever IRON routers had a path to it for
>       this prefix would adapt by choosing a path to another VP
>       router advertising the same prefix.
>
>   3 - Generally, subject to conditions discussed below, the A router
>       would find the closest of multiple VP routers - so reducing
>       total path lengths and delays for the first packet or packets.
>
>       There could be a flurry of packets sent from A to B before
>       B's redirect gets to A - especially if one or more of the
>       the redirect packets are lost.  So the B router in Seattle
>       would need to get all those packets to the correct D or E
>       router.

Right, but that's how redirects work on any link.
There could be multiple initial packets that travel
over the dogleg route before the direct route gets
installed.

> However, now the D and E routers need to communicate their
> "ownership" of 43.0.56.76 /30 to multiple VP routers all over the
> world.  Likewise their lack of ability to handle packets for this
> prefix if there is an outage.

There would never be a need to explicitly communicate
an outage after the initial reachability information
was propagated. IRON routers that receive the redirects
will be able to learn of outages via Neighbor
Unreachability Detection and/or ICMP Destination
Unreachables from other IRON routers.

> These VP routers could be anywhere in the world.
>
> So how does the proposed "blowing bubbles" method (I think based on
> IPv6 or RANGER Router Advertisements) scale properly?  Does it happen
> over the IRON BGP control plane only - or is it somehow a process
> which happens outside this?  The EUN router in the tour company
> office is not part of this control plane.
>
> I understand that this process is a continual one - the D and E
> routers need to keep doing it, based on some caching time in the VP
> routers, I guess.

Yes.

> There are going to be millions of these EUN prefixes, and for each
> one, if it is multihomed to two ISPs, there are going to be two IRON
> routers "blowing bubbles" in a manner which will continually reach
> one or more IRON VP routers anywhere in the world.

Yes.

> The selection of the "closest" VP router depends on the tunneled BGP
> neighbour links between all IRON routers generally following the
> "nearby" rule, based on the underlying physical topology over which
> DFZ BGP routers conduct their sessions.
>
> If there was only a single VP router for a given prefix of "edge"
> space such as 43.0.0.0 /16, then it doesn't matter how the IRON
> routers are connected.  It would be fine for a New Zealand IRON
> router to tunnel to IRON routers in Moscow, London and South Africa.
>
>
>
> What if?
> --------
>
> The above structure is interesting and unique.  TIDR had all the DFZ
> BRs (not transit DFZ routers) routers communicating via a second BGP
> instance - so it doesn't really solve one of the crucial parts of the
> routing scaling problem: reducing load on the DFZ control plane.  But
> IRON involves new routers, in similar places to DFZ routers,
> communicating in a way which does not burden the DFZ control plane at
> all.
>
> IRON uses a data-driven method of gaining "mapping" while also
> delivering the initial packet - without excessive delay and without a
> fancy new network such as the ALT network.
>
> The "map reply" is the SEAL redirect message.
>
> Why not forget about most of these IRON routers and simply have the
> VP router advertise its prefix in the DFZ?  Because then it can't
> send redirects to the routers closer to the sending host, since those
> routers are just ordinary routers, are not ready to accept such
> things, and because the VP router wouldn't be able to know their IP
> address.
>
> Why not have large numbers of VP routers?  This depends on how the D
> and E routers, and most or all other IRON routers handling millions
> of EUN "edge" prefixes, communicate their aliveness and IP address to
> the multiple VP routers.
>
> If there were a hundred VP routers, then maybe there wouldn't need to
> be any redirects - since one of them would be close enough to the
> path between the A router and either D or E for the system to work
> fine.  This degenerates into LISP with hundreds or tens of thousands
> of PTRs - and no other ITRs.  (Or Ivip with all DITRs, where every
> DITR advertises all the "edge" space, as MABs in the DFZ).  In both
> cases, the dominant problem would then be getting the "mapping" to
> these tens of thousands of routers, for the millions of EUN "edge"
> prefixes in a scalable, secure, fashion fast enough for multihoming
> service restoration controlled by the IRON routers which deliver
> packets to the EUNs.
>
>
>
> Draft Critique
> --------------
>
> I hope Fred will be able to comment on this - after he does, I will
> revise it and then hopefully move on to other proposals.
>
> This is about 750 words.  I can try chopping it down to 500 once I
> hear from Fred.   I will be making an ID of the full versions of all
> critiques which do not make it into the RRG Report, so a non-chopped
> down version can be in that ID.
>
>
>
>
> IRON-RANGER (hereafter "IRON") uses principles from RANGER, VET and
> SEAL to construct a Core-Edge Separation scalable routing solution.
> Separate IRON networks would be used for IPv4 and IPv6, but perhaps
> they could be combined in some way if this was desired.
>
> IRON does not have a mapping system such as that of LISP or Ivip.
>
> A single global network of IRON routers communicate over tunnels,
> each using their own BGP instance, to form the IRON BGP control
> plane.  This is unrelated to the DFZ's BGP control plane.  While each
> IRON router advertises all "edge" prefixes in the routing system of
> the networks they are based in (of ISPs and large corporations,
> universities etc.), the current IDs do not call for them to advertise
> any such prefixes in the DFZ.

I'm not sure I exactly followed the above. True, the
IRON routers have to advertise all VPs into the edge
networks. But, if there is one large prefix from which
all VPs are derived (e.g., 4::/3), then the IRON routers
only need to advertise that one prefix. (Similarly, if
there are only a few large prefixes then only those
prefixes need to be advertised.)

The IRON BGP control plane on the other hand carries
all VPs - there should be no more than O(10k) - O(100k)
of these.

> Therefore, as currently described,
> IRON could only support packets sent by all hosts if it was adopted
> by all such networks.  However, IRON could easily be adapted to do
> this by having multiple widely-dispersed IRON routers advertise the
> complete set of "edge" prefixes in the DFZ.
>
> Each IRON router processes packets addressed to "edge" addresses by
> forwarding them to a particular IRON router which, inside the IRON
> BGP control plane, advertises a particular Virtual Prefix.  There may
> be one or more of these VP routers for a given prefix, and the number
> of VP prefixes for the entire "edge" subset of the global unicast
> address space would be limited, in part, but the ability of the IRON
> BGP control plane to handle this number of prefixes.
>
> IRON routers peer with topologically nearby IRON routers to be their
> BGP neighbours.  When the traffic packet arrives at the VP router, it
> is forwarded (via a tunnel again?) to the IRON router which can
> deliver the packet to the destination network.

Yes; via a tunnel.

> The VP router also sends a SEAL redirect router to the first IRON
                                           ^^^^^^
                                       extraneous word

> router and thereafter, that first IRON router tunnels the packets
> directly to the IRON router which connects to the destination
> end-user network.
>
> The VP router's FIB for has more-specific routes for each end-user
> network prefix which is covered by this VP.
>
> There are unresolved scaling questions regarding:
>
>   1 - The ability of the initial IRON router to handle in its
>       FIB the temporarily installed more-specific routes due
>       to the redirect messages it receives from VP routers.
>
>   2 - Likewise, questions of FIB and/or route processor ability
>       to handle the churn in these, since they will typically
>       last for seconds or minutes, before having to be withdrawn
>       and perhaps replaced after a further redirect.

Neighbor Unreachability Detection (perhaps using the
SEAL explicit acknowledgement mechanism) can be used to
detect failures and withdraw information on a more
timely basis.

>   3 - The number of VP routers - more than one would be necessary
>       for robustness.
>
>   4 - The ability of the VP routers to discern which of the multiple
>       advertising IRON routers had the highest priority for use
>       in a multihoming scenario when both were advertising the one
>       end-user network "edge" prefix.

Route information carried in the Router Advertisements
and Redirects would contain metrics.

>   5 - The scaling problems inherent in these IRON routers advertising
>       their collectively millions of end-user "edge" prefixes all
>       over the IRON network, since the one or more VP routers could
>       be located anywhere with respect to these advertising IRON
>       routers.

The IRON routers only advertise their VPs in the control
plane. They inform other IRON routers of more-specifics
in the data plane and only on-demand of actual traffic.

>   6 - The speed with which VP routers can learn of outages detected
>       by the IRON routers which are capable of delivering packets to
>       the end-user networks.

Neighbor Unreachability Detection, e.g., based on SEAL
explicit acknowledgements.
>
> IRON is not yet described in sufficient detail for these questions to
> be answered.  It is not clear how, or if, it would implement load
> sharing or other forms of inbound TE.

VET specifies a method for load sharing across
multiple ETRs using equal-cost multipath.

> Nor is it clear what approach
> to mobility the system would adopt, or how this would scale to
> billions of mobile devices.

Using IPv6 as an example, mobility is based on a /56
or larger granularity and not, e.g., a /64 or /128
smaller granularity. So, IRON takes care of *mobile sites*,
while mobile end systems are accommodated via other
mechanisms such as HIP, MIPv6, etc.

> There is no current description of the business relationships between
> the various users and operators of routers - so it is difficult to
> envisage business arrangements in which costs are generally borne by
> those who benefit, without unfair burdens being placed on any
> participants.  Nor is there a description of how IRON could be
> introduced so as to provide portability, multihoming etc. for all
> packets received by an adopting network, before all networks have
> their own IRON routers.
>
> IRON is a novel CES architecture in an early stage of its design
> process.  It can be decentralised in every respect, and uses
> data-driven "redirect" messages as a form of mapping distribution.
> However, it is not yet clear how the VP routers learn the mapping for
> the end-user prefixes in their VP.  If this an be done in a secure,
> fast and scalable fashion - then IRON may be worth considering as a
> scalable routing system, at least for providing portability and
> multihoming to non-mobile end-user networks.

VET specifies how the end-user prefixes are pushed to
the IRON routers that own the VPs from which they are
derived. IRON uses RANGER, VET and SEAL.

Thanks for all of your time and effort on this,

Fred
fred.l.templin@boeing.com