Re: [rrg] Anycast in the core architecture - sep. OK; elim not?
Robin Whittle <rw@firstpr.com.au> Thu, 23 April 2009 04:10 UTC
Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 31BC13A6EA9 for <rrg@core3.amsl.com>; Wed, 22 Apr 2009 21:10:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.72
X-Spam-Level:
X-Spam-Status: No, score=-0.72 tagged_above=-999 required=5 tests=[AWL=-0.629, BAYES_05=-1.11, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TeOsWfzE7G6P for <rrg@core3.amsl.com>; Wed, 22 Apr 2009 21:10:20 -0700 (PDT)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id 7987B3A6945 for <rrg@irtf.org>; Wed, 22 Apr 2009 21:10:19 -0700 (PDT)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id E1DB3175D36; Thu, 23 Apr 2009 14:11:35 +1000 (EST)
Message-ID: <49EFEA00.9080403@firstpr.com.au>
Date: Thu, 23 Apr 2009 14:09:36 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
References: <3c3e3fca0904190913s72f519a8p3cab3ec22e73d379@mail.gmail.com> <49EB54EC.5060801@tony.li> <3c3e3fca0904191632n1b93b9fy48f3b94f4ce518bd@mail.gmail.com> <49EC0200.9060600@tony.li> <3c3e3fca0904201453i614ea451h58f09d50b09365c4@mail.gmail.com> <49ED604C.2020407@tony.li> <3c3e3fca0904211453pe3b51as1c282a495fa274d2@mail.gmail.com> <49EE9718.9060100@firstpr.com.au>
In-Reply-To: <49EE9718.9060100@firstpr.com.au>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: Re: [rrg] Anycast in the core architecture - sep. OK; elim not?
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Apr 2009 04:10:23 -0000
Short version: Summarising and extending my previous message to include more about anycast with a core-edge separation scheme and to explore it with a core-edge elimination scheme. Replying to Bill about how global anycast would work with core-edge elimination. In summary: Global anycast has its uses, but it is highly unscalable, since there needs to be a separate prefix advertised in the DFZ for every set of anycast servers - for each anycast IP address. A core-edge separation scheme could achieve the same benefits, by using anycast *ETRs*. This has no benefits over the conventional BGP-based approach and is equally unscalable. A core-edge elimination scheme *might* be able to do global anycast too - but it would be more complex than with the conventional BGP approach. There would be no scaling benefits - it would be just as unscalable as the conventional BGP approach. However, a core-edge elimination scheme, once fully adopted, does not allow end-user networks to have their own PI prefixes. So it would not be possible to use the conventional BGP approach. I will write a separate message on Disaster Recovery. Hi Bill, To summarise and extend: http://www.ietf.org/mail-archive/web/rrg/current/msg04897.html For core-edge separation schemes: 1 - The routing system - the BGP-based DFZ and probably some other routers, plus the ITRs, ETRs etc. of the core-edge separation scheme - can't tell the difference between host unicast and host anycast. This is another way of saying they will work fine for packets for both scenarios. 2 - However, just host-anycast with a core-edge separation scheme does not have the same effect on packet flow in the DFZ as conventional BGP-based host anycast with each host having its own separate BGP router. The main, perhaps only, reasons people are interested in conventional BGP-based host anycast are (AFAIK) as follows. These are all dependent on the normal behaviour of the BGP routing system. a - "Shortest" (generally, in BGP terms) path to the nearest router which advertises the prefix of its anycast host. b - Automatic failure recovery as long as the router stops advertising the prefix if the one or more hosts using this prefix dies. If so, the other BGP routers will soon get the packets which would have gone to this one. c - Load sharing over many hosts, in geographically and topologically diverse sites, which gives the system a high capacity and a great resistance to failure without involving DNS in any way, since it always responds to the one IP address. This is also extremely important as a way of achieving high total bandwidth to survive DoS attacks with floods of incoming packets. It may also be desired and possible to: d - Imply something about sending host location from which of the anycast BGP routers got the packet (as you are doing with your project which you mentioned at the end of msg04894) - but AFAIK this information is not used for the most prominent BGP-based host anycast usage: root nameservers. Host-anycast with a core-edge separation scheme and a single (unicast) ETR (Fig 9 as noted below) involves no significant difference in the way packets traverse the DFZ compared to conventional BGP-based unicast (Fig 3). Fig 1 (msg04897) shows BGP-based host unicast - with two border routers advertising the prefix. This is ordinary operation for many hosts today, which use PA space on an ISP with two DFZ border routers each with its own upstream link to separate parts of the DFZ. Fig 3 shows core-edge separation with a unicast host and a unicast ETR, with two border routers advertising the ETR's prefix. This is likewise basic, ordinary, core-edge separation operation as with LISP, APT, Ivip or TRRP - where the ETR address is in a PA prefix of an ISP with two BRs. Fig 9 - if we ignore the bottom half - is just like Fig 3 with a unicast ETR, but there is host anycast at the one local network the ETR connects to. So using a core-edge separation scheme to do host-anycast with a single (unicast) ETR (rather than anycast ETRs) brings none of the benefits of conventional BGP-based anycast, because the ITR->ETR tunnel is conventional unicast. 3 - To achieve the goals listed above - a, b, c and perhaps d - with a core-edge separation scheme, it is necessary to have anycast *ETRs*. This is the only way of achieving the desired packet paths through the DFZ and of using the DFZ routers to automatically send the packet to the nearest router of an ETR which handles the EID (micronet) prefix of the multiple anycast hosts. Fig 11 shows conventional BGP-based host anycast: R2------R5------>DH1 / \ | SH---->R1 R4---R6 \ / | R3------R7------>DH2 | R9------R10----->DH3 / \ SH2----R12--R13 Four destination hosts, each \ / one anycast in a global sense. R14-----R11----->DH4 Fig 11 To achieve the same DFZ packet flow patterns with a core-edge separation scheme, you would use anycast ETRs as with Fig 8: ITRs ETRs R2------R5-->E1->DH1 / \ | Each of four ISPs has an SH-I1->R1 R4---R6 anycast ETR. So each ISP's \ / | router (R5, R7, R10 & R14) R3------R7-->E2->DH2 is advertising the same | prefix. R9------R10->E3->DH3 / \ SH2-I2-R12--R13 Four destination hosts, each \ / one anycast in a global sense. R14-----R11->E4->DH4 Fig 8 This is anycast ETRs (all with the same "RLOC" = Locator address) and anycast hosts (all with the same EID = Identifier = micronet address). You could also achieve the same DFZ packet paths with four identically anycast ETRs, each with a VPN or a dedicated link of some kind to a data centre: R2------R5-->E1 / \ | \ SH-I1->R1 R4---R6 \ \ / | \ R3------R7-->E2---\ | [Data center] R9------R10->E3---/ / \ / SH2-I2-R12--R13 / \ / / R14-----R11->E4 Fig 12 where you could have a single host, which is therefore globally unicast. You could also have multiple hosts on the same address at the data centre, and therefore locally anycast. Depending on how the data centre's router linked the VPNs to the hosts, you could make the system behave like Fig 8, or you could have all the incoming packets go to one router, which picks the "closest" (or however the IGP chooses the next hop) router of each of the locally anycast hosts. You could also have some kind of load distributor which behaves like a single host but spreads packets out to a server farm, ideally maintaining the same distribution over sessions, so TCP etc. works fine. 4 - So with a core-edge separation scheme, you can get the benefits of conventional BGP-based anycast, but only by using anycast ETRs. Whether or not these anycast ETRs use their own (anycast) hosts (Fig 8) or whether they link in some way to one or to multiple hosts (Fig 12) is a separate issue. 5 - Conventional BGP-based anycast (Fig 11) is even more unscalable than ordinary PI prefixes for end-user networks, for two reasons at least: A - Each router needs to withdraw the prefix if its host dies. Therefore, the prefix can only realistically be used by one host (or perhaps a load-balanced host farm), in which case the router still needs to withdraw the prefix from the DFZ as soon as the host or host farm dies. Therefore, the prefix can only be realistically used for a single host (or host farm) at each location. Maybe each such host uses multiple IP addresses, each of which is globally anycast, but whether they have one or multiple such addresses doesn't alter the fact that the router needs to stop advertising the entire prefix if the host or host farm dies. (Likewise, for Fig 12, if the VPN link dies.) Therefore, the prefix can't be shared by multiple end-user networks, or perhaps even by the one end-user network trying to run two separate anycast servers from each router - each such host (or host farm) needs its own prefix so the router can withdraw it if the host or host farm dies. B - Even if the above was not true, it is unlikely that two separate end-user networks would want to run their anycast hosts at exactly the same sites as some other end-user network. Therefore, conventional BGP-based anycast is just as unscalable as any other end-user PI prefix, plus the additional constraint that if the network wanted to run two anycast servers at each site, it would need to advertise two separate prefixes at each site. 6 - While a core-edge separation scheme *could* be used to achieve similar "anycast" packet flows in the DFZ, by using anycast ETRs, this is no more scalable than the conventional BGP-based host anycast, for reasons identical to those in point 5, but for which the term "host or host farm" is replaced by "ETR". 7 - So AFAIK, there is no benefit at all using a core-edge separation scheme with anycast ETRs, to achieve the same or similar DFZ packet flow patterns of conventional BGP-based anycast. The KISS principle therefore rules out the use of core-edge separation schemes to achieve the same DFZ packet flow benefits of conventional BGP- based anycast. 8 - If the potentially anycast hosts are in a single data centre, then you could use conventional BGP techniques as you do with your project (Fig 9) or you could use a core-edge separation scheme to do it (Figs 10 & 12) - but the KISS principle favours the conventional BGP approach. In short, a core-edge separation scheme can be used to achieve the same benefits of conventional BGP-based anycast and/or fancier VPN and data-centre based arrangements like your project. However, there are no scalability benefits in doing so, and the system would probably be more complex than using ordinary BGP methods. I didn't write anything specific about core-edge elimination schemes, in which the hosts have one or more, potentially unstable, "locator" addresses (much like a MIP Care of Address, I think) but in which the applications identify the hosts with persistent Identifier addresses and so are able to maintain sessions while each host: 1 - Is mobile - rapidly gaining and losing locator addresses. 2 - Is portable - the host retains it identifier address whenever a new ISP is used, giving the host a new locator address. 3 - Needs to use a previously unused locator address from ISP-2 due to the currently used one (from ISP-1) ceasing to function - for instance in multihoming service restoration. HIP is an example of a core-edge elimination scheme. HIP is not a practical solution to the routing scaling problem, because it involves host changes (stack and I think API and applications) to work, and since the multihoming, TE and portability it could bring only work with similarly HIP-upgraded hosts. Since all core-edge elimination schemes involve some new addressing arrangements (HIP uses a unique namespace for host identifiers) I think none of them are practical solutions to the routing scaling problem, since we must rely on voluntary adoption to introduce this solution to most and ideally all end-user networks over a period of years: http://www.firstpr.com.au/ip/ivip/RRG-2009/constraints/ Since AFAIK no-one else has done so, I will now try to imagine how a core-edge elimination scheme (as I understand it) could do something resembling conventional BGP-based anycast. There are no ITRs or ETRs in a core-edge elimination scheme. Just today's basic routing system, but with fancier hosts which can keep operating, maintain sessions etc., despite using one or more potentially unstable locator addresses. The only benefits of anycast, as far as I can see, are those listed above: a, b, c and perhaps d. To achieve these, you need to replicate the patterns of DFZ packet travel, in which packets typically travel to the "nearest" router of a functioning anycast host. This relies on BGP or whatever other routing systems are involved. To achieve this, a core-edge elimination scheme needs to have multiple routers advertising the same prefix (or at least a prefix which includes the "anycast" address). In the core-edge elimination scheme, this is a locator address. So all the "anycast" hosts need the same locator address. I will refer to the one or more ordinary hosts which is communicating with one or more of the "anycast" hosts as the Correspondent Host (CH). This has one or more, potentially unstable, unicast locator addresses and a single, globally unique, identifier. For any CH to be able to communicate with the one or more "anycast" hosts, it needs to be able to regard that set of hosts as behaving like one conventional host. (Unless perhaps the core-edge elimination scheme has special provisions for implementing something equivalent to anycast.) Therefore, all the "anycast" hosts must have the same identifier. This is because at any time, the CH could have its packets sent to a different "anycast" host than the one the packets were sent to a moment before. This could be due to: 1 - The CH starts sending packets from a different locator address. 2 - Something in the routing system changes, so the packets are now sent to a different router which is advertising the prefix of the "anycast" hosts. This includes changes due to of the "anycast" hosts going down, or coming up. Note that this precludes something which would ordinarily be acceptable for hosts in a core-edge elimination scheme: One of the "anycast" hosts changing its locator address and therefore requiring its router (or perhaps some other router) to advertise a different prefix for its new locator address. This can't work, because the only way this "anycast" scheme can achieve the desired pattern of packet flow in the DFZ is for all the hosts to have the same locator address. So it looks like you *could* use a core-edge elimination scheme to achieve similar DFZ packet flows as with conventional BGP-based anycast and so achieve goals a, b, c and perhaps d (with a VPN or some other link to a central data-centre). However, you would have to accept the following restrictions, at least: 1 - You have to organise all the hosts to have the same locator and identifier. (So how do you talk to them for admin purposes? Can a host have multiple identifiers, one of which is the same for all these hosts and another which is unique?) 2 - As with 5 A above, you need to devote an entire BGP prefix to this particular set of anycast servers. 3 - As noted below, you actually need an end-user network specific prefix, which is at odds with how a core-edge elimination scheme is supposed to work. Therefore, any attempt to use a core-edge elimination scheme for "anycast" is: 1 - Probably more complex and constrained than the conventional approach. 2 - Does not allow any scaling benefits which ordinarily come from a core-edge elimination scheme. 3 - Is just as unscalable as conventional BGP-based host anycast. The scaling benefits of a core-edge elimination scheme result from all hosts in end-user networks getting at least one address from each of their one or more ISPs - where those addresses always come from the large (lots of addresses = short prefix) prefixes which belong to the ISPs. In other words: end-user networks do not have their own prefixes and work perfectly well on PA addresses from the ISP's well aggregated (highly scalable) prefixes. For instance, one ISP site may have one or a few such prefixes which may have millions of IPv4 addresses or bazillions of IPv6 /64s. Each such prefix is a single entry in the DFZ routing table but supports the needs of potentially thousands, millions or whatever end-user networks. In summary, assuming the goals of "global host anycast" a, b, c and perhaps d: Conventional BGP approach as used today: Works fine. Requires a single prefix for each such set of anycast hosts (point 5 A above). Therefore, is unscalable to a high degree - even more so than an ordinary end-user PI prefix, since this end-user PI prefix for anycast hosts needs to be dedicated to the one set of hosts: the one set on a (typically) single IP address. Core-edge separation, using anycast ETRs. Works about as well as the conventional approach, but is equally unscalable for the same reasons. Since it is more complex, it is probably best to avoid it and use the conventional BGP approach instead. Core-edge elimination Could be made to work, but is more complex than the conventional approach and is just as bad in terms of scaling. However, if the Internet was somehow converted to a core-edge elimination approach, it would be impossible to do the conventional BGP approach, in which there was no distinction between identifier and locator (the IP address means both). So adoption of a core-edge elimination approach would probably make global anycasting more difficult than at present. In contrast, a core-edge separation scheme could do it, in a probably more difficult manner. - without getting in the way of using the simpler BGP approach. *** This is all assuming the core-edge separation scheme is not intended to prohibit end-user networks having their own conventional PI prefixes, as is required to do conventional BGP anycast or its anycast ETR equivalent. IOW, the "separation" is only partial and is not intended to ever be complete. I recall that a goal of APT (as currently defined - maybe this will change) is the eventual complete separation of the "core" address space (ISP prefixes for ITRs, ETRs and Default Mappers etc.) and the "edge" address space (EIDs for end-user network hosts). http://tools.ietf.org/html/draft-jen-apt-01#section-3 The definitions of "transit space" and "delivery space" imply a complete separation of the global unicast address space into two classes. So "transit" addresses are RLOCs for ITRs and ETRs and "delivery" space addresses are EIDs for end-user network hosts and their internal routers etc. The definitions include two separate routing scopes or "areas". I understand this as meaning complete separation, since it would be no route for a packet from any "transit" address to any "delivery" address or vice-versa. These would be two separate subsets, in various interspersed prefixes, in the current global unicast address range. I think this is an unnecessary and probably unachievable goal. Another aim of APT (IIRC, but I can't easily find a reference) is to improve security by somehow trusting packets sent from the "core" (transit) subset of the address space in a different manner from those sent from the "edge" (delivery) space. I think this would be very difficult to do robustly, since a single attacker with control of a host with an ISP's (transit) address could do whatever they liked in the "core". Ivip, and I think LISP, do not have a specific goal of complete separation, as does APT. So with Ivip and I think LISP, it would always be possible for an end-user network to use a PI address for a set of globally anycast hosts using the conventional BGP approach, or to use the prefix for the anycast ETR approach. It seems that any global use of anycast is inherently costly to the DFZ routing system - since it involves a specific prefix for every set of anycast servers. This is fine for a limited number of sets which are widely regarded as being important to everyone - such DNS root servers. However, if everyone and his dog (Aussie phrase) starts doing it, for purposes such as your project (end of msg04894) then the burden all these place on the DFZ control plane would come under increasing scrutiny. You wrote, in: http://www.ietf.org/mail-archive/web/rrg/current/msg04901.html > Hi Robin, > > What's the difference between anycast and unicast? Is it a > difference in the forwarding or data planes? Or is it a difference > in how the respondent machine(s) understand the address? > > It's the latter of course. Yes, but the reason most people want to do it on a global scale is because of the way the packets flow in the DFZ - to the "nearest" (in BGP terms) router advertising the prefix which encompasses an operational host. > In strategy B, Core-edge elimination schemes: http://bill.herrin.us/network/rrgarchitectures.html http://www.firstpr.com.au/ip/ivip/rrgarch/ http://tools.ietf.org/html/draft-irtf-rrg-recommendation-02 > the host's understanding of > each packet moves out of the packet forwarding system entirely and > into the map used by the respective hosts. Yes - the upper levels of the stack and all the applications don't care about the "locator" address with which packets are sent and received. All they care about is the "identifier" of the hosts concerned. > A strategy-B host literally > doesn't care what layer-3 addresses were attached to the received > packet. Like I said! > The address in the routing system only retains semantics > associated with the forwarding process itself. So, as long as the > mapping system adequately supports something that looks like > anycast, the whole strategy B system does as well. This is a rather mathematical, rule-based, way of looking at it - which doesn't seem to recognise some pertinent principles. AFAIK, the only reasons you want anycast on a global basis are those reasons a, b, c and perhaps d listed above in point 2. These all relate to the way the packets travel in the DFZ, which is determined by the conventional routing system and the *locator* (layer 3) destination address of the packets. So to achieve these benefits with a core-edge elimination scheme, all the routers need to advertise a prefix which contains the single *locator* IP address of all the anycast hosts AND (AFAIK) all those hosts need to have the same identifier. AFAIK, having multiple hosts with the same locator is a violation of the general principles behind core-edge elimination schemes. Likewise, perhaps even more so, is the concept of multiple hosts having the same identifier. Yet it brings you no benefits over the conventional BGP approach - and worse still, AFAIK, you with a core-edge elimination scheme in place and used by all hosts, you wont be able to use the conventional BGP approach. APT, it seems, once fully deployed, would also preclude the conventional BGP approach or the anycast ETR approach, unless you fashioned your end-user network as an ISP, got yourself an RLOC (core = transit) prefix and set up the equivalent of an ISP network, with Default Mappers ETRs and ITRs etc. in a unified manner somehow located at your several chosen sites. I will respond to what you wrote about disaster recovery in a separate thread. - Robin
- Re: [rrg] Anycast in the core architecture Jeroen Massar
- [rrg] Anycast in the core architecture William Herrin
- Re: [rrg] Anycast in the core architecture Tony Li
- Re: [rrg] Anycast in the core architecture Joel M. Halpern
- Re: [rrg] Anycast in the core architecture William Herrin
- Re: [rrg] Anycast in the core architecture Robin Whittle
- Re: [rrg] Anycast in the core architecture Tony Li
- Re: [rrg] Anycast in the core architecture Dino Farinacci
- Re: [rrg] Anycast in the core architecture Scott Brim
- Re: [rrg] Anycast in the core architecture William Herrin
- Re: [rrg] Anycast in the core architecture Robin Whittle
- Re: [rrg] Anycast in the core architecture Tony Li
- Re: [rrg] Anycast in the core architecture Tony Li
- Re: [rrg] Anycast in the core architecture HeinerHummel
- Re: [rrg] Anycast in the core architecture Patrick Frejborg
- Re: [rrg] Anycast in the core architecture HeinerHummel
- Re: [rrg] Anycast in the core architecture William Herrin
- Re: [rrg] Anycast in the core architecture HeinerHummel
- Re: [rrg] Anycast in the core architecture - sep.… Robin Whittle
- Re: [rrg] Anycast in the core architecture HeinerHummel
- Re: [rrg] Anycast in the core architecture Tony Li
- Re: [rrg] Anycast in the core architecture William Herrin
- Re: [rrg] Anycast in the core architecture HeinerHummel
- Re: [rrg] Anycast in the core architecture Robin Whittle
- Re: [rrg] Anycast in the core architecture - sep.… Robin Whittle
- Re: [rrg] Anycast in the core architecture - sep.… William Herrin
- Re: [rrg] Anycast in the core architecture Eliot Lear
- [rrg] Adding a Distance Server for anycast / disa… Robin Whittle
- Re: [rrg] Adding a Distance Server for anycast / … Robin Whittle
- Re: [rrg] Anycast in the core architecture HeinerHummel
- Re: [rrg] Anycast in the core architecture Tony Li