Re: [rrg] Anycast in the core architecture

Re: [rrg] Anycast in the core architecture - sep. OK; elim not?

Robin Whittle <rw@firstpr.com.au> Thu, 23 April 2009 04:10 UTC

Message-ID: <49EFEA00.9080403@firstpr.com.au>
Date: Thu, 23 Apr 2009 14:09:36 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
References: <3c3e3fca0904190913s72f519a8p3cab3ec22e73d379@mail.gmail.com> <49EB54EC.5060801@tony.li> <3c3e3fca0904191632n1b93b9fy48f3b94f4ce518bd@mail.gmail.com> <49EC0200.9060600@tony.li> <3c3e3fca0904201453i614ea451h58f09d50b09365c4@mail.gmail.com> <49ED604C.2020407@tony.li> <3c3e3fca0904211453pe3b51as1c282a495fa274d2@mail.gmail.com> <49EE9718.9060100@firstpr.com.au>
In-Reply-To: <49EE9718.9060100@firstpr.com.au>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: Re: [rrg] Anycast in the core architecture - sep. OK; elim not?
Precedence: list

Short version: Summarising and extending my previous message to
               include more about anycast with a core-edge
               separation scheme and to explore it with
               a core-edge elimination scheme.

               Replying to Bill about how global anycast would
               work with core-edge elimination.

               In summary:

                  Global anycast has its uses, but it is highly
                  unscalable, since there needs to be a separate
                  prefix advertised in the DFZ for every set of
                  anycast servers - for each anycast IP address.

                  A core-edge separation scheme could achieve the
                  same benefits, by using anycast *ETRs*.  This
                  has no benefits over the conventional BGP-based
                  approach and is equally unscalable.

                  A core-edge elimination scheme *might* be able
                  to do global anycast too - but it would be more
                  complex than with the conventional BGP approach.
                  There would be no scaling benefits - it would
                  be just as unscalable as the conventional BGP
                  approach.  However, a core-edge elimination
                  scheme, once fully adopted, does not allow
                  end-user networks to have their own PI prefixes.
                  So it would not be possible to use the conventional
                  BGP approach.

               I will write a separate message on Disaster Recovery.

Hi Bill,

To summarise and extend:

  http://www.ietf.org/mail-archive/web/rrg/current/msg04897.html

For core-edge separation schemes:

  1 - The routing system - the BGP-based DFZ and probably some other
      routers, plus the ITRs, ETRs etc. of the core-edge separation
      scheme - can't tell the difference between host unicast and
      host anycast.  This is another way of saying they will work
      fine for packets for both scenarios.


  2 - However, just host-anycast with a core-edge separation scheme
      does not have the same effect on packet flow in the DFZ as
      conventional BGP-based host anycast with each host having its
      own separate BGP router.

      The main, perhaps only, reasons people are interested in
      conventional BGP-based host anycast are (AFAIK) as follows.
      These are all dependent on the normal behaviour of the BGP
      routing system.

      a - "Shortest" (generally, in BGP terms) path to the nearest
          router which advertises the prefix of its anycast host.

      b - Automatic failure recovery as long as the router stops
          advertising the prefix if the one or more hosts using
          this prefix dies.  If so, the other BGP routers will
          soon get the packets which would have gone to this one.

      c - Load sharing over many hosts, in geographically and
          topologically diverse sites, which gives the system
          a high capacity and a great resistance to failure
          without involving DNS in any way, since it always
          responds to the one IP address.  This is also extremely
          important as a way of achieving high total bandwidth
          to survive DoS attacks with floods of incoming packets.

      It may also be desired and possible to:

      d - Imply something about sending host location from which
          of the anycast BGP routers got the packet (as you are
          doing with your project which you mentioned at the end
          of msg04894) - but AFAIK this information is not used for
          the most prominent BGP-based host anycast usage: root
          nameservers.

      Host-anycast with a core-edge separation scheme and a single
      (unicast) ETR (Fig 9 as noted below) involves no significant
      difference in the way packets traverse the DFZ compared to
      conventional BGP-based unicast (Fig 3).

      Fig 1 (msg04897) shows BGP-based host unicast - with two border
      routers advertising the prefix.  This is ordinary operation
      for many hosts today, which use PA space on an ISP with
      two DFZ border routers each with its own upstream link to
      separate parts of the DFZ.

      Fig 3 shows core-edge separation with a unicast host and a
      unicast ETR, with two border routers advertising the ETR's
      prefix.  This is likewise basic, ordinary, core-edge
      separation operation as with LISP, APT, Ivip or TRRP - where
      the ETR address is in a PA prefix of an ISP with two BRs.

      Fig 9 - if we ignore the bottom half - is just like Fig 3
      with a unicast ETR, but there is host anycast at the
      one local network the ETR connects to.

      So using a core-edge separation scheme to do host-anycast
      with a single (unicast) ETR (rather than anycast ETRs) brings
      none of the benefits of conventional BGP-based anycast, because
      the ITR->ETR tunnel is conventional unicast.


  3 - To achieve the goals listed above - a, b, c and perhaps d -
      with a core-edge separation scheme, it is necessary to have
      anycast *ETRs*.  This is the only way of achieving the
      desired packet paths through the DFZ and of using the DFZ
      routers to automatically send the packet to the nearest
      router of an ETR which handles the EID (micronet) prefix
      of the multiple anycast hosts.

      Fig 11 shows conventional BGP-based host anycast:

                  R2------R5------>DH1
                 /  \     |
        SH---->R1   R4---R6
                 \  /     |
                  R3------R7------>DH2
                   |
                  R9------R10----->DH3
                 /  \
        SH2----R12--R13                Four destination hosts, each

                 \  /                  one anycast in a global sense.
                  R14-----R11----->DH4
        Fig 11

      To achieve the same DFZ packet flow patterns with a core-edge
      separation scheme, you would use anycast ETRs as with Fig 8:

           ITRs                ETRs

                  R2------R5-->E1->DH1
                 /  \     |            Each of four ISPs has an
        SH-I1->R1   R4---R6            anycast ETR.  So each ISP's
                 \  /     |            router (R5, R7, R10 & R14)
                  R3------R7-->E2->DH2 is advertising the same
                   |                   prefix.
                  R9------R10->E3->DH3
                 /  \
        SH2-I2-R12--R13                Four destination hosts, each

                 \  /                  one anycast in a global sense.
                  R14-----R11->E4->DH4
        Fig 8

      This is anycast ETRs (all with the same "RLOC" = Locator
      address) and anycast hosts (all with the same EID =
      Identifier = micronet address).


      You could also achieve the same DFZ packet paths with four
      identically anycast ETRs, each with a VPN or a dedicated
      link of some kind to a data centre:

                  R2------R5-->E1
                 /  \     |      \
        SH-I1->R1   R4---R6       \
                 \  /     |        \
                  R3------R7-->E2---\
                   |                [Data center]
                  R9------R10->E3---/
                 /  \              /
        SH2-I2-R12--R13           /
                 \  /            /
                  R14-----R11->E4
        Fig 12

      where you could have a single host, which is therefore
      globally unicast.  You could also have multiple hosts
      on the same address at the data centre, and therefore
      locally anycast.  Depending on how the data centre's
      router linked the VPNs to the hosts, you could make the
      system behave like Fig 8, or you could have all the
      incoming packets go to one router, which picks the
      "closest" (or however the IGP chooses the next hop)
      router of each of the locally anycast hosts.

      You could also have some kind of load distributor
      which behaves like a single host but spreads packets
      out to a server farm, ideally maintaining the same
      distribution over sessions, so TCP etc. works fine.


  4 - So with a core-edge separation scheme, you can get the
      benefits of conventional BGP-based anycast, but only
      by using anycast ETRs.  Whether or not these anycast
      ETRs use their own (anycast) hosts (Fig 8) or whether
      they link in some way to one or to multiple hosts
      (Fig 12) is a separate issue.


  5 - Conventional BGP-based anycast (Fig 11) is even more
      unscalable than ordinary PI prefixes for end-user
      networks, for two reasons at least:

        A - Each router needs to withdraw the prefix if
            its host dies.  Therefore, the prefix can
            only realistically be used by one host (or
            perhaps a load-balanced host farm), in which
            case the router still needs to withdraw the
            prefix from the DFZ as soon as the host or
            host farm dies.

            Therefore, the prefix can only be realistically
            used for a single host (or host farm) at each
            location.  Maybe each such host uses multiple IP
            addresses, each of which is globally anycast,
            but whether they have one or multiple such
            addresses doesn't alter the fact that the router
            needs to stop advertising the entire prefix if
            the host or host farm dies.  (Likewise, for
            Fig 12, if the VPN link dies.)

            Therefore, the prefix can't be shared by
            multiple end-user networks, or perhaps even
            by the one end-user network trying to run
            two separate anycast servers from each router -
            each such host (or host farm) needs its own
            prefix so the router can withdraw it if the
            host or host farm dies.

        B - Even if the above was not true, it is unlikely
            that two separate end-user networks would want
            to run their anycast hosts at exactly the same
            sites as some other end-user network.

      Therefore, conventional BGP-based anycast is just as
      unscalable as any other end-user PI prefix, plus the
      additional constraint that if the network wanted to
      run two anycast servers at each site, it would
      need to advertise two separate prefixes at each site.


  6 - While a core-edge separation scheme *could* be used
      to achieve similar "anycast" packet flows in the
      DFZ, by using anycast ETRs, this is no more scalable
      than the conventional BGP-based host anycast, for
      reasons identical to those in point 5, but for which
      the term "host or host farm" is replaced by "ETR".


  7 - So AFAIK, there is no benefit at all using a core-edge
      separation scheme with anycast ETRs, to achieve the
      same or similar DFZ packet flow patterns of conventional
      BGP-based anycast.  The KISS principle therefore rules
      out the use of core-edge separation schemes to achieve
      the same DFZ packet flow benefits of conventional BGP-
      based anycast.


  8 - If the potentially anycast hosts are in a single data
      centre, then you could use conventional BGP techniques
      as you do with your project (Fig 9) or you could use
      a core-edge separation scheme to do it (Figs 10 & 12)
      - but the KISS principle favours the conventional BGP
      approach.

In short, a core-edge separation scheme can be used to achieve the
same benefits of conventional BGP-based anycast and/or fancier VPN
and data-centre based arrangements like your project.  However, there
are no scalability benefits in doing so, and the system would
probably be more complex than using ordinary BGP methods.


I didn't write anything specific about core-edge elimination schemes,
in which the hosts have one or more, potentially unstable, "locator"
addresses (much like a MIP Care of Address, I think) but in which the
applications identify the hosts with persistent Identifier addresses
and so are able to maintain sessions while each host:

  1 - Is mobile - rapidly gaining and losing locator addresses.

  2 - Is portable - the host retains it identifier address whenever
      a new ISP is used, giving the host a new locator address.

  3 - Needs to use a previously unused locator address from ISP-2
      due to the currently used one (from ISP-1) ceasing to function
      - for instance in multihoming service restoration.

HIP is an example of a core-edge elimination scheme.  HIP is not a
practical solution to the routing scaling problem, because it
involves host changes (stack and I think API and applications) to
work, and since the multihoming, TE and portability it could bring
only work with similarly HIP-upgraded hosts.

Since all core-edge elimination schemes involve some new addressing
arrangements (HIP uses a unique namespace for host identifiers) I
think none of them are practical solutions to the routing scaling
problem, since we must rely on voluntary adoption to introduce this
solution to most and ideally all end-user networks over a period of
years:

  http://www.firstpr.com.au/ip/ivip/RRG-2009/constraints/


Since AFAIK no-one else has done so, I will now try to imagine how a
    core-edge elimination scheme (as I understand it) could do
something resembling conventional BGP-based anycast.

There are no ITRs or ETRs in a core-edge elimination scheme.  Just
today's basic routing system, but with fancier hosts which can keep
operating, maintain sessions etc., despite using one or more
potentially unstable locator addresses.

The only benefits of anycast, as far as I can see, are those listed
above: a, b, c and perhaps d.  To achieve these, you need to
replicate the patterns of DFZ packet travel, in which packets
typically travel to the "nearest" router of a functioning anycast host.

This relies on BGP or whatever other routing systems are involved.
To achieve this, a core-edge elimination scheme needs to have
multiple routers advertising the same prefix (or at least a prefix
which includes the "anycast" address).

In the core-edge elimination scheme, this is a locator address.  So
all the "anycast" hosts need the same locator address.

I will refer to the one or more ordinary hosts which is communicating
with one or more of the "anycast" hosts as the Correspondent Host
(CH).  This has one or more, potentially unstable, unicast locator
addresses and a single, globally unique, identifier.

For any CH to be able to communicate with the one or more "anycast"
hosts, it needs to be able to regard that set of hosts as behaving
like one conventional host.  (Unless perhaps the core-edge
elimination scheme has special provisions for implementing something
equivalent to anycast.)  Therefore, all the "anycast" hosts must have
the same identifier.  This is because at any time, the CH could have
its packets sent to a different "anycast" host than the one the
packets were sent to a moment before.

This could be due to:

  1 - The CH starts sending packets from a different locator address.

  2 - Something in the routing system changes, so the packets are now
      sent to a different router which is advertising the prefix of
      the "anycast" hosts.  This includes changes due to of the
      "anycast" hosts going down, or coming up.

Note that this precludes something which would ordinarily be
acceptable for hosts in a core-edge elimination scheme:

         One of the "anycast" hosts changing its locator address and
         therefore requiring its router (or perhaps some other
         router) to advertise a different prefix for its new locator
         address.

This can't work, because the only way this "anycast" scheme can
achieve the desired pattern of packet flow in the DFZ is for all
the hosts to have the same locator address.


So it looks like you *could* use a core-edge elimination scheme to
achieve similar DFZ packet flows as with conventional BGP-based
anycast and so achieve goals a, b, c and perhaps d (with a VPN or
some other link to a central data-centre).

However, you would have to accept the following restrictions, at least:

  1 - You have to organise all the hosts to have the same locator
      and identifier.  (So how do you talk to them for admin
      purposes?  Can a host have multiple identifiers, one of
      which is the same for all these hosts and another which is
      unique?)

  2 - As with 5 A above, you need to devote an entire BGP prefix
      to this particular set of anycast servers.

  3 - As noted below, you actually need an end-user network specific
      prefix, which is at odds with how a core-edge elimination
      scheme is supposed to work.

Therefore, any attempt to use a core-edge elimination scheme for
"anycast" is:

  1 - Probably more complex and constrained than the conventional
      approach.

  2 - Does not allow any scaling benefits which ordinarily come
      from a core-edge elimination scheme.

  3 - Is just as unscalable as conventional BGP-based host anycast.

The scaling benefits of a core-edge elimination scheme result from
all hosts in end-user networks getting at least one address from each
of their one or more ISPs - where those addresses always come from
the large (lots of addresses = short prefix) prefixes which belong to
the ISPs.  In other words: end-user networks do not have their own
prefixes and work perfectly well on PA addresses from the ISP's well
aggregated (highly scalable) prefixes.  For instance, one ISP site
may have one or a few such prefixes which may have millions of IPv4
addresses or bazillions of IPv6 /64s.  Each such prefix is a single
entry in the DFZ routing table but supports the needs of potentially
thousands, millions or whatever end-user networks.


In summary, assuming the goals of "global host anycast" a, b, c and
perhaps d:

   Conventional BGP approach as used today:

      Works fine.  Requires a single prefix for each such set of
      anycast hosts (point 5 A above).  Therefore, is unscalable to a
      high degree - even more so than an ordinary end-user PI prefix,
      since this end-user PI prefix for anycast hosts needs to be
      dedicated to the one set of hosts: the one set on a (typically)
      single IP address.

   Core-edge separation, using anycast ETRs.

      Works about as well as the conventional approach, but is
      equally unscalable for the same reasons.  Since it is more
      complex, it is probably best to avoid it and use the
      conventional BGP approach instead.

   Core-edge elimination

      Could be made to work, but is more complex than the
      conventional approach and is just as bad in terms of
      scaling.   However, if the Internet was somehow converted
      to a core-edge elimination approach, it would be impossible
      to do the conventional BGP approach, in which there was no
      distinction between identifier and locator (the IP address
      means both).

So adoption of a core-edge elimination approach would probably make
global anycasting more difficult than at present.

In contrast, a core-edge separation scheme could do it, in a probably
more difficult manner. - without getting in the way of using the
simpler BGP approach.

***  This is all assuming the core-edge separation scheme is not
     intended to prohibit end-user networks having their own
     conventional PI prefixes, as is required to do conventional
     BGP anycast or its anycast ETR equivalent.  IOW, the
     "separation" is only partial and is not intended to ever be
     complete.

     I recall that a goal of APT (as currently defined - maybe this
     will change) is the eventual complete separation of the "core"
     address space (ISP prefixes for ITRs, ETRs and Default Mappers
     etc.) and  the "edge" address space (EIDs for end-user network
     hosts).

        http://tools.ietf.org/html/draft-jen-apt-01#section-3

     The definitions of "transit space" and "delivery space" imply
     a complete separation of the global unicast address space into
     two classes.  So "transit" addresses are RLOCs for ITRs and
     ETRs and "delivery" space addresses are EIDs for end-user
     network hosts and their internal routers etc.   The definitions
     include two separate routing scopes or "areas".  I understand
     this as meaning complete separation, since it would be
     no route for a packet from any "transit" address to any
     "delivery" address or vice-versa.

     These would be two separate subsets, in various interspersed
     prefixes, in the current global unicast address range.

     I think this is an unnecessary and probably unachievable goal.

     Another aim of APT (IIRC, but I can't easily find a reference)
     is to improve security by somehow trusting packets sent from the
     "core" (transit) subset of the address space in a different
     manner from those sent from the "edge" (delivery) space.  I
     think this would be very difficult to do robustly, since a
     single attacker with control of a host with an ISP's (transit)
     address could do whatever they liked in the "core".

     Ivip, and I think LISP, do not have a specific goal of
     complete separation, as does APT.  So with Ivip and I think
     LISP, it would always be possible for an end-user network to
     use a PI address for a set of globally anycast hosts using
     the conventional BGP approach, or to use the prefix for
     the anycast ETR approach.

It seems that any global use of anycast is inherently costly to the
DFZ routing system -  since it involves a specific prefix for every
set of anycast servers.

This is fine for a limited number of sets which are widely regarded
as being important to everyone - such DNS root servers.   However, if
everyone and his dog (Aussie phrase) starts doing it, for purposes
such as your project (end of msg04894) then the burden all these
place on the DFZ control plane would come under increasing scrutiny.



You wrote, in:

  http://www.ietf.org/mail-archive/web/rrg/current/msg04901.html

> Hi Robin,
>
> What's the difference between anycast and unicast? Is it a
> difference in the forwarding or data planes? Or is it a difference
> in how the respondent machine(s) understand the address?
>
> It's the latter of course.

Yes, but the reason most people want to do it on a global scale is
because of the way the packets flow in the DFZ - to the "nearest" (in
BGP terms) router advertising the prefix which encompasses an
operational host.


> In strategy B,

Core-edge elimination schemes:

  http://bill.herrin.us/network/rrgarchitectures.html
  http://www.firstpr.com.au/ip/ivip/rrgarch/
  http://tools.ietf.org/html/draft-irtf-rrg-recommendation-02

> the host's understanding of
> each packet moves out of the packet forwarding system entirely and
> into the map used by the respective hosts.

Yes - the upper levels of the stack and all the applications don't
care about the "locator" address with which packets are sent and
received.  All they care about is the "identifier" of the hosts
concerned.

> A strategy-B host literally
> doesn't care what layer-3 addresses were attached to the received
> packet.

Like I said!

> The address in the routing system only retains semantics
> associated with the forwarding process itself. So, as long as the
> mapping system adequately supports something that looks like
> anycast, the whole strategy B system does as well.

This is a rather mathematical, rule-based, way of looking at it -
which doesn't seem to recognise some pertinent principles.

AFAIK, the only reasons you want anycast on a global basis are those
reasons a, b, c and perhaps d listed above in point 2.  These all
relate to the way the packets travel in the DFZ, which is determined
by the conventional routing system and the *locator* (layer 3)
destination address of the packets.

So to achieve these benefits with a core-edge elimination scheme, all
the routers need to advertise a prefix which contains the single
*locator* IP address of all the anycast hosts AND (AFAIK) all those
hosts need to have the same identifier.

AFAIK, having multiple hosts with the same locator is a violation of
the general principles behind core-edge elimination schemes.
Likewise, perhaps even more so, is the concept of multiple hosts
having the same identifier.

Yet it brings you no benefits over the conventional BGP approach -
and worse still, AFAIK, you with a core-edge elimination scheme in
place and used by all hosts, you wont be able to use the conventional
BGP approach.

APT, it seems, once fully deployed, would also preclude the
conventional BGP approach or the anycast ETR approach, unless you
fashioned your end-user network as an ISP, got yourself an RLOC (core
= transit) prefix and set up the equivalent of an ISP network, with
Default Mappers ETRs and ITRs etc. in a unified manner somehow
located at your several chosen sites.

I will respond to what you wrote about disaster recovery in a
separate thread.


 - Robin

Re: [rrg] Anycast in the core architecture Jeroen Massar
[rrg] Anycast in the core architecture William Herrin
Re: [rrg] Anycast in the core architecture Tony Li
Re: [rrg] Anycast in the core architecture Joel M. Halpern
Re: [rrg] Anycast in the core architecture William Herrin
Re: [rrg] Anycast in the core architecture Robin Whittle
Re: [rrg] Anycast in the core architecture Tony Li
Re: [rrg] Anycast in the core architecture Dino Farinacci
Re: [rrg] Anycast in the core architecture Scott Brim
Re: [rrg] Anycast in the core architecture William Herrin
Re: [rrg] Anycast in the core architecture Robin Whittle
Re: [rrg] Anycast in the core architecture Tony Li
Re: [rrg] Anycast in the core architecture Tony Li
Re: [rrg] Anycast in the core architecture HeinerHummel
Re: [rrg] Anycast in the core architecture Patrick Frejborg
Re: [rrg] Anycast in the core architecture HeinerHummel
Re: [rrg] Anycast in the core architecture William Herrin
Re: [rrg] Anycast in the core architecture HeinerHummel
Re: [rrg] Anycast in the core architecture - sep.… Robin Whittle
Re: [rrg] Anycast in the core architecture HeinerHummel
Re: [rrg] Anycast in the core architecture Tony Li
Re: [rrg] Anycast in the core architecture William Herrin
Re: [rrg] Anycast in the core architecture HeinerHummel
Re: [rrg] Anycast in the core architecture Robin Whittle
Re: [rrg] Anycast in the core architecture - sep.… Robin Whittle
Re: [rrg] Anycast in the core architecture - sep.… William Herrin
Re: [rrg] Anycast in the core architecture Eliot Lear
[rrg] Adding a Distance Server for anycast / disa… Robin Whittle
Re: [rrg] Adding a Distance Server for anycast / … Robin Whittle
Re: [rrg] Anycast in the core architecture HeinerHummel
Re: [rrg] Anycast in the core architecture Tony Li