Re: grow: anycast ops draft

Pekka Savola <pekkas@netcore.fi> Mon, 08 November 2004 12:25 UTC

Received: from darkwing.uoregon.edu (root@darkwing.uoregon.edu [128.223.142.13]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id HAA27546 for <grow-archive@lists.ietf.org>; Mon, 8 Nov 2004 07:25:07 -0500 (EST)
Received: from darkwing.uoregon.edu (majordom@localhost [127.0.0.1]) by darkwing.uoregon.edu (8.12.11/8.12.11) with ESMTP id iA8CNOHZ009415; Mon, 8 Nov 2004 04:23:24 -0800 (PST)
Received: (from majordom@localhost) by darkwing.uoregon.edu (8.12.11/8.12.11/Submit) id iA8CNO1h009414; Mon, 8 Nov 2004 04:23:24 -0800 (PST)
Received: from netcore.fi (netcore.fi [193.94.160.1]) by darkwing.uoregon.edu (8.12.11/8.12.11) with ESMTP id iA8CNM7I009367 for <grow@lists.uoregon.edu>; Mon, 8 Nov 2004 04:23:22 -0800 (PST)
Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id iA8CN6C09721; Mon, 8 Nov 2004 14:23:07 +0200
Date: Mon, 08 Nov 2004 14:23:06 +0200
From: Pekka Savola <pekkas@netcore.fi>
To: Joe Abley <jabley@isc.org>
cc: grow@lists.uoregon.edu
Subject: Re: grow: anycast ops draft
In-Reply-To: <18A3F56E-2C41-11D9-928C-000D93B24C7A@isc.org>
Message-ID: <Pine.LNX.4.61.0411081422130.9056@netcore.fi>
References: <18A3F56E-2C41-11D9-928C-000D93B24C7A@isc.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"; format="flowed"
Sender: owner-grow@lists.uoregon.edu
Precedence: bulk

On Mon, 1 Nov 2004, Joe Abley wrote:
> Kurtis Lindqvist and I squeezed this draft through before the cut-off:
>
>  http://www.ietf.org/internet-drafts/draft-kurtis-anycast-bcp-00.txt

I see Geoff already responded with a lot of good comments.  Some 
comments from me..

Overall I thought this was a very good and useful document, definitely
something we should be producing.  A loud 'yes' for the adoption for this as
WG item!

Obviously, as with all -00 I-Ds, there's a lot which can be improved. 
Below.

substantial
-----------

    Where a routing advertisement from a node corresponds to a single
    Service Address, this coupling might be such that availability of the
    service triggers the route advertisement, and non-availability of the
    service triggers a route withdrawal.  This can be achieved using
    routing protocol implementations on the same servers which provide
    the service being distributed.

==> The last sentence is a required but not sufficient condition.  The draft
should also expand the text (at least a paragraph) to say how the routing
protocol (or some other means) monitors the "health" of the service.

That is, just because the node participates in the routing protocol,
it does not offer a guarantee of the health of the service, just about
the health of the node.  And this is vital with anycast, as we don't want
the requests to be blackholed if the service does not work, but the nodes
are still operational.

    Where multiple Service Addresses are covered by the same covering
    route, there is no longer a tight coupling between the advertisement
    of that route and the individual services associated with the covered
    host routes.  The resulting impact on signaling availability of
    individual services is discussed in Section 4.4.1.

==> the draft goes on at significant length to describe hacks to work around
this problem, to be able to deploy multiple services at the same anycast
prefix.

The draft needs to expicitly state, in a separate paragraph, a simpler
alternative: "just don't do that".  It's sufficient for DNS at least.  Get a
different prefix for every service.  Assuming that something around
/24 is sufficient, there are still plenty of those.

That's much simpler operationally than running tunneling across different
nodes.

    For services which are distributed across the global Internet using
    BGP, equal-cost paths are normally not a consideration: BGP's exit
    selection algorithm usually selects a single, consistent exit for a
    single destination regardless of whether multiple candidate paths
    exist.  Implementations of BGP exist that support multi-path exit
    selection, however, and corner cases where dual selected exits route
    to different nodes are possible.  Analysis of the likely incidence of
    such corner cases for particular distributions of Anycast Nodes are
    recommended for services which involve multi-packet transactions.

==> I fear that the BGP ecmp has to become more prominent here.  AFAICS,
it's implemented and used especially for stub ASs which, for some reason,
want to provide this kind of more balanced load balancing.

4.4.5  Reverse Path Forwarding Checks


    Reverse Path Forwarding (RPF) checks, first described in [8], are
    commonly deployed as part of ingress interface packet filters on
    routers in the global Internet in order to deny packets whose source
    addresses are spoofed (see also [10]).  Deployed implementations of
    RPF make available two modes of operation: a loose mode, and a strict
    mode.


    Strict-mode RPF checks can cause non-spoofed packets to be denied
    when they originate from multi-homed site, since selected paths might
    legitimately not correspond with the ingress interface of non-spoofed
    packets from the multi-homed site.  A collection of anycast nodes
    deployed across the Internet is largely indistinguishable from a
    distributed, multi-homed site to the routing system, and hence this
    risk also exists for anycast nodes, even if individual nodes are not
    multi-homed.


    Care should be taken to ensure that strict-mode RPF is not enabled in
    peer networks connecting to anycast nodes.

==> I think there are a lot of problems with this text, and you may not have
thought it through.  A few comments:

  - the first and second paragraph should be referencing RFC3704, which
is discussing *exactly* these problems, and also introduces different
flavours of doing RPF.

  - the second and third paragraphes are not really accurate.  It depends on
how the preferences are selected.  If you hook up an anycast node to its
first-hop ISP, almost always the ISP will consider that exit preferable, so
strict RPF works.  (Where it may not work is further upstream, but there
strict RPF-like techniques are not applied in any case..)
Even if strict RPF did not work, you could still use
feasible path RPF as described in RFC3704.

The way I see it, the biggest issue with RPF-like techniques is when people
don't employ them, and use static access lists instead.  As anycast
addresses might not conform to the local topology, the guy/gal who has
written the ACLs might have forgotten the anycast prefix(es) completely.

    As is described in  having the Anycast Node avoid black-holing
    traffic in the event of a failure on the software or subsystem
    providing the service should be avoided.  As described, this can be
    done with withdrawing the announcement of the prefix corresponding to
    the service address, or the covering route.  However, the nodes could
    also try and handle the failure in a number of ways.  This can be
    with as also previously described tunneling to other instances of the
    Anycasted service, and using a IGP over the tunnels, route incoming
    client queries to the other destination.  The Anycasted node could
    also contain separate systems for trying to restart the service in
    question, and if successful again re-announce the service prefix.

==> this seems like a major operational pain in the ass, and a way to add
mysterious errors in the process.

Consider for example the effect of tunneling on MTU, as discussed in
draft-savola-mtufrag-tunneling-02.txt.  If the anycast is used in such a
manner (non-DNS/UDP) that the MTU is maxed out, the tunneling likely
will cause problems in some cases.  Does the tunnel implementation do
fragmenting -- if so, even with DF bit set for v4, and what about v6? If
not, and just sends ICMP errors back, the service just mysteriously fails
(under these rare self-healing conditions, making it very difficult to
debug) for those poor bastards who are inappropriately filtering the ICMP
packet too big messages.

All in all, I'd think tunneling is a bad option here and we should work
around with that by recommending appropriate allocation policies.  It's
still a good idea to discuss the problems of tunneling though -- I'd suggest
an appendix for that, if it's too prominent in the spec.

6.  Security Considerations

==> this section probably needs to discuss, with a short paragraph, issues
relating to availability.  A badly implemented anycast solution has a good
chance to decrease the availability rather than increase it.
As any anycast node is capable of blackholing its
users, it's often also good sense to provide multiple independent
addresses/services to achieve the same service.  E.g., there should be
a couple of addresses in DNS for service X, which would be anycasted by
differerent sets of nodes.






semi-editorial
--------------
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
    document are to be interpreted as described in RFC 2119.

==> if you use these keywords (and I'm not sure whether it's a good idea
because this is an operational document, not a document describing how
to make interoperable implementations, you must also add normative
reference to it.

    Anycast Node: an internally-connected collection of hosts and routers
       which together provide service for an anycast service address.

==> the way you say this is confusing '.. hosts and routers', and can be
read two ways:
  1) "one instance providing a particular anycast service", or
  2) "an anycast service, including all the nodes that implement it"
I think you mean the latter, but it requires rewording to make that clear. 
I suggest you don't use the plural tense.

==> please also add linebreaks between different definitions ?

    4.  Triangulation of traffic sources, in the case of attack (or
        query) traffic which incorporates spoofed source addresses;

==> this is too brief, it's not clear what you mean here; please expand.

    Some services have very short transaction times, and may even be
    carried out using a single packet request and a single packet reply
    in some cases (the DNS is an example of this).

==> s/DNS/DNS using UDP/

    Some anycast deployments have very predictable routing systems, which
    can remain stable for long periods of time (e.g.  anycast within an
    IGP, where node selection changes only occur as a response to node
    failures).  Other deployments have far less predictable
    characteristics (e.g.  a densely-deployed array of nodes across the
    global Internet).

==> you make this sound as if IGPs would have intrinsic characteristics to
make them typically stable compared to BGP.  I'd consider adding:
s/IGP/well-managed IGP, as described in Section 4.4.3,/

    When a service is anycast within an IGP the routing system is
    typically under the control of the same organization who is providing
    the service, and hence the relationship between service transaction
    characteristics and network stability are likely to be relatively
    well-understood.  This technique is consequently applicable to a
    larger number of applications than Internet-wide anycast service
    distribution (see Section 4.1).

==> as above, to provide truth in advertising, this section needs to discuss
or provide pointer to 4.4.3 and some other appropriate sections on why IGPs
may not be better than inter-domain unless they have been designed properly.


   As is described in  having the Anycast Node avoid black-holing

==> missing reference here..

7.  Protocol Considerations


    This document does not impose any protocol considerations.

==> is this section needed?

9  References

==> split to informative/normative.



editorial
---------
Lindqvist & Abley        Expires April 22, 2005                 [Page 1]
Internet-Draft                  Anycast                     October 2004

==> the abbrev should be something a bit more descriptive than plain
"Anycast" ?

  Since it is usually a requirement that a single
    client-server interaction is carried out between a client the same
    server node for the duration of the transaction, it follows that the
    routing system's node selection decision ought to be stable for an
    order of magnitude longer than the expected transaction time, if the
    service is to be provided reliably.

==> s/client the same/client and the same/ ?

    o  A root server service might be distributed throughout the Internet
       with nodes located in regions with poor external connectivity, to
       ensure that the DNS functions adequately within the region during
       times of external network failure.

==> s/root/root DNS/

   well-documented examples of import polices which filter on RIR

==> spell out RIR ?

    AS:es at multiple locations and could therefor propagate the same

==> s/AS:/AS/
==> s/therefor/therefore/

    When Anycast services are deployed across networks operated by
    others, their reachability is dependent on routing polices and
    topology changes (planned and unplanned) which are unpredictable and
    sometimes difficult to identify.

==> only if there were routing polices, but no.. s/police/policies/ :)

    Codependencies are avoided by making each node as autonomous and

==> s/Code/Co-de/ (easier to read because doesn't start as "code") ?

    facilities on the Internet such as RIPE's Routing
    Information Service [14] and the University of
           Oregon Route
    Views Project [15].

==> some screwed up formatting here..

-- 
Pekka Savola                 "You each name yourselves king, yet the
Netcore Oy                    kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
_________________________________________________________________
web user interface: http://darkwing.uoregon.edu/~llynch/grow.html
web archive:        http://darkwing.uoregon.edu/~llynch/grow/