Re: grow: anycast ops draft
Pekka Savola <pekkas@netcore.fi> Mon, 08 November 2004 12:25 UTC
Received: from darkwing.uoregon.edu (root@darkwing.uoregon.edu [128.223.142.13]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id HAA27546 for <grow-archive@lists.ietf.org>; Mon, 8 Nov 2004 07:25:07 -0500 (EST)
Received: from darkwing.uoregon.edu (majordom@localhost [127.0.0.1]) by darkwing.uoregon.edu (8.12.11/8.12.11) with ESMTP id iA8CNOHZ009415; Mon, 8 Nov 2004 04:23:24 -0800 (PST)
Received: (from majordom@localhost) by darkwing.uoregon.edu (8.12.11/8.12.11/Submit) id iA8CNO1h009414; Mon, 8 Nov 2004 04:23:24 -0800 (PST)
Received: from netcore.fi (netcore.fi [193.94.160.1]) by darkwing.uoregon.edu (8.12.11/8.12.11) with ESMTP id iA8CNM7I009367 for <grow@lists.uoregon.edu>; Mon, 8 Nov 2004 04:23:22 -0800 (PST)
Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id iA8CN6C09721; Mon, 8 Nov 2004 14:23:07 +0200
Date: Mon, 08 Nov 2004 14:23:06 +0200
From: Pekka Savola <pekkas@netcore.fi>
To: Joe Abley <jabley@isc.org>
cc: grow@lists.uoregon.edu
Subject: Re: grow: anycast ops draft
In-Reply-To: <18A3F56E-2C41-11D9-928C-000D93B24C7A@isc.org>
Message-ID: <Pine.LNX.4.61.0411081422130.9056@netcore.fi>
References: <18A3F56E-2C41-11D9-928C-000D93B24C7A@isc.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"; format="flowed"
Sender: owner-grow@lists.uoregon.edu
Precedence: bulk
On Mon, 1 Nov 2004, Joe Abley wrote: > Kurtis Lindqvist and I squeezed this draft through before the cut-off: > > http://www.ietf.org/internet-drafts/draft-kurtis-anycast-bcp-00.txt I see Geoff already responded with a lot of good comments. Some comments from me.. Overall I thought this was a very good and useful document, definitely something we should be producing. A loud 'yes' for the adoption for this as WG item! Obviously, as with all -00 I-Ds, there's a lot which can be improved. Below. substantial ----------- Where a routing advertisement from a node corresponds to a single Service Address, this coupling might be such that availability of the service triggers the route advertisement, and non-availability of the service triggers a route withdrawal. This can be achieved using routing protocol implementations on the same servers which provide the service being distributed. ==> The last sentence is a required but not sufficient condition. The draft should also expand the text (at least a paragraph) to say how the routing protocol (or some other means) monitors the "health" of the service. That is, just because the node participates in the routing protocol, it does not offer a guarantee of the health of the service, just about the health of the node. And this is vital with anycast, as we don't want the requests to be blackholed if the service does not work, but the nodes are still operational. Where multiple Service Addresses are covered by the same covering route, there is no longer a tight coupling between the advertisement of that route and the individual services associated with the covered host routes. The resulting impact on signaling availability of individual services is discussed in Section 4.4.1. ==> the draft goes on at significant length to describe hacks to work around this problem, to be able to deploy multiple services at the same anycast prefix. The draft needs to expicitly state, in a separate paragraph, a simpler alternative: "just don't do that". It's sufficient for DNS at least. Get a different prefix for every service. Assuming that something around /24 is sufficient, there are still plenty of those. That's much simpler operationally than running tunneling across different nodes. For services which are distributed across the global Internet using BGP, equal-cost paths are normally not a consideration: BGP's exit selection algorithm usually selects a single, consistent exit for a single destination regardless of whether multiple candidate paths exist. Implementations of BGP exist that support multi-path exit selection, however, and corner cases where dual selected exits route to different nodes are possible. Analysis of the likely incidence of such corner cases for particular distributions of Anycast Nodes are recommended for services which involve multi-packet transactions. ==> I fear that the BGP ecmp has to become more prominent here. AFAICS, it's implemented and used especially for stub ASs which, for some reason, want to provide this kind of more balanced load balancing. 4.4.5 Reverse Path Forwarding Checks Reverse Path Forwarding (RPF) checks, first described in [8], are commonly deployed as part of ingress interface packet filters on routers in the global Internet in order to deny packets whose source addresses are spoofed (see also [10]). Deployed implementations of RPF make available two modes of operation: a loose mode, and a strict mode. Strict-mode RPF checks can cause non-spoofed packets to be denied when they originate from multi-homed site, since selected paths might legitimately not correspond with the ingress interface of non-spoofed packets from the multi-homed site. A collection of anycast nodes deployed across the Internet is largely indistinguishable from a distributed, multi-homed site to the routing system, and hence this risk also exists for anycast nodes, even if individual nodes are not multi-homed. Care should be taken to ensure that strict-mode RPF is not enabled in peer networks connecting to anycast nodes. ==> I think there are a lot of problems with this text, and you may not have thought it through. A few comments: - the first and second paragraph should be referencing RFC3704, which is discussing *exactly* these problems, and also introduces different flavours of doing RPF. - the second and third paragraphes are not really accurate. It depends on how the preferences are selected. If you hook up an anycast node to its first-hop ISP, almost always the ISP will consider that exit preferable, so strict RPF works. (Where it may not work is further upstream, but there strict RPF-like techniques are not applied in any case..) Even if strict RPF did not work, you could still use feasible path RPF as described in RFC3704. The way I see it, the biggest issue with RPF-like techniques is when people don't employ them, and use static access lists instead. As anycast addresses might not conform to the local topology, the guy/gal who has written the ACLs might have forgotten the anycast prefix(es) completely. As is described in having the Anycast Node avoid black-holing traffic in the event of a failure on the software or subsystem providing the service should be avoided. As described, this can be done with withdrawing the announcement of the prefix corresponding to the service address, or the covering route. However, the nodes could also try and handle the failure in a number of ways. This can be with as also previously described tunneling to other instances of the Anycasted service, and using a IGP over the tunnels, route incoming client queries to the other destination. The Anycasted node could also contain separate systems for trying to restart the service in question, and if successful again re-announce the service prefix. ==> this seems like a major operational pain in the ass, and a way to add mysterious errors in the process. Consider for example the effect of tunneling on MTU, as discussed in draft-savola-mtufrag-tunneling-02.txt. If the anycast is used in such a manner (non-DNS/UDP) that the MTU is maxed out, the tunneling likely will cause problems in some cases. Does the tunnel implementation do fragmenting -- if so, even with DF bit set for v4, and what about v6? If not, and just sends ICMP errors back, the service just mysteriously fails (under these rare self-healing conditions, making it very difficult to debug) for those poor bastards who are inappropriately filtering the ICMP packet too big messages. All in all, I'd think tunneling is a bad option here and we should work around with that by recommending appropriate allocation policies. It's still a good idea to discuss the problems of tunneling though -- I'd suggest an appendix for that, if it's too prominent in the spec. 6. Security Considerations ==> this section probably needs to discuss, with a short paragraph, issues relating to availability. A badly implemented anycast solution has a good chance to decrease the availability rather than increase it. As any anycast node is capable of blackholing its users, it's often also good sense to provide multiple independent addresses/services to achieve the same service. E.g., there should be a couple of addresses in DNS for service X, which would be anycasted by differerent sets of nodes. semi-editorial -------------- The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. ==> if you use these keywords (and I'm not sure whether it's a good idea because this is an operational document, not a document describing how to make interoperable implementations, you must also add normative reference to it. Anycast Node: an internally-connected collection of hosts and routers which together provide service for an anycast service address. ==> the way you say this is confusing '.. hosts and routers', and can be read two ways: 1) "one instance providing a particular anycast service", or 2) "an anycast service, including all the nodes that implement it" I think you mean the latter, but it requires rewording to make that clear. I suggest you don't use the plural tense. ==> please also add linebreaks between different definitions ? 4. Triangulation of traffic sources, in the case of attack (or query) traffic which incorporates spoofed source addresses; ==> this is too brief, it's not clear what you mean here; please expand. Some services have very short transaction times, and may even be carried out using a single packet request and a single packet reply in some cases (the DNS is an example of this). ==> s/DNS/DNS using UDP/ Some anycast deployments have very predictable routing systems, which can remain stable for long periods of time (e.g. anycast within an IGP, where node selection changes only occur as a response to node failures). Other deployments have far less predictable characteristics (e.g. a densely-deployed array of nodes across the global Internet). ==> you make this sound as if IGPs would have intrinsic characteristics to make them typically stable compared to BGP. I'd consider adding: s/IGP/well-managed IGP, as described in Section 4.4.3,/ When a service is anycast within an IGP the routing system is typically under the control of the same organization who is providing the service, and hence the relationship between service transaction characteristics and network stability are likely to be relatively well-understood. This technique is consequently applicable to a larger number of applications than Internet-wide anycast service distribution (see Section 4.1). ==> as above, to provide truth in advertising, this section needs to discuss or provide pointer to 4.4.3 and some other appropriate sections on why IGPs may not be better than inter-domain unless they have been designed properly. As is described in having the Anycast Node avoid black-holing ==> missing reference here.. 7. Protocol Considerations This document does not impose any protocol considerations. ==> is this section needed? 9 References ==> split to informative/normative. editorial --------- Lindqvist & Abley Expires April 22, 2005 [Page 1] Internet-Draft Anycast October 2004 ==> the abbrev should be something a bit more descriptive than plain "Anycast" ? Since it is usually a requirement that a single client-server interaction is carried out between a client the same server node for the duration of the transaction, it follows that the routing system's node selection decision ought to be stable for an order of magnitude longer than the expected transaction time, if the service is to be provided reliably. ==> s/client the same/client and the same/ ? o A root server service might be distributed throughout the Internet with nodes located in regions with poor external connectivity, to ensure that the DNS functions adequately within the region during times of external network failure. ==> s/root/root DNS/ well-documented examples of import polices which filter on RIR ==> spell out RIR ? AS:es at multiple locations and could therefor propagate the same ==> s/AS:/AS/ ==> s/therefor/therefore/ When Anycast services are deployed across networks operated by others, their reachability is dependent on routing polices and topology changes (planned and unplanned) which are unpredictable and sometimes difficult to identify. ==> only if there were routing polices, but no.. s/police/policies/ :) Codependencies are avoided by making each node as autonomous and ==> s/Code/Co-de/ (easier to read because doesn't start as "code") ? facilities on the Internet such as RIPE's Routing Information Service [14] and the University of Oregon Route Views Project [15]. ==> some screwed up formatting here.. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings _________________________________________________________________ web user interface: http://darkwing.uoregon.edu/~llynch/grow.html web archive: http://darkwing.uoregon.edu/~llynch/grow/
- Re: grow: anycast ops draft Geoff Huston
- grow: anycast ops draft Joe Abley
- Re: grow: anycast ops draft Pekka Savola
- Re: grow: anycast ops draft Kurt Erik Lindqvist
- Re: grow: anycast ops draft Kurt Erik Lindqvist
- Re: grow: anycast ops draft Danny McPherson
- Re: grow: anycast ops draft Joe Abley
- Re: grow: anycast ops draft Pekka Savola
- Re: grow: anycast ops draft Danny McPherson