Re: grow: anycast ops draft

Kurt Erik Lindqvist <kurtis@kurtis.pp.se> Tue, 16 November 2004 23:13 UTC

Received: from darkwing.uoregon.edu (root@darkwing.uoregon.edu [128.223.142.13]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA08416 for <grow-archive@lists.ietf.org>; Tue, 16 Nov 2004 18:13:57 -0500 (EST)
Received: from darkwing.uoregon.edu (majordom@localhost [127.0.0.1]) by darkwing.uoregon.edu (8.12.11/8.12.11) with ESMTP id iAGN2SXl017531; Tue, 16 Nov 2004 15:02:28 -0800 (PST)
Received: (from majordom@localhost) by darkwing.uoregon.edu (8.12.11/8.12.11/Submit) id iAGN2SHI017513; Tue, 16 Nov 2004 15:02:28 -0800 (PST)
Received: from laptop2.kurtis.pp.se (dhcp2.se.kurtis.pp.se [194.15.141.71]) by darkwing.uoregon.edu (8.12.11/8.12.11) with ESMTP id iAGN2MhA017196 for <grow@lists.uoregon.edu>; Tue, 16 Nov 2004 15:02:23 -0800 (PST)
Received: from [127.0.0.1] (localhost [127.0.0.1]) by laptop2.kurtis.pp.se (Postfix) with ESMTP id 87A616727E6; Wed, 17 Nov 2004 00:02:26 +0100 (CET)
In-Reply-To: <Pine.LNX.4.61.0411081422130.9056@netcore.fi>
References: <18A3F56E-2C41-11D9-928C-000D93B24C7A@isc.org> <Pine.LNX.4.61.0411081422130.9056@netcore.fi>
Mime-Version: 1.0 (Apple Message framework v619)
Content-Type: text/plain; charset="US-ASCII"; format="fixed"
Message-Id: <9199218F-3823-11D9-818C-000A95928574@kurtis.pp.se>
Content-Transfer-Encoding: 7bit
Cc: grow@lists.uoregon.edu, Joe Abley <jabley@isc.org>
From: Kurt Erik Lindqvist <kurtis@kurtis.pp.se>
Subject: Re: grow: anycast ops draft
Date: Wed, 17 Nov 2004 00:02:20 +0100
To: Pekka Savola <pekkas@netcore.fi>
X-Pgp-Rfc2646-Fix: 1
X-Mailer: Apple Mail (2.619)
Sender: owner-grow@lists.uoregon.edu
Precedence: bulk
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


	Pekka,

some comments and sorry for the late reply.

On 2004-11-08, at 13.23, Pekka Savola wrote:

> On Mon, 1 Nov 2004, Joe Abley wrote:
>> Kurtis Lindqvist and I squeezed this draft through before the cut-off:
>>
>>  http://www.ietf.org/internet-drafts/draft-kurtis-anycast-bcp-00.txt
> something we should be producing.  A loud 'yes' for the adoption for 
> this as
> WG item!

I saw Dave already send that out.

> Obviously, as with all -00 I-Ds, there's a lot which can be improved. 
> Below.

I am sure! :-)

> substantial
> -----------
>
>    Where a routing advertisement from a node corresponds to a single
>    Service Address, this coupling might be such that availability of 
> the
>    service triggers the route advertisement, and non-availability of 
> the
>    service triggers a route withdrawal.  This can be achieved using
>    routing protocol implementations on the same servers which provide
>    the service being distributed.
>
> ==> The last sentence is a required but not sufficient condition.  The 
> draft
> should also expand the text (at least a paragraph) to say how the 
> routing
> protocol (or some other means) monitors the "health" of the service.

Geoff had the same comment and I sent some suggestion of new text to 
him, please let me know if you think that is sufficient or if you think 
we should add more text.

>    Where multiple Service Addresses are covered by the same covering
>    route, there is no longer a tight coupling between the advertisement
>    of that route and the individual services associated with the 
> covered
>    host routes.  The resulting impact on signaling availability of
>    individual services is discussed in Section 4.4.1.
>
> ==> the draft goes on at significant length to describe hacks to work 
> around
> this problem, to be able to deploy multiple services at the same 
> anycast
> prefix.
>
> The draft needs to expicitly state, in a separate paragraph, a simpler
> alternative: "just don't do that".  It's sufficient for DNS at least.  
> Get a
> different prefix for every service.  Assuming that something around
> /24 is sufficient, there are still plenty of those.
>
> That's much simpler operationally than running tunneling across 
> different
> nodes.

I think it is fair that we should point out the risks with the approach 
and the added complexity. I will try and write something up the pros 
and cons with tunneling.

>    For services which are distributed across the global Internet using
>    BGP, equal-cost paths are normally not a consideration: BGP's exit
>    selection algorithm usually selects a single, consistent exit for a
>    single destination regardless of whether multiple candidate paths
>    exist.  Implementations of BGP exist that support multi-path exit
>    selection, however, and corner cases where dual selected exits route
>    to different nodes are possible.  Analysis of the likely incidence 
> of
>    such corner cases for particular distributions of Anycast Nodes are
>    recommended for services which involve multi-packet transactions.
>
> ==> I fear that the BGP ecmp has to become more prominent here.  
> AFAICS,
> it's implemented and used especially for stub ASs which, for some 
> reason,
> want to provide this kind of more balanced load balancing.

I am not really sure how you feel ecmp should be emphasized even more? 
It's a full chapter on it's own....

> 4.4.5  Reverse Path Forwarding Checks

> ==> I think there are a lot of problems with this text, and you may 
> not have
> thought it through.  A few comments:
>
>  - the first and second paragraph should be referencing RFC3704, which
> is discussing *exactly* these problems, and also introduces different
> flavours of doing RPF.

Ok, good point.

>  - the second and third paragraphes are not really accurate.  It 
> depends on
> how the preferences are selected.  If you hook up an anycast node to 
> its
> first-hop ISP, almost always the ISP will consider that exit 
> preferable, so
> strict RPF works.  (Where it may not work is further upstream, but 
> there
> strict RPF-like techniques are not applied in any case..)
> Even if strict RPF did not work, you could still use
> feasible path RPF as described in RFC3704.

The case I had in mind I described in the mail to Geoff. Does that make 
the case clearer? I agree we should update the text to make it more 
clear though.

> The way I see it, the biggest issue with RPF-like techniques is when 
> people
> don't employ them, and use static access lists instead.  As anycast
> addresses might not conform to the local topology, the guy/gal who has
> written the ACLs might have forgotten the anycast prefix(es) 
> completely.

I am assuming that if you connect an anycast service, you will also 
accept traffic to those prefixes. If you point gun to foot and pull 
trigger, you will shoot yourself in the foot...

>    As is described in  having the Anycast Node avoid black-holing
>    traffic in the event of a failure on the software or subsystem
>    providing the service should be avoided.  As described, this can be
>    done with withdrawing the announcement of the prefix corresponding 
> to
>    the service address, or the covering route.  However, the nodes 
> could
>    also try and handle the failure in a number of ways.  This can be
>    with as also previously described tunneling to other instances of 
> the
>    Anycasted service, and using a IGP over the tunnels, route incoming
>    client queries to the other destination.  The Anycasted node could
>    also contain separate systems for trying to restart the service in
>    question, and if successful again re-announce the service prefix.
>
> ==> this seems like a major operational pain in the ass, and a way to 
> add
> mysterious errors in the process.
>
> Consider for example the effect of tunneling on MTU, as discussed in
> draft-savola-mtufrag-tunneling-02.txt.  If the anycast is used in such 
> a
> manner (non-DNS/UDP) that the MTU is maxed out, the tunneling likely
> will cause problems in some cases.  Does the tunnel implementation do
> fragmenting -- if so, even with DF bit set for v4, and what about v6? 
> If
> not, and just sends ICMP errors back, the service just mysteriously 
> fails
> (under these rare self-healing conditions, making it very difficult to
> debug) for those poor bastards who are inappropriately filtering the 
> ICMP
> packet too big messages.

I think adding a tunneling consideration section might be a good idea 
though. Will do.

> 6.  Security Considerations
>
> ==> this section probably needs to discuss, with a short paragraph, 
> issues
> relating to availability.  A badly implemented anycast solution has a 
> good
> chance to decrease the availability rather than increase it.
> As any anycast node is capable of blackholing its
> users, it's often also good sense to provide multiple independent
> addresses/services to achieve the same service.  E.g., there should be
> a couple of addresses in DNS for service X, which would be anycasted by
> differerent sets of nodes.

I see your point, but is that really something for a security 
considerations section?

> semi-editorial
> --------------
>    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
>    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in 
> this
>    document are to be interpreted as described in RFC 2119.
>
> ==> if you use these keywords (and I'm not sure whether it's a good 
> idea
> because this is an operational document, not a document describing how
> to make interoperable implementations, you must also add normative
> reference to it.

I removed it.

>    Anycast Node: an internally-connected collection of hosts and 
> routers
>       which together provide service for an anycast service address.
>
> ==> the way you say this is confusing '.. hosts and routers', and can 
> be
> read two ways:
>  1) "one instance providing a particular anycast service", or
>  2) "an anycast service, including all the nodes that implement it"
> I think you mean the latter, but it requires rewording to make that 
> clear. I suggest you don't use the plural tense.

I think I will leave this to the native english speaker of the two 
authors :-)

> ==> please also add linebreaks between different definitions ?

I'll see if I can get xml2rfc to do this :-)

>    4.  Triangulation of traffic sources, in the case of attack (or
>        query) traffic which incorporates spoofed source addresses;
>
> ==> this is too brief, it's not clear what you mean here; please 
> expand.

Ok, good point.

>    Some services have very short transaction times, and may even be
>    carried out using a single packet request and a single packet reply
>    in some cases (the DNS is an example of this).
>
> ==> s/DNS/DNS using UDP/

Ok.

>    Some anycast deployments have very predictable routing systems, 
> which
>    can remain stable for long periods of time (e.g.  anycast within an
>    IGP, where node selection changes only occur as a response to node
>    failures).  Other deployments have far less predictable
>    characteristics (e.g.  a densely-deployed array of nodes across the
>    global Internet).
>
> ==> you make this sound as if IGPs would have intrinsic 
> characteristics to
> make them typically stable compared to BGP.  I'd consider adding:
> s/IGP/well-managed IGP, as described in Section 4.4.3,/

ok.

>    When a service is anycast within an IGP the routing system is
>    typically under the control of the same organization who is 
> providing
>    the service, and hence the relationship between service transaction
>    characteristics and network stability are likely to be relatively
>    well-understood.  This technique is consequently applicable to a
>    larger number of applications than Internet-wide anycast service
>    distribution (see Section 4.1).
>
> ==> as above, to provide truth in advertising, this section needs to 
> discuss
> or provide pointer to 4.4.3 and some other appropriate sections on why 
> IGPs
> may not be better than inter-domain unless they have been designed 
> properly.

Ok.

>
>
>   As is described in  having the Anycast Node avoid black-holing
>
> ==> missing reference here..

Fixed.
>

> 7.  Protocol Considerations
>
>
>    This document does not impose any protocol considerations.
>
> ==> is this section needed?

Uhm, I thought so.

>
> 9  References
>
> ==> split to informative/normative.

Well, removing the normative language (actually only the definitions) 
should have resolved this.

> editorial
> ---------
> Lindqvist & Abley        Expires April 22, 2005                 [Page 
> 1]
> Internet-Draft                  Anycast                     October 
> 2004
>
> ==> the abbrev should be something a bit more descriptive than plain
> "Anycast" ?

Ok.

>  Since it is usually a requirement that a single
>    client-server interaction is carried out between a client the same
>    server node for the duration of the transaction, it follows that the
>    routing system's node selection decision ought to be stable for an
>    order of magnitude longer than the expected transaction time, if the
>    service is to be provided reliably.
>
> ==> s/client the same/client and the same/ ?

ok.

>
>    o  A root server service might be distributed throughout the 
> Internet
>       with nodes located in regions with poor external connectivity, to
>       ensure that the DNS functions adequately within the region during
>       times of external network failure.
>
> ==> s/root/root DNS/
Ok.


>
>   well-documented examples of import polices which filter on RIR
>
> ==> spell out RIR ?

Done.

>
>    AS:es at multiple locations and could therefor propagate the same
>
> ==> s/AS:/AS/
> ==> s/therefor/therefore/

ok.

>
>    When Anycast services are deployed across networks operated by
>    others, their reachability is dependent on routing polices and
>    topology changes (planned and unplanned) which are unpredictable and
>    sometimes difficult to identify.
>
> ==> only if there were routing polices, but no.. s/police/policies/ :)

Bummer! :-)

>    Codependencies are avoided by making each node as autonomous and
>
> ==> s/Code/Co-de/ (easier to read because doesn't start as "code") ?

ok.

>
>    facilities on the Internet such as RIPE's Routing
>    Information Service [14] and the University of
>           Oregon Route
>    Views Project [15].
>
> ==> some screwed up formatting here..

Fixed.

Thanks!

- - kurtis -

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.1

iQA/AwUBQZqHAKarNKXTPFCVEQJsfgCg8VMBgAk20NRw0cOYn/ccLyP35FAAoNyO
nwDK+AyZm86YsRTVXw/nTNhi
=Cgg2
-----END PGP SIGNATURE-----

_________________________________________________________________
web user interface: http://darkwing.uoregon.edu/~llynch/grow.html
web archive:        http://darkwing.uoregon.edu/~llynch/grow/