Re: [rrg] [lisp] LISP Map Server I-D & updated draft-farinacci-lisp - 2 stages of caching mapping
Robin Whittle <rw@firstpr.com.au> Mon, 09 March 2009 08:41 UTC
Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D1D3E3A68F9 for <rrg@core3.amsl.com>; Mon, 9 Mar 2009 01:41:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.436
X-Spam-Level:
X-Spam-Status: No, score=-0.436 tagged_above=-999 required=5 tests=[AWL=-0.851, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, SARE_URGBIZ=0.725, URG_BIZ=1.585]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ESH6HXAMcpbD for <rrg@core3.amsl.com>; Mon, 9 Mar 2009 01:41:15 -0700 (PDT)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id DD98F3A67D7 for <rrg@irtf.org>; Mon, 9 Mar 2009 01:41:13 -0700 (PDT)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id D75D2175BE2; Mon, 9 Mar 2009 19:41:46 +1100 (EST)
Message-ID: <49B4D641.9090803@firstpr.com.au>
Date: Mon, 09 Mar 2009 19:41:37 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.19 (Windows/20081209)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
References: <49AEBC00.9070306@firstpr.com.au> <673472F6-3BEB-4DD0-A1F6-66AA9E90EE41@cisco.com>
In-Reply-To: <673472F6-3BEB-4DD0-A1F6-66AA9E90EE41@cisco.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Cc: lisp@ietf.org
Subject: Re: [rrg] [lisp] LISP Map Server I-D & updated draft-farinacci-lisp - 2 stages of caching mapping
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Mar 2009 08:41:19 -0000
Warning: Long message ahead! Suggested changed to the I-D to make it clearer that the Map-Server does not cache. Map-Servers needing to make GRE tunnels to a large number (hundreds, thousands) of level 1 ALT aggregation routers - and likewise those routers needing to make GRE tunnels to large numbers of Map-Servers. Two Map-Servers for an end-user network's two ETRs will generally link directly (via a GRE tunnel) to the same level 1 ALT aggregation router. There, the router will (I guess) send packets for this network's EID prefix only to one of these, since they both have the same AS hop count of 1. Which will depend on the AS number of the ISP. How can an ALT router, such as this busy level 1 aggregation router, detect the failure or unreachability of a Map-Server? I guess it would find out when it tries to send a Map-Request message to the Map-Server via the GRE tunnel. But is there a keepalive arrangement so the level 1 router could detect the disappearance of this Map-Server before it sends a Map-Request? More on potential problems with caching Map-Resolvers. Hi Dino, Thanks for your reply. Before I respond to what you wrote, I want to point out some ways I think the I-D: http://tools.ietf.org/html/draft-fuller-lisp-ms-00 could be improved, assuming that Map Servers never do any caching, which is what I understand from your reply: > The ETRs are registering their EID-prefixes more so than the > mapping. Just an FYI, if that wasn't clear. Map-Servers don't > answer Map-Requests because they wouldn't be authoritative. 2 Introduction There are two types of operation for a LISP Map-Server: as a Map-Resolver, which accepts Map-Requests from an ITR and "resolves" the EID-to-RLOC mapping using the distributed mapping * database, and as a Map-Server, which learns authoritative * EID-to-RLOC mappings from an ETR and publish them in the database. A single device may implement one or both types of operation. * Conceptually, LISP Map-Servers share some of the same basic * configuration and maintenance properties as Domain Name System * (DNS) [RFC1035] servers and caching resolvers. With this in * mind, this specification borrows familiar terminology (resolver * and server) from the DNS specifications. This text: Map-Server, which learns authoritative EID-to-RLOC mappings from an ETR and publish them in the database. made me think from the start that Map-Servers are told by the ETR what the current mapping is for the ETR's EIDs, and then, in an abstract way "publish" them globally. In the case of ALT, this meant to my way of thinking, to respond to queries and to answer those queries based on the saved (cached) mapping information provided from time-to-time by ETRs. Then the next section indicating that Map-Servers are analogous to DNS servers, at least in some ways, reinforced my misunderstanding. Again here, it is quite explicit that the Map-Servers store ("learns") the mapping: 3. Definition of Terms Map-Server: a network infrastructure component which learns EID-to-RLOC mapping entries from an authoratative source (typically, an ETR, though static configuration or another out-of-band mechanism may be used). A Map-Server publishes these mappings in the distributed mapping database. However, I realise the above could be interpreted in a different way. For instance, instead of the ETR telling the Map-Server the mapping for some EID, it simply tells it that the ETR is an authoritative query server for mapping requests concerning this EID. "Publishing" then, at least in the case of ALT, means that the ALT network part of the Map-Server "announces" the EID on the ALT network. Then, queries come to the Map-Server and it passes them on to the ETR, as you described in your reply. Actually, with ALT, the Map-Server doesn't simply "announce" the EID over its existing BGP links with other ALT routers (including potentially other Map-Servers. As far as I understand ALT, in a fully operational deployment with 10M EIDs etc. due to the "highly aggregated" nature of the ALT network, it doesn't at all resemble a hodge-podge of geographically close connections to other ALT routers. I think the Map-Server would need to establish GRE tunnels to many first level ALT routers, which could be all over the world. I will write some more about this in a separate message on the scaling of the ALT network. If I had read this definition more carefully, I would have figured out that Map-Servers don't cache mapping information or answer mapping queries: Map-Register message: a LISP message sent by an ETR to a Map-Server to register its associated EID prefixes. In addition to the set of EID prefixes to register, the message includes one or more RLOCs to be be used by the Map-Server when forwarding Map-Requests (re-formatted as Encapsulated Map-Requests) received through the database mapping system. However . . . when I looked at the Map-Register message, and when I look at it now: http://tools.ietf.org/html/draft-farinacci-lisp-12#section-6.1.6 (BTW, "randomly selected UDP port number." should mention "source".) the message is identical to the Map-Reply message, except that the type field is set to 3 instead of 2. The descriptive text is simply: The definition of each field of the Map-Register can be found in the Map-Reply section. So I think pretty much anyone reading this would think that in the Map-Register message, the ETR is telling the Map-Server the complete mapping for an EID, including Locator Reach Bits, TTL (caching time), potentially multiple RLOCs each with Priority and Weight etc. In a recent message: http://www.ietf.org/mail-archive/web/rrg/current/msg04584.html you wrote: The registration service is used so an ETRs at a site can tell the database mapping system that they are available to answer Map-Requests. They are not really registering mappings. They are registering the EID-prefixes they are authoriatively going to answer for. The list of RLOCs in a Map-Register message don't have to be all the RLOCs used to encapsulate data to the site. It's just a list of RLOCs willing to answer Map-Requests for the site. It would be good to add a description like this to the I-D. I would have thought that a Map-Register message from an ETR would only register itself and would only register it in respect of that one Map-Server it was sent to. If you have two Map-Servers in two ISPs, and two ETRs in the end-user network, then - assuming each ETR has a single RLOC address from one ISP, it doesn't make sense to tell the Map-Server of one ISP that it should preferentially forward Map-Requests to a different ETR than the one on this ISP's RLOC address, for instance the ETR which is on the other ISP's RLOC address. Such forwarded messages would be sent out of the first ISP, to the second ISP and than through the PE-CE link to the end user network. Para 1 and 2 of the Basic Overview gives a better impression of the Map-Server forwarding Map-Request messages to the ETR. However, I think you could make it more explicit by stating something like this, maybe after para 1: Map-Servers do not answer Map-Queries or store mapping information. They receive Map-Queries from the distributed mapping database system and forward these to the ETR which registered itself as being an authoritative query server for the EID which matches the address in the Map-Query. The ETR then sends a Map-Reply directly to the RLOC address contained in the Map-Query message. I think the use of "publish" may not be ideal - maybe "announce" would be better. I think this text could be improved, because for someone who wrongly assumes that the Map-Server caches the full mapping of an EID and answers queries, they could find their impression largely confirmed by: (5.2) An ETR which uses a Map-Server to publish its EID-to-RLOC mappings does not need to participate further in the mapping database protocol(s). It would be easy to look at this and think, wrongly, that the ETR tells the Map-Server the full mapping of its EIDs, and then doesn't need to do anything more. In fact, it does need to accept queries and send responses, which are arguably both instances of "participating in the mapping database protocols". 5.3 is quite specific about the Map-Server forwarding requests to the appropriate ETR. I must have not clearly understood this when I first read it. A question I didn't raise in my initial message is how the Map-Server can know, quickly and reliably, that an ETR which has registered itself is no longer available. Firstly, how can the ETR tell the Map-Server that an EID it previously registered is no longer one it handles? Secondly, if the ETR dies, or becomes unreachable to the Map-Server, how does the Map-Server detect this and what decision making algorithms does it employ before withdrawing the advertisement of this ETR's registered EIDs from the ALT network? One thing which confused me on my second reading was: 5.4. Map-Resolver Processing In response to an Encapsulated Map-Request, a Map-Resolver de- capsulates the message then checks its local database of ? mapping entries (statically configured, cached, or learned ? from associated ETRs). If it finds a matching entry, it returns a non-authoratative LISP Map-Reply with the known mapping. I understand that if it is a caching Map-Resolver, it may have the requested mapping in its cache, but I don't understand the other two mechanisms: Static configuration This would mean the Map-Resolver was somehow configured to return mapping replies for some EID - yet it is not an ETR and how could this make sense, since all mapping is supposed to come from an ETR? learned from associated ETRs I don't understand what this could be, since ETRs have no links to the Map-Resolver, and since you haven't defined links between a Map-Resolver and any nearby or integrated Map-Server. You wrote: > Thanks for your comments Robin. > > The main point of draft-fuller-lisp-ms-00.txt is to create a API of > sorts for LISP sites. So they can use a set of primitives regardless of > the mapping database system deployed. OK. > By doing this, the cost of managing an xTR goes way down. No GRE > tunnels, no BGP. Simply Map-Request, Map-Reply, and Map-Register > primitives. Yes - it greatly reduces the need for stability and configuration for each ITR and ETR, since the ALT network is going to need to be carefully managed. Now only the Map-Resolvers and Map-Servers need to be managed so carefully to be a part of the ALT network. However, the original ALT network (without Map-Resolvers and Map-Servers): ITRs ALT routers ETRs was pretty simple in terms of the number of network elements. The ITRs and ETRs also had to be ALT routers, but since I think you can cobble together an ALT router from existing functional blocks in a suitably flexible software- or hardware-based router, this still means you have just 3 types of device in your entire core-edge separation system. That is enviable simplicity! Now you have five: ITRs Map Resolvers ALT routers Map Servers ETRs types of device which is not so elegant. Still, I think any well designed core-edge separation scheme is going to have quite a few functional elements. Ivip has: ITRs (all caching) (Optional caching query servers - QSCs) Full database query servers - QSDs ETRs Replicators for the fast push of mapping to the QSDs Also, there is a single system of Launch servers which drive the Replicator system, and the RUAS organisations which own and run the Launch servers. Reachability probing is external to Ivip, so there would be other systems and perhaps standards for that too. Your ITRs no longer need ALT router functions, but your Map- Resolvers and Map-Servers do. Still, I think it is good, since you need a lot of ITRs and ETRs, and now they can be simpler and talk happily to a smaller number of Map-Resolvers and Map-Servers without needing to be on any stable address and without needing to be known to the ALT network. When I wrote the message you replied to, I wrongly assumed that LISP's ETRs were at the ISPs. Now I know that ETRs, and I guess most ITRs are at the end-user networks. (Proxy Tunnel Routers - PTRs - are not in end-user networks. It is not clear who would run them, but they would advertise prefixes containing lots of EIDs in the DFZ, attracting packets addressed to those EID prefixes. They would send their mapping queries to nearby Map-Resolvers, I guess.) >> My understanding of and comments on this are: >> >> Instead of ITRs and ETRs needing to act as routers in the ALT >> network, they communicate via the ordinary Internet with Map Servers, >> which are routers on the ALT network. This will greatly reduce the >> complexity and configuration difficulties of ITR and ETRs. > > Yes, that is right. ITRs send encapsulated Map-Requests to Map-Resolvers > via the Map-Resolver's RLOC address. OK. > ETRs get encapsulated Map-Requests from Map-Servers via the ETR's > RLOC address only after the ETR Map-Registers to the Map-Server. Oh - I had misunderstood this. Now I realise the Map-Server is a connection system to forward mapping requests from the ALT network to particular ETRs, which have previously securely registered themselves with the Map-Server. I understand the ETR sends the map reply straight back to the Map-Resolver via the Internet. >> These Map Server devices are implicitly local to the ITRs and ETRs in >> a given network and are intended to be used only by those ITRs and >> ETRs. They are always on RLOC (stable, globally reachable, non >> LISP-mapped) addresses. > > Don't know what you mean by local. But if you meant the Map-Server is > colocated with ITRs, that is not true. The Map-Server would typically > not be at the site but in the Internet infrastructure somewhere. Most > likely in an service provider, an interconnect provider, a RIR, or a > third-party. OK - I was assuming the ITRs and ETRs were at ISPs, which is where I assumed the Map-Resolvers and Map-Servers are. I am trying to picture this. I assume a multihomed end-user network with a bunch of ITRs and two upstream ISPs. I assume the Map- Resolvers and Map-Servers are at ISP-A and ISP-B. Each ETR and ITR in the end-user network needs to be on an RLOC address. For simplicity I will assume there are two ETRs and two ITRs: one ETR and ITR for the single IP address of RLOC address which each ISP gives to the network. Now, each ETR has an RLOC address for receiving encapsulated packets, from one ISP only. Assuming two separate ETRs A and B, then ETR-A is going to be on RLOC-IP-A and will register with ISP-A's Map-Server. The same applies to the other ETR-B, which is on RLOC-IP-B and registers itself with ISP-B's Map-Server. The Map-Replies from each ETR will go out their respective links to their "own" ISP. The reply can't very well go out the other ISP's link because the source address wouldn't match. (Alternatively, I guess both ETRs could send from both addresses: RLOC-ID-A and RLOC-ID-B, but when would an ETR receive a Map-Request from one ISP and send the reply out on the other ISP's link? Maybe it could, for outgoing load sharing or similar reasons.) This sounds OK. If ISP-B dies, the link to ISP-B dies or ETR-B dies, then I guess the ITRs in the networks of sending hosts will figure this out by some means and instead tunnel traffic packets to ETR-A on RLOC-IP-A. So in my understanding, each ETR is dedicated to the RLOC address and Map Server of its "own" ISP. ITRs would need to follow a similar pattern, at least in terms of map requests. Your I-D indicates that the ITR is configured with a single address of the Map Resolver it queries. This may be an anycast address in the ISP's network, which sounds good to me. I think each ITR is going to be similarly dedicated to its own ISP and the RLOC-IP address that ISP provides: End-user RLOC Physical ISP network addresses link ISP-A ALT router ..\ / \ ITR-A------ RLOC-IP-A ------- Map-Resolver-A....BR-A==>} DFZ } } ISP-B ALT router ..\ } / \ } ITR-B------ RLOC-IP-B ------- Map-Resolver-A....BR-A==>} DFZ I guess this is OK. You would need to ensure that within the end-user network, outgoing traffic to EID addresses was internally shared between both two ITRs in some way, as long as both links were up, to spread the outgoing load. Also, once one link dies, the corresponding ITR has to stop accepting outgoing packets and these need to go to the other ITR. I guess in many implementations, there will only really be one CE router, with links to both ISPs, and implementing the ETR-A and ETR-B functions all within itself. Likewise probably the ITR-A and ITR-B functions. >> There are two functions which may be combined in the one device: >> >> Map Resolver (MR) >> >> Accepts a mapping query from an ITR and (usually) sends the >> ITR a mapping reply. (The exception is if the MR doesn't >> have the information and sends the query verbatim to some >> other device, which will answer the query directly to the >> ITR.) > > Right, we want to experiment with Map-Resolver caching but want to do > that as a second phase in the implementation. OK. > So the Map-Resolver gets > the Map-Request from the ITR which now puts it on the LISP-ALT network. > If there is another mapping database service, it could be used. > > This way we can make the mapping database service modular and don't need > the sites to participate in it directly. Yes. >> MRs can be caching or non-caching. More on that below. >> >> ITRs are intended to be configured with a single address for >> their local MR. This would raise questions of robustness if >> not for the next item: >> >> Multiple MRs in a local network (such as an ISP network or >> I guess any end-user network which has ITRs) can be configured >> on the one anycast address. This way, the ITR's request will >> be forwarded to the nearest currently active MR. All >> communication is via single packets, not via TCP. Presumably >> the MRs will also have their own unique addresses so they can >> be managed via TCP. > > Right. > >> I think the MR is an important improvement to LISP-ALT, since > > It's not an improvement to the LISP-ALT mapping database, but a > Map-Resolver can be a LISP-ALT router/system, a NERD system, or a > CONS/DHT system. OK. It seems that NERD has been pretty much forgotten for a while now, except by Eliot, but now you are mentioning it as mapping distribution system worth considering. NERD, to date: http://tools.ietf.org/html/draft-lear-lisp-nerd-04 is defined by every ITR-getting the full mapping database, via slow push - actually, ITR initiated downloads of sections of the mapping database and/or updates to the mapping of various sections of the EID address space. But by suggesting a combination of NERD and Map Resolvers, you are now proposing a version of LISP very different from NERD and very different from any other form of LISP. You are proposing a local full-database query system, as used by APT and Ivip. APT and LISP-NERD-MR use slow push of the entire mapping database. Ivip uses fast (a few seconds) push of the entire mapping database, with instant cache updates from the full database query server to any ITR which recently (within the caching time) was sent the mapping fro this micronet (EID) in a map reply. I think CONS has been on the back burner for a while. I never did understand how it worked. Likewise Distributed Hash Tables. BTW, your reference in draft-farinacci-lisp-12 to the DHT I-D doesn't work: draft-mathy-lisp-dht-00 is not in the IETF system. Google finds a copy at: http://inl.info.ucl.ac.be/publications/lisp-dht-towards-dht-map-identifiers-locators but this is from 2008-02 and as far as I know, it has not been in the IETF system. I think ALT looks like a better idea - but like CONS it is still a global query server network and so I think it is going to be fragile and slow compared to having a full database local query server. >> it enables an ITR to be a much more casual and unstable concept >> than was the case when all ITRs needed to participate in the >> ALT network as routers (AFAIK). This means that ITRs can be >> added easily, without having to configure anything. > > True. > >> It also means (though this is my suggestion, not from the LISP >> team) that an ITR function could easily be implemented in a >> sending host, assuming it was not behind NAT. I guess the >> sending host would need to be on an RLOC address - which rules >> out this idea for sending hosts in end-user networks. Ivip's >> ITR in sending host function (ITFH) requires the host to be >> on a non-NAT address which can be and ordinary or a Scalable PI >> address - RLOC or EID in LISP parlance. > > True, however, it would increase the number of locators for a site. That > is the EID to RLOC ratio would be 1-to-1. And the mapping database would > be orders of magnitude larger! Yes. As currently defined, your ITRs must be on non-mapped (not EID) ordinary BGP-routed "RLOC" addresses. So you can't have ITRs in the sending hosts of hosts in end-user networks, because the whole idea of LISP or any other core-edge separation architecture is to have all end-user hosts on EID space. So your ITRs need to be special devices either in the end-user network, or in the ISP, with a direct link to their particular ISP link, on the one or perhaps more RLOC addresses that ISP gives them. Because you anticipate ITRs and ETRs communicating somewhat - such as an ETR receiving a Solicit-Map-Request message which it needs to pass on to the ITR, which will request fresh mapping - I guess this means the ITRs need to be in the end-user network too. Probably, if a big end-user site such as a university has 10 ITRs for ISP-A, then ISP-A needs to give the network at least 10 separate RLOC addresses, one for each ITR. >> Map Server (MS) >> >> Is a router on the ALT network and accepts secure messages from >> one or more ETRs. (Secret key pairs to secure these.) ETRs >> are (typically, or always?) the authoritative source of mapping >> information in LISP. > > Right. The ETRs are registering their EID-prefixes more so than the > mapping. Just an FYI, if that wasn't clear. Map-Servers don't answer > Map-Requests because they wouldn't be authoritative. > >> ETRs can be on any RLOC address and use ordinary packets to >> communicate with the MS. > > Yes, they send Map-Register messages from one of their local RLOCs. > >> My understanding is that the MS announces the appropriate >> prefixes on the ALT network - one for every EID the ETR >> tells it. > > Right, but if the Map-Server is at an aggregation boundary, the specific > EID-prefix won't be announced but the configured aggregate in the > Map-Server would. OK - but I don't understand how the ALT-router part of a Map-Server is going to be part of the highly aggregated ALT network. To be highly aggregated, you need a strict, upside-down tree-like splitting of the address space over more and more routers as you get to lower and lower levels. At some level 1, you have ALT routers which handle all the packets for some pretty small subset of the entire space. They only connect upwards to the level 2 routers, each of which aggregates the space of, for instance, 16 or 64 or whatever of the level 1 routers. I will write more on this in another thread. >> Ignoring MSes for a moment, I have never understood how this >> would work with two ETRs in two separate ISPs handling the same >> EID. > > Multiple ETRs reside at the same site not in the SP network. OK. I understand this now. >> Both ETRs would be routers on the ALT network and would >> announce the same prefix. So where do packets go to? I guess >> to either. > > Within their aggregation level, there are two paths for Map-Requests to > travel to the site. It's the upstream BGP routers that decide which path > to take. They would take shortest path based on AS-path hop-count. > Recall that each LISP-ALT router is doing "eBGP". OK. ISP-A and ISP-B both have Map-Servers - MS-A and MS-B. The end-user network has two ETRs: ETR-A and ETR-B. ETR-A registers with MS-A and ETR-B registers with MS-B - as they would need to, according to the RLOC address each ETR gets from one ISP or the other. Now, for simplicity, assuming the end-user network had an EID prefix 55.44.33.00/24, both ISPs MSes need to advertise this same prefix on the ALT network. I am trying to imagine the ALT network topology. By your specification it is "highly aggregated". Therefore, it does not replicate the pattern of Internet routers - physically adjacent (geographically, but bridged with fibre links, as well as being directly connected in data centres) routers having links between themselves in a pretty random-looking arrangement, with the connections bearing no relation to the addresses which the routers advertise in the DFZ. With the ALT network, the connections between routers can be of arbitrary geographic length via GRE tunnels - including having a neighbour anywhere in the world, involving a tunnel which physically travels over a dozen ASes and twice as many routers. However, due to the highly aggregated nature of the ALT network (which is essential to ensuring the shortest number of ALT routers between the ITR and the ETR, so Map-Requests get to the ETR ASAP) you don't just have ALT routers setting up GRE tunnels to a handful of other ALT routers in nearby ISPs. Since end-user EIDs are portable and can be used anywhere in the world, you can't assume any efficiency gains in the ALT network based on assumptions that EIDs of a certain address range are all going to be used in in any one geographic area. Since each ISP's Map-Servers (I guess they would have one, a few or a dozen or so) are all handling a wide, essentially random, assortment of EIDs, I think there would be little or no value in them having GRE tunnels to neighbouring ISP's Map-Servers. If there was, this would surely not be sufficient connectivity to give the shortest path for Map-Request message to reach the Map-Server. Some might come from nearby ISPs, but others would come only from whatever high level of ALT hierarchy which was fully meshed. So these messages would need to come via an ALT router at level 1 of the hierarchy. This means the one Map-Server will need to make GRE tunnels to a large number of these level 1 ALT routers, which would be distributed physically all over the Net. For your ALT network to be highly aggregated, somewhere there needs to be one ALT router which handles, for instance, 55.44.16.00/12. As far as I know, the LISP team has not explained how the ALT network can be both highly aggregated and robust against single points of failure - for a realistic large-scale deployment handling 10 million physically scattered EID prefixes, 100 million, a billion etc. I am sure what you wrote is correct. But if there was a single aggregating router for 55.44.16.00/12 - which is how I understand the network would be if it is to be highly aggregated - then that router will have GRE tunnels directly from MR-A and MR-B. The AS hop-count is going to be over the ALT network, ignoring physical DFZ routers which carry the tunnel packets. I would expect both the MS-A and MS-B to have the same AS hop count in the level 1 ALT router which handles 55.44.16.00/12, since they are direct neighbours of this level 1 aggregation router. Then, from what I recall about BGP, all the packets from that level 1 aggregation router would be sent to the MS-A ALT router or the MS-B ALT router according to which ISP has the lowest AS number. So in this example, one of the Map-Servers would get all the queries. That isn't necessarily bad. If MS-A is getting all the queries and it dies, then pretty quickly the level 1 aggregation ALT router will sense this and its BGP implementation will direct queries to the GRE tunnel which leads to MS-B. So in this failure example, I think the ability of the ALT network to continue responding to mapping requests moment-to-moment depends on the BGP implementation of this level 1 router. I know little about GRE. How does an ALT router at one end of a GRE tunnel find out quickly if the ALT router at the other end is dead or unreachable? The only way I can imagine this is with regular keep-alive packets going each way. But then, you could have a thousand GRE tunnels per ALT router, such as from the level 1 router to the Map-Servers of 500 end-user networks whose EIDs match this router's aggregation range. I guess this could be rather traffic-intensive and CPU-intensive to maintain. Does the level 1 aggregation router need to do full BGP to each of these Map-Servers? If the Map-Servers are not cross-linked to other ALT routers, but only receive packets and send them to ETRs, then I guess each Map-Server can have a single-homed link to the ALT network. Then, the BGP activity per neighbour (each of 1000 GRE tunnels) would be pretty minimal for the ALT router, and likewise minimal for each Map-Server. But then, how can the system be robust with a single level 1 ALT aggregation router? >> But then the ETRs somehow need to coordinate >> themselves, or be coordinated by something else, so they act >> in a unified manner. Then, as long as both were reachable and >> working properly, it wouldn't matter which ETR got the query. > > Right, but they don't need to coordinate. All they need is to be > consistently configured to Map-Register the same EID-prefix. As I understand LISP, the ETRs for a given end-user network definitely need to be coordinated in some way, since they need to send out the same mapping replies. Also, with locator reachability bits (or the versioning alternative) they need to send out consistent messages to ITRs in this regard too. However, now I understand the ETRs are owned by and located in the destination network, I see it is no problem to coordinate them. >> The same problem seems to apply with MSes. There would be two >> ETRs in two separate ISPs and each would presumably (for >> robustness in a multihoming situation and probably for security >> reasons) have its own MS in its own ISP network. > > No, not true. OK - the Map-Servers simply pass on queries to the ETR which registers with them for that EID range. So they don't cache mapping information or answer queries themselves. Therefore they don't need to be coordinated, except to the extent already provided for by the ETRs securely registering themselves and each MS then announcing that EID prefix on the ALT network. MS-A and MS-B can do this fine without communicating with each other or even knowing about each other. >> So now we have two ETRs and two MSes which need to be >> coordinated. The two MSes both announce the one EID prefix >> on the ALT network. Yet they are supposed to still be >> coordinated during outages. > > The 2 Map-Servers will converge into a topology that will aggregate the > site's Registered EID-prefix so we can have a smaller ALT core. Smaller > meaning, a small number of EID-prefixes needing to be stored in the core > of the ALT network. This is easy in a test network, or with a few tens of thousands of end-user networks. However you haven't described how it would work for the full-scale deployment with 100 million end-user networks. >> However this is resolved, I think it is a big improvement for >> LISP to have MSes, since it reduces the cost, complexity, >> management effort etc. for ETRs similarly to how MRs do the >> same for ITRs. >> >> Both these functions can presumably be performed quite adequately by >> software devices, such as a COTS server with suitable software. >> There doesn't have to be any hardware router FIB etc. AFAIK. > > Yep, that is true. OK. >> This would enable hardware routers to assume ITR and ETR >> responsibilities without them also needing all the software and >> configuration, stable address etc. to be an ALT router. Also, by >> decreasing the total number of ALT routers, this simplifies the ALT >> network. > > Yes, we thought so too. OK. >> I gather from this new I-D, and from what I read in: >> >> http://www.lisp4.net/docs/lisp-ausnog02.ppt >> >> that the current test network and the intention for the future is not >> to send traffic packets on the ALT network. This approach was >> initially an option, with the intention that the ALT network would >> forward the initial packet(s) to the correct ETR, which would then >> forward it to the destination network, while also recognising it as a >> map request and so would send a map reply message to the ITR. > > Right that is correct. The implementation support both sending > Map-Requests and Data-Probes on the ALT network, but we default to > Map-Requests and might possibly deprecate Data-Probes. OK. >> I recall from somewhere that the ITR typically sends out a few >> mapping requests, just in case one of them is dropped. > > Well no, we rate-limit Map-Requests but they are triggered when a source > at the site sends data. However, we can play with this to see what works > well. OK. >> When the ITR >> connects directly to the ALT network, these packets presumably >> usually traverse the entire global ALT network until they are >> delivered to one or more (probably just all to one) ETR which >> responds. I guess the ETR sends multiple replies, but maybe not. >> The reply goes to the ITR via the ordinary Internet. > > Map-Replies are rate-limited as well. OK. >> Removing these potentially long and voluminous traffic packets from >> the ALT network seems like a good idea to me. There may well be >> security benefits in doing so too. Below, I assume the ALT network >> only carries mapping requests, and that the map replies go back from >> whatever answers them (an ETR connected to ALT network, or more >> likely a Map Server) via a direct ordinary Internet packet to the >> device which made the query (perhaps a directly connected ITR or more >> likely a Map Resolver). > > Yes, this is true. OK. >> A Caching Map Resolver? >> >> If the MR caches, then it has the potential to significantly reduce >> the traffic on the ALT network. This is due to two or more ITRs in a >> given ISP network wanting the same mapping, and the second and >> subsequent ones getting it directly from the local caching MR. > > Yes, this was Noel's idea with CONS. It is worth experimenting. OK - and now with ALT. >> This also has the potential to eliminate, for the second and >> subsequent ITRs which need this mapping, the major problem of >> "LISP-ALT's initial packet delays", so much debated on the RRG in >> recent months. > > Well, I'm not so sure. If you point an ITR to an RLOC of a Map-Resolver, > you take the shortest path to it. But if you had a GRE tunnel to the > same box, the GRE tunnel destination would be the same RLOC. So the path > would be the same. But you couldn't run an anycast Map-Resolver service > because the eBGP connections that ran over the GRE tunnels would reset. > So I guess this is an improvement. What you wrote seems to me to be about something different to my intended meaning. I meant that if there are two sending hosts in some end-user network, or even in any of the end-user networks whose ITRs are using a single Map-Resolver at some ISP, then one host requests mapping for EID prefix NNN and after a second or two or whatever, hopefully less (the "long path" problem) the Map Resolver gets the mapping response and nearly instantly sends a mapping response with the same information to the ITR. Now, assuming the Map-Resolver caches this mapping, some other ITR requests from it mapping for the same EID. Now the Map Resolver doesn't need to generate a Map-Request and wait for it to traverse the ALT network. It has the cached mapping and sends the reply back to the second ITR within a few milliseconds. This second ITR therefore has no significant delay in getting the mapping. Ideally - and I don't know whether your ITRs are meant to do this - that second ITR would buffer the first traffic packet, rather than drop it (as I understand your ITRs do at present) and then tunnel it to the ETR within the few milliseconds it takes to get the mapping reply from the Map Resolver. So for the second ITR, there would be no significant delay in traffic packets at all - and no dropped traffic packets in this instance at least. (Of course, if your LISP network Map-Resolvers all used NERD rather than ALT or CONS - joining the local full-database query server throng with APT and Ivip - all your ITRs would be be configured to buffering their initial traffic packets for 100ms or so, awaiting the Map-Reply. Unless something goes wrong, this would involve no significant initial packet delays whatsoever, the few milliseconds it takes to get the mapping from the nearby full database query server = Map-Resolver is not, I think, significant.) >> There is nothing in draft-fuller-lisp-ms-00 to describe this caching >> behavior. >> >> The caching time of map replies is specified in units of one minute: >> >> draft-farinacci-lisp-12: >> >> Record TTL: The time in minutes the recipient of the Map-Reply >> will store the mapping. > > That detail will come in a later draft. OK. >> Let's say at time T = 0 minutes, ITR-A sends a map request to MR-1, >> which has no mapping for the EID prefix which matches the EID address >> in the request message. MR-1 sends its own map request message (with >> its own nonce) onto the ALT network which forwards it to either the > > Well, that's not the way it works. The ITR sends an encapsulated > Map-Request to the Map-Resolver. The Map-Resolver strips the outer > header and then forwards the Map-Request on the ALT. The source address > is the ITR RLOC address and the destination address is the EID that > caused the map-cache fault on the ITR. OK - this is a Map-Resolver without caching. As long as you have no caching, then the Map-Resolver doesn't contribute to the resolution of the "long path" problem. LISP-ALT's long path problem yet again 2008-12-24 http://www.ietf.org/mail-archive/web/rrg/current/msg04097.html >> single Map Server which advertises the matching EID prefix on the ALT >> network, or to one of the multiple such Map Serves, or perhaps to the >> directly ALT-connected ETR(s) which do the same. > > Correct. OK. >> That device sends the mapping reply back to MR-1 directly via the >> Internet. The reply is secured by returning MR-1's nonce. > > No, it would go to the ITR because in the Map-Request payload there is > an "ITR RLOC" field. This is quite important because if that Map-Request > was an IPv6 Map-Request with an IPv6 outer header, and since the > LISP-ALT network we have deployed is dual-stack, the IPv6 Map-Request is > forwarded on the ALT, but the ETR may not (and probably not) have a IPv6 > path back to the ITR. So if the "ITR RLOC" field is encoded with an IPv4 > RLOC, the ETR sends a Map-Reply back with an IPv4 header. > > In the entire LISP design we treat IPv4 and IPv6 equally and try to > enhance IPv6 connectivity by using IPv4 outer headers or IPv4 RLOCs when > encapsulating. > > Today, two IPv6-only sites can open an IPv6 TCP connection to each other > if they run LISP and use IPv4 locators. My discussion assumes a caching role for the Map-Resolver. Without that, its contribution is mainly to make it easier to get a lot of ITRs working without each one being a part of the ALT network. This is good, but it does nothing to reduce the "long path" delay problem. >> Let's say the mapping reply comes back with a 90 minute caching time. >> >> MR-1 sends to ITR-A a map reply, with ITR-A's request's nonce, with >> the fresh mapping information and a caching time of 90 minutes. Now >> MR-1 can encapsulate packets to its choice of ETRs, based on the >> fresh mapping it has received and whatever it has determined about >> reachability of those ETRs, and of the ETRs' ability to get packets >> to the destination network. > > No, no, no. The Map-Resolver does not encapsulate any packets. Remember > the ALT has no data going over it. OK - I meant to write "ITR-A" can encapsulate packets to ..." > If the Map-Resolver is caching Map-Replies and the ITR sends a > Map-Request with A=0, then the Map-Resolver can respond with a > Map-Reply. If the ITR sends a Map-Request with A=1, the Map-Resolver > must forward the Map-Request over the ALT so an authoritative Map-Reply > can be returned by the ETR. OK. This is the Authoritative bit in the Map-Request message: http://tools.ietf.org/html/draft-farinacci-lisp-12#section-6.1.2 In my example I am assuming the ITR trusts the caching ability of the Map Resolver and would prefer a quick reply (and no more burden on the ALT network, the Map Server or the ETR) to waiting longer for an authoritative reply from the ETR. So the ITR would set the A bit to zero. The authoritative reply would be fresh with the full length of caching time, but I figure that in most instances, whatever remained of the caching time in the Map Resolver's cached mapping would be sufficient for the ITR. >> Later, at T = 85 minutes, ITR-B sends a mapping request to MR-1 for >> an address which matches this same EID prefix. MR-1 can use its >> cached information and send a reply within a few milliseconds. This >> means ITR-B's traffic will not be delayed by any significant amount. >> >> What caching time will be in that reply to ITR-B? I assume it will >> be 5 minutes. If it would be 90 minutes, ITR-B could be running for >> a long time to come on stale mapping information. > > We haven't figure that out yet. We don't want to create an impression > that a cacher of a Map-Reply can use any TTL it wants. We want to make > it mandatory to respect the ETR's value. OK. >> Assuming ITR-A no longer needs this EID's mapping, but ITR-B keeps >> needing to tunnel packets addressed to this EID, then at T=90 >> minutes, ITR-B will want mapping information again. >> >> Should ITR-B request the mapping again at at T = 88 minutes, in >> readiness for probably needing it in 1 minute's time? > > It could, but the reasons to time out the map-cache entry is to keep the > cache small and to be resilient, to some extent for locator-set changes > at the ETR site. OK. >> This would seem like a generally reasonable approach if it prompted >> MR-1 to get fresh mapping information, but why should MR-1 do this? >> Would MR-1 need to look at the original caching time and how much >> has expired to decide whether it should, by some algorithm, request >> fresh mapping? But what if the mapping hadn't changed in the distant >> Map Server, but the ETR was going to change it two minutes later? > > One of the problems I see with caching in the Map-Resolver is if the > map-cache entry does have a locator-set change and the ETR asks all > cachers to send Map-Requests (it does this by setting the SMR-bit for > active flows), the Map-Resolvers cannot get updated because they are not > seeing data. > However, I have a solution for this because, it will be the ITR that > sends A=1 Map-Requests with an SMR-bit set. That can tell the > Map-Resolver to ask for the Map-Reply back to update it cache. I know > there are security issues with this but it's one way of doing it. OK. I have just written a message about the Versioning approach working in all circumstances, but Solicit-Map-Request (SMR) not working when the sending host is not on an EID address: LISP Versioning vs. Solicit-Map-Request http://www.ietf.org/mail-archive/web/rrg/current/msg04585.html quoting this part of your reply. > There are also details how a Map-Resolver asks to get the Map-Reply > back. We want to do this in a stateless manner in the Map-Resolver. So > we might have to preserve the ITR RLOC's address in the Map-Request but > instruct the ETR where to send the Map-Reply. We have some ideas and > what to think about it before changing packet formats. By "stateless" do you mean you don't want the Map-Resolver to have to remember which ITR to reply to when it gets some Map-Reply back? Then, I guess, if the Map-Reply came back from the ETR to the Map-Resolver with the ITR's address embedded, the Map-Resolver could figure out from that Map-Reply, without any state, to send the mapping on to that ITR. >> If ITR-B waited until T = 90 or a little later before requesting >> fresh mapping, then unless MR-1 had already got fresh mapping in the >> last minute or two, then there would presumably be a delay in the ITR >> being able to handle traffic for this EID, since it would take some >> time for MR-1's second mapping request to traverse the ALT network >> and generate a reply to MR-1. >> >> There are various scenarios, but I think there are potential >> difficulties with caching times running out in three locations now >> rather than one. > > Yeah, we can't get too tricky about manipulating TTLs. DNS has been > fraught with problems due to TTL issues. If anyone has advice about > this, it would be nice to hear about it. OK. >> Previously, it was simple (despite the scaling problems of lots of >> ITRs peppering an ETR for mapping, not to mention them all trying to >> decide reachability for this and other ETRs): >> >> ITR query -----------> ETR >> <----------- reply >> cache >> >> >> Now we have: >> >> ITR query -----> Map query -----> Map >> <----- reply Resolver <----- reply Server <--Register- ETR >> >> Cache Cache Cached, in a sense, >> controlled by messages >> from the ETR(s) whenever >> they can reach the >> >> Map Server and decide >> to send a Map Register >> message. > > Right, but what if we used for first case and have the ETR schedule an > update to the Map-Resolver? Or what if the ITR updated the Map-Resolver? > Not sure yet. And not even sure how much RTT will buy us with > Map-Resolver caching. OK. >> I think this raises more complex problems with: >> >> 1 - How to avoid cache times running out at ITRs >> which are going to be tunneling packets addressed >> to this EID after the cache time expires. >> >> Such a situation will cause a traffic delay unless >> the local Map Resolver has recently got fresh mapping. > > But I think this is issue has continually been exaggerated. The > Map-Request delay is not for a lot of packets and will be relatively > rare I imagine. I and others still think it is a serious problem. If people had the choice between two routers, one of which occasionally delayed new sessions by a second or two and one that didn't, then there would need to be compelling reasons to buy the first one. End-users who can only get PA space, without multihoming now, might be attracted to LISP-ALT space, despite the delays. However any larger end-user network would definitely not want to switch their BGP-managed PI prefixes over to LISP-ALT management if it resulted in this inferior behavior. It is not just a TCP session being delayed by a fraction of a second, or a second or two (or several seconds if the lone request and response packet is lost in the global ALT network). The delays will sometimes affect initial attempts to reach a DNS server, since end-user networks will be running their won DNS servers and they will be on EID spaces, not necessarily the same EID prefix as the host whose address is being looked up. Likewise, the delay could be two delays, not counting DNS. Firstly a delay in getting a packet from host X to host Y (on an EID address). Then, if X is also on and EID address, Y's ITR could have a delay in getting the mapping for X. The delays are not necessarily simply bounded by the response time of the ALT network. If the ITR buffers the packet and waits, and if the response arrives before the sending host tries again, this would be optimal. However, if the ITR drops the packet, figuring the response will come too late to be worth sending the original packet, then the delay time is more likely to be a function of how long it takes the sending host to time out and try again. Sending a delayed packet around the time of the second packet would probably cause confusion and unwanted responses, an argument for ITRs dropping all packets they have no mapping for. But with a caching Map-Resolver, maybe the ITR can get the mapping in a few milliseconds - so it should buffer the packet and send it when the mapping arrives. It cannot be assured the sending host will retry. It might just try the communication to a different host, in a different EID, and so be subject to the same potential delay problems which could easily outlast the host's time-out value. If something like LISP-ALT was widely implemented, sending hosts might be tempted to generate a flurry of closely spaced packets, in order that one or more of them would be sent as soon as the mapping arrives in the ITR. Since the sending host can't anticipate whether the ITR has the mapping, could get it in a few milliseconds, or might have to wait for it for fractions of a seconds or more (or forever, if the request or reply is lost) then I think this scenario might encourage undesirable host behavior trying to milk the fastest performance out of the uncertain ITR mapping situation. >> 2 - How to minimise unnecessary map requests by Map >> Resolvers trying to anticipate ITRs making such >> requests, but actually requesting fresh mapping from >> the distant Map Server when the ITR doesn't need it. > > Right, Map-Resolver caching could be more trouble than it is worth. I wasn't suggesting this was necessarily the case. If you could contemplate converting LISP to a local full database query system with NERD-like full push to Map Resolvers, I think the much lesser change from ALT of having caching in the map resolvers (local caching query servers) is worth contemplating. >> Without further complications, the Map Resolver can't know whether >> the one or more ITRs which requested the mapping for an EID are still >> handling traffic for that EID. So it can't very well request fresh >> mapping towards the end of its expiry, just in case an ITR wants it. >> To do so would approximately double the volume of map requests >> traversing the ALT network, since it is reasonable to assume, with >> longish caching times, that the original caching time will generally >> suffice for the needs of the one or more ITRs served by the Map >> Resolver. (This would not be true with a busy Map Resolver and >> popular EIDs many ITRs are sending packets to.) >> >> Without some elaboration of the request protocol, ITR-B at T = 85 >> minutes can't ask the Map Resolver to get fresh mapping and send it a >> new reply - unless there is some algorithm in the Map Resolver such >> as: "If the cached mapping is 90% of the way to its expiry time, do >> not answer the new request from the cache, but send a fresh map >> request and then answer the query if and when the new reply arrives." >> >> To do so would effectively shorten all the caching times. > > Well, it the mappings don't change, longer TTLs will help the > Map-Request load on the ALT. If there are frequent changes and you want > fast convergence to them, then you use more resources. > That is the tradeoff. OK. >> At present, there is only one kind of map request message from an ITR >> to a Map Server - implicitly an urgent request. >> >> If there was a second kind: >> >> "This ITR has mapping for this EID which will expire in some >> time period (specified) soon, and requests the Map Resolver >> to get fresh mapping from the Map Server now, and to send >> a reply once this arrives." > > There is no reason why the ITR cannot send a Map-Request directly to the > RLOC of the ETR. It does have a set of them he can try. And the nonce > will protect against ETR spoof attacks. OK - but that wouldn't help update the cache of a caching Map-Resolver. >> then I think these problems would be resolvable with less trouble and >> less need for choices based on limited information. >> >> - Robin > > Thanks again for your comments Robin, > Dino Thanks for responding - in detail! - Robin
- [rrg] LISP Map Server I-D & updated draft-farinac… Robin Whittle
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Dino Farinacci
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Patrick Frejborg
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… David Meyer
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Patrick Frejborg
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Dino Farinacci
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Robin Whittle
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Patrick Frejborg
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… David Meyer
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… David Meyer
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Robin Whittle
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… David Meyer
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Robin Whittle
- Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Patrick Frejborg