[rrg] LISP-ALT mobility and scaling to 10M, 100M, 1B etc. EIDs?
Robin Whittle <rw@firstpr.com.au> Tue, 10 March 2009 02:42 UTC
Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0FF603A69CF for <rrg@core3.amsl.com>; Mon, 9 Mar 2009 19:42:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.533
X-Spam-Level:
X-Spam-Status: No, score=-1.533 tagged_above=-999 required=5 tests=[AWL=0.362, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jDUQhg2D80jD for <rrg@core3.amsl.com>; Mon, 9 Mar 2009 19:42:14 -0700 (PDT)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id DFE593A6879 for <rrg@irtf.org>; Mon, 9 Mar 2009 19:42:13 -0700 (PDT)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id 3F8DE175C50; Tue, 10 Mar 2009 13:42:47 +1100 (EST)
Message-ID: <49B5D32F.2040804@firstpr.com.au>
Date: Tue, 10 Mar 2009 13:40:47 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.19 (Windows/20081209)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [rrg] LISP-ALT mobility and scaling to 10M, 100M, 1B etc. EIDs?
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Mar 2009 02:42:17 -0000
Hi David, I am replying to part of what you wrote in: LISP Map Server I-D & updated draft-farinacci-lisp- 2 stages of caching mapping I have not yet succeeded in prompting you or any other LISP-ALT folks to describe how the ALT network will scale to handle however many EIDs, Map-Servers, Map-Resolvers etc. there would be in a full-scale deployment. Here is another, more explicit, attempt. I could go into more detail still, but the following should suffice. By describing more fully my best imagined understanding of how ALT could scale and then pointing out the problems I see in this, I hope you and your colleagues will point out why my critique doesn't apply - most likely by providing a fully detailed description of how the whole ALT network would really work, and work well. You wrote: > Robin, > >> Map-Servers needing to make GRE tunnels to a large >> number (hundreds, thousands) of level 1 ALT >> aggregation routers - and likewise those routers >> needing to make GRE tunnels to large numbers of >> Map-Servers. > > Actually, those numbers (100s, 1000s) are speculation. In the absence of more detailed plans from the LISP team, all I can do is speculate. > There are many ways folks might deploy this technology > (e.g, with hierarchy of some kind, as one example). So > lets see what how people actually deploy stuff before > speculating on what they might do (and I would strike the > not-so-parenthetical comment from future discussions). The scaling problems will only occur when you get to a million EIDs, 10 million, a billion or whatever. I understand you are planning on developing some experimental LISP-ALT, Map-Server (and -Resolver) RFCs by mid-2010: http://www.ietf.org/mail-archive/web/lisp/current/msg00251.html The only way you will get the scaling problems is with a real global deployment, so these problems won't arise in any trial deployment arising from experimental I-Ds and RFCs. I don't find your: "So lets see what how people actually deploy stuff before speculating ..." at all convincing for a proposal you presumably think has a chance of being the best possible solution to the routing and addressing scaling problem. The ALT ID http://tools.ietf.org/html/draft-fuller-lisp-alt-00 of 2007-10 was 3 months after APT and Ivip's first I-Ds. Now you are preparing to race ahead with ALT in an IETF WG. That's fine, since it is for experimental purposes. But right from the start, before you published the I-D, surely you would have been aware of the scaling problems of the ALT network, which I attempt to describe later in this message. The sole justification for ALT or any other global query server approach (CONS and TRRP) over a scheme with the full mapping database in query servers at each ISP (APT and Ivip, or LISP-NERD with the full database in each ITR) is that ALT can scale to arbitrarily large numbers of EIDs. Indeed, ALT, CONS and NERD can scale arbitrarily in this regard. But there's no problem with scaling APT's or Ivip's local query servers to 10 million or so EIDs, end-user networks etc. Even with the complex multi-RLOC mapping format of APT (which resembles that of LISP) and even with long IPv6 EIDs and RLOCs, the complete mapping data could easily fit in a few gigabytes of RAM in today's COTS servers. For an ISP, the rate (average and peak) of mapping updates is not a problem either. Nor is the cost of the update stream's bandwidth. The entire mapping database would be no bigger than a HD movie download. So if there was 10% churn per day, the cost to the ISP of getting the updates per day would be their wholesale cost of 0.5 to 1.0 Gbytes of incoming traffic, which is trivial. If the frequent criticism of other proposals such as APT and Ivip from the LISP-ALT camp is to have any credibility, you need to show that LISP-ALT will be scalable to some much higher number of EIDs, end-user networks etc. Before you do this, however, I think you need to demonstrate a realistic demand scenario for larger numbers, such as 100M, 1B, 5B or 10B. I propose that we consider "~~10M" (meaning 5, 10, 20 maybe 30 million) as the absolute maximum number of fixed (non-mobile) end-user networks which will ever conceivably want or need multihoming or portability. The LISP-ALT plans seem to assume this conventional kind of end-user network - a fixed network, with two or more links to two or more ISPs. That's fine, but it seems you are not planning for whatever other classes of end-user network would make up the balance of 100M, 1B, 5B etc. I propose we agree to something like this: 1 - If and when a core-edge separation approach to the routing and addressing scaling problem ever has significantly more EIDs than ~~10M, those in excess of this figure will be for mobile end-user networks, which physically consist of a portable device, with some mixture of wired and wireless Ethernet and other forms of wireless connectivity. 2 - Such mobile end-user networks are likely to have only one EID prefix (micronet). 3 - Therefore, only those core-edge separation schemes which provide some Mobility benefits by providing Mobile Nodes (MNs) with their own EID (micronet) will need to scale beyond ~10M EIDs etc. Does that sound reasonable? I will proceed on the assumption it is. I think the LISP-ALT team needs to show how LISP-ALT could provide mobility benefits to 1B or whatever MNs before you criticise other schemes for being unable to scale to the high number of EIDs expected - implicitly 100M, 1B, 5B, 10B etc. How would LISP-ALT support mobility? You clearly can't have the ETRs in the end-user network because its Care of Address (CoA) is constantly changing. This is at odds the recently stated LISP-ALT team position that the ETRs are definitely in the end-user network. Nor can you support mobility with the ETRs in the "ISP" - the access network. This is because a MN is frequently changing access networks and may have no lasting business relationship at all with its access networks. (For instance WiFi hot-spots, or plugging a laptop into the network at a friend's home, which gives the laptop an address behind NAT via DSL or whatever.) I figured this out in June 2007 and proposed the TTR (Translating Tunnel Router) architecture: http://www.ietf.org/mail-archive/web/ram/current/msg01518.html Steve Russert and I wrote this up fully in August last year: http://www.firstpr.com.au/ip/ivip/TTR-Mobility.pdf We put the ETR near the MN, but not in the access network. We call this new kind of ETR a TRR. The MN owner pays a company which has a network of TTRs. As far as the core-edge separation scheme is concerned, they are identical to any other ETR - in terms of being able to tunnel packets to the end-user network. The MN makes a 2 way encrypted tunnel to a nearby (eg. 1000km or less) TTR and uses it as an ITR too. Then the mapping of the MN's one or more EIDs (micronets) is changed so all ITRs tunnel packets to this TTR. Most people seem to assume: (map-encap + mobility) = mapping change for every change of access network but that is on the assumption that the ETR must be in the ISP (access network) or in the end-user network (MN). TTR mobility does not involve rapid changes to mapping. Most people don't move more than 1000km (or whatever) very often. There's no absolute need for a mapping change when they do - but by using a TTR which is closer to to wherever they move to, overall path lengths are reduced. LISP-ALT can work with the TTR Mobility architecture. However, you will need to make something other than the ETRs the authoritative source of mapping. In practice, the MN owner whose EID is being mapped will enable the TTR company to control their mapping. This should be fine - the TTR company will probably run its own Map-Servers and so be a part of the ALT network. I think Ivip would support mobility better than APT, because Ivip's essentially real-time control of ITR behavior enables a rapid selection of a new TTR compared to the delays inherent in the slow push mapping distribution system of APT. I think LISP-ALT could work quite well with the TRR mobility architecture. Mapping changes are not frequent, but it would be best if they were propagated quickly. The current ETR (TTR) could prompt currently active ITRs to update their mapping after the mapping has changed to the new TTR. So LISP-ALT could have a fast response to mapping changes like Ivip. I think you have not yet established a realistic demand scenario for LISP-ALT handling more than ~~10M EIDs. Until you do so (and I think you can, with TTR mobility) I don't think you should criticise other schemes for not being able to scale to 100M or more EIDs. Before making such criticisms, I think you should also describe in detail how the ALT network is going to scale to 100M, 1B or whatever EIDs. Presumably, for most of these EIDs, the end-user network will only have one or at most a few EIDs, so you are also discussing how the system would scale to about this number of separate end-user networks. As far as I can see, you need to discuss this in terms of LISP-ALT supporting at most ~~10M fixed networks (the type you seem to assume in ALT development at present) and the balance of the 100M, 1B etc. being mobile end-user networks. Ignoring for a moment where the ETRs are and what devices are the authoritative sources of mapping, here are the scaling challenges I think the LISP-ALT team needs to address. 1 - The challenge of ensuring robustness in a highly aggregated network. The ALT network is defined as being highly aggregated. This is the only way to ensure (on average) a minimal number of ALT routers in the path from one ITR (or Map-Resolver), up the ALT hierarchy to whatever level it is fully meshed, and then down the hierarchy to the ETR (perhaps via a Map-Server). (Not all Map-Requests will need to ascend to the fully meshed level, but enough of them will for this to be a major limiting factor in overall performance of the ALT network.) High aggregation is easy to achieve: 0 - Make any one ALT router the sole one which handles some prefix. 1 - Make it advertise that prefix to its single upstream router as a single entity (so the constituent longer prefixes don't propagate above to the next level). 2 - Make sure all lower level routers in the hierarchy which handle the set of prefixes that is split into, are its neighbours. This is easy to do with GRE tunnels, so the ALT routers don't need to be physically close or connected. The hierarchy is then a simple upside-down tree - and at some level near the theoretical top, all the routers at that level are fully meshed - directly connected to each other. The question now is how do you achieve this maximal level of aggregation and make it robust against the failure of any one router or link? With this simple model: Level N + 1 X a.b.0.0/16 | ----------------------------------- | | | | | | N A a.b.0.0/20 B a.b.16.0/20 ... P a.b.240.0/20 | ----------------------------------- | | | | N - 1 Q a.b.1.0/24 R a.b.2.0/24 ... the router at each level aggregates a bunch of prefixes - 16 in the above example and presents them as one larger block of space (shorter prefix) to the level above. This network has optimal aggregation - but how can it tolerate failures? Assuming the GRE tunnels are statically defined ... (If they are dynamically established, you need a really secure system for orchestrating this.) ... the system can't reconfigure itself if the B router fails, to somehow put in place a replacement for B, in a totally different location. (A different location is needed, since maybe the outage is a persistent - more than a few seconds - failure which affects wherever B was located.) Do you propose that each "N - 1" router below A to P have additional, already established, GRE tunnels to router X? That doesn't sound scalable. What if some router Q below B, on level "N - 1", can no longer reach B? Do you suggest every router such as Q and R also have GRE tunnels to some other level N routers? Maybe this is possible, with the non-B level N routers somehow not advertising the longer prefix from Q and R etc. unless Q and R cannot reach B? I am trying to make up answers to these questions on your behalf. It would be better if the LISP-ALT team explained how this would work in a fully deployed system serving 100M, 1B etc. EIDs. For some number of EIDs, such as 100M or 1B, this would involve giving realistic details about: a - How many GRE tunnels and so BGP neighbours each ALT router might have. b - The aggregation factor per level. c - Whether Map-Resolvers and Map-Servers are single-homed to a single ALT router, or whether they are multihomed an so have to take part in the full BGP of the ALT network. d - How many levels there would be. e - How many ALT routers there would be in the highest level - the one where all at that level connect to each other in a fully meshed manner. f - How many separate ETRs and ITRs the Map-Resolvers and Map-Servers might handle. g - Estimates of peak and average query volumes in a fully deployed (100M, 1B, 5B etc.) network. h - Therefore, estimates of traffic volumes at the top level, where quite a lot of the packets will need to ascend to in order to descend towards their destination. i - Likely business scenarios for who would own and run all these ALT routers. This includes how they would be paid for the traffic flowing on them, and where geographically, these routers would be located. j - Likely scenarios for physical locations of Map-Servers regarding the EIDs of the ETRs in their networks and the physical location of the lowest level ALT router for the encompassing prefix of that EID - which I guess the Map-Server will need to make a GRE tunnel to. 2 - The long path problem: Based on the above, you should also be able to estimate average and worst-case travel times, according to the number of ALT levels and routers the request needs to traverse before it gets to the ETR. With the geographical information and some estimate of the underlying structure of the DFZ between the ALT routers, you should be able to estimate actual travel times and the cumulative sensitivity of packet loss to the number of DFZ routers in the entire path. More on this problem at: LISP-ALT's long path problem yet again 2008-12-24 http://www.ietf.org/mail-archive/web/rrg/current/msg04097.html Many people are concerned about some initial packets being delayed until the ITR gets a mapping reply from the global ALT network. I understand a typical scenario would be to drop the initial packet if there was no mapping, and wait for the sending host to send another packet, by which time hopefully the mapping would have arrived. In that case, the delay would be longer than the mapping reply delay. Also, when the sending host tries again, it may not try the same destination host - it might try some other one, in which case the whole dropped packet and delay process could repeat itself as often as the sending host tries to send packets to different destination hosts. Also, this behavior of the ITR dropping a first packet and waiting for a retry might encourage application programmers to fire off a series of packets, to minimise the time after the mapping arrives before one of them is tunneled to the correct ETR. Alternatively, you could have ITRs buffer all packets and send each one whenever the mapping arrives. (Sometimes, due to dropped packets in the ALT network, there a reply will never arrive to a single request.) But then, the ITR may be sending a packet well after the sending host sent it, which can lead to duplicate packets, extra reply traffic from the destination host, and other forms of waste and confusion. This long-path problem doesn't just affect a single packet. It can affect the first packets in each direction between two hosts on EID addresses. Also, since end-user networks will often run their own DNS servers, it could delay DNS lookups too. There can be a delay in getting the DNS request to the DNS server when the DNS server is on an EID the ITR has no mapping for. If the request comes from a host on an EID address, the DNS server's ITR will probably also need to do a mapping lookup. So you could have the delay problem four or more times in a row. This is assuming that .com.au server is not on an EID address, which is reasonable. Hosts A and B are on EID addresses. A -------------------> .com.au DNS server A <------------- DELAY .com.au DNS server A DELAY -------------> xxx.com.au DNS server A <------------- DELAY xxx.com.au DNS server A DELAY -------------> www.xxx.com.au A <------------- DELAY www.xxx.com.au Sorry to break your reply up with such a long suggestion. > BTW, the RRG (or IETF, or ITU-T, or ATIS, ...) can > specify any technology they want. Sure, but before anyone can be confident of LISP-ALT being a viable solution to the routing scaling problem, or even the best solution, they would want the designers to describe at least one way these fundamental challenges can be decisively resolved. > However, SPs if deploy > it, they will deploy it in the way that makes most > business sense for their particular predicament, and that > may or may not bear any relationship to what is said in > SDO deliberations. And unfortunately, we have precious > little input from those folks our (SDO) fora. Yes, but before any Standards Development Organisations, the IETF included, will be motivated to take LISP-ALT seriously as a practical solution to the routing scaling problem, you will need to show convincing plans for every aspect of its operation when fully deployed. Since you criticise other approaches for not being able to scale, you must have some figure like 100M, 1B, 5B or whatever EIDs in mind. So I think you need to show technically how LISP-ALT could scale like this, and remain robust. Then I think you need to describe at least one set of business arrangements where everyone involved is sufficiently motivated to play their part - in a way in which the great majority of existing and new end-user networks will voluntarily adopt LISP-ALT-mapped address space. >> Two Map-Servers for an end-user network's two ETRs >> will generally link directly (via a GRE tunnel) to >> the same level 1 ALT aggregation router. There, >> the router will (I guess) send packets for this >> network's EID prefix only to one of these, since they >> both have the same AS hop count of 1. Which will >> depend on the AS number of the ISP. > > I don't know what level 1 ALT means. The ALT doesn't > carry level identifiers (we thought about this in CONS > but discarded it for various reasons). And again, just > like everything else, you build scalability with > hierarchy. Like every hierarchy, there will be a "top > level", but not everyone will or needs to connect to such > a top level. I meant "level 1" as the lowest in the inverted tree structure I depicted above. I will respond to the rest of your reply in the original thread. - Robin
- [rrg] LISP-ALT mobility and scaling to 10M, 100M,… Robin Whittle