Re: [rrg] LMS alternative critique
Charrie Sun <charriesun@gmail.com> Sat, 06 March 2010 08:31 UTC
Return-Path: <charriesun@gmail.com>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 699EF3A8ED4 for <rrg@core3.amsl.com>; Sat, 6 Mar 2010 00:31:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.283
X-Spam-Level:
X-Spam-Status: No, score=-2.283 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6U0RF8TvfG5L for <rrg@core3.amsl.com>; Sat, 6 Mar 2010 00:31:01 -0800 (PST)
Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by core3.amsl.com (Postfix) with ESMTP id 3805428C15D for <rrg@irtf.org>; Sat, 6 Mar 2010 00:31:01 -0800 (PST)
Received: by vws14 with SMTP id 14so2342866vws.13 for <rrg@irtf.org>; Sat, 06 Mar 2010 00:31:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=RTASnYFf/1vL3HpEQqxywP3dMbvEYomJDy+PVaGK7a8=; b=xhP6gYq/Ws4I0Pjb2+WIjFZ+Ssak3AnI0ux7s/49qvR3raZo38CLXGYZ/bjAkzM2za qdPHXOsXqfRW9U9rcUgVvwJY3cOWy8TYgpoavHkPGsK8hbcMYNCvpFwjewsFehgouMWH ph9ZJcEisZSE6xqTc5lcTncScE+6L873vlt2o=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=sqBOkvy9nXPDZmZ4qQIGDbEgMpkFEX3whHzQblUg84bWOrY0rbaMhtItxZ9bx28caW dQOHGrAi47L0fWR8e4YTpFyywD6XQIy38tisciqwDumDehiQEgsyDm6iCQVc1nLoT1Rs aK0w3VssBxU18D7m4sTCmbeM2hI0Y+RpzKn58=
MIME-Version: 1.0
Received: by 10.220.121.147 with SMTP id h19mr1377782vcr.40.1267864261754; Sat, 06 Mar 2010 00:31:01 -0800 (PST)
In-Reply-To: <4B85E519.3080609@firstpr.com.au>
References: <4B7F9E39.2030800@firstpr.com.au> <4eb512451002240117y4fe3a056r6376981034c9ca5@mail.gmail.com> <4B85E519.3080609@firstpr.com.au>
Date: Sat, 06 Mar 2010 16:31:01 +0800
Message-ID: <4eb512451003060031g3e83e81cxf53afac06f23656d@mail.gmail.com>
From: Charrie Sun <charriesun@gmail.com>
To: Robin Whittle <rw@firstpr.com.au>
Content-Type: multipart/alternative; boundary="0016369f9a5ff2016f04811da648"
Cc: RRG <rrg@irtf.org>
Subject: Re: [rrg] LMS alternative critique
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sat, 06 Mar 2010 08:31:06 -0000
Hi Robin: Sorry to reply so late, here's my response inline: 2010/2/25 Robin Whittle <rw@firstpr.com.au> > Hi Letong, > > Thanks for your reply, in which you wrote: > > > Hi Robin: > > > > Thank you for your critique. My response is inline. > > > Layered Mapping System > > ---------------------- > > > > LMS is a step towards designing a complete Core-Edge Separation > routing > > scaling solution for both IPv4 and IPv6, somewhat along the lines of > > LISP-ALT. > > > > There are insufficient details in the proposal to evaluate how well > the > > basic infrastructure of ITRs and ETRs would work, considering the > > unknown nature of mapping delays, > > > > We did a simulation of LMS and using real data collected from a > > campus border router to show that, when equipped with the two-stage > > cache mechanism at ITRs (as stated in our proposal), the request > > hops are considerably small (94% no hop: cache hits; 5.5% two hops; 0.5% > > four hops). These hops are logical as mapping servers > > talk using tunnels, while the delay between two random nodes may not be > > unacceptable (some estimated it 115 ms ([1]). The redundancy for > > logical mapping servers may help to reduce the delays between two > > mapping servers, since a mapping server may choose a nearby server it > > wants to communicate with. > > > > [1]: S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, ”A > > scalable content-addressable network,” in Proc. of ACM SIGCOMM’ 01, San > > Diego, CA, USA, Aug. 2001, pp.161-172. > > I am not sure how this paper: > > http://www.cs.cornell.edu/People/francis/p13-ratnasamy.pdf > > relates to the LMS proposal or to an ALT-like structure. Nor have > you mentioned how your system would incorporate multiple physical > MSes (Map Servers) presumably reachable by your system in ways which > are topologically different both in terms of LMS routers and their > tunnels and the actual paths these tunnels take on DFZ and other > routers and their physical data links. > Firstly, I think adding redundant physical nodes for mapping server is an engineering issue. DNS system has evidenced the success of using redundant nodes. Secondly, I may make a wrong statement. LMS only handles mapping requests and does not forward traffic packets for ITRs. Thus mapping servers need not to maintain tunnels between them, but only to know the locations of their logical neighbors. This can greatly relieve their burden and can avoid many problems you mentioned later, about the servers' capacity. > > Because your structure is highly compressed, with few levels, I am > not surprised that you can get small numbers of hops. ITR caching > will of course mean that for a sequence of two to many thousands of > packets sent from one host to another, there would be a single > mapping lookup or perhaps no lookup due to to the ITR already having > the mapping it needs in its cache. > > When I wrote: > > > There are insufficient details in the proposal to evaluate how well > > the basic infrastructure of ITRs and ETRs would work, considering > > the unknown nature of mapping delays, > > I meant to infer that the "mapping delays" are unknown, in part > because I don't understand how the system would work at all - how > routers could maintain millions of BGP sessions and tunnels, how > redundant routers and MSes could be accommodated etc. > > > Mapping servers may only need to maintain BGP sessions with: (1) redundant servers of logical neighbors; (2) redundant servers of its served logical node. The latter may form a full mesh, and I do not think it is unfeasible. > > reliability of the global mapping system, > > > > The reliability of the global mapping system lies in its > > redundancy and efficient communications between mapping servers. The > > test needs real environment and actual implementation, however I do not > > see there is logic fault in the layered mapping system. Perhaps we can > > draw lessons from DNS, which run effectively. > > I think you are just providing vague references to DNS, when you need > to show exactly how your newly proposed structure would work, be > robust against single points of failure, be possible to implement > with realistic amounts of RAM, CPU power etc. > > I will study it later. > > the problems of Path MTU Discovery (due to tunneling via > > encapsulation) > > > > MTU issue is inherent in the map-and-encap scheme. However, address > > rewriting needs intermediate routers to change the packet headers, which > > is insecure and brings many issues such as checksum > > conformity; comprising both core and edge addresses in one form only > > handle the problem by shrinking the size of namespace. You have proposed > > to estimate MTU along the tunnels, yet I do not know whether it is > > effective. > > If you want to have a complete proposal for scalable routing, I think > you need to show exactly how you would solve any problems such as > PMTUD for the tunnels you are using. I think you need to specify > exactly what tunneling protocols you would use, and how they handle > PMTUD for IPv4 and IPv6, including when these tunnels themselves are > going through other tunnels you can't control or foresee. > > > > and difficulties with an ITR reliably probing > > reachability of the destination network via multiple ETRs. > > > > ITR receive multiple mappings in the response, each with different > > priorities. ITR can select a most preferred ETR to forward packets to, > > while using others as backups. > > Yes - this is the basic LISP approach. But how does the ITR decide > which ETRs can currently send packets to the destination network? > The LISP people have been grappling with this since LISP's inception > three or so years ago. The ITR doesn't have a way of probing through > the ETR to a host or router in the destination network, since it > doesn't have any information on what IP address to test, or how to > probe reachability to them. > > An ITR only needs to know whether it can reach the ETR it concerns (and this is traceable). Whether the ETR connects to the destination edge network is the work on mapping servers, to authenticate the mapping information. Mapping servers should also check frequently whether the link between ETR and destination edge network is currently usable. ITRs could get this information from mapping servers, or, if it caches the out-of-date information, a timeout would help to remind ITR of the possible fail of the ETR and it may turn to another ETR. With Ivip, the ITRs make no such decisions. The mapping is a single ETR address. The end-user whose network this is either controls the mapping themselves, or appoints a multihoming monitoring company to probe reachability however the end-user likes, and then controls the mapping to restore service in the event of a failure of an ETR, its ISP or the link from the ETR to the destination network. I think a publicly-run mapping servers do exactly the things the “multihoming monitoring company” in your proposal does. The only difference is that the former servers reply ITRs of multiple mapped ETRs, while the latter choose one and reply to the ITR. The selection of ETR at ITRs obeys local policy. It also gives ITR more choices. > I do not know whether the following relates to answer your question, I > just write down to improve the clarification: A mapping server is the > authorative of the mapping information of its charged edge > address. It should (and could) know the connectivity of ETRs and charged > edge addresses in time. When an ITR caches the locator information > of mapping servers which it thinks may be useful (as suggested in the > two-stage cache mechanism), the ITR can periodically request the mapping > servers about its interested mapping information to get the current > reachability information. You could do this. In this scenario there are no options in the > mapping - the mapping would contain a single ETR address, and the > mapping server somehow tests reachability and changes the mapping > accordingly. Why cannot mapping servers then notify the multiple workable ETRs to the ITR, letting ITR to decide which ETR to send packets to? > But then, in order to achieve, for instance, 30 second > response times to outages, every ITR in the world would need to > request mapping every 30 seconds (as long as it is still handling > packets addressed to the destination network). In our cache mechanism a cache entry’s expiration time is refreshed as long as a packet with the destination arrives before the entry timeout. Today’s Internet traffic is characterized by short-termed bursts, and the distribution of packet intervals (to the same destination) is heavy-tailed, meaning that while most of the intervals are very short (in our trace, 80% of the intervals are less than 10 seconds), other large intervals may be very large (on the order of 10^3 seconds). Thus the case packets arriving constantly at the interval of 30 seconds would be very rare. > This will place a > huge load on your system, especially since I think you are sending > traffic packets through your network of tunnels and routers, as > data-driven mapping requests, and so the packets get delivered via > the Network. No, in my scheme the mapping system only handles requests. Now I think ITRs dropping the first-packet is reasonble, since in our experiment, by using a two stage cache mechanism the number of dropped packets is moderately low (dozens of packets per second on our campus border router). Even if you just sent short mapping requests, rather than data packets, you would still have an enormous load on the mapping network and the map servers. Our experiment show that request load on the mapping system would be relatively low. Using the two-stage cache mechanism the load on upper-layer map servers are especially lower : drop to zero nearly. LISP doesn't do this. It returns mapping information with longer caching times than the desired times for multihoming service restoration. The mapping consists of multiple ETR addresses with priorities. Then the difficulty is how the ITRs decide which of the ETR addresses to use. As noted above, Ivip is completely different - and separates out the reachability testing and mapping decision-making from the CES architecture itself, making it the responsibility of the end-user network. > > > Most of the proposal concerns a new global, distributed, mapping > query > > system based on the ALT concepts of each node in the tree-like > structure > > being a router, using tunnels to other such nodes, and all such nodes > > using BGP to develop best paths to neighbours. > > > > By a series of mathematical transformations the designers arrive at > > certain design choices for IPv4 and IPv6. For IPv4, if the entire > > address space was to be mapped (and at present there is no need to do > so > > beyond 224.0.0.0) there would be 256 Layer 2 nodes. Therefore, each > > such node would be the authoritative query server for mapping for > > however much "edge" space was used in an entire /8 of 16 million IPv4 > > addresses. This is a wildly unrealistic demand of any single > physical > > server, considering that there are likely to be millions of separate > > "edge" "EID" (LISP) prefixes in each such /8. > > > > We make a constraint study of the process capability on mapping > > servers. Assume `Pa' is the percentage of mappings per second that are > > requested, `N' is the total number of edge blocks, thus `Pa*N' is the > > total number of requests sent into the mapping system. A mapping > > server can forward `R' requests per second. In an L-layered system, a > > request may traverse `L+1' MNs in the worst case. Assuming requests are > > distributed evenly among `M' leaf mapping servers, we have the > constraint: > > R * M > Pa * N * (L + 1). (1) > > `Pa' is estimated to be less than 0.001 [2], we set `N' here to be 2^32 > > in the IPv4 case, `L' is 2, the right part of (1) is O(10^6). Note > > that a single router can forward 10^8 packets per second [2], > > That is a big router if it is handling potentially long traffic > packets. Also, average traffic levels probably need to be 20% or > less of the peak rate the system can handle, so as not to drop too > many packets when the traffic level briefly fluctuates to 5 or 10 > times higher than average. > > The right part of the equation is O(10^8) or more, and the right part is O(10^6). I think it can handle the peak traffic which may be 10 times higher than average. > > thus > > we see no problems that the leaf mapping servers handle requests from > > all ITRs. Our own simulations validate this (LMS: Section > > 3.2.3). Morever, the mechanism of redundancy for mapping servers, and > > the two stage cache mechanism can further relieve the burdern of each > > mapping server. The caching of locators of leaf mapping servers can > > especially relieve the load of root mapping servers. > > > > [2]:H. Luo, Y. Qin, H. Zhang. A DHT-based Identifier-to-locator Mapping > > Approach for a Scalable Internet. IEEE Transaction on Parallel and > > Distribution Systems. VOL.20, NO.10, 2009. > > As far as I know, you have not provided details of how you would > achieve this redundancy. Its not good enough, I think, to point to a > paper and expect readers to understand how that paper, written for > another purpose, could be applied to your proposal. I think you need > to explain the principles, give detailed examples, and then show how > the whole thing could scale and still be robust to some specified > level of mass adoption which reflects the greatest level of queries, > traffic or whatever you think the system would ever have to cope with. > > > I think the provision of redundancy lies heavily on actual implementation. I think I do not know the engineering issues in depth as a school student. But I have thought the architecture design should focus on its basic ideas correct, not to get into engineering issues at the design stage. The mention of DNS merely claims that redundancy _can_ be provided. I will learn more about the redundancy mechanisms if it is necessary. > > > A single address in a > > single prefix of such "edge" space could be extremely intensively > used, > > with tens or hundreds of thousands of hosts connecting to it in any 1 > > hour period - and in many instances, each such connection resulting > in a > > mapping query. > > > > If such an arrangement were feasible, there's no obvious reason why > BGP > > and tunnels would be used at all, since each ITR could easily be > > configured with, or automatically discover, the IP addresses of all > > these nodes. > > > > In the IPv4 case caching locator information of all mapping servers in > > an ITR _is_ reasonable, that's why we think FIRMS [3] could work well in > > the IPv4 case. > > OK - so why do the 224, 256 or whatever small number of mapping > servers and their routers need to be connected with BGP? > > I don't know how a single mapping server could handle all the > requests for a busy /8. If the entire /8 was "edge" space, there are > 16 million IPv4 addresses. Many end-user networks will only need > one or a few IPv4 addresses, so there could easily be 10 million > separately mapped "EID prefixes" (to use LISP terminology). Some of > these might have many mapping requests per second. I don't see how > your mathematically derived decision to have a single map server > handle so many EID prefixes is scalable. > > I think the process capability constraint study: R * M > Pa * N * (L + 1). (1) > `Pa' is estimated to be less than 0.001 [2], we set `N' here to be 2^32 > in the IPv4 case, `L' is 2, the right part of (1) is O(10^6). Note > that a single router can forward 10^8 packets per second [2], has answered this question. > > > However, if IPv6 is used for edge addresses, there are > > much more mapping servers. Storing all their locators would become > > unfeasible. > > > > [3]: M. Menth, M. Hartmann, M. Hofling. FIRMS: a Future InteRnet Mapping > > System. EuroView2008, 2008. > > > http://www3.informatik.uni-wuerzburg.de/~menth/Publications/papers/Menth09-FIRMS.pdf > > As I wrote, I don't think your mathematically derived decision about > how to structure the IPv6 system would scale well either. I think > you need to explain exactly how these things will work in terms of > bits and bytes, RAM and CPU power, delay times in fibre at 200km per > millisecond, actual details of network structure, physical and > topological locations of all the elements etc. > > > > A more realistic arrangement might be to have the "edge" space broken > > into a larger number of sections, such as 2^22 (4 million) divisions > of > > 2^10 IPv4 addresses each. If (and even this is questionable, though > it > > would frequently be true) each such division could be managed by a > > single node (a single authoritative query server), then it would be > > perfectly feasible for each ITR to automatically discover the IP > > addresses of all 2^22 potential query servers. In practice, in some > > cases, no such query server would be required, since none of those > 1024 > > IPv4 addresses were "edge" space. In other cases, due to low enough > > actual query rates, a single server could handle queries for multiple > > sets of 1024 IPv4 ranges of "edge" space. > > > > A simple ground-up engineering evaluation would thus produce a much > more > > practical solution than the highly contrived top-down mathematical > model > > - which considered only storage requirements for mapping data in each > > query server, and not the volume of queries. > > > > Firstly, we did not provide a wholesome analysis about the process > > capability of mapping servers; secondly, what we did in the proposal is > > just a constraint study, to show that the layer structure is scalable in > > providing efficient mapping service while remain the storage and process > > load acceptable. The arrangement is a specific example, while the > > actual layer number and prefix width may well be vary in actual, as to > > different parts of the world. > > OK - I think your mathematical model didn't include enough detail to > anticipate real operational conditions, such as peak numbers of > requests or the actual density of EID prefixes, which will be far > more numerous than prefixes which are advertised in the DFZ today. > > > > The IPv6 arrangement of two layers (a third layer is the single top > > node) seems even more unrealistic. Although the IPv6 address space > is > > likely to remain sparsely used for a long time, due to its inherent > > vastness, the LMS plan calls for up to 2^24 Layer 2 nodes, each with > up > > to 2^24 Layer 3 nodes underneath. This proposal seems wildly > > unrealistic as stated - since each such node would need to > accommodate > > up to 2^24 + 1 tunnels and BGP sessions with its neighbours. > > > > Firstly, similar calculation as previously implemented in the IPv4 case, > > process constraint can be meet in the IPv6 case; one important virtue of > > LMS is that it can be incrementally constructed. A mapping server need > > not to be constructed if none of its charged edge addresses are used. > > This especially makes sense in the IPv6 case. > > I agree - much of the IPv6 space is unused, so this reduces the total > size of the system you are proposing. > > > > Along with the > > popularization of the IPv6 address, we do not see it is impractical for > > a node to be able to accomadate 2^24 tunnels and sessions with neighbors. > > I think that if you looked into the software, CPU processing and > state requirements of tunnels and BGP communications you would arrive > at a figure more like 2^6 or maybe 2^8. I think your proposal is > highly theoretical and that your understanding of what is practical > is about 100,000 times greater than what is actually practical. > > > > Furthermore, the top node (Layer 1) has to cope with most query > packets, > > since there is no suggestion that Layer 2 nodes would be fully or > even > > partially meshed. > > > > Using the two stage cache mechanism, our experiment validates that the > > requests sent to the root node would be sharply reduced (nearly zero > > from our router simulated as an ITR). This is because ITRs would > > directly query the responsible bottom-layer nodes if they cache the > > locator information of them (the hit rate of this cache is high). > > Moreover, the success of DNS may also support the feasibility of the > > tree structure. > > In IPv4, obviously, an ITR could cache the addresses of all 224 map > servers. In IPv6, an ITR could cache some larger number, but you > can't rely on your broad-brush mathematical model to tell you at what > levels of the address hierarchy you need map servers, because it is > so dependent on how the "edge" space is assigned, and the nature of > the traffic to the EID prefixes in that space. > > I think a "bottom-up" analysis, starting with some explicit but > realistic assumptions of address usage, traffic patterns etc. would > provide meaningful constraints on how high in the inverted tree you > can place map servers before any one map server would be overloaded. > Then you could think about how you would implement multiple map > servers with the same responsibilities for redundancy and/or load > sharing. Only then could you start to design the overall structure > by which queries are sent to this map servers. This would be > realistic and meaningful, if the assumptions were realistic. > > The highly theoretical, inadequately detailed, top-down approach you > have taken produces results which are at odds with the actual > capabilities of servers and routers, and not necessarily grounded in > foreseeable traffic patterns in a new system of millions of EIDs, > which is different from the current situation which produced the > traffic patterns used in your modelling. Starting with a few > numbers and throwing them into a highly contrived formula, which is > what you have done, does not produce results which properly respect > the real engineering constraints on a practical solution. > > I recognized that what I could do is just a constraint study to find what the layered mapping system would be like. The realization of the layer structure depends on real circumstances, and the numbers of layers and prefix-width to each node may be different as to different network. I revised my presentation and give a constraint study on the storage and process capability constraint, and get that three layers are enough for IPv6 edge addresses. But what you say that the data about actual capabilities of routers are at odds, which I do not agree. The data of O(10^8) process capability of a router is from industry and the assumption of a few GB memory for routers are reasonable. The case of peak traffic is worth studying, though we have not had a clue. > > > Even with some unspecified caching times, the prodigious query rates > > considered in section 3.2.2 cannot be suitably handled by either of > the > > IPv4 or IPv6 structures in the proposal. > > > > The cache timeout is 5 minutes and the cache size limit is 30,000 > > entries. I am sorry for the miss of this information. As stated, we > > extrapolate the query rate from one ITR to the whole world (according > > to the proportion of our campus address space to the whole IPv4 space), > > we see that the process capability constraint can be satisfied. > > Maybe so, but what of the future when there are far more EID prefixes > than there are DFZ prefixes today? With the plan you mentioned > above, your 5 minute caching time means mulithoming service > restoration times could be as long as 5 minutes plus whatever time it > takes the mapping servers to figure out there is a problem and change > the mapping accordingly. I think this time is too long. You can > shorten it to 30 seconds, and then the levels of queries to map > servers go up by a factor of 6 - which completely changes the limits > on how much address space any one map server can be responsible for. > > > > While there is reference to redundant hardware for each node, there > is > > no discussion of how to implement this. This is one of the problems > > which so far has bedevilled LISP-ALT: - how to add nodes to this > highly > > aggregated network structure so that there is no single point of > > failure. For instance, how in the IPv6 system could there be two > > physical nodes, each performing the role of a given Level 2 node, in > > topologically diverse locations - without adding great complexity and > > greater numbers of tunnels and BGP sessions? > > > > I do not think adding physical nodes to provide redundancy would > > complicate the system much. > > You haven't explained how you would do it in the LMS structure. > > > > DNS uses mirrors and can provide redundancy > > effectively. > > DNS is not a routing system - and you are proposing a routing system > like ALT, with different decisions about levels of aggregation, but > also with ITRs caching addresses of map servers they discover > initially via traffic packets acting as map requests traversing the > LMS overlay network. > > Letting the mapping system to carry traffic packets may burden the system heavily, and it is not recommended in the early stage of LISP implementation. I think LMS does similar work like DNS does, while it is core routers who launch the mapping requests, not end users. > A domain is delegated, normally, to two or more topologically > separated authoritative query servers, and delegation is achieved by > passing on the IP addresses of these servers. > > You can create two or more redundant mapping servers and place them > in physically and topologically different locations. Now the task is > to show how the LMS routers handle this doubled number of map > servers, and automatically propagate their reachability up and down > the the LMS system with BGP, in a scalable fashion, without there > being any single point of failure such as reliance on any particular > LMS router. How would you structure LMS so for every router, there > is another which will automatically take its place of it fails? You > can't point to DNS as a solution, since it doesn't involve routers. > > Redundant nodes for a logical mapping server form a mesh, regularly inform the reachability to each other. When a node fails, other nodes inform the nodes of logical neighbors who cares the reachability information. The limited updates propogation is acceptable for the mapping servers. > > > > However, the practical issue should be found and solved > > through actual implementation. Thus a thorough and > > large-scale experiment (as the LISP interworking) is much in need. > > The LISP test network is, and always will be, too small to display > the scaling problems which are obvious for ALT or for LMS. > > The whole idea of good architecture is to take everything into > consideration, especially all the physical details (which are > sometimes described disparagingly as merely "engineering details") > and devise a plan which looks like it can scale to the massive sizes > required for a fully adopted routing scaling solution. Only when you > have such a design is it worth experimenting, I think. > > To say "It looks like we don't know how to make it work on a large > enough scale to be a real solution - so lets try building it on a > small scale, and then ideally a larger scale and see if we can figure > out how to make it work." is the antithesis of good design. I think > the LISP team did something like this. They place great importance > on actually trying things. I am not opposed to trying things, but > when an architecture such as ALT has very obvious failings, which > were mentioned within months of its announcement: > > ALT's strong aggregation often leads to *very* long paths > http://www.ietf.org/mail-archive/web/rrg/current/msg01171.html > > it would have been better to create a better design rather than > experimenting with one which had obvious problems with no apparent > solution. > > > > The suggestion (section 5) that core (DFZ) routers not maintain a > full > > FIB, but rather hold a packet which does not match any FIB prefix, > pay > > for a mapping lookup, await the result, update the FIB (a frequently > > expensive operation) and then forward the packet - is also wildly > > unrealistic. > > > > If the mapping system has been existed, or highly configured routers > > would provide mapping services, why is it unrealistic, that routers who > > cannot afford to hold the global routing table, discard of uninterested > > specific routes and query for them when needed? > > Any router which dropped or delayed packets like this would not be > usable in the DFZ. DFZ routers need to reliably and instantly > forward any packet whose destination address matches one of the > prefixes advertised in the DFZ. If it can't do this, no-one would > want to forward packets to that router. > > But in the future some core routers may fail to maintain the global routing table, as the scalability problem states. > > > It is important to research alternative approaches when existing > methods > > are perceived as facing serious problems, as is the case with > LISP-ALT. > > In this case, the proposed solution is not likely to be any > improvement > > on any ALT arrangement which is likely to arise by a more > hand-crafted > > design methodology. > > > > The LMS proposal, as it currently stands, is far too incomplete to be > > considered suitable for further development ahead of some other > > proposals. It represents the efforts of a creative team to improve > on > > LISP-ALT, and does not necessarily mean that all such attempts at > > improvement would lead to such impractical choices. > > > > Waiting for your reply. Thank you. > > > > Best wishes, > > Letong > > OK - thanks for your reply. > > - Robin > > Thank you for your patience and enlightened suggestions. Best wishes, Letong
- [rrg] LMS alternative critique Robin Whittle
- Re: [rrg] LMS alternative critique Charrie Sun
- Re: [rrg] LMS alternative critique Robin Whittle
- Re: [rrg] LMS alternative critique Charrie Sun