Re: [rrg] LMS alternative critique
Charrie Sun <charriesun@gmail.com> Wed, 24 February 2010 09:15 UTC
Return-Path: <charriesun@gmail.com>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A6F4928C0DD for <rrg@core3.amsl.com>; Wed, 24 Feb 2010 01:15:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.283
X-Spam-Level:
X-Spam-Status: No, score=-2.283 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JE1YF53Qt-ML for <rrg@core3.amsl.com>; Wed, 24 Feb 2010 01:15:27 -0800 (PST)
Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by core3.amsl.com (Postfix) with ESMTP id 051883A67AC for <rrg@irtf.org>; Wed, 24 Feb 2010 01:15:26 -0800 (PST)
Received: by vws14 with SMTP id 14so2321515vws.13 for <rrg@irtf.org>; Wed, 24 Feb 2010 01:17:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=QY2bu2VnCiOOOLte3Dv71hkDIyr7RZAFUS/Uh/SMoYc=; b=LEhgXEjykX9ylTuWOuLtqjh29VEHIEgmXl0wdAXeH4hq/Au1gEz7ZN1kiJUgeu1pxG hokDjwd1MVX0NXgHnFG9jhwleLPHAIq17R24QRClX7mWCd5OpPu/SeX0v+SC9b+WjZgo 7v9Cow+AvU5xaeGQUdxsdhGHHiuybz2Fo/jTA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=tAM2IT3lebWlK7GIGvE4PnXOuP5VWvvkvL1aco3lipVZLxo1uYcwTyEZK32Q8UHHw4 aEb9HIWTxxszZLMCDqnIG0G3e4gWiQUUmE0WQqx4fyZhEWak4o47WTR0xDpHCrMfvO6p zf12r00hPUXQeAnP+eebP/kTOR4MZA8+JXjL4=
MIME-Version: 1.0
Received: by 10.220.121.227 with SMTP id i35mr1327461vcr.29.1267003051681; Wed, 24 Feb 2010 01:17:31 -0800 (PST)
In-Reply-To: <4B7F9E39.2030800@firstpr.com.au>
References: <4B7F9E39.2030800@firstpr.com.au>
Date: Wed, 24 Feb 2010 17:17:31 +0800
Message-ID: <4eb512451002240117y4fe3a056r6376981034c9ca5@mail.gmail.com>
From: Charrie Sun <charriesun@gmail.com>
To: Robin Whittle <rw@firstpr.com.au>
Content-Type: multipart/alternative; boundary="001636e1eee1d328d404805522c3"
Cc: RRG <rrg@irtf.org>
Subject: Re: [rrg] LMS alternative critique
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Feb 2010 09:15:29 -0000
Hi Robin: Thank you for your critique. My response is inline. 2010/2/20 Robin Whittle <rw@firstpr.com.au> > Here is an 865 word critique of the 2009-12-24 version of the LMS > proposal (right click the Adobe Reader display of a page of this file, > once saved locally): > > > http://docs.google.com/fileview?id=0BwsJc7A4NTgeMzRlYWYzYjEtZTkyOS00ZjgwLWI5YjItYmUyNzJjNTIyZTJi&hl=en > > There is currently a critique in the RRG Report ID: > http://tools.ietf.org/html/draft-irtf-rrg-recommendation-04#section-8.2 > > I am not sure whether this applies to the earlier LMS proposal or to the > significantly updated 2009-12-24 version. > > - Robin > > > > > > Layered Mapping System > ---------------------- > > LMS is a step towards designing a complete Core-Edge Separation routing > scaling solution for both IPv4 and IPv6, somewhat along the lines of > LISP-ALT. > > There are insufficient details in the proposal to evaluate how well the > basic infrastructure of ITRs and ETRs would work, considering the > unknown nature of mapping delays, We did a simulation of LMS and using real data collected from a campus border router to show that, when equipped with the two-stage cache mechanism at ITRs (as stated in our proposal), the request hops are considerably small (94% no hop: cache hits; 5.5% two hops; 0.5% four hops). These hops are logical as mapping servers talk using tunnels, while the delay between two random nodes may not be unacceptable (some estimated it 115 ms ([1]). The redundancy for logical mapping servers may help to reduce the delays between two mapping servers, since a mapping server may choose a nearby server it wants to communicate with. [1]: S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, ”A scalable content-addressable network,” in Proc. of ACM SIGCOMM’ 01, San Diego, CA, USA, Aug. 2001, pp.161-172. > reliability of the global mapping > system, The reliability of the global mapping system lies in its redundancy and efficient communications between mapping servers. The test needs real environment and actual implementation, however I do not see there is logic fault in the layered mapping system. Perhaps we can draw lessons from DNS, which run effectively. > the problems of Path MTU Discovery (due to tunneling via > encapsulation) MTU issue is inherent in the map-and-encap scheme. However, address rewriting needs intermediate routers to change the packet headers, which is insecure and brings many issues such as checksum conformity; comprising both core and edge addresses in one form only handle the problem by shrinking the size of namespace. You have proposed to estimate MTU along the tunnels, yet I do not know whether it is effective. > and difficulties with an ITR reliably probing > reachability of the destination network via multiple ETRs. > ITR receive multiple mappings in the response, each with different priorities. ITR can select a most preferred ETR to forward packets to, while using others as backups. I do not know whether the following relates to answer your question, I just write down to improve the clarification: A mapping server is the authorative of the mapping information of its charged edge address. It should (and could) know the connectivity of ETRs and charged edge addresses in time. When an ITR caches the locator information of mapping servers which it thinks may be useful (as suggested in the two-stage cache mechanism), the ITR can periodically request the mapping servers about its interested mapping information to get the current reachability information. > Most of the proposal concerns a new global, distributed, mapping query > system based on the ALT concepts of each node in the tree-like structure > being a router, using tunnels to other such nodes, and all such nodes > using BGP to develop best paths to neighbours. > > By a series of mathematical transformations the designers arrive at > certain design choices for IPv4 and IPv6. For IPv4, if the entire > address space was to be mapped (and at present there is no need to do so > beyond 224.0.0.0) there would be 256 Layer 2 nodes. Therefore, each > such node would be the authoritative query server for mapping for > however much "edge" space was used in an entire /8 of 16 million IPv4 > addresses. This is a wildly unrealistic demand of any single physical > server, considering that there are likely to be millions of separate > "edge" "EID" (LISP) prefixes in each such /8. We make a constraint study of the process capability on mapping servers. Assume `Pa' is the percentage of mappings per second that are requested, `N' is the total number of edge blocks, thus `Pa*N' is the total number of requests sent into the mapping system. A mapping server can forward `R' requests per second. In an L-layered system, a request may traverse `L+1' MNs in the worst case. Assuming requests are distributed evenly among `M' leaf mapping servers, we have the constraint: R * M > Pa * N * (L + 1). (1) `Pa' is estimated to be less than 0.001 [2], we set `N' here to be 2^32 in the IPv4 case, `L' is 2, the right part of (1) is O(10^6). Note that a single router can forward 10^8 packets per second [2], thus we see no problems that the leaf mapping servers handle requests from all ITRs. Our own simulations validate this (LMS: Section 3.2.3). Morever, the mechanism of redundancy for mapping servers, and the two stage cache mechanism can further relieve the burdern of each mapping server. The caching of locators of leaf mapping servers can especially relieve the load of root mapping servers. [2]:H. Luo, Y. Qin, H. Zhang. A DHT-based Identifier-to-locator Mapping Approach for a Scalable Internet. IEEE Transaction on Parallel and Distribution Systems. VOL.20, NO.10, 2009. > A single address in a > single prefix of such "edge" space could be extremely intensively used, > with tens or hundreds of thousands of hosts connecting to it in any 1 > hour period - and in many instances, each such connection resulting in a > mapping query. > > If such an arrangement were feasible, there's no obvious reason why BGP > and tunnels would be used at all, since each ITR could easily be > configured with, or automatically discover, the IP addresses of all > these nodes. > > In the IPv4 case caching locator information of all mapping servers in an ITR _is_ reasonable, that's why we think FIRMS [3] could work well in the IPv4 case. However, if IPv6 is used for edge addresses, there are much more mapping servers. Storing all their locators would become unfeasible. [3]: M. Menth, M. Hartmann, M. Hofling. FIRMS: a Future InteRnet Mapping System. EuroView2008, 2008. > A more realistic arrangement might be to have the "edge" space broken > into a larger number of sections, such as 2^22 (4 million) divisions of > 2^10 IPv4 addresses each. If (and even this is questionable, though it > would frequently be true) each such division could be managed by a > single node (a single authoritative query server), then it would be > perfectly feasible for each ITR to automatically discover the IP > addresses of all 2^22 potential query servers. In practice, in some > cases, no such query server would be required, since none of those 1024 > IPv4 addresses were "edge" space. In other cases, due to low enough > actual query rates, a single server could handle queries for multiple > sets of 1024 IPv4 ranges of "edge" space. > > A simple ground-up engineering evaluation would thus produce a much more > practical solution than the highly contrived top-down mathematical model > - which considered only storage requirements for mapping data in each > query server, and not the volume of queries. > > Firstly, we did not provide a wholesome analysis about the process capability of mapping servers; secondly, what we did in the proposal is just a constraint study, to show that the layer structure is scalable in providing efficient mapping service while remain the storage and process load acceptable. The arrangement is a specific example, while the actual layer number and prefix width may well be vary in actual, as to different parts of the world. > The IPv6 arrangement of two layers (a third layer is the single top > node) seems even more unrealistic. Although the IPv6 address space is > likely to remain sparsely used for a long time, due to its inherent > vastness, the LMS plan calls for up to 2^24 Layer 2 nodes, each with up > to 2^24 Layer 3 nodes underneath. This proposal seems wildly > unrealistic as stated - since each such node would need to accommodate > up to 2^24 + 1 tunnels and BGP sessions with its neighbours. > Firstly, similar calculation as previously implemented in the IPv4 case, process constraint can be meet in the IPv6 case; one important virtue of LMS is that it can be incrementally constructed. A mapping server need not to be constructed if none of its charged edge addresses are used. This especially makes sense in the IPv6 case. Along with the popularization of the IPv6 address, we do not see it is impractical for a node to be able to accomadate 2^24 tunnels and sessions with neighbors. > Furthermore, the top node (Layer 1) has to cope with most query packets, > since there is no suggestion that Layer 2 nodes would be fully or even > partially meshed. > > Using the two stage cache mechanism, our experiment validates that the requests sent to the root node would be sharply reduced (nearly zero from our router simulated as an ITR). This is because ITRs would directly query the responsible bottom-layer nodes if they cache the locator information of them (the hit rate of this cache is high). Moreover, the success of DNS may also support the feasibility of the tree structure. > Even with some unspecified caching times, the prodigious query rates > considered in section 3.2.2 cannot be suitably handled by either of the > IPv4 or IPv6 structures in the proposal. > > The cache timeout is 5 minutes and the cache size limit is 30,000 entries. I am sorry for the miss of this information. As stated, we extrapolate the query rate from one ITR to the whole world (according to the proportion of our campus address space to the whole IPv4 space), we see that the process capability constraint can be satisfied. > While there is reference to redundant hardware for each node, there is > no discussion of how to implement this. This is one of the problems > which so far has bedevilled LISP-ALT: - how to add nodes to this highly > aggregated network structure so that there is no single point of > failure. For instance, how in the IPv6 system could there be two > physical nodes, each performing the role of a given Level 2 node, in > topologically diverse locations - without adding great complexity and > greater numbers of tunnels and BGP sessions? > > I do not think adding physical nodes to provide redundancy would complicate the system much. DNS uses mirrors and can provide redundancy effectively. However, the practical issue should be found and solved through actual implementation. Thus a thorough and large-scale experiment (as the LISP interworking) is much in need. > The suggestion (section 5) that core (DFZ) routers not maintain a full > FIB, but rather hold a packet which does not match any FIB prefix, pay > for a mapping lookup, await the result, update the FIB (a frequently > expensive operation) and then forward the packet - is also wildly > unrealistic. > > If the mapping system has been existed, or highly configured routers would provide mapping services, why is it unrealistic, that routers who cannot afford to hold the global routing table, discard of uninterested specific routes and query for them when needed? > It is important to research alternative approaches when existing methods > are perceived as facing serious problems, as is the case with LISP-ALT. > In this case, the proposed solution is not likely to be any improvement > on any ALT arrangement which is likely to arise by a more hand-crafted > design methodology. > > The LMS proposal, as it currently stands, is far too incomplete to be > considered suitable for further development ahead of some other > proposals. It represents the efforts of a creative team to improve on > LISP-ALT, and does not necessarily mean that all such attempts at > improvement would lead to such impractical choices. > > > Waiting for your reply. Thank you. Best wishes, Letong
- [rrg] LMS alternative critique Robin Whittle
- Re: [rrg] LMS alternative critique Charrie Sun
- Re: [rrg] LMS alternative critique Robin Whittle
- Re: [rrg] LMS alternative critique Charrie Sun