Re: [rrg] LMS alternative critique

Charrie Sun <charriesun@gmail.com> Wed, 24 February 2010 09:15 UTC

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=tAM2IT3lebWlK7GIGvE4PnXOuP5VWvvkvL1aco3lipVZLxo1uYcwTyEZK32Q8UHHw4 aEb9HIWTxxszZLMCDqnIG0G3e4gWiQUUmE0WQqx4fyZhEWak4o47WTR0xDpHCrMfvO6p zf12r00hPUXQeAnP+eebP/kTOR4MZA8+JXjL4=
MIME-Version: 1.0
In-Reply-To: <4B7F9E39.2030800@firstpr.com.au>
References: <4B7F9E39.2030800@firstpr.com.au>
Date: Wed, 24 Feb 2010 17:17:31 +0800
Message-ID: <4eb512451002240117y4fe3a056r6376981034c9ca5@mail.gmail.com>
From: Charrie Sun <charriesun@gmail.com>
To: Robin Whittle <rw@firstpr.com.au>
Content-Type: multipart/alternative; boundary="001636e1eee1d328d404805522c3"
Cc: RRG <rrg@irtf.org>
Subject: Re: [rrg] LMS alternative critique
Precedence: list

Hi Robin:
    Thank you for your critique. My response is inline.
2010/2/20 Robin Whittle <rw@firstpr.com.au>

> Here is an 865 word critique of the 2009-12-24 version of the LMS
> proposal (right click the Adobe Reader display of a page of this file,
> once saved locally):
>
>
> http://docs.google.com/fileview?id=0BwsJc7A4NTgeMzRlYWYzYjEtZTkyOS00ZjgwLWI5YjItYmUyNzJjNTIyZTJi&hl=en
>
> There is currently a critique in the RRG Report ID:
>  http://tools.ietf.org/html/draft-irtf-rrg-recommendation-04#section-8.2
>
> I am not sure whether this applies to the earlier LMS proposal or to the
> significantly updated 2009-12-24 version.
>
>  - Robin
>
>
>
>
>
> Layered Mapping System
> ----------------------
>
> LMS is a step towards designing a complete Core-Edge Separation routing
> scaling solution for both IPv4 and IPv6, somewhat along the lines of
> LISP-ALT.
>
> There are insufficient details in the proposal to evaluate how well the
> basic infrastructure of ITRs and ETRs would work, considering the
> unknown nature of mapping delays,

We did a simulation of LMS and using real data collected from a
campus border router to show that, when equipped with the two-stage cache
mechanism at ITRs (as stated in our proposal), the request hops are
considerably small (94% no hop: cache hits; 5.5% two hops; 0.5% four hops).
These hops are logical as mapping servers talk using tunnels,
while the delay between two random nodes may not be unacceptable (some
estimated it 115 ms ([1]).  The redundancy for logical mapping servers
may help to reduce the delays between two mapping servers, since a mapping
server may choose a nearby server it wants to communicate with.

[1]: S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, ”A
scalable content-addressable network,” in Proc. of ACM SIGCOMM’ 01, San
Diego, CA, USA, Aug. 2001, pp.161-172.


> reliability of the global mapping
> system,

The reliability of the global mapping system lies in its
redundancy and efficient communications between mapping servers. The
test needs real environment and actual implementation, however I do not see
there is logic fault in the layered mapping system. Perhaps we can draw
lessons from DNS, which run effectively.


> the problems of Path MTU Discovery (due to tunneling via
> encapsulation)


MTU issue is inherent in the map-and-encap scheme. However, address
rewriting needs intermediate routers to change the packet headers, which
is insecure and brings many issues such as checksum conformity; comprising
both core and edge addresses in one form only handle the problem by
shrinking the size of namespace. You have proposed to estimate MTU along the
tunnels, yet I do not know whether it is effective.


> and difficulties with an ITR reliably probing
> reachability of the destination network via multiple ETRs.
>
ITR receive multiple mappings in the response, each with different
priorities. ITR can select a most preferred ETR to forward packets to, while
using others as backups.
I do not know whether the following relates to answer your question, I just
write down to improve the clarification: A mapping server is the authorative
of the mapping information of its charged edge address. It should (and
could) know the connectivity of ETRs and charged edge addresses in
time. When an ITR caches the locator information of mapping servers which it
thinks may be useful (as suggested in the two-stage cache mechanism), the
ITR can periodically request the mapping servers about its interested
mapping information to get the current reachability information.


> Most of the proposal concerns a new global, distributed, mapping query
> system based on the ALT concepts of each node in the tree-like structure
> being a router, using tunnels to other such nodes, and all such nodes
> using BGP to develop best paths to neighbours.
>
> By a series of mathematical transformations the designers arrive at
> certain design choices for IPv4 and IPv6.  For IPv4, if the entire
> address space was to be mapped (and at present there is no need to do so
> beyond 224.0.0.0) there would be 256 Layer 2 nodes.  Therefore, each
> such node would be the authoritative query server for mapping for
> however much "edge" space was used in an entire /8 of 16 million IPv4
> addresses.  This is a wildly unrealistic demand of any single physical
> server, considering that there are likely to be millions of separate
> "edge" "EID" (LISP) prefixes in each such /8.

We make a constraint study of the process capability on mapping
servers. Assume `Pa'  is the percentage of mappings per second that are
requested, `N' is the total number of edge blocks, thus `Pa*N' is the total
number of requests sent into the mapping system. A mapping server can
forward `R' requests per second. In an L-layered system, a request may
traverse `L+1' MNs in the worst case. Assuming requests are distributed
evenly among `M' leaf mapping servers, we have the constraint:
    R * M > Pa * N * (L + 1).  (1)
`Pa' is estimated to be less than 0.001 [2], we set `N' here to be 2^32 in
the IPv4 case, `L' is 2, the right part of (1) is O(10^6). Note
that a single router can forward 10^8 packets per second [2], thus we see
no problems that the leaf mapping servers handle requests from all ITRs. Our
own simulations validate this (LMS: Section 3.2.3). Morever, the mechanism
of redundancy for mapping servers, and the two stage cache mechanism can
further relieve the burdern of each mapping server. The caching of locators
of leaf mapping servers can especially relieve the load of root mapping
servers.

[2]:H. Luo, Y. Qin, H. Zhang. A DHT-based Identifier-to-locator Mapping
Approach for a Scalable Internet. IEEE Transaction on Parallel and
Distribution Systems. VOL.20, NO.10, 2009.


>   A single address in a
> single prefix of such "edge" space could be extremely intensively used,
> with tens or hundreds of thousands of hosts connecting to it in any 1
> hour period - and in many instances, each such connection resulting in a
>  mapping query.
>
> If such an arrangement were feasible, there's no obvious reason why BGP
> and tunnels would be used at all, since each ITR could easily be
> configured with, or automatically discover, the IP addresses of all
> these nodes.
>
> In the IPv4 case caching locator information of all mapping servers in an
ITR _is_ reasonable, that's why we think FIRMS [3] could work well in the
IPv4 case. However, if IPv6 is used for edge addresses, there are much more
mapping servers. Storing all their locators would become unfeasible.

[3]: M. Menth, M. Hartmann, M. Hofling. FIRMS: a Future InteRnet Mapping
System. EuroView2008, 2008.


> A more realistic arrangement might be to have the "edge" space broken
> into a larger number of sections, such as 2^22 (4 million) divisions of
> 2^10 IPv4 addresses each.  If (and even this is questionable, though it
> would frequently be true) each such division could be managed by a
> single node (a single authoritative query server), then it would be
> perfectly feasible for each ITR to automatically discover the IP
> addresses of all 2^22 potential query servers.  In practice, in some
> cases, no such query server would be required, since none of those 1024
> IPv4 addresses were "edge" space.  In other cases, due to low enough
> actual query rates, a single server could handle queries for multiple
> sets of 1024 IPv4 ranges of "edge" space.
>
> A simple ground-up engineering evaluation would thus produce a much more
> practical solution than the highly contrived top-down mathematical model
> - which considered only storage requirements for mapping data in each
> query server, and not the volume of queries.
>
> Firstly, we did not provide a wholesome analysis about the process
capability of mapping servers; secondly, what we did in the proposal is just
a constraint study, to show that the layer structure is scalable in
providing efficient mapping service while remain the storage and process
load acceptable. The arrangement is a specific example, while the
actual layer number and prefix width may well be vary in actual, as to
different parts of the world.


> The IPv6 arrangement of two layers (a third layer is the single top
> node) seems even more unrealistic.  Although the IPv6 address space is
> likely to remain sparsely used for a long time, due to its inherent
> vastness, the LMS plan calls for up to 2^24 Layer 2 nodes, each with up
> to 2^24 Layer 3 nodes underneath.  This proposal seems wildly
> unrealistic as stated - since each such node would need to accommodate
> up to 2^24 + 1 tunnels and BGP sessions with its neighbours.
>
Firstly, similar calculation as previously implemented in the IPv4 case,
process constraint can be meet in the IPv6 case; one important virtue of LMS
is that it can be incrementally constructed. A mapping server need not to be
constructed if none of its charged edge addresses are used. This especially
makes sense in the IPv6 case. Along with the popularization of the IPv6
address, we do not see it is impractical for a node to be able to
accomadate 2^24 tunnels and sessions with neighbors.


> Furthermore, the top node (Layer 1) has to cope with most query packets,
> since there is no suggestion that Layer 2 nodes would be fully or even
> partially meshed.
>
> Using the two stage cache mechanism, our experiment validates that the
requests sent to the root node would be sharply reduced (nearly zero from
our router simulated as an ITR). This is because ITRs would directly query
the responsible bottom-layer nodes if they cache the locator information of
them (the hit rate of this cache is high). Moreover, the success of DNS may
also support the feasibility of the tree structure.


> Even with some unspecified caching times, the prodigious query rates
> considered in section 3.2.2 cannot be suitably handled by either of the
> IPv4 or IPv6 structures in the proposal.
>
> The cache timeout is 5 minutes and the cache size limit is 30,000 entries.
I am sorry for the miss of this information. As stated, we extrapolate the
query rate from one ITR to the whole world (according to the proportion of
our campus address space to the whole IPv4 space), we see that the process
capability constraint can be satisfied.


> While there is reference to redundant hardware for each node, there is
> no discussion of how to implement this.  This is one of the problems
> which so far has bedevilled LISP-ALT: - how to add nodes to this highly
> aggregated network structure so that there is no single point of
> failure.  For instance, how in the IPv6 system could there be two
> physical nodes, each performing the role of a given Level 2 node, in
> topologically diverse locations - without adding great complexity and
> greater numbers of tunnels and BGP sessions?
>
> I do not think adding physical nodes to provide redundancy would complicate
the system much. DNS uses mirrors and can provide redundancy effectively.
However, the practical issue should be found and solved through actual
implementation. Thus a thorough and large-scale experiment (as the LISP
interworking) is much in need.


> The suggestion (section 5) that core (DFZ) routers not maintain a full
> FIB, but rather hold a packet which does not match any FIB prefix, pay
> for a mapping lookup, await the result, update the FIB (a frequently
> expensive operation) and then forward the packet - is also wildly
> unrealistic.
>
> If the mapping system has been existed, or highly configured routers would
provide mapping services, why is it unrealistic, that routers who cannot
afford to hold the global routing table, discard of uninterested specific
routes and query for them when needed?


> It is important to research alternative approaches when existing methods
> are perceived as facing serious problems, as is the case with LISP-ALT.
>  In this case, the proposed solution is not likely to be any improvement
> on any ALT arrangement which is likely to arise by a more hand-crafted
> design methodology.
>
> The LMS proposal, as it currently stands, is far too incomplete to be
> considered suitable for further development ahead of some other
> proposals.  It represents the efforts of a creative team to improve on
> LISP-ALT, and does not necessarily mean that all such attempts at
> improvement would lead to such impractical choices.
>
>
>
Waiting for your reply. Thank you.

Best wishes,
Letong

[rrg] LMS alternative critique Robin Whittle
Re: [rrg] LMS alternative critique Charrie Sun
Re: [rrg] LMS alternative critique Robin Whittle
Re: [rrg] LMS alternative critique Charrie Sun