Re: [lisp] LISP Map Server I-D & updated draft-farinacci-lisp - 2 stages of caching mapping

Dino Farinacci <dino@cisco.com> Fri, 06 March 2009 05:46 UTC

Return-Path: <dino@cisco.com>
X-Original-To: lisp@core3.amsl.com
Delivered-To: lisp@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6E0393A69AF for <lisp@core3.amsl.com>; Thu, 5 Mar 2009 21:46:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.444
X-Spam-Level:
X-Spam-Status: No, score=-5.444 tagged_above=-999 required=5 tests=[AWL=-1.155, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, SARE_URGBIZ=0.725, URG_BIZ=1.585]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aamf8ayQyUQ2 for <lisp@core3.amsl.com>; Thu, 5 Mar 2009 21:46:09 -0800 (PST)
Received: from sj-iport-1.cisco.com (sj-iport-1.cisco.com [171.71.176.70]) by core3.amsl.com (Postfix) with ESMTP id D7C5A3A6A46 for <lisp@ietf.org>; Thu, 5 Mar 2009 21:46:06 -0800 (PST)
X-IronPort-AV: E=Sophos;i="4.38,312,1233532800"; d="scan'208";a="151732206"
Received: from sj-dkim-3.cisco.com ([171.71.179.195]) by sj-iport-1.cisco.com with ESMTP; 06 Mar 2009 05:45:37 +0000
Received: from sj-core-1.cisco.com (sj-core-1.cisco.com [171.71.177.237]) by sj-dkim-3.cisco.com (8.12.11/8.12.11) with ESMTP id n265jbpt031817; Thu, 5 Mar 2009 21:45:37 -0800
Received: from xbh-sjc-231.amer.cisco.com (xbh-sjc-231.cisco.com [128.107.191.100]) by sj-core-1.cisco.com (8.13.8/8.13.8) with ESMTP id n265jb5l008763; Fri, 6 Mar 2009 05:45:37 GMT
Received: from xfe-sjc-211.amer.cisco.com ([171.70.151.174]) by xbh-sjc-231.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 5 Mar 2009 21:45:37 -0800
Received: from [192.168.1.5] ([10.21.79.33]) by xfe-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 5 Mar 2009 21:45:36 -0800
Message-Id: <673472F6-3BEB-4DD0-A1F6-66AA9E90EE41@cisco.com>
From: Dino Farinacci <dino@cisco.com>
To: Robin Whittle <rw@firstpr.com.au>
In-Reply-To: <49AEBC00.9070306@firstpr.com.au>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Thu, 05 Mar 2009 21:39:44 -0800
References: <49AEBC00.9070306@firstpr.com.au>
X-Mailer: Apple Mail (2.930.3)
X-OriginalArrivalTime: 06 Mar 2009 05:45:36.0635 (UTC) FILETIME=[C616A4B0:01C99E1E]
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; l=21879; t=1236318337; x=1237182337; c=relaxed/simple; s=sjdkim3002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=dino@cisco.com; z=From:=20Dino=20Farinacci=20<dino@cisco.com> |Subject:=20Re=3A=20[lisp]=20LISP=20Map=20Server=20I-D=20&= 20updated=20draft-farinacci-lisp=20-=202=20stages=20of=20cac hing=20mapping |Sender:=20; bh=OQOd8JGcOoo8LG4LgXitYTSPzoKIoaJ/T/fyPzVENIc=; b=GrUsOUPE6nADKww/07pm8HvYVJoNUWaHXvJln5eSp+2UfaDu3K89L2fnbZ fPWq0NxQuc2rUNbK4R0roypPNbcDZK3MEGQKx54HrXHrgvKSYngFxgdnKbKB uYBh7IwmLy;
Authentication-Results: sj-dkim-3; header.From=dino@cisco.com; dkim=pass ( sig from cisco.com/sjdkim3002 verified; );
Cc: RRG <rrg@irtf.org>, lisp@ietf.org
Subject: Re: [lisp] LISP Map Server I-D & updated draft-farinacci-lisp - 2 stages of caching mapping
X-BeenThere: lisp@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: List for the discussion of the Locator/ID Separation Protocol <lisp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/lisp>, <mailto:lisp-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/lisp>
List-Post: <mailto:lisp@ietf.org>
List-Help: <mailto:lisp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lisp>, <mailto:lisp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Mar 2009 05:46:09 -0000

> Short version:   The Devil is in the detail, but the I-Ds don't yet
>                 have all the details.  Multiple levels of caching
>                 is good in some ways and troublesome in others.

Thanks for your comments Robin.

The main point of draft-fuller-lisp-ms-00.txt is to create a API of  
sorts for LISP sites. So they can use a set of primitives regardless  
of the mapping database system deployed.

By doing this, the cost of managing an xTR goes way down. No GRE  
tunnels, no BGP. Simply Map-Request, Map-Reply, and Map-Register  
primitives.

> LISP Map Server draft-fuller-lisp-ms-00
> Abstract
>
>  This draft describes the LISP Map-Server (LISP-MS), a computing
>  system which provides a simple LISP protocol interface as a "front
>  end" to the Endpoint-ID (EID) to Routing Locator (RLOC) mapping
>  database and associated virtual network of LISP protocol elements.
>
>  The purpose of the Map-Server is to simplify the implementation
>  and operation of LISP Ingress Tunnel Routers (ITRs) and Egress
>  Tunnel Routers (ETRs), the devices that implement the "edge" of the
>  LISP infrastructure and which connect directly to LISP-capable
>  Internet end sites.
>
> My understanding of and comments on this are:
>
> Instead of ITRs and ETRs needing to act as routers in the ALT
> network, they communicate via the ordinary Internet with Map Servers,
> which are routers on the ALT network.  This will greatly reduce the
> complexity and configuration difficulties of ITR and ETRs.

Yes, that is right. ITRs send encapsulated Map-Requests to Map- 
Resolvers via the Map-Resolver's RLOC address. ETRs get encapsulated  
Map-Requests from Map-Servers via the ETR's RLOC address only after  
the ETR Map-Registers to the Map-Server.

> These Map Server devices are implicitly local to the ITRs and ETRs in
> a given network and are intended to be used only by those ITRs and
> ETRs.  They are always on RLOC (stable, globally reachable, non
> LISP-mapped) addresses.

Don't know what you mean by local. But if you meant the Map-Server is  
colocated with ITRs, that is not true. The Map-Server would typically  
not be at the site but in the Internet infrastructure somewhere. Most  
likely in an service provider, an interconnect provider, a RIR, or a  
third-party.

> There are two functions which may be combined in the one device:
>
>   Map Resolver (MR)
>
>      Accepts a mapping query from an ITR and (usually) sends the
>      ITR a mapping reply.   (The exception is if the MR doesn't
>      have the information and sends the query verbatim to some
>      other device, which will answer the query directly to the
>      ITR.)

Right, we want to experiment with Map-Resolver caching but want to do  
that as a second phase in the implementation. So the Map-Resolver gets  
the Map-Request from the ITR which now puts it on the LISP-ALT  
network. If there is another mapping database service, it could be used.

This way we can make the mapping database service modular and don't  
need the sites to participate in it directly.

>      MRs can be caching or non-caching.  More on that below.
>
>      ITRs are intended to be configured with a single address for
>      their local MR.  This would raise questions of robustness if
>      not for the next item:
>
>      Multiple MRs in a local network (such as an ISP network or
>      I guess any end-user network which has ITRs) can be configured
>      on the one anycast address.  This way, the ITR's request will
>      be forwarded to the nearest currently active MR.  All
>      communication is via single packets, not via TCP.  Presumably
>      the MRs will also have their own unique addresses so they can
>      be managed via TCP.

Right.

>      I think the MR is an important improvement to LISP-ALT, since

It's not an improvement to the LISP-ALT mapping database, but a Map- 
Resolver can be a LISP-ALT router/system, a NERD system, or a CONS/DHT  
system.

>      it enables an ITR to be a much more casual and unstable concept
>      than was the case when all ITRs needed to participate in the
>      ALT network as routers (AFAIK).  This means that ITRs can be
>      added easily, without having to configure anything.

True.

>      It also means (though this is my suggestion, not from the LISP
>      team) that an ITR function could easily be implemented in a
>      sending host, assuming it was not behind NAT.  I guess the
>      sending host would need to be on an RLOC address - which rules
>      out this idea for sending hosts in end-user networks.  Ivip's
>      ITR in sending host function (ITFH) requires the host to be
>      on a non-NAT address which can be and ordinary or a Scalable PI
>      address - RLOC or EID in LISP parlance.

True, however, it would increase the number of locators for a site.  
That is the EID to RLOC ratio would be 1-to-1. And the mapping  
database would be orders of magnitude larger!

>  Map Server (MS)
>
>      Is a router on the ALT network and accepts secure messages from
>      one or more ETRs.  (Secret key pairs to secure these.)  ETRs
>      are (typically, or always?) the authoritative source of mapping
>      information in LISP.

Right. The ETRs are registering their EID-prefixes more so than the  
mapping. Just an FYI, if that wasn't clear. Map-Servers don't answer  
Map-Requests because they wouldn't be authoritative.

>      ETRs can be on any RLOC address and use ordinary packets to
>      communicate with the MS.

Yes, they send Map-Register messages from one of their local RLOCs.

>      My understanding is that the MS announces the appropriate
>      prefixes on the ALT network - one for every EID the ETR
>      tells it.

Right, but if the Map-Server is at an aggregation boundary, the  
specific EID-prefix won't be announced but the configured aggregate in  
the Map-Server would.

>      Ignoring MSes for a moment, I have never understood how this
>      would work with two ETRs in two separate ISPs handling the same

Multiple ETRs reside at the same site not in the SP network.

>      EID.  Both ETRs would be routers on the ALT network and would
>      announce the same prefix.  So where do packets go to?  I guess

Within their aggregation level, there are two paths for Map-Requests  
to travel to the site. It's the upstream BGP routers that decide which  
path to take. They would take shortest path based on AS-path hop- 
count. Recall that each LISP-ALT router is doing "eBGP".

>      to either.  But then the ETRs somehow need to coordinate
>      themselves, or be coordinated by something else, so they act
>      in a unified manner.  Then, as long as both were reachable and
>      working properly, it wouldn't matter which ETR got the query.

Right, but they don't need to coordinate. All they need is to be  
consistently configured to Map-Register the same EID-prefix.

>      The same problem seems to apply with MSes.  There would be two
>      ETRs in two separate ISPs and each would presumably (for
>      robustness in a multihoming situation and probably for security
>      reasons) have its own MS in its own ISP network.

No, not true.

>      So now we have two ETRs and two MSes which need to be
>      coordinated.  The two MSes both announce the one EID prefix
>      on the ALT network.  Yet they are supposed to still be
>      coordinated during outages.

The 2 Map-Servers will converge into a topology that will aggregate  
the site's Registered EID-prefix so we can have a smaller ALT core.  
Smaller meaning, a small number of EID-prefixes needing to be stored  
in the core of the ALT network.

>      However this is resolved, I think it is a big improvement for
>      LISP to have MSes, since it reduces the cost, complexity,
>      management effort etc. for ETRs similarly to how MRs do the
>      same for ITRs.
>
> Both these functions can presumably be performed quite adequately by
> software devices, such as a COTS server with suitable software.
> There doesn't have to be any hardware router FIB etc. AFAIK.

Yep, that is true.

> This would enable hardware routers to assume ITR and ETR
> responsibilities without them also needing all the software and
> configuration, stable address etc. to be an ALT router.  Also, by
> decreasing the total number of ALT routers, this simplifies the ALT
> network.

Yes, we thought so too.

> I gather from this new I-D, and from what I read in:
>
>   http://www.lisp4.net/docs/lisp-ausnog02.ppt
>
> that the current test network and the intention for the future is not
> to send traffic packets on the ALT network.  This approach was
> initially an option, with the intention that the ALT network would
> forward the initial packet(s) to the correct ETR, which would then
> forward it to the destination network, while also recognising it as a
> map request and so would send a map reply message to the ITR.

Right that is correct. The implementation support both sending Map- 
Requests and Data-Probes on the ALT network, but we default to Map- 
Requests and might possibly deprecate Data-Probes.

> I recall from somewhere that the ITR typically sends out a few
> mapping requests, just in case one of them is dropped.  When the ITR

Well no, we rate-limit Map-Requests but they are triggered when a  
source at the site sends data. However, we can play with this to see  
what works well.

> connects directly to the ALT network, these packets presumably
> usually traverse the entire global ALT network until they are
> delivered to one or more (probably just all to one) ETR which
> responds.  I guess the ETR sends multiple replies, but maybe not.
> The reply goes to the ITR via the ordinary Internet.

Map-Replies are rate-limited as well.

> Removing these potentially long and voluminous traffic packets from
> the ALT network seems like a good idea to me.  There may well be
> security benefits in doing so too.  Below, I assume the ALT network
> only carries mapping requests, and that the map replies go back from
> whatever answers them (an ETR connected to ALT network, or more
> likely a Map Server) via a direct ordinary Internet packet to the
> device which made the query (perhaps a directly connected ITR or more
> likely a Map Resolver).

Yes, this is true.

> A Caching Map Resolver?
>
> If the MR caches, then it has the potential to significantly reduce
> the traffic on the ALT network.  This is due to two or more ITRs in a
> given ISP network wanting the same mapping, and the second and
> subsequent ones getting it directly from the local caching MR.

Yes, this was Noel's idea with CONS. It is worth experimenting.

> This also has the potential to eliminate, for the second and
> subsequent ITRs which need this mapping, the major problem of
> "LISP-ALT's initial packet delays", so much debated on the RRG in
> recent months.

Well, I'm not so sure. If you point an ITR to an RLOC of a Map- 
Resolver, you take the shortest path to it. But if you had a GRE  
tunnel to the same box, the GRE tunnel destination would be the same  
RLOC. So the path would be the same. But you couldn't run an anycast  
Map-Resolver service because the eBGP connections that ran over the  
GRE tunnels would reset. So I guess this is an improvement.

> There is nothing in draft-fuller-lisp-ms-00 to describe this caching
> behavior.
>
> The caching time of map replies is specified in units of one minute:
>
>  draft-farinacci-lisp-12:
>
>    Record TTL:  The time in minutes the recipient of the Map-Reply
>                 will store the mapping.

That detail will come in a later draft.

> Let's say at time T = 0 minutes, ITR-A sends a map request to MR-1,
> which has no mapping for the EID prefix which matches the EID address
> in the request message.  MR-1 sends its own map request message (with
> its own nonce) onto the ALT network which forwards it to either the

Well, that's not the way it works. The ITR sends an encapsulated Map- 
Request to the Map-Resolver. The Map-Resolver strips the outer header  
and then forwards the Map-Request on the ALT. The source address is  
the ITR RLOC address and the destination address is the EID that  
caused the map-cache fault on the ITR.

> single Map Server which advertises the matching EID prefix on the ALT
> network, or to one of the multiple such Map Serves, or perhaps to the
> directly ALT-connected ETR(s) which do the same.

Correct.

> That device sends the mapping reply back to MR-1 directly via the
> Internet.  The reply is secured by returning MR-1's nonce.

No, it would go to the ITR because in the Map-Request payload there is  
an "ITR RLOC" field. This is quite important because if that Map- 
Request was an IPv6 Map-Request with an IPv6 outer header, and since  
the LISP-ALT network we have deployed is dual-stack, the IPv6 Map- 
Request is forwarded on the ALT, but the ETR may not (and probably  
not) have a IPv6 path back to the ITR. So if the "ITR RLOC" field is  
encoded with an IPv4 RLOC, the ETR sends a Map-Reply back with an IPv4  
header.

In the entire LISP design we treat IPv4 and IPv6 equally and try to  
enhance IPv6 connectivity by using IPv4 outer headers or IPv4 RLOCs  
when encapsulating.

Today, two IPv6-only sites can open an IPv6 TCP connection to each  
other if they run LISP and use IPv4 locators.

> Let's say the mapping reply comes back with a 90 minute caching time.
>
> MR-1 sends to ITR-A a map reply, with ITR-A's request's nonce, with
> the fresh mapping information and a caching time of 90 minutes.  Now
> MR-1 can encapsulate packets to its choice of ETRs, based on the
> fresh mapping it has received and whatever it has determined about
> reachability of those ETRs, and of the ETRs' ability to get packets
> to the destination network.

No, no, no. The Map-Resolver does not encapsulate any packets.  
Remember the ALT has no data going over it.

If the Map-Resolver is caching Map-Replies and the ITR sends a Map- 
Request with A=0, then the Map-Resolver can respond with a Map-Reply.  
If the ITR sends a Map-Request with A=1, the Map-Resolver must forward  
the Map-Request over the ALT so an authoritative Map-Reply can be  
returned by the ETR.

> Later, at T = 85 minutes, ITR-B sends a mapping request to MR-1 for
> an address which matches this same EID prefix.  MR-1 can use its
> cached information and send a reply within a few milliseconds.  This
> means ITR-B's traffic will not be delayed by any significant amount.
>
> What caching time will be in that reply to ITR-B?  I assume it will
> be 5 minutes.  If it would be 90 minutes, ITR-B could be running for
> a long time to come on stale mapping information.

We haven't figure that out yet. We don't want to create an impression  
that a cacher of a Map-Reply can use any TTL it wants. We want to make  
it mandatory to respect the ETR's value.

> Assuming ITR-A no longer needs this EID's mapping, but ITR-B keeps
> needing to tunnel packets addressed to this EID, then at T=90
> minutes, ITR-B will want mapping information again.
>
> Should ITR-B request the mapping again at at T = 88 minutes, in
> readiness for probably needing it in 1 minute's time?

It could, but the reasons to time out the map-cache entry is to keep  
the cache small and to be resilient, to some extent for locator-set  
changes at the ETR site.

> This would seem like a generally reasonable approach if it prompted
> MR-1 to get fresh mapping information, but why should MR-1 do this?
> Would MR-1 need to look at the original caching time and how much
> has expired to decide whether it should, by some algorithm, request
> fresh mapping?  But what if the mapping hadn't changed in the distant
> Map Server, but the ETR was going to change it two minutes later?

One of the problems I see with caching in the Map-Resolver is if the  
map-cache entry does have a locator-set change and the ETR asks all  
cachers to send Map-Requests (it does this by setting the SMR-bit for  
active flows), the Map-Resolvers cannot get updated because they are  
not seeing data.

However, I have a solution for this because, it will be the ITR that  
sends A=1 Map-Requests with an SMR-bit set. That can tell the Map- 
Resolver to ask for the Map-Reply back to update it cache. I know  
there are security issues with this but it's one way of doing it.

There are also details how a Map-Resolver asks to get the Map-Reply  
back. We want to do this in a stateless manner in the Map-Resolver. So  
we might have to preserve the ITR RLOC's address in the Map-Request  
but instruct the ETR where to send the Map-Reply. We have some ideas  
and what to think about it before changing packet formats.

> If ITR-B waited until T = 90 or a little later before requesting
> fresh mapping, then unless MR-1 had already got fresh mapping in the
> last minute or two, then there would presumably be a delay in the ITR
> being able to handle traffic for this EID, since it would take some
> time for MR-1's second mapping request to traverse the ALT network
> and generate a reply to MR-1.
>
> There are various scenarios, but I think there are potential
> difficulties with caching times running out in three locations now
> rather than one.

Yeah, we can't get too tricky about manipulating TTLs. DNS has bee  
fraught with problems due to TTL issues. If anyone has advice about  
this, it would be nice to hear about it.

> Previously, it was simple (despite the scaling problems of lots of
> ITRs peppering an ETR for mapping, not to mention them all trying to
> decide reachability for this and other ETRs):
>
>  ITR       query ----------->    ETR
>            <----------- reply
>  cache
>
>
> Now we have:
>
>  ITR   query ----->  Map      query ----->  Map
>        <----- reply  Resolver <----- reply  Server <--Register- ETR
>
>  Cache               Cache                  Cached, in a  sense,
>                                             controlled by messages
>                                             from the ETR(s) whenever
>                                             they can reach the
>
>                                             Map Server and decide
>                                             to send a Map Register
>                                             message.

Right, but what if we used for first case and have the ETR schedule an  
update to the Map-Resolver? Or what if the ITR updated the Map- 
Resolver? Not sure yet. And not even sure how much RTT will buy us  
with Map-Resolver caching.

> I think this raises more complex problems with:
>
>  1 - How to avoid cache times running out at ITRs
>      which are going to be tunneling packets addressed
>      to this EID after the cache time expires.
>
>      Such a situation will cause a traffic delay unless
>      the local Map Resolver has recently got fresh mapping.

But I think this is issue has continually been exaggerated. The Map- 
Request delay is not for a lot of packets and will be relatively rare  
I imagine.

>  2 - How to minimise unnecessary map requests by Map
>      Resolvers trying to anticipate ITRs making such
>      requests, but actually requesting fresh mapping from
>      the distant Map Server when the ITR doesn't need it.

Right, Map-Resolver caching could be more trouble than it is worth.

> Without further complications, the Map Resolver can't know whether
> the one or more ITRs which requested the mapping for an EID are still
> handling traffic for that EID.  So it can't very well request fresh
> mapping towards the end of its expiry, just in case an ITR wants it.
> To do so would approximately double the volume of map requests
> traversing the ALT network, since it is reasonable to assume, with
> longish caching times, that the original caching time will generally
> suffice for the needs of the one or more ITRs served by the Map
> Resolver.  (This would not be true with a busy Map Resolver and
> popular EIDs many ITRs are sending packets to.)
>
> Without some elaboration of the request protocol, ITR-B at T = 85
> minutes can't ask the Map Resolver to get fresh mapping and send it a
> new reply - unless there is some algorithm in the Map Resolver such
> as: "If the cached mapping is 90% of the way to its expiry time, do
> not answer the new request from the cache, but send a fresh map
> request and then answer the query if and when the new reply arrives."
>
> To do so would effectively shorten all the caching times.

Well, it the mappings don't change, longer TTLs will help the Map- 
Request load on the ALT. If there are frequent changes and you want  
fast convergence to them, then you use more resources.
That is the tradeoff.

> At present, there is only one kind of map request message from an ITR
> to a Map Server - implicitly an urgent request.
>
> If there was a second kind:
>
>   "This ITR has mapping for this EID which will expire in some
>    time period (specified) soon, and requests the Map Resolver
>    to get fresh mapping from the Map Server now, and to send
>    a reply once this arrives."

There is no reason why the ITR cannot send a Map-Request directly to  
the RLOC of the ETR. It does have a set of them he can try. And the  
nonce will protect against ETR spoof attacks.

> then I think these problems would be resolvable with less trouble and
> less need for choices based on limited information.
>
>  - Robin

Thanks again for your comments Robin,
Dino