Re: [rrg] [lisp] LISP Map Server I-D & updated draft-farinacci-lisp

Re: [rrg] [lisp] LISP Map Server I-D & updated draft-farinacci-lisp - 2 stages of caching mapping

Robin Whittle <rw@firstpr.com.au> Mon, 09 March 2009 08:41 UTC
Message-ID: <49B4D641.9090803@firstpr.com.au>
Date: Mon, 09 Mar 2009 19:41:37 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.19 (Windows/20081209)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
References: <49AEBC00.9070306@firstpr.com.au> <673472F6-3BEB-4DD0-A1F6-66AA9E90EE41@cisco.com>
In-Reply-To: <673472F6-3BEB-4DD0-A1F6-66AA9E90EE41@cisco.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Cc: lisp@ietf.org
Subject: Re: [rrg] [lisp] LISP Map Server I-D & updated draft-farinacci-lisp - 2 stages of caching mapping
Precedence: list
Warning: Long message ahead!

               Suggested changed to the I-D to make it clearer that
               the Map-Server does not cache.

               Map-Servers needing to make GRE tunnels to a large
               number (hundreds, thousands) of level 1 ALT
               aggregation routers - and likewise those routers
               needing to make GRE tunnels to large numbers of
               Map-Servers.

               Two Map-Servers for an end-user network's two ETRs
               will generally link directly (via a GRE tunnel) to
               the same level 1 ALT aggregation router.  There,
               the router will (I guess) send packets for this
               network's EID prefix only to one of these, since they
               both have the same AS hop count of 1.  Which will
               depend on the AS number of the ISP.

               How can an ALT router, such as this busy level 1
               aggregation router, detect the failure or
               unreachability of a Map-Server?  I guess it would
               find out when it tries to send a Map-Request message
               to the Map-Server via the GRE tunnel.  But is there
               a keepalive arrangement so the level 1 router could
               detect the disappearance of this Map-Server before
               it sends a Map-Request?

               More on potential problems with caching Map-Resolvers.

Hi Dino,

Thanks for your reply.  Before I respond to what you wrote, I want to
point out some ways I think the I-D:

  http://tools.ietf.org/html/draft-fuller-lisp-ms-00

could be improved, assuming that Map Servers never do any caching,
which is what I understand from your reply:

> The ETRs are registering their EID-prefixes more so than the
> mapping. Just an FYI, if that wasn't clear. Map-Servers don't
> answer Map-Requests because they wouldn't be authoritative.


    2 Introduction

    There are two types of operation for a LISP Map-Server: as a
    Map-Resolver, which accepts Map-Requests from an ITR and
    "resolves" the EID-to-RLOC mapping using the distributed mapping
*   database, and as a Map-Server, which learns authoritative
*   EID-to-RLOC mappings from an ETR and publish them in the
    database.  A single device may implement one or both types of
    operation.

*   Conceptually, LISP Map-Servers share some of the same basic
*   configuration and maintenance properties as Domain Name System
*   (DNS) [RFC1035] servers and caching resolvers.  With this in
*   mind, this specification borrows familiar terminology (resolver
*   and server) from the DNS specifications.

This text:

  Map-Server, which learns authoritative EID-to-RLOC mappings from an
  ETR and publish them in the database.

made me think from the start that Map-Servers are told by the ETR
what the current mapping is for the ETR's EIDs, and then, in an
abstract way "publish" them globally.  In the case of ALT, this meant
to my way of thinking, to respond to queries and to answer those
queries based on the saved (cached) mapping information provided from
time-to-time by ETRs.

Then the next section indicating that Map-Servers are analogous to
DNS servers, at least in some ways, reinforced my misunderstanding.

Again here, it is quite explicit that the Map-Servers store
("learns") the mapping:

   3.  Definition of Terms

      Map-Server:   a network infrastructure component which learns
         EID-to-RLOC mapping entries from an authoratative source
         (typically, an ETR, though static configuration or another
         out-of-band mechanism may be used).  A Map-Server publishes
         these mappings in the distributed mapping database.


However, I realise the above could be interpreted in a different way.

For instance, instead of the ETR telling the Map-Server the mapping
for some EID, it simply tells it that the ETR is an authoritative
query server for mapping requests concerning this EID.

"Publishing" then, at least in the case of ALT, means that the ALT
network part of the Map-Server "announces" the EID on the ALT
network.  Then, queries come to the Map-Server and it passes them on
to the ETR, as you described in your reply.

Actually, with ALT, the Map-Server doesn't simply "announce" the EID
over its existing BGP links with other ALT routers (including
potentially other Map-Servers.  As far as I understand ALT, in a
fully operational deployment with 10M EIDs etc. due to the "highly
aggregated" nature of the ALT network, it doesn't at all resemble a
hodge-podge of geographically close connections to other ALT routers.

I think the Map-Server would need to establish GRE tunnels to many
first level ALT routers, which could be all over the world.  I will
write some more about this in a separate message on the scaling of
the ALT network.

If I had read this definition more carefully, I would have figured
out that Map-Servers don't cache mapping information or answer
mapping queries:

   Map-Register message:   a LISP message sent by an ETR to a
      Map-Server to register its associated EID prefixes.  In
      addition to the set of EID prefixes to register, the message
      includes one or more RLOCs to be be used by the Map-Server when
      forwarding Map-Requests (re-formatted as Encapsulated
      Map-Requests) received through the database mapping system.


However . . .  when I looked at the Map-Register message, and when I
look at it now:

   http://tools.ietf.org/html/draft-farinacci-lisp-12#section-6.1.6

(BTW, "randomly selected UDP port number." should mention "source".)

the message is identical to the Map-Reply message, except that the
type field is set to 3 instead of 2.  The descriptive text is simply:

   The definition of each field of the Map-Register can be found in
   the Map-Reply section.

So I think pretty much anyone reading this would think that in the
Map-Register message, the ETR is telling the Map-Server the complete
mapping for an EID, including Locator Reach Bits, TTL (caching time),
potentially multiple RLOCs each with Priority and Weight etc.

In a recent message:

   http://www.ietf.org/mail-archive/web/rrg/current/msg04584.html

you wrote:

    The registration service is used so an ETRs at a site can tell
    the database mapping system that they are available to answer
    Map-Requests. They are not really registering mappings. They are
    registering the EID-prefixes they are authoriatively going to
    answer for. The list of RLOCs in a Map-Register message don't
    have to be all the RLOCs used to encapsulate data to the site.
    It's just a list of RLOCs willing to answer Map-Requests for the
    site.

It would be good to add a description like this to the I-D.

I would have thought that a Map-Register message from an ETR would
only register itself and would only register it in respect of that
one Map-Server it was sent to.

If you have two Map-Servers in two ISPs, and two ETRs in the end-user
network, then - assuming each ETR has a single RLOC address from one
ISP, it doesn't make sense to tell the Map-Server of one ISP that it
should preferentially forward Map-Requests to a different ETR than
the one on this ISP's RLOC address, for instance the ETR which is on
the other ISP's RLOC address.  Such forwarded messages would be sent
out of the first ISP, to the second ISP and than through the PE-CE
link to the end user network.


Para 1 and 2 of the Basic Overview gives a better impression of the
Map-Server forwarding Map-Request messages to the ETR.  However, I
think you could make it more explicit by stating something like this,
maybe after para 1:

   Map-Servers do not answer Map-Queries or store mapping
   information.  They receive Map-Queries from the distributed
   mapping database system and forward these to the ETR which
   registered itself as being an authoritative query server for
   the EID which matches the address in the Map-Query.  The ETR
   then sends a Map-Reply directly to the RLOC address contained in
   the Map-Query message.


I think the use of "publish" may not be ideal - maybe "announce"
would be better.

I think this text could be improved, because for someone who wrongly
assumes that the Map-Server caches the full mapping of an EID and
answers queries, they could find their impression largely confirmed by:

    (5.2)

    An ETR which uses a Map-Server to publish its EID-to-RLOC
    mappings does not need to participate further in the mapping
    database protocol(s).

It would be easy to look at this and think, wrongly, that the ETR
tells the Map-Server the full mapping of its EIDs, and then doesn't
need to do anything more.  In fact, it does need to accept queries
and send responses, which are arguably both instances of
"participating in the mapping database protocols".

5.3 is quite specific about the Map-Server forwarding requests to the
appropriate ETR.  I must have not clearly understood this when I
first read it.


A question I didn't raise in my initial message is how the Map-Server
can know, quickly and reliably, that an ETR which has registered
itself is no longer available.

Firstly, how can the ETR tell the Map-Server that an EID it
previously registered is no longer one it handles?

Secondly, if the ETR dies, or becomes unreachable to the Map-Server,
how does the Map-Server detect this and what decision making
algorithms does it employ before withdrawing the advertisement of
this ETR's registered EIDs from the ALT network?


One thing which confused me on my second reading was:

    5.4. Map-Resolver Processing

       In response to an Encapsulated Map-Request, a Map-Resolver de-
       capsulates the message then checks its local database of
  ?    mapping entries (statically configured, cached, or learned
  ?    from associated ETRs).  If it finds a matching entry, it
       returns a non-authoratative LISP Map-Reply with the known
       mapping.

I understand that if it is a caching Map-Resolver, it may have the
requested mapping in its cache, but I don't understand the other two
mechanisms:

  Static configuration

     This would mean the Map-Resolver was somehow configured to
     return mapping replies for some EID - yet it is not an ETR
     and how could this make sense, since all mapping is supposed
     to come from an ETR?

  learned from associated ETRs

     I don't understand what this could be, since ETRs have no
     links to the Map-Resolver, and since you haven't defined
     links between a Map-Resolver and any nearby or integrated
     Map-Server.


You wrote:

> Thanks for your comments Robin.
> 
> The main point of draft-fuller-lisp-ms-00.txt is to create a API of
> sorts for LISP sites. So they can use a set of primitives regardless of
> the mapping database system deployed.

OK.


> By doing this, the cost of managing an xTR goes way down. No GRE
> tunnels, no BGP. Simply Map-Request, Map-Reply, and Map-Register
> primitives.

Yes - it greatly reduces the need for stability and configuration for
each ITR and ETR, since the ALT network is going to need to be
carefully managed.  Now only the Map-Resolvers and Map-Servers need
to be managed so carefully to be a part of the ALT network.

However, the original ALT network (without Map-Resolvers and
Map-Servers):

   ITRs
   ALT routers
   ETRs

was pretty simple in terms of the number of network elements.  The
ITRs and ETRs also had to be ALT routers, but since I think you can
cobble together an ALT router from existing functional blocks in a
suitably flexible software- or hardware-based router, this still
means you have just 3 types of device in your entire core-edge
separation system.  That is enviable simplicity!

Now you have five:

  ITRs
  Map Resolvers
  ALT routers
  Map Servers
  ETRs

types of device which is not so elegant.   Still, I think any well
designed core-edge separation scheme is going to have quite a few
functional elements.

Ivip has:

  ITRs  (all caching)
 (Optional caching query servers - QSCs)
  Full database query servers - QSDs
  ETRs
  Replicators for the fast push of mapping to the QSDs

Also, there is a single system of Launch servers which drive the
Replicator system, and the RUAS organisations which own and run the
Launch servers.  Reachability probing is external to Ivip, so there
would be other systems and perhaps standards for that too.


Your ITRs no longer need ALT router functions, but your Map-
Resolvers and Map-Servers do.  Still, I think it is good, since you
need a lot of ITRs and ETRs, and now they can be simpler and talk
happily to a smaller number of Map-Resolvers and Map-Servers without
needing to be on any stable address and without needing to be known
to the ALT network.

When I wrote the message you replied to, I wrongly assumed that
LISP's ETRs were at the ISPs.  Now I know that ETRs, and I guess most
ITRs are at the end-user networks.  (Proxy Tunnel Routers - PTRs -
are not in end-user networks.  It is not clear who would run them,
but they would advertise prefixes containing lots of EIDs in the DFZ,
attracting packets addressed to those EID prefixes.  They would send
their mapping queries to nearby Map-Resolvers, I guess.)


>> My understanding of and comments on this are:
>>
>> Instead of ITRs and ETRs needing to act as routers in the ALT
>> network, they communicate via the ordinary Internet with Map Servers,
>> which are routers on the ALT network.  This will greatly reduce the
>> complexity and configuration difficulties of ITR and ETRs.
> 
> Yes, that is right. ITRs send encapsulated Map-Requests to Map-Resolvers
> via the Map-Resolver's RLOC address. 

OK.

> ETRs get encapsulated Map-Requests from Map-Servers via the ETR's
> RLOC address only after the ETR Map-Registers to the Map-Server.

Oh - I had misunderstood this.  Now I realise the Map-Server is a
connection system to forward mapping requests from the ALT network to
particular ETRs, which have previously securely registered themselves
with the Map-Server.  I understand the ETR sends the map reply
straight back to the Map-Resolver via the Internet.


>> These Map Server devices are implicitly local to the ITRs and ETRs in
>> a given network and are intended to be used only by those ITRs and
>> ETRs.  They are always on RLOC (stable, globally reachable, non
>> LISP-mapped) addresses.
> 
> Don't know what you mean by local. But if you meant the Map-Server is
> colocated with ITRs, that is not true. The Map-Server would typically
> not be at the site but in the Internet infrastructure somewhere. Most
> likely in an service provider, an interconnect provider, a RIR, or a
> third-party.

OK - I was assuming the ITRs and ETRs were at ISPs, which is where I
assumed the Map-Resolvers and Map-Servers are.

I am trying to picture this.  I assume a multihomed end-user network
with a bunch of ITRs and two upstream ISPs.  I assume the Map-
Resolvers and Map-Servers are at ISP-A and ISP-B.

Each ETR and ITR in the end-user network needs to be on an RLOC
address.  For simplicity I will assume there are two ETRs and two
ITRs:  one ETR and ITR for the single IP address of RLOC address
which each ISP gives to the network.

Now, each ETR has an RLOC address for receiving encapsulated packets,
from one ISP only.

Assuming two separate ETRs A and B, then ETR-A is going to be on
RLOC-IP-A and will register with ISP-A's Map-Server.  The same
applies to the other ETR-B, which is on RLOC-IP-B and registers
itself with ISP-B's Map-Server.  The Map-Replies from each ETR will
go out their respective links to their "own" ISP.  The reply can't
very well go out the other ISP's link because the source address
wouldn't match.  (Alternatively, I guess both ETRs could send from
both addresses: RLOC-ID-A and RLOC-ID-B, but when would an ETR
receive a Map-Request from one ISP and send the reply out on the
other ISP's link?  Maybe it could, for outgoing load sharing or
similar reasons.)

This sounds OK.  If ISP-B dies, the link to ISP-B dies or ETR-B dies,
then I guess the ITRs in the networks of sending hosts will figure
this out by some means and instead tunnel traffic packets to ETR-A on
RLOC-IP-A.

So in my understanding, each ETR is dedicated to the RLOC address and
Map Server of its "own" ISP.

ITRs would need to follow a similar pattern, at least in terms of
map requests.  Your I-D indicates that the ITR is configured with
a single address of the Map Resolver it queries.  This may be an
anycast address in the ISP's network, which sounds good to me.

I think each ITR is going to be similarly dedicated to its own ISP
and the RLOC-IP address that ISP provides:

  End-user         RLOC       Physical   ISP
  network          addresses  link

                                  ISP-A   ALT router ..\
                                           /            \
       ITR-A------ RLOC-IP-A ------- Map-Resolver-A....BR-A==>} DFZ
                                                              }
                                                              }
                                  ISP-B   ALT router ..\      }
                                           /            \     }
       ITR-B------ RLOC-IP-B ------- Map-Resolver-A....BR-A==>} DFZ


I guess this is OK.  You would need to ensure that within the
end-user network, outgoing traffic to EID addresses was internally
shared between both two ITRs in some way, as long as both links were
up, to spread the outgoing load.  Also, once one link dies, the
corresponding ITR has to stop accepting outgoing packets and these
need to go to the other ITR.

I guess in many implementations, there will only really be one CE
router, with links to both ISPs, and implementing the ETR-A and ETR-B
functions all within itself.  Likewise probably the ITR-A and ITR-B
functions.


>> There are two functions which may be combined in the one device:
>>
>>   Map Resolver (MR)
>>
>>      Accepts a mapping query from an ITR and (usually) sends the
>>      ITR a mapping reply.   (The exception is if the MR doesn't
>>      have the information and sends the query verbatim to some
>>      other device, which will answer the query directly to the
>>      ITR.)
> 
> Right, we want to experiment with Map-Resolver caching but want to do
> that as a second phase in the implementation. 

OK.

> So the Map-Resolver gets
> the Map-Request from the ITR which now puts it on the LISP-ALT network.
> If there is another mapping database service, it could be used.
> 
> This way we can make the mapping database service modular and don't need
> the sites to participate in it directly.

Yes.

>>      MRs can be caching or non-caching.  More on that below.
>>
>>      ITRs are intended to be configured with a single address for
>>      their local MR.  This would raise questions of robustness if
>>      not for the next item:
>>
>>      Multiple MRs in a local network (such as an ISP network or
>>      I guess any end-user network which has ITRs) can be configured
>>      on the one anycast address.  This way, the ITR's request will
>>      be forwarded to the nearest currently active MR.  All
>>      communication is via single packets, not via TCP.  Presumably
>>      the MRs will also have their own unique addresses so they can
>>      be managed via TCP.
> 
> Right.
> 
>>      I think the MR is an important improvement to LISP-ALT, since
> 
> It's not an improvement to the LISP-ALT mapping database, but a
> Map-Resolver can be a LISP-ALT router/system, a NERD system, or a
> CONS/DHT system.

OK.  It seems that NERD has been pretty much forgotten for a while
now, except by Eliot, but now you are mentioning it as mapping
distribution system worth considering.

NERD, to date:

  http://tools.ietf.org/html/draft-lear-lisp-nerd-04

is defined by every ITR-getting the full mapping database, via slow
push - actually, ITR initiated downloads of sections of the mapping
database and/or updates to the mapping of various sections of the EID
address space.

But by suggesting a combination of NERD and Map Resolvers, you are
now proposing a version of LISP very different from NERD and very
different from any other form of LISP.

You are proposing a local full-database query system, as used by APT
and Ivip.

APT and LISP-NERD-MR use slow push of the entire mapping database.

Ivip uses fast (a few seconds) push of the entire mapping database,
with instant cache updates from the full database query server to any
ITR which recently (within the caching time) was sent the mapping fro
this micronet (EID) in a map reply.

I think CONS has been on the back burner for a while.  I never did
understand how it worked.  Likewise Distributed Hash Tables.


BTW, your reference in draft-farinacci-lisp-12 to the DHT I-D doesn't
work:  draft-mathy-lisp-dht-00  is not in the IETF system. Google
finds a copy at:

http://inl.info.ucl.ac.be/publications/lisp-dht-towards-dht-map-identifiers-locators

but this is from 2008-02 and as far as I know, it has not been in the
IETF system.


I think ALT looks like a better idea - but like CONS it is still a
global query server network and so I think it is going to be fragile
and slow compared to having a full database local query server.


>>      it enables an ITR to be a much more casual and unstable concept
>>      than was the case when all ITRs needed to participate in the
>>      ALT network as routers (AFAIK).  This means that ITRs can be
>>      added easily, without having to configure anything.
> 
> True.
> 
>>      It also means (though this is my suggestion, not from the LISP
>>      team) that an ITR function could easily be implemented in a
>>      sending host, assuming it was not behind NAT.  I guess the
>>      sending host would need to be on an RLOC address - which rules
>>      out this idea for sending hosts in end-user networks.  Ivip's
>>      ITR in sending host function (ITFH) requires the host to be
>>      on a non-NAT address which can be and ordinary or a Scalable PI
>>      address - RLOC or EID in LISP parlance.
> 
> True, however, it would increase the number of locators for a site. That
> is the EID to RLOC ratio would be 1-to-1. And the mapping database would
> be orders of magnitude larger!

Yes.  As currently defined, your ITRs must be on non-mapped (not EID)
ordinary BGP-routed "RLOC" addresses.  So you can't have ITRs in the
sending hosts of hosts in end-user networks, because the whole idea
of LISP or any other core-edge separation architecture is to have all
end-user hosts on EID space.  So your ITRs need to be special devices
either in the end-user network, or in the ISP, with a direct link to
their particular ISP link, on the one or perhaps more RLOC addresses
that ISP gives them.

Because you anticipate ITRs and ETRs communicating somewhat - such as
an ETR receiving a Solicit-Map-Request message which it needs to pass
on to the ITR, which will request fresh mapping - I guess this means
the ITRs need to be in the end-user network too.


Probably, if a big end-user site such as a university has 10 ITRs for
ISP-A, then ISP-A needs to give the network at least 10 separate RLOC
addresses, one for each ITR.


>>  Map Server (MS)
>>
>>      Is a router on the ALT network and accepts secure messages from
>>      one or more ETRs.  (Secret key pairs to secure these.)  ETRs
>>      are (typically, or always?) the authoritative source of mapping
>>      information in LISP.
> 
> Right. The ETRs are registering their EID-prefixes more so than the
> mapping. Just an FYI, if that wasn't clear. Map-Servers don't answer
> Map-Requests because they wouldn't be authoritative.
> 
>>      ETRs can be on any RLOC address and use ordinary packets to
>>      communicate with the MS.
> 
> Yes, they send Map-Register messages from one of their local RLOCs.
> 
>>      My understanding is that the MS announces the appropriate
>>      prefixes on the ALT network - one for every EID the ETR
>>      tells it.
> 
> Right, but if the Map-Server is at an aggregation boundary, the specific
> EID-prefix won't be announced but the configured aggregate in the
> Map-Server would.

OK - but I don't understand how the ALT-router part of a Map-Server
is going to be part of the highly aggregated ALT network.  To be
highly aggregated, you need a strict, upside-down tree-like splitting
of the address space over more and more routers as you get to lower
and lower levels.  At some level 1, you have ALT routers which handle
all the packets for some pretty small subset of the entire space.
They only connect upwards to the level 2 routers, each of which
aggregates the space of, for instance, 16 or 64 or whatever of the
level 1 routers.  I will write more on this in another thread.


>>      Ignoring MSes for a moment, I have never understood how this
>>      would work with two ETRs in two separate ISPs handling the same
>>      EID.
>
> Multiple ETRs reside at the same site not in the SP network.

OK.  I understand this now.


>>      Both ETRs would be routers on the ALT network and would
>>      announce the same prefix.  So where do packets go to?  I guess
>>      to either.  
> 
> Within their aggregation level, there are two paths for Map-Requests to
> travel to the site. It's the upstream BGP routers that decide which path
> to take. They would take shortest path based on AS-path hop-count.
> Recall that each LISP-ALT router is doing "eBGP".

OK.  ISP-A and ISP-B both have Map-Servers - MS-A and MS-B.

The end-user network has two ETRs: ETR-A and ETR-B.  ETR-A registers
with MS-A and ETR-B registers with MS-B - as they would need to,
according to the RLOC address each ETR gets from one ISP or the other.

Now, for simplicity, assuming the end-user network had an EID prefix
55.44.33.00/24, both ISPs MSes need to advertise this same prefix on
the ALT network.

I am trying to imagine the ALT network topology.  By your
specification it is "highly aggregated".

Therefore, it does not replicate the pattern of Internet routers -
physically adjacent (geographically, but bridged with fibre links, as
well as being directly connected in data centres) routers having
links between themselves in a pretty random-looking arrangement, with
the connections bearing no relation to the addresses which the
routers advertise in the DFZ.

With the ALT network, the connections between routers can be of
arbitrary geographic length via GRE tunnels - including having a
neighbour anywhere in the world, involving a tunnel which physically
travels over a dozen ASes and twice as many routers.

However, due to the highly aggregated nature of the ALT network
(which is essential to ensuring the shortest number of ALT routers
between the ITR and the ETR, so Map-Requests get to the ETR ASAP) you
don't just have ALT routers setting up GRE tunnels to a handful of
other ALT routers in nearby ISPs.

Since end-user EIDs are portable and can be used anywhere in the
world, you can't assume any efficiency gains in the ALT network based
on assumptions that EIDs of a certain address range are all going to
be used in in any one geographic area.

Since each ISP's Map-Servers (I guess they would have one, a few or a
dozen or so) are all handling a wide, essentially random, assortment
of EIDs, I think there would be little or no value in them having GRE
tunnels to neighbouring ISP's Map-Servers.  If there was, this would
surely not be sufficient connectivity to give the shortest path for
Map-Request message to reach the Map-Server.  Some might come from
nearby ISPs, but others would come only from whatever high level of
ALT hierarchy which was fully meshed.  So these messages would need
to come via an ALT router at level 1 of the hierarchy.  This means
the one Map-Server will need to make GRE tunnels to a large number of
these level 1 ALT routers, which would be distributed physically all
over the Net.

For your ALT network to be highly aggregated, somewhere there needs
to be one ALT router which handles, for instance, 55.44.16.00/12.

As far as I know, the LISP team has not explained how the ALT network
can be both highly aggregated and robust against single points of
failure - for a realistic large-scale deployment handling 10 million
physically scattered EID prefixes, 100 million, a billion etc.

I am sure what you wrote is correct.  But if there was a single
aggregating router for 55.44.16.00/12 - which is how I understand the
network would be if it is to be highly aggregated - then that router
will have GRE tunnels directly from MR-A and MR-B.

The AS hop-count is going to be over the ALT network, ignoring
physical DFZ routers which carry the tunnel packets.

I would expect both the MS-A and MS-B to have the same AS hop
count in the level 1 ALT router which handles 55.44.16.00/12, since
they are direct neighbours of this level 1 aggregation router.

Then, from what I recall about BGP, all the packets from that level 1
aggregation router would be sent to the MS-A ALT router or the MS-B
ALT router according to which ISP has the lowest AS number.

So in this example, one of the Map-Servers would get all the queries.

That isn't necessarily bad.  If MS-A is getting all the queries and
it dies, then pretty quickly the level 1 aggregation ALT router will
sense this and its BGP implementation will direct queries to the GRE
tunnel which leads to MS-B.

So in this failure example, I think the ability of the ALT network to
continue responding to mapping requests moment-to-moment depends on
the BGP implementation of this level 1 router.

I know little about GRE.  How does an ALT router at one end of a GRE
tunnel find out quickly if the ALT router at the other end is dead or
unreachable?

The only way I can imagine this is with regular keep-alive packets
going each way.

But then, you could have a thousand GRE tunnels per ALT router, such
as from the level 1 router to the Map-Servers of 500 end-user
networks whose EIDs match this router's aggregation range.  I guess
this could be rather traffic-intensive and CPU-intensive to maintain.

Does the level 1 aggregation router need to do full BGP to each of
these Map-Servers?  If the Map-Servers are not cross-linked to other
ALT routers, but only receive packets and send them to ETRs, then I
guess each Map-Server can have a single-homed link to the ALT
network.  Then, the BGP activity per neighbour (each of 1000 GRE
tunnels) would be pretty minimal for the ALT router, and likewise
minimal for each Map-Server.

But then, how can the system be robust with a single level 1 ALT
aggregation router?


>>      But then the ETRs somehow need to coordinate
>>      themselves, or be coordinated by something else, so they act
>>      in a unified manner.  Then, as long as both were reachable and
>>      working properly, it wouldn't matter which ETR got the query.
> 
> Right, but they don't need to coordinate. All they need is to be
> consistently configured to Map-Register the same EID-prefix.

As I understand LISP, the ETRs for a given end-user network
definitely need to be coordinated in some way, since they need to
send out the same mapping replies.  Also, with locator reachability
bits (or the versioning alternative) they need to send out consistent
messages to ITRs in this regard too.

However, now I understand the ETRs are owned by and located in the
destination network, I see it is no problem to coordinate them.


>>      The same problem seems to apply with MSes.  There would be two
>>      ETRs in two separate ISPs and each would presumably (for
>>      robustness in a multihoming situation and probably for security
>>      reasons) have its own MS in its own ISP network.
> 
> No, not true.

OK - the Map-Servers simply pass on queries to the ETR which
registers with them for that EID range.  So they don't cache mapping
information or answer queries themselves.  Therefore they don't need
to be coordinated, except to the extent already provided for by the
ETRs securely registering themselves and each MS then announcing that
EID prefix on the ALT network.

MS-A and MS-B can do this fine without communicating with each
other or even knowing about each other.


>>      So now we have two ETRs and two MSes which need to be
>>      coordinated.  The two MSes both announce the one EID prefix
>>      on the ALT network.  Yet they are supposed to still be
>>      coordinated during outages.
> 
> The 2 Map-Servers will converge into a topology that will aggregate the
> site's Registered EID-prefix so we can have a smaller ALT core. Smaller
> meaning, a small number of EID-prefixes needing to be stored in the core
> of the ALT network.

This is easy in a test network, or with a few tens of thousands of
end-user networks.  However you haven't described how it would work
for the full-scale deployment with 100 million end-user networks.


>>      However this is resolved, I think it is a big improvement for
>>      LISP to have MSes, since it reduces the cost, complexity,
>>      management effort etc. for ETRs similarly to how MRs do the
>>      same for ITRs.
>>
>> Both these functions can presumably be performed quite adequately by
>> software devices, such as a COTS server with suitable software.
>> There doesn't have to be any hardware router FIB etc. AFAIK.
> 
> Yep, that is true.

OK.


>> This would enable hardware routers to assume ITR and ETR
>> responsibilities without them also needing all the software and
>> configuration, stable address etc. to be an ALT router.  Also, by
>> decreasing the total number of ALT routers, this simplifies the ALT
>> network.
> 
> Yes, we thought so too.

OK.

>> I gather from this new I-D, and from what I read in:
>>
>>   http://www.lisp4.net/docs/lisp-ausnog02.ppt
>>
>> that the current test network and the intention for the future is not
>> to send traffic packets on the ALT network.  This approach was
>> initially an option, with the intention that the ALT network would
>> forward the initial packet(s) to the correct ETR, which would then
>> forward it to the destination network, while also recognising it as a
>> map request and so would send a map reply message to the ITR.
> 
> Right that is correct. The implementation support both sending
> Map-Requests and Data-Probes on the ALT network, but we default to
> Map-Requests and might possibly deprecate Data-Probes.

OK.


>> I recall from somewhere that the ITR typically sends out a few
>> mapping requests, just in case one of them is dropped.  
> 
> Well no, we rate-limit Map-Requests but they are triggered when a source
> at the site sends data. However, we can play with this to see what works
> well.

OK.

>> When the ITR
>> connects directly to the ALT network, these packets presumably
>> usually traverse the entire global ALT network until they are
>> delivered to one or more (probably just all to one) ETR which
>> responds.  I guess the ETR sends multiple replies, but maybe not.
>> The reply goes to the ITR via the ordinary Internet.
> 
> Map-Replies are rate-limited as well.

OK.

>> Removing these potentially long and voluminous traffic packets from
>> the ALT network seems like a good idea to me.  There may well be
>> security benefits in doing so too.  Below, I assume the ALT network
>> only carries mapping requests, and that the map replies go back from
>> whatever answers them (an ETR connected to ALT network, or more
>> likely a Map Server) via a direct ordinary Internet packet to the
>> device which made the query (perhaps a directly connected ITR or more
>> likely a Map Resolver).
> 
> Yes, this is true.

OK.


>> A Caching Map Resolver?
>>
>> If the MR caches, then it has the potential to significantly reduce
>> the traffic on the ALT network.  This is due to two or more ITRs in a
>> given ISP network wanting the same mapping, and the second and
>> subsequent ones getting it directly from the local caching MR.
> 
> Yes, this was Noel's idea with CONS. It is worth experimenting.

OK - and now with ALT.


>> This also has the potential to eliminate, for the second and
>> subsequent ITRs which need this mapping, the major problem of
>> "LISP-ALT's initial packet delays", so much debated on the RRG in
>> recent months.
> 
> Well, I'm not so sure. If you point an ITR to an RLOC of a Map-Resolver,
> you take the shortest path to it. But if you had a GRE tunnel to the
> same box, the GRE tunnel destination would be the same RLOC. So the path
> would be the same. But you couldn't run an anycast Map-Resolver service
> because the eBGP connections that ran over the GRE tunnels would reset.
> So I guess this is an improvement.

What you wrote seems to me to be about something different to my
intended meaning.

I meant that if there are two sending hosts in some end-user network,
or even in any of the end-user networks whose ITRs are using a single
Map-Resolver at some ISP, then one host requests mapping for EID
prefix NNN and after a second or two or whatever, hopefully less (the
"long path" problem) the Map Resolver gets the mapping response and
nearly instantly sends a mapping response with the same information
to the ITR.

Now, assuming the Map-Resolver caches this mapping, some
other ITR requests from it mapping for the same EID.  Now the Map
Resolver doesn't need to generate a Map-Request and wait for it to
traverse the ALT network.  It has the cached mapping and sends the
reply back to the second ITR within a few milliseconds.

This second ITR therefore has no significant delay in getting the
mapping.  Ideally - and I don't know whether your ITRs are meant to
do this - that second ITR would buffer the first traffic packet,
rather than drop it (as I understand your ITRs do at present) and
then tunnel it to the ETR within the few milliseconds it takes to get
the mapping reply from the Map Resolver.

So for the second ITR, there would be no significant delay in traffic
packets at all - and no dropped traffic packets in this instance at
least.

(Of course, if your LISP network Map-Resolvers all used NERD rather
than ALT or CONS - joining the local full-database query server
throng with APT and Ivip - all your ITRs would be be configured to
buffering their initial traffic packets for 100ms or so, awaiting the
Map-Reply.  Unless something goes wrong, this would involve no
significant initial packet delays whatsoever, the few milliseconds it
takes to get the mapping from the nearby full database query server =
Map-Resolver is not, I think, significant.)


>> There is nothing in draft-fuller-lisp-ms-00 to describe this caching
>> behavior.
>>
>> The caching time of map replies is specified in units of one minute:
>>
>>  draft-farinacci-lisp-12:
>>
>>    Record TTL:  The time in minutes the recipient of the Map-Reply
>>                 will store the mapping.
> 
> That detail will come in a later draft.

OK.


>> Let's say at time T = 0 minutes, ITR-A sends a map request to MR-1,
>> which has no mapping for the EID prefix which matches the EID address
>> in the request message.  MR-1 sends its own map request message (with
>> its own nonce) onto the ALT network which forwards it to either the
> 
> Well, that's not the way it works. The ITR sends an encapsulated
> Map-Request to the Map-Resolver. The Map-Resolver strips the outer
> header and then forwards the Map-Request on the ALT. The source address
> is the ITR RLOC address and the destination address is the EID that
> caused the map-cache fault on the ITR.

OK - this is a Map-Resolver without caching.  As long as you have no
caching, then the Map-Resolver doesn't contribute to the resolution
of the "long path" problem.

  LISP-ALT's long path problem yet again   2008-12-24
  http://www.ietf.org/mail-archive/web/rrg/current/msg04097.html


>> single Map Server which advertises the matching EID prefix on the ALT
>> network, or to one of the multiple such Map Serves, or perhaps to the
>> directly ALT-connected ETR(s) which do the same.
> 
> Correct.

OK.

>> That device sends the mapping reply back to MR-1 directly via the
>> Internet.  The reply is secured by returning MR-1's nonce.
> 
> No, it would go to the ITR because in the Map-Request payload there is
> an "ITR RLOC" field. This is quite important because if that Map-Request
> was an IPv6 Map-Request with an IPv6 outer header, and since the
> LISP-ALT network we have deployed is dual-stack, the IPv6 Map-Request is
> forwarded on the ALT, but the ETR may not (and probably not) have a IPv6
> path back to the ITR. So if the "ITR RLOC" field is encoded with an IPv4
> RLOC, the ETR sends a Map-Reply back with an IPv4 header.
> 
> In the entire LISP design we treat IPv4 and IPv6 equally and try to
> enhance IPv6 connectivity by using IPv4 outer headers or IPv4 RLOCs when
> encapsulating.
> 
> Today, two IPv6-only sites can open an IPv6 TCP connection to each other
> if they run LISP and use IPv4 locators.

My discussion assumes a caching role for the Map-Resolver.  Without
that, its contribution is mainly to make it easier to get a lot of
ITRs working without each one being a part of the ALT network.  This
is good, but it does nothing to reduce the "long path" delay problem.


>> Let's say the mapping reply comes back with a 90 minute caching time.
>>
>> MR-1 sends to ITR-A a map reply, with ITR-A's request's nonce, with
>> the fresh mapping information and a caching time of 90 minutes.  Now
>> MR-1 can encapsulate packets to its choice of ETRs, based on the
>> fresh mapping it has received and whatever it has determined about
>> reachability of those ETRs, and of the ETRs' ability to get packets
>> to the destination network.
> 
> No, no, no. The Map-Resolver does not encapsulate any packets. Remember
> the ALT has no data going over it.

OK - I meant to write "ITR-A" can encapsulate packets to ..."


> If the Map-Resolver is caching Map-Replies and the ITR sends a
> Map-Request with A=0, then the Map-Resolver can respond with a
> Map-Reply. If the ITR sends a Map-Request with A=1, the Map-Resolver
> must forward the Map-Request over the ALT so an authoritative Map-Reply
> can be returned by the ETR.

OK.  This is the Authoritative bit in the Map-Request message:

http://tools.ietf.org/html/draft-farinacci-lisp-12#section-6.1.2

In my example I am assuming the ITR trusts the caching ability
of the Map Resolver and would prefer a quick reply (and no more
burden on the ALT network, the Map Server or the ETR) to waiting
longer for an authoritative reply from the ETR.  So the ITR would set
the A bit to zero.

The authoritative reply would be fresh with the full length of
caching time, but I figure that in most instances, whatever remained
of the caching time in the Map Resolver's cached mapping would be
sufficient for the ITR.


>> Later, at T = 85 minutes, ITR-B sends a mapping request to MR-1 for
>> an address which matches this same EID prefix.  MR-1 can use its
>> cached information and send a reply within a few milliseconds.  This
>> means ITR-B's traffic will not be delayed by any significant amount.
>>
>> What caching time will be in that reply to ITR-B?  I assume it will
>> be 5 minutes.  If it would be 90 minutes, ITR-B could be running for
>> a long time to come on stale mapping information.
> 
> We haven't figure that out yet. We don't want to create an impression
> that a cacher of a Map-Reply can use any TTL it wants. We want to make
> it mandatory to respect the ETR's value.

OK.


>> Assuming ITR-A no longer needs this EID's mapping, but ITR-B keeps
>> needing to tunnel packets addressed to this EID, then at T=90
>> minutes, ITR-B will want mapping information again.
>>
>> Should ITR-B request the mapping again at at T = 88 minutes, in
>> readiness for probably needing it in 1 minute's time?
> 
> It could, but the reasons to time out the map-cache entry is to keep the
> cache small and to be resilient, to some extent for locator-set changes
> at the ETR site.

OK.


>> This would seem like a generally reasonable approach if it prompted
>> MR-1 to get fresh mapping information, but why should MR-1 do this?
>> Would MR-1 need to look at the original caching time and how much
>> has expired to decide whether it should, by some algorithm, request
>> fresh mapping?  But what if the mapping hadn't changed in the distant
>> Map Server, but the ETR was going to change it two minutes later?
> 
> One of the problems I see with caching in the Map-Resolver is if the
> map-cache entry does have a locator-set change and the ETR asks all
> cachers to send Map-Requests (it does this by setting the SMR-bit for
> active flows), the Map-Resolvers cannot get updated because they are not
> seeing data.

> However, I have a solution for this because, it will be the ITR that
> sends A=1 Map-Requests with an SMR-bit set. That can tell the
> Map-Resolver to ask for the Map-Reply back to update it cache. I know
> there are security issues with this but it's one way of doing it.

OK.

I have just written a message about the Versioning approach working
in all circumstances, but Solicit-Map-Request (SMR) not working when
the sending host is not on an EID address:

  LISP Versioning vs. Solicit-Map-Request
  http://www.ietf.org/mail-archive/web/rrg/current/msg04585.html

quoting this part of your reply.


> There are also details how a Map-Resolver asks to get the Map-Reply
> back. We want to do this in a stateless manner in the Map-Resolver. So
> we might have to preserve the ITR RLOC's address in the Map-Request but
> instruct the ETR where to send the Map-Reply. We have some ideas and
> what to think about it before changing packet formats.

By "stateless" do you mean you don't want the Map-Resolver to have to
remember which ITR to reply to when it gets some Map-Reply back?
Then, I guess, if the Map-Reply came back from the ETR to the
Map-Resolver with the ITR's address embedded, the Map-Resolver could
figure out from that Map-Reply, without any state, to send the
mapping on to that ITR.


>> If ITR-B waited until T = 90 or a little later before requesting
>> fresh mapping, then unless MR-1 had already got fresh mapping in the
>> last minute or two, then there would presumably be a delay in the ITR
>> being able to handle traffic for this EID, since it would take some
>> time for MR-1's second mapping request to traverse the ALT network
>> and generate a reply to MR-1.
>>
>> There are various scenarios, but I think there are potential
>> difficulties with caching times running out in three locations now
>> rather than one.
> 
> Yeah, we can't get too tricky about manipulating TTLs. DNS has been
> fraught with problems due to TTL issues. If anyone has advice about
> this, it would be nice to hear about it.

OK.


>> Previously, it was simple (despite the scaling problems of lots of
>> ITRs peppering an ETR for mapping, not to mention them all trying to
>> decide reachability for this and other ETRs):
>>
>>  ITR       query ----------->    ETR
>>            <----------- reply
>>  cache
>>
>>
>> Now we have:
>>
>>  ITR   query ----->  Map      query ----->  Map
>>        <----- reply  Resolver <----- reply  Server <--Register- ETR
>>
>>  Cache               Cache                  Cached, in a  sense,
>>                                             controlled by messages
>>                                             from the ETR(s) whenever
>>                                             they can reach the
>>
>>                                             Map Server and decide
>>                                             to send a Map Register
>>                                             message.
> 
> Right, but what if we used for first case and have the ETR schedule an
> update to the Map-Resolver? Or what if the ITR updated the Map-Resolver?
> Not sure yet. And not even sure how much RTT will buy us with
> Map-Resolver caching.

OK.


>> I think this raises more complex problems with:
>>
>>  1 - How to avoid cache times running out at ITRs
>>      which are going to be tunneling packets addressed
>>      to this EID after the cache time expires.
>>
>>      Such a situation will cause a traffic delay unless
>>      the local Map Resolver has recently got fresh mapping.
> 
> But I think this is issue has continually been exaggerated. The
> Map-Request delay is not for a lot of packets and will be relatively
> rare I imagine.

I and others still think it is a serious problem.  If people had the
choice between two routers, one of which occasionally delayed new
sessions by a second or two and one that didn't, then there would
need to be compelling reasons to buy the first one.   End-users who
can only get PA space, without multihoming now, might be attracted to
LISP-ALT space, despite the delays.  However any larger end-user
network would definitely not want to switch their BGP-managed PI
prefixes over to LISP-ALT management if it resulted in this inferior
behavior.

It is not just a TCP session being delayed by a fraction of a second,
or a second or two (or several seconds if the lone request and
response packet is lost in the global ALT network).

The delays will sometimes affect initial attempts to reach a DNS
server, since end-user networks will be running their won DNS servers
and they will be on EID spaces, not necessarily the same EID prefix
as the host whose address is being looked up.

Likewise, the delay could be two delays, not counting DNS.  Firstly a
delay in getting a packet from host X to host Y (on an EID address).
 Then, if X is also on and EID address, Y's ITR could have a delay in
getting the mapping for X.

The delays are not necessarily simply bounded by the response time of
the ALT network.  If the ITR buffers the packet and waits, and if the
response arrives before the sending host tries again, this would be
optimal.  However, if the ITR drops the packet, figuring the response
will come too late to be worth sending the original packet, then the
delay time is more likely to be a function of how long it takes the
sending host to time out and try again.  Sending a delayed packet
around the time of the second packet would probably cause confusion
and unwanted responses, an argument for ITRs dropping all packets
they have no mapping for.

But with a caching Map-Resolver, maybe the ITR can get the mapping in
a few milliseconds - so it should buffer the packet and send it when
the mapping arrives.

It cannot be assured the sending host will retry.  It might just try
the communication to a different host, in a different EID, and so be
subject to the same potential delay problems which could easily
outlast the host's time-out value.

If something like LISP-ALT was widely implemented, sending hosts
might be tempted to generate a flurry of closely spaced packets, in
order that one or more of them would be sent as soon as the mapping
arrives in the ITR.  Since the sending host can't anticipate whether
the ITR has the mapping, could get it in a few milliseconds, or might
have to wait for it for fractions of a seconds or more (or forever,
if the request or reply is lost) then I think this scenario might
encourage undesirable host behavior trying to milk the fastest
performance out of the uncertain ITR mapping situation.


>>  2 - How to minimise unnecessary map requests by Map
>>      Resolvers trying to anticipate ITRs making such
>>      requests, but actually requesting fresh mapping from
>>      the distant Map Server when the ITR doesn't need it.
> 
> Right, Map-Resolver caching could be more trouble than it is worth.

I wasn't suggesting this was necessarily the case.  If you could
contemplate converting LISP to a local full database query system
with NERD-like full push to Map Resolvers, I think the much lesser
change from ALT of having caching in the map resolvers (local caching
query servers) is worth contemplating.


>> Without further complications, the Map Resolver can't know whether
>> the one or more ITRs which requested the mapping for an EID are still
>> handling traffic for that EID.  So it can't very well request fresh
>> mapping towards the end of its expiry, just in case an ITR wants it.
>> To do so would approximately double the volume of map requests
>> traversing the ALT network, since it is reasonable to assume, with
>> longish caching times, that the original caching time will generally
>> suffice for the needs of the one or more ITRs served by the Map
>> Resolver.  (This would not be true with a busy Map Resolver and
>> popular EIDs many ITRs are sending packets to.)
>>
>> Without some elaboration of the request protocol, ITR-B at T = 85
>> minutes can't ask the Map Resolver to get fresh mapping and send it a
>> new reply - unless there is some algorithm in the Map Resolver such
>> as: "If the cached mapping is 90% of the way to its expiry time, do
>> not answer the new request from the cache, but send a fresh map
>> request and then answer the query if and when the new reply arrives."
>>
>> To do so would effectively shorten all the caching times.
> 
> Well, it the mappings don't change, longer TTLs will help the
> Map-Request load on the ALT. If there are frequent changes and you want
> fast convergence to them, then you use more resources.
> That is the tradeoff.

OK.


>> At present, there is only one kind of map request message from an ITR
>> to a Map Server - implicitly an urgent request.
>>
>> If there was a second kind:
>>
>>   "This ITR has mapping for this EID which will expire in some
>>    time period (specified) soon, and requests the Map Resolver
>>    to get fresh mapping from the Map Server now, and to send
>>    a reply once this arrives."
> 
> There is no reason why the ITR cannot send a Map-Request directly to the
> RLOC of the ETR. It does have a set of them he can try. And the nonce
> will protect against ETR spoof attacks.

OK - but that wouldn't help update the cache of a caching Map-Resolver.

>> then I think these problems would be resolvable with less trouble and
>> less need for choices based on limited information.
>>
>>  - Robin
> 
> Thanks again for your comments Robin,
> Dino

Thanks for responding - in detail!

  - Robin
[rrg] LISP Map Server I-D & updated draft-farinac… Robin Whittle
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Dino Farinacci
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Patrick Frejborg
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… David Meyer
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Patrick Frejborg
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Dino Farinacci
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Robin Whittle
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Patrick Frejborg
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… David Meyer
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… David Meyer
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Robin Whittle
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… David Meyer
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Robin Whittle
Re: [rrg] [lisp] LISP Map Server I-D & updated dr… Patrick Frejborg