Re: [RAM] LISP NERD/CONS, eFIT-APT & Ivip compared

Thanks Dan and Michael for your appreciative message and for
correcting my misunderstandings.  You wrote, in part:

> Robin Whittle wrote:
>  > LISP-APT  (APT is only intended for eFIT.)
> 
> The wording is too strict. While APT was intended for eFIT, its main
> design concepts are generalizable towards any similar routing
> architecture. With a few modifications, we expect that APT can be made
> to work with LISP. In your post, you choose not to "consider how LISP
> might work with APT", which is fine. We are just pointing out that APT
> is not limited to ONLY working with eFIT.

I have updated my comparison:

  http://www.firstpr.com.au/ip/ivip/comp/

to reflect this and other matters you raise.

If there is a document which describes how APT applies to LISP,
then I or someone else might want to do a comparison between that
and LISP-NERD, LISP-CONS, eFIT-APT and Ivip.  Alternatively,
perhaps someone could write about the differences between LISP 3.x
in general and eFIT: what they have in common, what one or the
other lacks and/or what one or the other adds.

BTW, in my last message I indicated that after the confusion I had
with LISP-CONS, I removed all my changes in green from the
comparison.  However I have retained some changes to reflect what
Dino wrote about an ITR encapsulating packets sent by hosts from
non-upgraded networks via advertising the LISP-mapped prefix in
BGP, from a single provider network. 	

>  > the same concept in eFIT-APT: "updated mapping entry" - which are
>  > real-time, local, push messages directed to the caching ITRs (and
>  > for Ivip, the caching Query Servers) which they query.
> 
> This is inaccurate. Default mappers never push unsolicited mapping
> entry updates to their tunnel routers. A default mapper will only send
> mapping entry information to a TR when the TR requests this information
> (by forwarding a data packet to the default mapper). Thus local mapping
> entry updates are strictly pull messages. Mapping entry TTLs within a TR
> cache ensure that tunnel routers pull updated mapping entry info as
> frequently as desired.
> 
> Perhaps our paper was unclear on this point. We will try to rectify this
> in future drafts.

OK - this was my mistake interpreting:

   At this time, if the mapping information changed in any way
   since ITR1's prior request, M1 can respond with an updated
   mapping entry.

Looked at in isolation, this seemed to me to indicate that the
Default Mapper could choose ("can") whether or not to send an
updated mapping entry.  I interpreted this as it choosing to send
something or not.  In fact, as I understand it now, the Default
Mapper always responds to a query with the latest mapping data,
which may be different from what it sent last time - but it is not
really a choice to send or not send something.

A more careful reading in the context makes it harder to
misinterpret this sentence - it is all to do with the ITR sending
a request for mapping (in the form of a user packet which it
currently doesn't have mapping data for).

Perhaps you could change this to be:

   At this time, M1 will respond with the mapping information,
   which may be the same as specified in the previously sent
   response or which would be different if M1 had received an
   update message which affected this mapping since it sent the
   previous mapping information to the ITR.

This substantially changes my understanding of the responsiveness
of eFIT-APT. I had thought that eFIT-APT had something like Ivip's
"Notify" function.

Now I understand that of these proposals, only Ivip attempts to
push "Notify" messages to caching ITRs which need it - or at least
probably need it, on the basis they will probably have to
encapsulate more packets which need this mapping, before the
caching time for the old, and now incorrect, mapping expires.

This means that all three other proposals - LISP-NERD, LISP-CONS
and eFIT-APT rely on short caching times to achieve rapid
multihoming service restorations.  This means the enquiry and
response traffic (LISP-CONS and eFIT-APT) or the polling traffic
for database update files (LISP-NERD) rises dramatically as the
caching time is shortened in an attempt to speed service
restoration times.

>  >   Each transit network is expected to run at least one "Default
>  >   Mapper", which may be router (at least it can encapsulate
>  >   packets, even if it only has one interface), and which has a
>  >   real-time updated copy of the entire eFIT-APT mapping
>  >   database. (Except in the future for IPv6 - section 7.4.)
> 
> Actually, it's likely to be the case with IPv6 that every default mapper
> still stores the full table. We meant section 7.4 as a contingency plan,
> just in case the global mapping table actually approaches its
> theoretical maximum size under IPv6. Note that 10^18 is eight orders of
> magnitude greater than the human population of the earth, so we aren't
> likely to
> actually use all of those prefixes any time soon.

OK - I have amended this to:

  (Except perhaps in the future for IPv6 when the
   mapping database could theoretically grow to immense
   proportions - section 7.4.)

>  >   Default Mappers also have the equivalent of an Ivip "Notify":
>  >   "M1 can respond with an updated mapping entry". There's
>  >   no mention of the ITR having to acknowledge this - and even
>  >   if it did, if anycast is used, it would be difficult to ensure
>  >   that the ack would go to the correct Default Mapper. Both
>  >   Ivip and APT should have a way of ensuring the updated mapping
>  >   information really has been received by the ITR (APT) or QSC,
>  >   ITRC or ITFH (Ivip) it was sent to.
> 
> We're not sure what you're referring to here. We don't have any push
> messages from the default mapper, and, as you mentioned before, the
> worst thing that can happen if an ICMP mapping response packet is lost
> is that the next packet goes through the default mapper
> as well.

This is based on the first misunderstanding noted above.  I have
revised the comparison by putting a strikethrough through this
paragraph.

My last sentence only refers to my mistaken idea of the Default
Mapper pushing updated mapping to the ITR and needing to be sure
it got it.  This problem does not apply to a response to a query,
because as you write, if the response is lost, there will simply
be another query - and the Default Mapper always encapsulates the
user packets which are the vehicle for the query.

>  >   Devolving the multihoming link selection problem, and the TE
>  >   functions, to the Default Mapper is a good idea, I think, but
>  >   involves more ICMP packets to ITRs with "updated mapping
>  >   entry" information. As noted above, I don't see any provision
>  >   in APT for this to be delivered in a reliable manner. 

Reliability could have various aspects.  Now I know there is no
push "Notification" (cache invalidation) there is no need to
ensure the mapping responses really are received - so that is one
aspect of "reliable".  The other aspect of reliability is security
- preventing the ITR from taking notice of spoofed packets, which
is what I refer to here:

>  >   There
>  >   also needs to be some crypto or protection against an attacker
>  >   spoofing packets with the Default Mapper's address, to ensure
>  >   the ITR's cache is not corrupted by someone who wants to steer
>  >   packets to their own address.
> 
> We think perhaps you misunderstood our use of TTLs. The TTL is in
> theITR, not the default mapper. The default mapper only tells the ITR
> whatvalue to use. When a TTL expires, the ITR invalidates its
> corresponding cache entry completely. As you mentioned, the worst thing
> that happens in the case of a lost ICMP mapping reply packet is that the
> next packet gets
> sent to the default mapper as well.
> 
> In regards to their security, we discuss this in section 7.2 of our
> draft. ICMP mapping packets never need to travel between ASes, so we
> insist that they always be dropped at the border routers within transit
> space. This means that they can only be spoofed in any given AS by a
> device within that AS. I suppose we could sign them, but we didn't
> really see a need.

OK.  I will add a note about this to the comparison, quoting some
of your message and referring to this message in the RAM archives.

I think this places a considerable burden on the border routers of
the AS, since they need to do deep packet inspection on every ICMP
packet which comes in.

>  > eFIT-APT: Not documented, but APT mentions an ICMP Destination
>  > Unreachable message being generated if the encapsulated packet
>  > does not reach an ETR, so I guess this means UDP encapsulation
>  > with outer SA = ITR (or Default Mapper) address, similar to that
>  > described in LISP-01's description of LISP 1 / 1.5.
> 
> Not sure if this is correct. It sounds like you believe that Destination
> Unreachable ICMP packets are generated whenever encapsulated packets are
> lost.
> Is this your thought? Is this why you also believe that we expect UDP
> encapsulation?
> 
> Destination Unreachable ICMP packets in APT are used in the same way
> that they are used today. The messages are only generated when the
> destination router cannot be reached due to a malfunction of some subset
> of routers.
> 
> Again, we are not sure if there is a misunderstanding here.

I was trying to imply from other aspects of your draft what sort
of encapsulation eFIT-APT would use.  I can find no mention in:

  http://tools.ietf.org/html/draft-jen-apt-00

of either UDP or IP-in-IP.  Perhaps you use the same encapsulation
as LISP:

  . . . the ingress tunnel router, or "ITR", as defined in [LISP]

The reference is to LISP-00 which used IP-in-IP, but now
LISP-01/02 uses UDP.

What sort of encapsulation do you use?

Is the SA (Source Address) of the outer header that of the ITR or
of the sending host?

LISP uses the ITR's address but Ivip uses the sending host's
address, to help with filtering (to prevent ETRs being used to
circumvent local address spoofing).  I think it also helps with
Path MTU auto-discovery, by removing the need for the ITR to
handle ICMP packets and to try to figure out which sending host's
encapsulated packets resulted in the ICMP packet being sent.

>  > If an end user is with provider X, using some of their PA
>  > addresses, could they go to another provider Y for connectivity,
>  > or X and Y or Y and Z for multihoming, and still have to rely on X
>  > to to the mapping the way they want?
>  >
>  > What if an AS-end-user with their own PI space is currently
>  > connecting with provider X. If they want to move to provider Y,
>  > does this means that X's Default Mapper no longer is
>  > authoritative, and that Y's is? I am confused about exactly how
>  > eFIT-APT works.
> 
> It's the customer that is ultimately authoritative for their own
> mapping, even though their transit-space addresses are
> provider-assigned. We expect that ALL of a customer's providers will
> announce the same mapping information on their behalf. All of these
> mappings should be completely identical (and contain all of the
> customer's ETR addresses at all of their providers). One thing we have
> discussed (which is not in the draft) to ensure that they are indeed
> identical is to allow customers to cryptographically sign their
> mappings. In combination with the misconfiguration detection scheme we
> discuss in section 7.1, this should prevent providers from being able to
> affect the global mapping table with a mismatched mapping.

OK - I have a rough idea of what you are proposing.  I have quoted
this in the comparison page.  I understand from this that the
end-user needs to have their own server, or secure messaging
system, to control the mapping data for their prefix which
multiple Default Mappers in multiple provider networks advertise
via new BGP messages.

Is this a transition arrangement or a permanent one?  An end-user
leaves ISP-X for ISP-Y and ISP-Z - then all three could be
advertising the same mapping information for a while, until the
end-user no-longer needs ISP-X?

I tried to imagine my own approach to incrementally introducing
eFIT-APT, whilst preserving full reachability from hosts in
non-upgraded networks.  (Any approach which does not achieve this
will be non-incrementally deployable and I think will never get
off the ground.)

I would really appreciate you writing more on incremental
adoption, because the best I can imagine for eFIT-APT results in
my critique:

    With eFIT-APT, there would be no benefit for early
    adopters (no portability without grave loss of reachability -
    and TE only for packets from upgraded networks) and no
    benefit for the whole network (removal of prefixes from the
    BGP routing table) until virtually all networks had upgraded.

>  > With LISP and Ivip, ETR and ITR functions may often be in the same
>  > device. ETR and ITRs always being in the one device is enforced
>  > by the current definition of eFIT-APT - but I don't see why the
>  > system needs to be so inflexible.
> 
> You're right, there isn't any good reason.

OK - removing this would give more flexibility and give your
proposal more in common with LISP or Ivip.

>  > Having the ITR and ETR functions fixed at
>  > these routers - Provider Edge routers I think - seems likely to
>  > produce a bottleneck and burden these routers with even greater
>  > workloads. This may also preclude doing ETR or ITR functions in
>  > servers, which may be more cost effective than routers with all
>  > the new functionality.
> 
> We aren't sure we understand -- are you trying to say that the TRs'
> functionality is too complex for routers but requires too much speed for
> servers? We believe that this functionality should be well within the
> capabilities of routers, that was part of the reason we tried to keep
> TRs as simple as possible.

I think that the complex communications which LISP and eFIT-ETR
ITRs engage in, handling ICMP messages etc. - makes them hard to
implement on on ordinary router, because these are bound to
involve the router's main CPU (as far as I know) and that is busy
enough as it is.

For Ivip, I can imagine a plain server with suitable software
being a full database ITRD, doing up to half a gigabit of ITR
encapsulation, because there are no decisions to make about which
address to tunnel to, no ICMP messages to take notice of etc.

I imagine a motherboard with two gigabit Ethernet interfaces.  It
appears from:

  http://docs.rodecker.nl/10-GE_Routing_on_Linux.pdf
  http://www.rodecker.nl/docs/10-GE_Routing_on_Linux.pdf

(which I can't reach right now, but search Google for
"10-GE_Routing_on_Linux" and read its cache) that a high-end
server can pump 700,000 packets a second, which is 4gigabits in
both directions at once for maximum packet size.

So my 0.5 gigabit estimate is a very rough and modest guess.

That server-based ITR could be a stub, with a single Ethernet link
to some other router - so it is not technically a router in the
usual sense because it only has one interface and it has no
hardware FIB.

I also envisage caching ITRs in servers and in sending hosts.

With the current definition of eFIT-APT, it would not be practical
to use servers at all for ITR functions, because you locate the
ITR functions exactly in the CE routers, which have to be real
routers with hardware FIBs, multiple interfaces etc.

A server can't do all the things a CE router must do, but it could
do an Ivip-style ITR function.  A server could probably also be
used to do the more complex ITR functions of LISP or APT, but not
if these functions had to be in a conventional router.

>  > eFIT-APT: The placement of TRs is specifically limited to the
>  > border routers between providers and their non-provider "customer"
>  > networks. ETRs have complex communication functions, including
>  > detecting failure of the link to the end-user's host or router.
>  > They send messages to their local Default Mapper and to the ITR
>  > function of the TR which encapsulated the packet (which can
>  > include the Default Mapper of the sender's network).
> 
> We aren't hardware experts, but it seems to us that these aren't
> particularly complex functions for routers. Routers already detect BGP
> connection failures. Border link failure detection could also be
> implemented as a persistent TCP connection. (Note that it is only the
> customer-edge-to-provider-edge link that needs to be monitored.) Routers
> also already send ICMP packets in response to certain incoming packets
> (Destination unreachable and so on).

I can't say how much of APT's or LISP's ITR functions can be
implemented in the hardware FIB of existing routers or to what
degree the traffic load would be too burdensome for the central
CPUs of existing routers.

My aim with Ivip is to keep the ITR and ETR functions simple, with
no communication, no reception of ICMP messages etc. other than
the communication required for an ITRD to get its full database
feed or for an ITRC or ITFH to query a QSD/QSC query server, and
receive "Notify" messages pushed from the QSD/QSC to it when
mapping changes for some range of addresses for which the ITR is
currently caching mapping.

>  > With eFIT-APT, there would be no
>  > benefit for early adopters (no portability without grave loss of
>  > reachability - and TE only for packets from upgraded networks) and
>  > no benefit for the whole network (removal of prefixes from the BGP
>  > routing table) until virtually all networks had upgraded.
> 
> Good point. We will include an incremental deployment plan in detail in a
> future draft. Hopefully, we can address many of your concerns in this area.

That will be good.

 - Robin

_______________________________________________
RAM mailing list
RAM@iab.org
https://www1.ietf.org/mailman/listinfo/ram