Re: [rrg] Fwd: I-D Action:draft-iannone-lisp-mapping-versioning-00.txt

Robin Whittle <rw@firstpr.com.au> Wed, 04 March 2009 18:23 UTC

Message-ID: <49AEC1DC.3020001@firstpr.com.au>
Date: Thu, 05 Mar 2009 05:01:00 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.19 (Windows/20081209)
MIME-Version: 1.0
To: rrg@irtf.org
References: <20090303181503.470533A67F2@core3.amsl.com> <C50348EC-F61D-4BAE-B32E-B11D389B84CD@net.t-labs.tu-berlin.de>
In-Reply-To: <C50348EC-F61D-4BAE-B32E-B11D389B84CD@net.t-labs.tu-berlin.de>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Cc: Damien Saucez <damien.saucez@uclouvain.be>, lisp@ietf.org
Subject: Re: [rrg] Fwd: I-D Action:draft-iannone-lisp-mapping-versioning-00.txt
Precedence: list

Hi Luigi,

I read your new I-D:

  http://tools.ietf.org/html/draft-iannone-lisp-mapping-versioning-00

and have the following comments and questions.


I understand this is a proposed change to the LISP protocol.  I guess
it is independent of the work of the main LISP-ALT team, who have
updated their main I-D and added a new one:

  http://tools.ietf.org/html/draft-fuller-lisp-ms-00
  http://tools.ietf.org/html/draft-farinacci-lisp-12

in recent days, in ways which seem unrelated to your suggestion.

I guess that your suggested replacement of LISP's Locator
Reachability Bits system with the new Versioning system is motivated
by problems you encountered, or anticipated, as part of your OpenLISP
project.

I don't clearly understand all the ways the Locator Reach Bits are
intended to be used.  There is a complex description within six pages
of material 6.2 to 6.5.2:

  http://tools.ietf.org/html/draft-farinacci-lisp-12


However, I can well imagine that there would be problems with Locator
Reachability Bits.  Your section 9.2 includes a critique of these
lists of bits having holes in them as RLOCs are added to or removed
from the list.  I have not been able to understand to what extent the
various bits sent by various ETRs are coordinated with the bits sent
by other ETRs.

You also mention security problems (10.2):

  ... with explicit reachability bits where an attacker can set
  all RLOCs to down with one single packet, with disruptive
  consequences on the traffic.

I don't understand how these bits can be used securely, but I assume
there is a genuine problem motivating your proposal to replace them
with a versioning system.

I think I understand the Destination Mapping Version part of your
proposal:  All mapping distributed by the ALT network contains an
additional 15 bit Destination Mapping Version number, which is
incremented for every new version of the mapping information.  (This,
and the similar 15 bits for Source Mapping Version, are intended to
replace the Reachability Bits in the the LISP traffic packet header.)

This Destination Mapping Version number is cached at the ITR, and in
every traffic packet the ITR encapsulates, the (modified) LISP header
includes this 15 bit number.

When the encapsulated packet arrives at the correct ETR, this ETR is
the authoritative source of mapping for the destination EID - or at
least is directly controlled by, or in communication with, whatever
it is which is the authoritative controller of that mapping.
Therefore, the ETR knows the latest mapping version number for this EID.

If an ITR-X is sending packets with an older version of the mapping,
then the ETR can quickly determine this and send a Map Update
Notification Message to ITR-X.  This should prompt ITR-X to look up
the mapping for this EID in the usual, secure, manner - and so
hopefully get the latest mapping with the latest version number.
There is nothing secure about such a message - so any attacker could
generate one and send it to any ITR.

(The ITR might be fussy about the source address of such messages,
but an attacker could easily find out the address of an authoritative
ETR for this EID, and send a spoofed message with that source
address.  Also, if the ITR was to be fussy about the source address
of the message, that would need to be based on RLOCs in older mapping
information, and this would make it difficult to adapt to new ETRs
which are in fact authoritative.)

In order to respond to genuine Map Update Notification messages, the
ITR needs to respond to spoofed ones too - since it can't tell the
difference.

When it gets such a message, which I will assume is genuine, the ITR
requests fresh mapping for the EID specified in that message.  With
the ordinary ALT network, this involves a possibly or typically
global path (or longer, due to the LISP-ALT long-path problem) for
the query packet over the ALT network, and then a rapid, though still
potentially global distance, response from the ETR (or whatever
responds to the request).

So I think your proposal is in some ways at least an attempt to
improve the responsiveness of the LISP system to changes in whatever
is happening with an EID's ETRs, in ways which are more efficient
than simply shortening the caching time of the map replies.

LISP mapping caching time is specified in units of minutes.

If an ETR answered all queries with replies containing a 1 minute
caching time, then as long as ITRs respected this (which can
generally be assumed, though some operators might decide this short
time was unreasonable) and as long as ITRs requested fresh mapping a
few seconds before the cache time expired (since I think it may take
a seconds or two or more to get it over the global ALT network) then
the ITR would always have mapping data which was no more than 1
minute behind whatever was the latest.

However, this is generally unscalable, since it would burden the ITRs
and the ETRs with a great deal of processing and traffic.  Worse
still, it would burden the global ALT network with a great deal more
query traffic.  While the ITR and the ETR operators are directly
involved in the communications and so are getting paid according to
the usage, there must be operators of ALT routers which these query
packets traverse who have a difficult time getting paid by the ITR
and ETR operators according to the burden of query traffic these
operators place on the ALT network routers between them.

So I guess you are assuming something like the caching time being
"long" such as 120 minutes, and the Versioning system being used only
occasionally to cause ITRs to get fresher mapping than would
otherwise be the case.

If your Versioning system repeatedly involved new mapping every 10
minutes, for instance, then I think it would be simpler and more
efficient to simply set the caching time in the replies to 10 minutes
and let the standard ITR caching mechanism do its work.

Can you comment on my understanding that Versioning is best for
generating fast responses in all the world's ITRs to changes in
mapping which do not happen all that frequently?

(I think there are typos in the last paragraph in section 3.)

I do not clearly understand the following:

     5. Dealing with Version Numbers

OK   The main idea of using Mapping Version Numbers is that whenever
     there is a change in the mapping (e.g., adding/removing RLOCs, a
     change in the weights due to new TE policies, or a change in the
     priorities) or

??   an ISP realizes that one or more of its own RLOCs are not
     reachable anymore from a local perspective (e.g., through IGP,
     due to route flap, or policy changes) the ISP updates the
     mapping with a new mapping version number.

I understand the need for a new version number if there is a change
in the existence or addresses of ETRs, or in the priority of the ETRs
(to control the ITRs' choice of ETR when two or more are reachable)
or in the weighting (to tell the ITR how to spread traffic between
multiple ETRs).

I don't understand the second section, where an ISP does something
because one of its ETRs has become unusable for traffic - for
instance it may be reachable by one or all ITRs, but its link to the
destination network is broken, or questionable.

Assuming the whole purpose of this is to support multihoming, and
assuming the one destination network has an ETR-A in ISP-A and an
ETR-B in ISP-B, then I don't know how one ISP can act unilaterally to
change the mapping in some way.

Part of my difficulty is that I have never been able to understand in
practical terms how the two ISPs are supposed to securely coordinate
their two ETRs, or whatever it is which controls their ETRs, so the
two act in a unified manner regarding responding to mapping requests
- and now including version numbers.

These ISPs may not know each other, like each other or trust each
other.   There could be 10,000 ISPs, with hundreds of millions of
multihomed customers, and a very wide spread of customers choosing
ISPs.  Any one ISP might have 10,000 customers, and those customers
other ISPs could number in the hundreds or thousands.  So how is any
one ISP supposed to securely coordinate its ETRs with the ETRs of
hundreds or thousands of other ISPs?

This is a major critique of LISP in general - not of your Versioning
system.  I haven't raised it before, but I have never been able to
understand how this could scale well.

Those customers will be chopping and changing ISPs frequently and I
can't see how the ETRs are supposed to reliably and securely
communicate when they most need to - when there are outages in the
DFZ and/or inside each ISP's network.

(Ivip doesn't have any such problems: the ETRs simply decapsulate
packets and handle a PMTUD protocol with ITRs.  For modified header
forwarding, the ETRs have no such protocol or communication with the
ITRs at all.  At no time do ETRs communicate with each other, either
within or between ISPs.)

Returning to the section I couldn't understand:

??   an ISP realizes that one or more of its own RLOCs are not
     reachable anymore from a local perspective (e.g., through IGP,
     due to route flap, or policy changes) the ISP updates the
     mapping with a new mapping version number.

Does this mean the ISP would generate a different mapping,
temporarily removing the faulty ETR, so that the other ISP's ETR
would be the only one which an ITR could choose to tunnel packets to?
 If so, how is this to be coordinated with the other ISP?  Surely the
other ISP's ETR would need to change its mapping and give out the new
version number too - but there is an outage going on, affecting the
ETR in some way so it could be difficult to coordinate the two.

Alternatively, do you mean that the ISP doesn't change its mapping,
but only changes the version number?  I doubt this, but then why
would you describe it separately from simply adding and/or removing
an RLOC?


The last two dot points in 5.1 contain frequent references to an ETR
dropping traffic packets tunnelled to it by an ITR, or at least
apparently by an ITR.  (ETR cannot know the IP addresses of all ITRs,
or whether a specific IP address is of an ITR.  Even if they could,
attackers could send encapsulated packets with a spoofed outer source
address.)

You suggest that if packets arrive with an old (or too high, not
recently used) Destination Mapping Version number, despite the ETR
trying repeatedly to prompt the ITR to update its mapping, that the
ETR could drop the packets.

Here are the scenarios I can imagine:

  1 - The packets are sent by an attacker.  It is good to drop
      them, but there is no way an ETR can be sure they came
      from an attacker - so this is not a reasonable option.

  2 - The packets are genuine traffic packets.  The ETR is the
      correct ETR for packets with this destination EID address.

      Even though the version number might be wrong, the ITR is
      nonetheless tunneling the packets to the right place, so
      why drop them?

  3 - The packets are genuine traffic packets.  The ETR is not
      the correct ETR.  It should try to prompt the ITR into
      updating its routing, but the question of dropping the
      packets does not arise, because the packets are already
      in a black hole due to being tunnelled to the wrong ETR.

      ETRs are not and cannot be in the business of trying to
      send the packets to some other ETR.

So I can't imagine what sense there is in an ETR deciding whether or
not to drop traffic packets.

Even if there was a reason for dropping them, there seems little
point in dropping packets just because they arrive with a too-high
version number.  An attacker can easily send them with the right
version number, so there is no reason to think that these packets
arise from an attacker.

Attackers can be assumed to have access the mapping system in some
way - it is a global, public, system and there is no way of fencing
it off entirely from someone who really wants to query it.  Also, the
mapping system arguably needs to be open to public scrutiny, if only
for debugging purposes and for helping to understand where traffic is
actually going.

Can you confirm my understanding of the Source Mapping Version number
system?  A device AAA, acting as an ITR, sending Source Mapping
Version numbers in the LISP header of encapsulated traffic packets to
a device BBB acting as an ETR is doing so in the expectation that BBB
will soon be, or is already, acting as an ITR and needs to tunnel
packets which are addressed to the EID of the original sending host
aaa which sent packets to AAA's ITR function.  Since that sending
host aaa is presumably in some end-user network using ETRs in the ISP
which runs AAA, then the AAA device presumably has an up-to-date
Destination Mapping Version number for the EID which aaa's address
matches.


 - Robin

[rrg] Fwd: I-D Action:draft-iannone-lisp-mapping-… Luigi Iannone
Re: [rrg] [lisp] Fwd: I-D Action:draft-iannone-li… Dino Farinacci
Re: [rrg] Fwd: I-D Action:draft-iannone-lisp-mapp… Robin Whittle