Re: [nvo3] Reachability and Locator Path Liveness

<david.black@emc.com> Tue, 10 April 2012 14:51 UTC

Return-Path: <david.black@emc.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EE80C21F8671 for <nvo3@ietfa.amsl.com>; Tue, 10 Apr 2012 07:51:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.267
X-Spam-Level:
X-Spam-Status: No, score=-110.267 tagged_above=-999 required=5 tests=[AWL=0.331, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pq6Jmdr9PQHT for <nvo3@ietfa.amsl.com>; Tue, 10 Apr 2012 07:51:47 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by ietfa.amsl.com (Postfix) with ESMTP id D314F21F8633 for <nvo3@ietf.org>; Tue, 10 Apr 2012 07:51:46 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id q3AEpjcm022875 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 10 Apr 2012 10:51:45 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.253]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor); Tue, 10 Apr 2012 10:51:25 -0400
Received: from mxhub34.corp.emc.com (mxhub34.corp.emc.com [10.254.93.82]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id q3AEpNpe020962; Tue, 10 Apr 2012 10:51:24 -0400
Received: from mx15a.corp.emc.com ([169.254.1.107]) by mxhub34.corp.emc.com ([::1]) with mapi; Tue, 10 Apr 2012 10:51:22 -0400
From: david.black@emc.com
To: jmh@joelhalpern.com
Date: Tue, 10 Apr 2012 10:51:21 -0400
Thread-Topic: [nvo3] Reachability and Locator Path Liveness
Thread-Index: Ac0XJHRcfLwfKU//TV6Nmfx6BSeytAAAchqQ
Message-ID: <8D3D17ACE214DC429325B2B98F3AE712034194@MX15A.corp.emc.com>
References: <8D3D17ACE214DC429325B2B98F3AE712034117@MX15A.corp.emc.com> <CAHiKxWiVQX=H23gFL7Z4bqKAadWqNBWnYuLz=DeGD7JVZODpYQ@mail.gmail.com> <8D3D17ACE214DC429325B2B98F3AE712034170@MX15A.corp.emc.com> <4F844068.4060004@joelhalpern.com>
In-Reply-To: <4F844068.4060004@joelhalpern.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-EMM-MHVC: 1
Cc: nvo3@ietf.org
Subject: Re: [nvo3] Reachability and Locator Path Liveness
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "L2 \"Network Virtualization Over l3\" overlay discussion list \(nvo3\)" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Apr 2012 14:51:49 -0000

Hi Joel,

> In concluding that there is no locator liveness problem, is you
> assumption that there is always only one locator that can be used?

No, at least not yet ... although it is an attractive simplification *if*
the related problems can be solved with a single locator ...

> The reason I ask is that there are two basic, related, causes that drive
> the locator liveness issue.
> First, if an end-point can be reached with two locators, it is important
> to know that one of them is no longer working, so that you switch to
> using the other.

... starting with exactly that availability problem ;-).

I would not assume a single locator until the options for availability are
explored, although there are commonly-used data center approaches that
effectively fail-over a single locator (the MAC address moves at L2) -
I hope VRRP is a familiar example.

> \Second, if an end-point can change locators, one of the usual
> techniques for handling things is to have the original locator send new
> information.  If one relies on this, then if the original locator is
> dead one needs that information as a proxy to prompt refetching the
> information.

That's typically not done for virtual machines (VMs); the typical currently-
used technique for a VM location change is to send a gratuitous ARP or RARP
from the new location to trigger data plane learning, and not worry about
dropping a few packets that go to the old location in the interim.  The
draft charter envisions work on an explicit attach/detach protocol that
offers opportunities for improvement, but not having to retain state at
the old location is a "feature" for VMs.

> With regard to site exit, the issues seem more complex than your initial
> description painted.  It may be possible to solve.  But it is not free
> if one is using the locator of the site exit, instead of underlying
> routing to the ultimate destination.

As an initial swipe at this, I'd expect "routing to the ultimate destination"
to select the site exit based on the inner IP header, with the encapsulation
effectively functioning as a glorified L2 link to get to it via the site exit's
locator.  If the initial nvo3 overlay deployments focus on the access layer,
then the core L3 routing on the "other side" of the overlay should get this
right, but I recognize that this paragraph is still somewhat of a high-level
handwave.  If the overlay reaches all the way to the site exits, then there's
no L3 forwarding of the end system's encapsulated L2 traffic between the end
system and the site exit (i.e., all of the L3 forwarding acts on the outer IP
header, not the inner one in the encapsulated L2 frame).  In this case, the
end system winds up selecting the site exit; there are environments in which
that may not be a good design ... or, in other words ..."Doctor, it hurts when
I do <this>." ... "Don't do <that>!" :-) :-).

Thanks,
--David


> -----Original Message-----
> From: Joel M. Halpern [mailto:jmh@joelhalpern.com]
> Sent: Tuesday, April 10, 2012 10:15 AM
> To: Black, David
> Cc: dmm@1-4-5.net; nvo3@ietf.org
> Subject: Re: [nvo3] Reachability and Locator Path Liveness
> 
> In concluding that there is no locator liveness problem, is you
> assumption that there is always only one locator that can be used?
> 
> The reason I ask is that there are two basic, related, causes that drive
> the locator liveness issue.
> First, if an end-point can be reached with two locators, it is important
> to know that one of them is no longer working, so that you switch to
> using the other.
> \Second, if an end-point can change locators, one of the usual
> techniques for handling things is to have the original locator send new
> information.  If one relies on this, then if the original locator is
> dead one needs that information as a proxy to prompt refetching the
> information.
> 
> If neither of those applies, then maybe there is no locator liveness
> problem for VMs.
> 
> With regard to site exit, the issues seem more complex than your initial
> description painted.  It may be possible to solve.  But it is not free
> if one is using the locator of the site exit, instead of underlying
> routing to the ultimate destination.
> 
> Yours,
> Joel
> 
> On 4/10/2012 9:41 AM, david.black@emc.com wrote:
> > Dave,
> >
> > In reverse order ...
> >
> >>> Specifically, for initial nvo3 work I'd suggest assuming
> >>> that the underlying network handles reachability of the NVE at the other
> side of
> >>> the overlay (other end of the tunnel) that does the decapsulation.  In
> terms
> >>> of that draft, within a single data center (and hence for the scope of
> initial
> >>> nvo3 work), I'd suggest that the underlying network be responsible for
> handling
> >>> the Locator Path Liveness problem.
> >>
> >> Not sure what you mean here by underlying network. In the LISP case,
> >> does the underlying network handle this problem? In any event can you
> >> be a bit more explicit in what you mean here?
> >
> > The underlying network is what connects the NVEs (encap/decap locations) in
> > the data center, and it runs an IGP (e.g., OSPF with ECMP) for its routing.
> > That IGP is going to be configured with relatively conventional IP address
> > blocks (e.g., /24s for IPv4 if that's what the data center is already
> using).
> >
> > Backing up ...
> >
> >>> Looking at draft-meyer-loc-id-implications-01:
> >>>
> >>>         http://tools.ietf.org/html/draft-meyer-loc-id-implications-01
> >>>
> >>> I would suggest that for initial nvo3 work, reachability between all NVEs
> in a
> >>> single overlay instance should be assumed, as there will be an IGP routing
> protocol
> >>> (e.g., OSPF with ECMP) running on the underlying data center network which
> will
> >>> handle link failures.
> >>
> >> That may be a reasonable assumption, but the fact that an IGP is
> >> running doesn't ease the "locator liveness" problem unless the routing
> >> system is carrying /32s (or /128s) and the corresponding /32 (/128)
> >> isn't injected into the IGP unless the decapsulator is up (that in and
> >> of itself might not be sufficient as we've learned from our
> >> experiences with anycast DNS overlays). In any event what we would be
> >> doing in this case is using the routing system to signal a live path
> >> to the decapsulator. Of course, carrying such long prefixes has its
> >> own set of problems.
> >
> > I strongly disagree with that statement.
> >
> > First, what may be an important difference is that when the NVE is not
> > in the end system (e.g., NVE in hypervisor softswitch or top-of-rack
> switch),
> > the locator (outer IP destination address) points at the NVE (tunnel
> endpoint,
> > decapsulation location), not the end system.  The end system is beyond the
> NVE,
> > so the NVE decaps the L2 frame and forwards based on that frame's L2
> destination
> > (MAC) address (so the NVE serves multiple end systems.
> >
> > To take a specific IPv4 example, it should be fine to have a bunch of NVEs
> > in 10.1.1.0/24 and expect OSPF or the like to deal with that /24 and its kin
> > as opposed to a /32 per NVE.  This example also applies to NVEs in end
> systems
> > - the IGP can still work with /24s and the like and does not need /32s (in
> this
> > case, each /24 points to an L2 domain with potentially many end system
> NVEs).
> >
> >>> That suggests that a starting point for whether different tunnel
> encapsulation types
> >>> should be supported in a single data center could be "if they don't have
> an NVE
> >>> node in common, they can be made to work" and optimizations can be
> considered later.
> >>
> >> Agree with this latter statement.
> >
> > Thank you.
> >
> > Thanks,
> > --David
> >
> >
> >> -----Original Message-----
> >> From: David Meyer [mailto:dmm@1-4-5.net]
> >> Sent: Tuesday, April 10, 2012 7:55 AM
> >> To: Black, David
> >> Cc: nvo3@ietf.org
> >> Subject: Re: [nvo3] Requirements + some non-requirement suggestions
> >>
> >> On Mon, Apr 9, 2012 at 1:51 PM,<david.black@emc.com>  wrote:
> >>>> Dave McDyson asked about three classes of connectivity:
> >>>>
> >>>> 1) a single data center
> >>>> 2) a set of data centers under control of one administrative domain
> >>>> 3) multiple sets of data centers under control of multiple
> >>>>        administrative domains
> >>>>
> >>>> Which of these do we *need* to address in NVO3?
> >>>
> >>> I agree with a number of other people that we have to start with 1), and
> >> then I'd
> >>> suggest addressing as much of 2) as "makes sense" without significantly
> >> affecting
> >>> the design and applicability of the result to data centers.  For example,
> a
> >> single
> >>> instance of an overlay that spans data centers "makes sense", at least to
> >> me, or
> >>> as Thomas Narten described it:
> >>>
> >>>> two DCs are under the same administrative domain, but are
> >>>> interconnected by some (existing) VPN technology, of which we don't
> >>>> care what the details are. The overlay just tunnels end-to-end and
> >>>> couldn't care less about the existence of a VPN interconnecting parts
> >>>> of the network.
> >>>
> >>> Looking at draft-meyer-loc-id-implications-01:
> >>>
> >>>         http://tools.ietf.org/html/draft-meyer-loc-id-implications-01
> >>>
> >>> I would suggest that for initial nvo3 work, reachability between all NVEs
> in
> >> a
> >>> single overlay instance should be assumed, as there will be an IGP routing
> >> protocol
> >>> (e.g., OSPF with ECMP) running on the underlying data center network which
> >> will
> >>> handle link failures.
> >>
> >> That may be a reasonable assumption, but the fact that an IGP is
> >> running doesn't ease the "locator liveness" problem unless the routing
> >> system is carrying /32s (or /128s) and the corresponding /32 (/128)
> >> isn't injected into the IGP unless the decapsulator is up (that in and
> >> of itself might not be sufficient as we've learned from our
> >> experiences with anycast DNS overlays). In any event what we would be
> >> doing in this case is using the routing system to signal a live path
> >> to the decapsulator. Of course, carrying such long prefixes has its
> >> own set of problems.
> >>
> >>> Specifically, for initial nvo3 work I'd suggest assuming
> >>> that the underlying network handles reachability of the NVE at the other
> >> side of
> >>> the overlay (other end of the tunnel) that does the decapsulation.  In
> terms
> >>> of that draft, within a single data center (and hence for the scope of
> >> initial
> >>> nvo3 work), I'd suggest that the underlying network be responsible for
> >> handling
> >>> the Locator Path Liveness problem.
> >>
> >> Not sure what you mean here by underlying network. In the LISP case,
> >> does the underlying network handle this problem? In any event can you
> >> be a bit more explicit in what you mean here?
> >>>
> >>> This suggestion also applies to the Multi-Exit problem, although on a
> >> related
> >>> note, I think it's a good idea to make sure that nvo3 doesn't turn any
> >> crucial
> >>> NVE into a single point of failure. Techniques like VRRP address this in
> the
> >>> absence of nvo3, so this could be mostly a matter of paying attention to
> >> ensure
> >>> that they're applicable to NVE failure.  Regardless of whether things work
> >> out
> >>> that way, I'd suggest that availability concerns be in scope.
> >>>
> >>> Turning to the topic of IGP metrics and "tromboning", I'd suggest that
> >> having
> >>> nvo3 add a full IGP routing protocol (or even an IGP metrics
> infrastructure
> >> for
> >>> one that has to be administered) beyond what's already running in the
> >> underlying
> >>> network is not a good idea.  It seems like a large portion of the
> >> "tromboning"
> >>> concerns could be resolved by techniques that distribute the default
> gateway
> >> in
> >>> a virtual network so that moving a VM (virtual machine) automatically
> sends
> >>> traffic to the locally-applicable instance of the default gateway for the
> >> VM's
> >>> new location, based on the same L2 address. There are multiple examples of
> >> this
> >>> sort of approach - OTV and draft-raggarwa-data-center-mobility-02 are
> among
> >> them.
> >>>
> >>> One more - Peter Ashwood-Smith writes:
> >>>
> >>>> Is it a requirement to support different tunnel encapsulation types in
> the
> >> same DC?
> >>>>
> >>>> It would seem that a very large DC could well end up with several
> different
> >> kinds of
> >>>> tunnel encapsulations that would need to somehow be bridged if they
> >> terminate VMs in
> >>>> the same subnet.
> >>>
> >>> I'd suggest that the latter scenario be out of scope and that crossing
> >> virtual
> >>> networks initially involve routing in preference to bridging, so that an
> NVE
> >>> receiving an unencapsulated packet can determine the overlay and
> >> encapsulation by
> >>> knowing which virtual network the packet belongs to.  An implication is
> that
> >> I'd
> >>> suggest figuring out how to optimize the following structure into a single
> >>> network node later (or at least as a cleanly separable work effort:
> >>>
> >>> ... (Overlay1)---- NVE1 ---- (VLANs) ---- NVE2 ---- (Overlay2) ...
> >>>
> >>> In the above, NVE1 and NVE2 are separate nodes, and the parenthesized
> terms
> >> are
> >>> the means of virtual network separation.
> >>>
> >>> That suggests that a starting point for whether different tunnel
> >> encapsulation types
> >>> should be supported in a single data center could be "if they don't have
> an
> >> NVE
> >>> node in common, they can be made to work" and optimizations can be
> >> considered later.
> >>
> >> Agree with this latter statement.
> >>
> >> Dave
> >
> > _______________________________________________
> > nvo3 mailing list
> > nvo3@ietf.org
> > https://www.ietf.org/mailman/listinfo/nvo3
> >