[nvo3] Reachability and Locator Path Liveness

<david.black@emc.com> Tue, 10 April 2012 13:41 UTC

Return-Path: <david.black@emc.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 308CE21F8585 for <nvo3@ietfa.amsl.com>; Tue, 10 Apr 2012 06:41:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.242
X-Spam-Level:
X-Spam-Status: No, score=-110.242 tagged_above=-999 required=5 tests=[AWL=0.357, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NYafEIxkcdaT for <nvo3@ietfa.amsl.com>; Tue, 10 Apr 2012 06:41:56 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by ietfa.amsl.com (Postfix) with ESMTP id 1313521F8582 for <nvo3@ietf.org>; Tue, 10 Apr 2012 06:41:55 -0700 (PDT)
Received: from hop04-l1d11-si03.isus.emc.com (HOP04-L1D11-SI03.isus.emc.com [10.254.111.23]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id q3ADfrFG001733 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 10 Apr 2012 09:41:53 -0400
Received: from mailhub.lss.emc.com (mailhubhoprd02.lss.emc.com [10.254.221.253]) by hop04-l1d11-si03.isus.emc.com (RSA Interceptor); Tue, 10 Apr 2012 09:41:39 -0400
Received: from mxhub29.corp.emc.com (mxhub29.corp.emc.com [128.222.70.169]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id q3ADfdp0011252; Tue, 10 Apr 2012 09:41:39 -0400
Received: from mx15a.corp.emc.com ([169.254.1.107]) by mxhub29.corp.emc.com ([128.222.70.169]) with mapi; Tue, 10 Apr 2012 09:41:39 -0400
From: david.black@emc.com
To: dmm@1-4-5.net
Date: Tue, 10 Apr 2012 09:41:33 -0400
Thread-Topic: Reachability and Locator Path Liveness
Thread-Index: Ac0XELaI5/kqaY/QSP+Nz31iV2qezQADJnWg
Message-ID: <8D3D17ACE214DC429325B2B98F3AE712034170@MX15A.corp.emc.com>
References: <8D3D17ACE214DC429325B2B98F3AE712034117@MX15A.corp.emc.com> <CAHiKxWiVQX=H23gFL7Z4bqKAadWqNBWnYuLz=DeGD7JVZODpYQ@mail.gmail.com>
In-Reply-To: <CAHiKxWiVQX=H23gFL7Z4bqKAadWqNBWnYuLz=DeGD7JVZODpYQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-cr-puzzleid: {6A4C5A08-0051-4146-9B0F-75F4B8B8DB21}
x-cr-hashedpuzzle: K6A= DQeG DXlY D8/C ELhd E8hE H9oK Jb7o Jo6d O4Po PMOX QqwP QzEZ Q1ag RrTG UBtx; 2; ZABtAG0AQAAxAC0ANAAtADUALgBuAGUAdAA7AG4AdgBvADMAQABpAGUAdABmAC4AbwByAGcA; Sosha1_v1; 7; {6A4C5A08-0051-4146-9B0F-75F4B8B8DB21}; ZABhAHYAaQBkAC4AYgBsAGEAYwBrAEAAZQBtAGMALgBjAG8AbQA=; Tue, 10 Apr 2012 13:41:33 GMT; UgBlAGEAYwBoAGEAYgBpAGwAaQB0AHkAIABhAG4AZAAgAEwAbwBjAGEAdABvAHIAIABQAGEAdABoACAATABpAHYAZQBuAGUAcwBzAA==
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-EMM-MHVC: 1
Cc: nvo3@ietf.org
Subject: [nvo3] Reachability and Locator Path Liveness
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "L2 \"Network Virtualization Over l3\" overlay discussion list \(nvo3\)" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Apr 2012 13:41:57 -0000

Dave,

In reverse order ...

> > Specifically, for initial nvo3 work I'd suggest assuming
> > that the underlying network handles reachability of the NVE at the other side of
> > the overlay (other end of the tunnel) that does the decapsulation.  In terms
> > of that draft, within a single data center (and hence for the scope of initial
> > nvo3 work), I'd suggest that the underlying network be responsible for handling
> > the Locator Path Liveness problem.
> 
> Not sure what you mean here by underlying network. In the LISP case,
> does the underlying network handle this problem? In any event can you
> be a bit more explicit in what you mean here?

The underlying network is what connects the NVEs (encap/decap locations) in
the data center, and it runs an IGP (e.g., OSPF with ECMP) for its routing. 
That IGP is going to be configured with relatively conventional IP address
blocks (e.g., /24s for IPv4 if that's what the data center is already using).

Backing up ...

> > Looking at draft-meyer-loc-id-implications-01:
> >
> >        http://tools.ietf.org/html/draft-meyer-loc-id-implications-01
> >
> > I would suggest that for initial nvo3 work, reachability between all NVEs in a
> > single overlay instance should be assumed, as there will be an IGP routing protocol
> > (e.g., OSPF with ECMP) running on the underlying data center network which will
> > handle link failures.
> 
> That may be a reasonable assumption, but the fact that an IGP is
> running doesn't ease the "locator liveness" problem unless the routing
> system is carrying /32s (or /128s) and the corresponding /32 (/128)
> isn't injected into the IGP unless the decapsulator is up (that in and
> of itself might not be sufficient as we've learned from our
> experiences with anycast DNS overlays). In any event what we would be
> doing in this case is using the routing system to signal a live path
> to the decapsulator. Of course, carrying such long prefixes has its
> own set of problems.

I strongly disagree with that statement.

First, what may be an important difference is that when the NVE is not
in the end system (e.g., NVE in hypervisor softswitch or top-of-rack switch),
the locator (outer IP destination address) points at the NVE (tunnel endpoint,
decapsulation location), not the end system.  The end system is beyond the NVE,
so the NVE decaps the L2 frame and forwards based on that frame's L2 destination
(MAC) address (so the NVE serves multiple end systems.

To take a specific IPv4 example, it should be fine to have a bunch of NVEs
in 10.1.1.0/24 and expect OSPF or the like to deal with that /24 and its kin
as opposed to a /32 per NVE.  This example also applies to NVEs in end systems
- the IGP can still work with /24s and the like and does not need /32s (in this
case, each /24 points to an L2 domain with potentially many end system NVEs).

> > That suggests that a starting point for whether different tunnel encapsulation types
> > should be supported in a single data center could be "if they don't have an NVE
> > node in common, they can be made to work" and optimizations can be considered later.
> 
> Agree with this latter statement.

Thank you.

Thanks,
--David


> -----Original Message-----
> From: David Meyer [mailto:dmm@1-4-5.net]
> Sent: Tuesday, April 10, 2012 7:55 AM
> To: Black, David
> Cc: nvo3@ietf.org
> Subject: Re: [nvo3] Requirements + some non-requirement suggestions
> 
> On Mon, Apr 9, 2012 at 1:51 PM,  <david.black@emc.com> wrote:
> >> Dave McDyson asked about three classes of connectivity:
> >>
> >> 1) a single data center
> >> 2) a set of data centers under control of one administrative domain
> >> 3) multiple sets of data centers under control of multiple
> >>       administrative domains
> >>
> >> Which of these do we *need* to address in NVO3?
> >
> > I agree with a number of other people that we have to start with 1), and
> then I'd
> > suggest addressing as much of 2) as "makes sense" without significantly
> affecting
> > the design and applicability of the result to data centers.  For example, a
> single
> > instance of an overlay that spans data centers "makes sense", at least to
> me, or
> > as Thomas Narten described it:
> >
> >> two DCs are under the same administrative domain, but are
> >> interconnected by some (existing) VPN technology, of which we don't
> >> care what the details are. The overlay just tunnels end-to-end and
> >> couldn't care less about the existence of a VPN interconnecting parts
> >> of the network.
> >
> > Looking at draft-meyer-loc-id-implications-01:
> >
> >        http://tools.ietf.org/html/draft-meyer-loc-id-implications-01
> >
> > I would suggest that for initial nvo3 work, reachability between all NVEs in
> a
> > single overlay instance should be assumed, as there will be an IGP routing
> protocol
> > (e.g., OSPF with ECMP) running on the underlying data center network which
> will
> > handle link failures.
> 
> That may be a reasonable assumption, but the fact that an IGP is
> running doesn't ease the "locator liveness" problem unless the routing
> system is carrying /32s (or /128s) and the corresponding /32 (/128)
> isn't injected into the IGP unless the decapsulator is up (that in and
> of itself might not be sufficient as we've learned from our
> experiences with anycast DNS overlays). In any event what we would be
> doing in this case is using the routing system to signal a live path
> to the decapsulator. Of course, carrying such long prefixes has its
> own set of problems.
> 
> > Specifically, for initial nvo3 work I'd suggest assuming
> > that the underlying network handles reachability of the NVE at the other
> side of
> > the overlay (other end of the tunnel) that does the decapsulation.  In terms
> > of that draft, within a single data center (and hence for the scope of
> initial
> > nvo3 work), I'd suggest that the underlying network be responsible for
> handling
> > the Locator Path Liveness problem.
> 
> Not sure what you mean here by underlying network. In the LISP case,
> does the underlying network handle this problem? In any event can you
> be a bit more explicit in what you mean here?
> >
> > This suggestion also applies to the Multi-Exit problem, although on a
> related
> > note, I think it's a good idea to make sure that nvo3 doesn't turn any
> crucial
> > NVE into a single point of failure. Techniques like VRRP address this in the
> > absence of nvo3, so this could be mostly a matter of paying attention to
> ensure
> > that they're applicable to NVE failure.  Regardless of whether things work
> out
> > that way, I'd suggest that availability concerns be in scope.
> >
> > Turning to the topic of IGP metrics and "tromboning", I'd suggest that
> having
> > nvo3 add a full IGP routing protocol (or even an IGP metrics infrastructure
> for
> > one that has to be administered) beyond what's already running in the
> underlying
> > network is not a good idea.  It seems like a large portion of the
> "tromboning"
> > concerns could be resolved by techniques that distribute the default gateway
> in
> > a virtual network so that moving a VM (virtual machine) automatically sends
> > traffic to the locally-applicable instance of the default gateway for the
> VM's
> > new location, based on the same L2 address. There are multiple examples of
> this
> > sort of approach - OTV and draft-raggarwa-data-center-mobility-02 are among
> them.
> >
> > One more - Peter Ashwood-Smith writes:
> >
> >> Is it a requirement to support different tunnel encapsulation types in the
> same DC?
> >>
> >> It would seem that a very large DC could well end up with several different
> kinds of
> >> tunnel encapsulations that would need to somehow be bridged if they
> terminate VMs in
> >> the same subnet.
> >
> > I'd suggest that the latter scenario be out of scope and that crossing
> virtual
> > networks initially involve routing in preference to bridging, so that an NVE
> > receiving an unencapsulated packet can determine the overlay and
> encapsulation by
> > knowing which virtual network the packet belongs to.  An implication is that
> I'd
> > suggest figuring out how to optimize the following structure into a single
> > network node later (or at least as a cleanly separable work effort:
> >
> > ... (Overlay1)---- NVE1 ---- (VLANs) ---- NVE2 ---- (Overlay2) ...
> >
> > In the above, NVE1 and NVE2 are separate nodes, and the parenthesized terms
> are
> > the means of virtual network separation.
> >
> > That suggests that a starting point for whether different tunnel
> encapsulation types
> > should be supported in a single data center could be "if they don't have an
> NVE
> > node in common, they can be made to work" and optimizations can be
> considered later.
> 
> Agree with this latter statement.
> 
> Dave