Re: [nvo3] Reachability and Locator Path Liveness

<david.black@emc.com> Tue, 10 April 2012 22:29 UTC

Return-Path: <david.black@emc.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3790911E812C for <nvo3@ietfa.amsl.com>; Tue, 10 Apr 2012 15:29:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.29
X-Spam-Level:
X-Spam-Status: No, score=-110.29 tagged_above=-999 required=5 tests=[AWL=0.309, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GzNTxf6fyggX for <nvo3@ietfa.amsl.com>; Tue, 10 Apr 2012 15:29:43 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by ietfa.amsl.com (Postfix) with ESMTP id 06DB011E8119 for <nvo3@ietf.org>; Tue, 10 Apr 2012 15:29:42 -0700 (PDT)
Received: from hop04-l1d11-si03.isus.emc.com (HOP04-L1D11-SI03.isus.emc.com [10.254.111.23]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id q3AMTYdi016947 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 10 Apr 2012 18:29:38 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.145]) by hop04-l1d11-si03.isus.emc.com (RSA Interceptor); Tue, 10 Apr 2012 18:29:16 -0400
Received: from mxhub24.corp.emc.com (mxhub24.corp.emc.com [128.222.70.136]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id q3AMTFGW015793; Tue, 10 Apr 2012 18:29:16 -0400
Received: from mx15a.corp.emc.com ([169.254.1.107]) by mxhub24.corp.emc.com ([128.222.70.136]) with mapi; Tue, 10 Apr 2012 18:29:15 -0400
From: david.black@emc.com
To: dmm@1-4-5.net
Date: Tue, 10 Apr 2012 18:29:10 -0400
Thread-Topic: [nvo3] Reachability and Locator Path Liveness
Thread-Index: Ac0XNdTsG2RuX7GfS5aqxaKmRME9EAAMtDDQ
Message-ID: <8D3D17ACE214DC429325B2B98F3AE7120534E9@MX15A.corp.emc.com>
References: <8D3D17ACE214DC429325B2B98F3AE712034117@MX15A.corp.emc.com> <CAHiKxWiVQX=H23gFL7Z4bqKAadWqNBWnYuLz=DeGD7JVZODpYQ@mail.gmail.com> <8D3D17ACE214DC429325B2B98F3AE712034170@MX15A.corp.emc.com> <CAHiKxWhTOnDKhCz1GQ4p3igWJ8rySK7zAAw=EkqonwErhiE6kQ@mail.gmail.com>
In-Reply-To: <CAHiKxWhTOnDKhCz1GQ4p3igWJ8rySK7zAAw=EkqonwErhiE6kQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-cr-puzzleid: {3F00445F-55FE-4848-B650-0208B35DBEC6}
x-cr-hashedpuzzle: dRc= BEx1 BZad CBpC D4MS E0/U JcpQ KZ/W OBo9 PU3o QIaM R97I TH2k Ufak XMQH XWki; 2; ZABtAG0AQAAxAC0ANAAtADUALgBuAGUAdAA7AG4AdgBvADMAQABpAGUAdABmAC4AbwByAGcA; Sosha1_v1; 7; {3F00445F-55FE-4848-B650-0208B35DBEC6}; ZABhAHYAaQBkAC4AYgBsAGEAYwBrAEAAZQBtAGMALgBjAG8AbQA=; Tue, 10 Apr 2012 22:29:10 GMT; UgBFADoAIABbAG4AdgBvADMAXQAgAFIAZQBhAGMAaABhAGIAaQBsAGkAdAB5ACAAYQBuAGQAIABMAG8AYwBhAHQAbwByACAAUABhAHQAaAAgAEwAaQB2AGUAbgBlAHMAcwA=
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-EMM-MHVC: 1
Cc: nvo3@ietf.org
Subject: Re: [nvo3] Reachability and Locator Path Liveness
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "L2 \"Network Virtualization Over l3\" overlay discussion list \(nvo3\)" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Apr 2012 22:29:44 -0000

> > To take a specific IPv4 example, it should be fine to have a bunch of NVEs
> > in 10.1.1.0/24 and expect OSPF or the like to deal with that /24 and its kin
> > as opposed to a /32 per NVE.  This example also applies to NVEs in end systems
> > - the IGP can still work with /24s and the like and does not need /32s (in this
> > case, each /24 points to an L2 domain with potentially many end system NVEs).
> 
> Just because there is a covering prefix doesn't mean that any of the longer prefixes that are
> covered are reachable. It is just this loss of information (in this case due to aggregation) that causes
> the problem.

That is an assumption that can be made in a data center for the NVEs (which
don't move) based on the IGP in the underlying network (e.g., every NVE in
10.1.1.0/24 is reachable via 10.1.1.0/24).  The initial endpoint identifiers
for nvo3 are L2 MACs, which are inherently non-aggregatable, and I agree that
aggregation assumptions for IP address endpoint identifiers are potentially
problematic.

Thanks,
--David


> -----Original Message-----
> From: nvo3-bounces@ietf.org [mailto:nvo3-bounces@ietf.org] On Behalf Of David
> Meyer
> Sent: Tuesday, April 10, 2012 12:20 PM
> To: Black, David
> Cc: nvo3@ietf.org
> Subject: Re: [nvo3] Reachability and Locator Path Liveness
> 
> David,
> 
> 
> > Backing up ...
> >
> >> > Looking at draft-meyer-loc-id-implications-01:
> >> >
> >> >        http://tools.ietf.org/html/draft-meyer-loc-id-implications-01
> >> >
> >> > I would suggest that for initial nvo3 work, reachability between all NVEs
> in a
> >> > single overlay instance should be assumed, as there will be an IGP
> routing protocol
> >> > (e.g., OSPF with ECMP) running on the underlying data center network
> which will
> >> > handle link failures.
> >>
> >> That may be a reasonable assumption, but the fact that an IGP is
> >> running doesn't ease the "locator liveness" problem unless the routing
> >> system is carrying /32s (or /128s) and the corresponding /32 (/128)
> >> isn't injected into the IGP unless the decapsulator is up (that in and
> >> of itself might not be sufficient as we've learned from our
> >> experiences with anycast DNS overlays). In any event what we would be
> >> doing in this case is using the routing system to signal a live path
> >> to the decapsulator. Of course, carrying such long prefixes has its
> >> own set of problems.
> >
> > I strongly disagree with that statement.
> >
> > First, what may be an important difference is that when the NVE is not
> > in the end system (e.g., NVE in hypervisor softswitch or top-of-rack
> switch),
> > the locator (outer IP destination address) points at the NVE (tunnel
> endpoint,
> > decapsulation location), not the end system.  The end system is beyond the
> NVE,
> > so the NVE decaps the L2 frame and forwards based on that frame's L2
> destination
> > (MAC) address (so the NVE serves multiple end systems.
> 
> Well, the endpoint I was talking about is the decapsulator (e.g., a
> LISP ETR). So maybe
> you can get away with just injecting /32s for those (if one decided to
> do it this way).
> >
> > To take a specific IPv4 example, it should be fine to have a bunch of NVEs
> > in 10.1.1.0/24 and expect OSPF or the like to deal with that /24 and its kin
> > as opposed to a /32 per NVE.  This example also applies to NVEs in end
> systems
> > - the IGP can still work with /24s and the like and does not need /32s (in
> this
> > case, each /24 points to an L2 domain with potentially many end system
> NVEs).
> 
> Just because there is a covering prefix doesn't mean that any of the
> longer prefixes that are
> covered are reachable. It is just this loss of information (in this
> case due to aggregation) that causes
> the problem.
> 
> Dave
> 
> >
> >> > That suggests that a starting point for whether different tunnel
> encapsulation types
> >> > should be supported in a single data center could be "if they don't have
> an NVE
> >> > node in common, they can be made to work" and optimizations can be
> considered later.
> >>
> >> Agree with this latter statement.
> >
> > Thank you.
> >
> > Thanks,
> > --David
> >
> >
> >> -----Original Message-----
> >> From: David Meyer [mailto:dmm@1-4-5.net]
> >> Sent: Tuesday, April 10, 2012 7:55 AM
> >> To: Black, David
> >> Cc: nvo3@ietf.org
> >> Subject: Re: [nvo3] Requirements + some non-requirement suggestions
> >>
> >> On Mon, Apr 9, 2012 at 1:51 PM,  <david.black@emc.com> wrote:
> >> >> Dave McDyson asked about three classes of connectivity:
> >> >>
> >> >> 1) a single data center
> >> >> 2) a set of data centers under control of one administrative domain
> >> >> 3) multiple sets of data centers under control of multiple
> >> >>       administrative domains
> >> >>
> >> >> Which of these do we *need* to address in NVO3?
> >> >
> >> > I agree with a number of other people that we have to start with 1), and
> >> then I'd
> >> > suggest addressing as much of 2) as "makes sense" without significantly
> >> affecting
> >> > the design and applicability of the result to data centers.  For example,
> a
> >> single
> >> > instance of an overlay that spans data centers "makes sense", at least to
> >> me, or
> >> > as Thomas Narten described it:
> >> >
> >> >> two DCs are under the same administrative domain, but are
> >> >> interconnected by some (existing) VPN technology, of which we don't
> >> >> care what the details are. The overlay just tunnels end-to-end and
> >> >> couldn't care less about the existence of a VPN interconnecting parts
> >> >> of the network.
> >> >
> >> > Looking at draft-meyer-loc-id-implications-01:
> >> >
> >> >        http://tools.ietf.org/html/draft-meyer-loc-id-implications-01
> >> >
> >> > I would suggest that for initial nvo3 work, reachability between all NVEs
> in
> >> a
> >> > single overlay instance should be assumed, as there will be an IGP
> routing
> >> protocol
> >> > (e.g., OSPF with ECMP) running on the underlying data center network
> which
> >> will
> >> > handle link failures.
> >>
> >> That may be a reasonable assumption, but the fact that an IGP is
> >> running doesn't ease the "locator liveness" problem unless the routing
> >> system is carrying /32s (or /128s) and the corresponding /32 (/128)
> >> isn't injected into the IGP unless the decapsulator is up (that in and
> >> of itself might not be sufficient as we've learned from our
> >> experiences with anycast DNS overlays). In any event what we would be
> >> doing in this case is using the routing system to signal a live path
> >> to the decapsulator. Of course, carrying such long prefixes has its
> >> own set of problems.
> >>
> >> > Specifically, for initial nvo3 work I'd suggest assuming
> >> > that the underlying network handles reachability of the NVE at the other
> >> side of
> >> > the overlay (other end of the tunnel) that does the decapsulation.  In
> terms
> >> > of that draft, within a single data center (and hence for the scope of
> >> initial
> >> > nvo3 work), I'd suggest that the underlying network be responsible for
> >> handling
> >> > the Locator Path Liveness problem.
> >>
> >> Not sure what you mean here by underlying network. In the LISP case,
> >> does the underlying network handle this problem? In any event can you
> >> be a bit more explicit in what you mean here?
> >> >
> >> > This suggestion also applies to the Multi-Exit problem, although on a
> >> related
> >> > note, I think it's a good idea to make sure that nvo3 doesn't turn any
> >> crucial
> >> > NVE into a single point of failure. Techniques like VRRP address this in
> the
> >> > absence of nvo3, so this could be mostly a matter of paying attention to
> >> ensure
> >> > that they're applicable to NVE failure.  Regardless of whether things
> work
> >> out
> >> > that way, I'd suggest that availability concerns be in scope.
> >> >
> >> > Turning to the topic of IGP metrics and "tromboning", I'd suggest that
> >> having
> >> > nvo3 add a full IGP routing protocol (or even an IGP metrics
> infrastructure
> >> for
> >> > one that has to be administered) beyond what's already running in the
> >> underlying
> >> > network is not a good idea.  It seems like a large portion of the
> >> "tromboning"
> >> > concerns could be resolved by techniques that distribute the default
> gateway
> >> in
> >> > a virtual network so that moving a VM (virtual machine) automatically
> sends
> >> > traffic to the locally-applicable instance of the default gateway for the
> >> VM's
> >> > new location, based on the same L2 address. There are multiple examples
> of
> >> this
> >> > sort of approach - OTV and draft-raggarwa-data-center-mobility-02 are
> among
> >> them.
> >> >
> >> > One more - Peter Ashwood-Smith writes:
> >> >
> >> >> Is it a requirement to support different tunnel encapsulation types in
> the
> >> same DC?
> >> >>
> >> >> It would seem that a very large DC could well end up with several
> different
> >> kinds of
> >> >> tunnel encapsulations that would need to somehow be bridged if they
> >> terminate VMs in
> >> >> the same subnet.
> >> >
> >> > I'd suggest that the latter scenario be out of scope and that crossing
> >> virtual
> >> > networks initially involve routing in preference to bridging, so that an
> NVE
> >> > receiving an unencapsulated packet can determine the overlay and
> >> encapsulation by
> >> > knowing which virtual network the packet belongs to.  An implication is
> that
> >> I'd
> >> > suggest figuring out how to optimize the following structure into a
> single
> >> > network node later (or at least as a cleanly separable work effort:
> >> >
> >> > ... (Overlay1)---- NVE1 ---- (VLANs) ---- NVE2 ---- (Overlay2) ...
> >> >
> >> > In the above, NVE1 and NVE2 are separate nodes, and the parenthesized
> terms
> >> are
> >> > the means of virtual network separation.
> >> >
> >> > That suggests that a starting point for whether different tunnel
> >> encapsulation types
> >> > should be supported in a single data center could be "if they don't have
> an
> >> NVE
> >> > node in common, they can be made to work" and optimizations can be
> >> considered later.
> >>
> >> Agree with this latter statement.
> >>
> >> Dave
> >
> _______________________________________________
> nvo3 mailing list
> nvo3@ietf.org
> https://www.ietf.org/mailman/listinfo/nvo3