Re: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt]

Yakov Rekhter <yakov@juniper.net> Fri, 21 December 2012 16:26 UTC

Return-Path: <yakov@juniper.net>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0475521F84F9 for <nvo3@ietfa.amsl.com>; Fri, 21 Dec 2012 08:26:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -99.834
X-Spam-Level:
X-Spam-Status: No, score=-99.834 tagged_above=-999 required=5 tests=[AWL=-6.535, BAYES_00=-2.599, J_CHICKENPOX_26=0.6, J_CHICKENPOX_32=0.6, J_CHICKENPOX_34=0.6, MANGLED_DOMAIN=2.3, MANGLED_FROM=2.3, MANGLED_MEDS=2.3, MANGLED_SIZE=2.3, MANGLED_WORKS=2.3, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p3Uq2reiw-xE for <nvo3@ietfa.amsl.com>; Fri, 21 Dec 2012 08:26:01 -0800 (PST)
Received: from exprod7og111.obsmtp.com (exprod7og111.obsmtp.com [64.18.2.175]) by ietfa.amsl.com (Postfix) with ESMTP id 4C93421F871A for <nvo3@ietf.org>; Fri, 21 Dec 2012 08:25:55 -0800 (PST)
Received: from P-EMHUB03-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by exprod7ob111.postini.com ([64.18.6.12]) with SMTP ID DSNKUNSNkFecw/NoFd7yxkQfH4Zyf4EU95pY@postini.com; Fri, 21 Dec 2012 08:26:01 PST
Received: from magenta.juniper.net (172.17.27.123) by P-EMHUB03-HQ.jnpr.net (172.24.192.33) with Microsoft SMTP Server (TLS) id 8.3.213.0; Fri, 21 Dec 2012 08:24:21 -0800
Received: from juniper.net (sapphire.juniper.net [172.17.28.108]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id qBLGOJ322463; Fri, 21 Dec 2012 08:24:20 -0800 (PST) (envelope-from yakov@juniper.net)
Message-ID: <201212211624.qBLGOJ322463@magenta.juniper.net>
To: "NAPIERALA, MARIA H" <mn1921@att.com>
In-Reply-To: <1D70D757A2C9D54D83B4CBD7625FA80E010E66A5@MISOUT7MSGUSR9I.ITServices.sbc.com>
References: <2691CE0099834E4A9C5044EEC662BB9D44861C40@dfweml505-mbx> <E6E66922099CFB4391FAA7A7D3238F9F16C97A4CD0@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com> <2691CE0099834E4A9C5044EEC662BB9D44862C52@dfweml505-mbx> <1FB05356C8766F4E9C16732EDC663C0901128825@xmb-aln-x09.cisco.com> <2691CE0099834E4A9C5044EEC662BB9D44862E02@dfweml505-mbx> <E6E66922099CFB4391FAA7A7D3238F9F16C986E568@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com> <2691CE0099834E4A9C5044EEC662BB9D44863016@dfweml505-mbx> <201212141758.qBEHwFDU012866@cichlid.raleigh.ibm.com> <2691CE0099834E4A9C5044EEC662BB9D44863172@dfweml505-mbx> <201212141850.qBEIoxnk013459@cichlid.raleigh.ibm.com> <2691CE0099834E4A9C5044EEC662BB9D448631E7@dfweml505-mbx> <201212141957.qBEJvqQo014045@cichlid.raleigh.ibm.com> <2691CE0099834E4A9C5044EEC662BB9D44863238@dfweml505-mbx> <CAOA2mbx7BmxwVE9dZB6uNctAT6dZGKsAm-BiET2ceVoGo=p+iQ@mail.gmail.com> <CAOA2mbx0buy2aVgCm_X8H2_fDW8uFU_79mYQvkq6NO0xrW9PLw@mail.gmail.com> <1D70D757A2C9D54D83B4CBD7625FA80E010E4D4C@MISOUT7MSGUSR9I.ITServices.sbc.com` g <1D70D757A2C9D54D83B4CBD7625FA80E010E66A5@MISOUT7MSGUSR9I.ITServices.sbc.com>
X-MH-In-Reply-To: "NAPIERALA, MARIA H" <mn1921@att.com> message dated "Thu, 20 Dec 2012 21:36:50 +0000."
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <23071.1356107059.1@juniper.net>
Date: Fri, 21 Dec 2012 08:24:19 -0800
From: Yakov Rekhter <yakov@juniper.net>
Cc: Thomas Narten <narten@us.ibm.com>, Kireeti Kompella <kireeti.kompella@gmail.com>, Aldrin Isaac <aldrin.isaac@gmail.com>, "nvo3@ietf.org" <nvo3@ietf.org>
Subject: Re: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt]
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Dec 2012 16:26:04 -0000

Maria,

> EVPN complexity lies in the interaction with bridging. For instance if one =
> connects two EVPN access circuits with a physical wire (or bridges two VMs =
> over a tunnel) you get a multihomed bridged site. Only one of the access po=
> rts can be active or otherwise loops will form.
> 
> But let's step back and look at the problem we are trying to solve. If majo=
> rity (if not all) of traffic is IP and if majority of it is routed, wouldn'=
> t it be better to develop a networking solution that is optimized for this =
> majority of traffic (and not the vice versa)?
> 
> The question is what problem does EVPN solve? In the context of DC, EVPN can
> only address packets bridged in the same VLAN. If most packets are routed
> then EVPN, even if all the complexity problems are addressed, doesn't achieve
> anything for the traffic that is routed. 

The claim you made in the last paragraph above is factually incorrect
- in the context of DC, EVPN can address *not* "only packets bridged
in the same VLAN", but *also* can be used to provide (optimal) routing
among VMs in *different* VLANs (different IP subnets). For more details 
please read section 6.1 of draft-raggarwa-data-center-mobility-04.txt
(please note that what is described in 6.1 is applicable both in the 
presence and in the absence of VM mobility).

> I believe it is the wrong tradeoff to design a solution around EVPN 
> (i.e., around bridging).

You are certainly entitled to have your own beliefs...

Yakov.

> From: nvo3-bounces@ietf.org [mailto:nvo3-bounces@ietf.org] On Behalf Of Ald=
> rin Isaac
> Sent: Wednesday, December 19, 2012 2:43 PM
> To: Kireeti Kompella
> Cc: Thomas Narten; nvo3@ietf.org
> Subject: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for =
> draft-yong-nvo3-frwk-dpreq-addition-00.txt]
> 
> Hi Kireeti,
> 
> In E-VPN, ARP is only flooded when the MAC-IP binding is unknown in BGP.  O=
> nce it is known, the local PE responds locally to the ARP request.   This s=
> cales quite well so it's not the best reason to lean one way or other.
> 
> An alternative for edge routing using EVPN is for an NVE to localize the VN=
> s to which edge routing is desired and stand up a local IP forwarder across=
>  these VN using the IP info in the EVPN routes.  If the DMAC on a packet is=
>  not present in the EVI and if the payload is IP then pass to the IP forwar=
> der....
> 
> In regards to optimizing multicast, with EVPN this can be done using VN ded=
> icated to multicast distribution by using the VLAN-based MVR model.  It wor=
> ks well and used today.
> 
> Another problem that is addressed in EVPN is that segments can be multihome=
> d using LAG.  With IP-only solutions, physical end station would need to mu=
> ltihome by advertising loopback IP over multiple physical IP interfaces.
> 
> We can have our TORs and use them too!! :)
> 
> Best regards -- aldrin
> 
> On Wednesday, December 19, 2012, Kireeti Kompella wrote:
> Hi Aldrin,
> On Tue, Dec 18, 2012 at 8:29 PM, Aldrin Isaac <aldrin.isaac@gmail.com<mailt=
> o:aldrin.isaac@gmail.com>> wrote:
> Kireeti,
> 
> I'm not clear what difference it makes whether a packet is unicast
> forwarded using MAC address or IP address within a subnet
> 
> Two important differences:
> a) you don't have to know the MAC address if you forward on IP.  I.e., you =
> don't have to propagate the ARP to the destination (flood), get the reply, =
> bind IP to MAC (ARP table), and maintain ARP binding (timeout, validate, et=
> c.).  The first is a real problem; the rest are annoyances that become prob=
> lems at scale.
> 
> (Note that the ARMD WG was created to address this issue, and you know wher=
> e that ended.)
> 
> (Note further that this may be hard to do in general, but in the case of an=
>  orchestrated data center, you have the information about where a given IP =
> lives, and you have a control plane (ORACLE) to inform all relevant NVEs.  =
> And of course, an overlay to shield the infrastructure from poking its nose=
>  into your forwarding behavior -- i.e., the infra doesn't care whether you =
> route or switch TS traffic.)
> 
> b) In the quite common case where all traffic from a TS is IP, you don't ha=
> ve to maintain two tables and two forwarding paradigms at the NVE (one for =
> IPs and one for MACs).  This is common enough to warrant optimization.
> 
> A third difference is that if you have only unicast traffic, you don't have=
>  to maintain a multicast tree (for flooding).  For some, this is a nice bon=
> us, but I know you have a multicast packet or two in your network :-)
> 
> as long as
> it gets to the intended destination along the most optimal path,
> particularly when the price to pay is non-standard behavior
> (intra-subnet ARP manglers ;}, etc).  I understand the argument about
> the sub-optimal routing from a third site, but when the primary sites
> end up aggregating prefixes for scaling reasons that argument falls
> off the table.  One way or other the piper gets paid.
> 
> One way, the piper gets paid a fair bit more than the other!
> 
> In terms of the real world issue of getting there from here --
> personally I haven't seen any vendor working towards a standards-based
> solution that will allow intra-subnet routing for subnets over
> HW/TOR-based PE, let alone intra-subnet routing for subnets that span
> across both hypervisor-based PE and TOR-based PE.  This makes me leery
> of solutions that can only take us half way there, particularly during
> the transition phase.  So if we're talking about network
> virtualization based purely on hypervisors, "route IP, bridge non-IP"
> may be realistic if you're willing to accept the caveats, but does not
> seem to be otherwise.
> 
> Good point.  Clearly, this is not a local decision: "route IP, bridge non-I=
> P" means that intra-subnet routes are propagated the same way as inter-subn=
> et routes, and thus every NVE, h/w or s/w, must be on the same page.
> 
> To make this concrete using BGP VPNs, "route IP, bridge non-IP" means all r=
> outes, intra- and inter-subnet, are propagated as IP VPN routes, and E-VPN =
> routes contain MACs without IPs.  "Bridge intra-subnet IP and non-IP, route=
>  inter-subnet" means inter-subnet routes are propagated as IP VPN routes, a=
> nd intra-VPN routes as E-VPN MAC+IP routes.
> 
> We can have a chat off-list on h/w vendors working towards this.  Hopefully=
> , others will weigh the above arguments, and support this.  Deployers (like=
>  you) have a say in this too :-)
> 
> Btw, I understand how multicast may be less than efficient when
> building both inter and intra subnet trees for the same IP mcast group
> that end up overlapping links (maybe even more than twice) -- but I'd
> like to hear your take on any other *insolvable* issues with regard to
> multicast.
> 
> Isn't that enough?  :-)  I am not a multicast expert, but I can try to dig =
> up IRB multicast horror stories.
> 
> Cheers,
> Kireeti.
> 
> Best regards -- aldrin
> 
> 
> 
> On Tue, Dec 18, 2012 at 6:06 PM, Kireeti Kompella
> <kireeti.kompella@gmail.com<mailto:kireeti.kompella@gmail.com>> wrote:
> > Hi Thomas,
> >
> > On Dec 18, 2012, at 09:03 , Thomas Narten <narten@us.ibm.com<mailto:narte=
> n@us.ibm.com>> wrote:
> >
> >> Kireeti Kompella <kireeti.kompella@gmail.com<mailto:kireeti.kompella@gma=
> il.com>> writes:
> >>
> >>> The solution is simple: route if IP, bridge if not.  Yes, one could
> >>> do IRB, but why?  IRB brings in complications, especially for
> >>> multicast.  I'm sure someone suggested this already, so put me down
> >>> as supporting this view.
> >>
> >> I'm not sure I understand the difference.
> >>
> >> From an *NVE* perspective, when it receives a packet (which will have
> >> an L2 header), it can look at the Ethertype, and if its IP, it can
> >> route it. Otherwise, it can provide normal L2 service. So, in this
> >> sense, "route if IP, bridge if not" is straightforward. And more to
> >> the point, I assume that if the packet gets L2 service, the entire VN
> >> is treated as a *single* broadcast domain. All nodes can reach all
> >> other nodes. Right?
> >
> > Right.
> >
> >> Just so I understand, how is this different than IRB?  What does IRB
> >> imply that the above does not?
> >
> > IRB follows the principle of "bridge when you can, route otherwise".  So,=
>  an IP packet with dest IP in the same subnet actually gets bridged; the or=
> iginator (e.g., the VM) is responsible for ARPing the IP address, slapping =
> the right dest MAC on the packet and sending that to the NVE which simply f=
> orwards based on dest MAC address *without* decrementing the TTL.
> >
> > If the dest IP is in another subnet, the packet is sent to the gateway (w=
> hich for IRB would be the same NVE), which this time does an IP address loo=
> kup, decrements TTL and routes the packet.
> >
> > For multicast, there are even more differences.
> >
> >> But this is different than what (I believe) Lucy is arguing for. In
> >> the case of a multi-subnet VN, you have one VN, but it contains
> >> different subnets. Each subnet is intended to be one broadcast domain
> >> (i.e., equivalent of a VLAN), so that when sending LL multicast and
> >> the like on a specific subnet, such packets are *not* delivered to all
> >> nodes in the VN, but only those that are part of subnet.
> >
> > If one were to configure multiple subnets on a VLAN, I wonder if LL traff=
> ic goes to all members of the VLAN, or just those in the same subnet as the=
>  sender.  I suspect the former (but don't know).
> >
> >> This is a more complex type of service to provide. And I'm not sure we
> >> need this type of service to be provided by one VN.
> >
> > Agree.
> >
> >> A (seemingly
> >> simpler) alternative would be to put each subnet in its own VN and
> >> allow inter-subnet traffic to be handed as inter-VN traffic. So long
> >> as that case is optimized (i.e., the ingress NVE can tunnel directly
> >> to the egress NVE without adding triangular routing), this would seem
> >> to be a cleaner way to implement this.
> >
> > Can be done.  However, we're on Lucy's topic; mine was "route if IP, brid=
> ge otherwise"; the goal was to rationalize the need for Layer 2 forwarding =
> for non-IP traffic, and inter- and intra-subnet routing.
> >
> > Kireeti.
> >
> >> Thomas
> >>
> >> _______________________________________________
> >> nvo3 mailing list
> >> nvo3@ietf.org<mailto:nvo3@ietf.org>
> >> https://www.ietf.org/mailman/listinfo/nvo3
> >
> > _______________________________________________
> > nvo3 mailing list
> > nvo3@ietf.org<mailto:nvo3@ietf.org>
> > https://www.ietf.org/mailman/listinfo/nvo3
> 
> 
> 
> --
> Kireeti
> 
> --_000_1D70D757A2C9D54D83B4CBD7625FA80E010E66A5MISOUT7MSGUSR9I_
> Content-Type: text/html; charset="us-ascii"
> Content-Transfer-Encoding: quoted-printable
> 
> <html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
> osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
> xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
> //www.w3.org/TR/REC-html40">
> <head>
> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
> >
> <meta name=3D"Generator" content=3D"Microsoft Word 12 (filtered medium)">
> <style><!--
> /* Font Definitions */
> @font-face
> 	{font-family:Calibri;
> 	panose-1:2 15 5 2 2 2 4 3 2 4;}
> @font-face
> 	{font-family:Tahoma;
> 	panose-1:2 11 6 4 3 5 4 4 2 4;}
> @font-face
> 	{font-family:Consolas;
> 	panose-1:2 11 6 9 2 2 4 3 2 4;}
> /* Style Definitions */
> p.MsoNormal, li.MsoNormal, div.MsoNormal
> 	{margin:0in;
> 	margin-bottom:.0001pt;
> 	font-size:12.0pt;
> 	font-family:"Times New Roman","serif";}
> a:link, span.MsoHyperlink
> 	{mso-style-priority:99;
> 	color:blue;
> 	text-decoration:underline;}
> a:visited, span.MsoHyperlinkFollowed
> 	{mso-style-priority:99;
> 	color:purple;
> 	text-decoration:underline;}
> p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
> 	{mso-style-priority:99;
> 	mso-style-link:"Plain Text Char";
> 	margin:0in;
> 	margin-bottom:.0001pt;
> 	font-size:10.5pt;
> 	font-family:Consolas;}
> span.EmailStyle17
> 	{mso-style-type:personal-reply;
> 	font-family:"Calibri","sans-serif";
> 	color:#1F497D;}
> span.PlainTextChar
> 	{mso-style-name:"Plain Text Char";
> 	mso-style-priority:99;
> 	mso-style-link:"Plain Text";
> 	font-family:Consolas;}
> .MsoChpDefault
> 	{mso-style-type:export-only;}
> @page WordSection1
> 	{size:8.5in 11.0in;
> 	margin:1.0in 1.0in 1.0in 1.0in;}
> div.WordSection1
> 	{page:WordSection1;}
> --></style><!--[if gte mso 9]><xml>
> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
> </xml><![endif]--><!--[if gte mso 9]><xml>
> <o:shapelayout v:ext=3D"edit">
> <o:idmap v:ext=3D"edit" data=3D"1" />
> </o:shapelayout></xml><![endif]-->
> </head>
> <body lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
> <div class=3D"WordSection1">
> <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Consolas=
> ">EVPN complexity lies in the interaction with bridging. For instance if on=
> e connects two EVPN access circuits with a physical wire (or bridges two VM=
> s over a tunnel) you get a multihomed
>  bridged site. Only one of the access ports can be active or otherwise loop=
> s will form.<o:p></o:p></span></p>
> <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Consolas=
> "><o:p>&nbsp;</o:p></span></p>
> <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Consolas=
> ">But let&#8217;s step back and look at the problem we are trying to solve.=
>  If majority (if not all) of traffic is IP and if majority of it is routed,=
>  wouldn&#8217;t it be better to develop a networking
>  solution that is optimized for this majority of traffic (and not the vice =
> versa)?<o:p></o:p></span></p>
> <p class=3D"MsoPlainText"><span style=3D"font-size:11.0pt">The question is =
> what problem does EVPN solve? In the context of DC, EVPN can only address p=
> ackets bridged in the same VLAN. If most packets are routed then EVPN, even=
>  if all the complexity problems are
>  addressed, doesn't achieve anything for the traffic that is routed. I beli=
> eve it is the wrong tradeoff to design a solution around EVPN (i.e., around=
>  bridging).<o:p></o:p></span></p>
> <p class=3D"MsoPlainText"><span style=3D"font-size:11.0pt"><o:p>&nbsp;</o:p=
> ></span></p>
> <p class=3D"MsoPlainText"><span style=3D"font-size:11.0pt">Maria<o:p></o:p>=
> </span></p>
> <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
> libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
> /p>
> <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
> libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
> /p>
> <div style=3D"border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in =
> 4.0pt">
> <div>
> <div style=3D"border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in =
> 0in 0in">
> <p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
> ;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
> ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> nvo3-bou=
> nces@ietf.org [mailto:nvo3-bounces@ietf.org]
> <b>On Behalf Of </b>Aldrin Isaac<br>
> <b>Sent:</b> Wednesday, December 19, 2012 2:43 PM<br>
> <b>To:</b> Kireeti Kompella<br>
> <b>Cc:</b> Thomas Narten; nvo3@ietf.org<br>
> <b>Subject:</b> [nvo3] Multi-subnet VNs [was Re: FW: New Version Notificati=
> on for draft-yong-nvo3-frwk-dpreq-addition-00.txt]<o:p></o:p></span></p>
> </div>
> </div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> <p class=3D"MsoNormal">Hi Kireeti,<br>
> <br>
> In E-VPN, ARP is only flooded when the MAC-IP binding is unknown in BGP. &n=
> bsp;Once it is known, the local PE responds locally to the ARP request. &nb=
> sp;&nbsp;This scales quite well so it's not the best reason to lean one way=
>  or other.&nbsp;<o:p></o:p></p>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">An alternative for edge routing using EVPN&nbsp;is f=
> or an NVE to localize the VNs to which edge routing is desired and stand up=
>  a local IP&nbsp;forwarder across these VN using the IP info in the EVPN ro=
> utes. &nbsp;If the DMAC on a packet&nbsp;is not present
>  in the EVI and&nbsp;if the payload is IP then&nbsp;pass to the IP&nbsp;for=
> warder....<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">In regards to optimizing&nbsp;multicast,&nbsp;with&n=
> bsp;EVPN&nbsp;this can be done&nbsp;using&nbsp;VN&nbsp;dedicated to multica=
> st distribution by&nbsp;using the&nbsp;VLAN-based&nbsp;MVR model. &nbsp;It =
> works well and used today. &nbsp;<o:p></o:p></p>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">Another problem that is addressed in&nbsp;EVPN is th=
> at segments&nbsp;can be multihomed using LAG. &nbsp;With IP-only solutions,=
>  physical&nbsp;end station&nbsp;would need to multihome by&nbsp;advertising=
>  loopback IP over multiple physical&nbsp;IP interfaces.&nbsp;<o:p></o:p></p=
> >
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">We can have our TORs and use them too!! :)<o:p></o:p=
> ></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">Best regards -- aldrin<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><br>
> On Wednesday, December 19, 2012, Kireeti Kompella wrote:<o:p></o:p></p>
> <div>
> <div>
> <p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">Hi Aldrin,<o:p></o:p>=
> </p>
> <div>
> <p class=3D"MsoNormal">On Tue, Dec 18, 2012 at 8:29 PM, Aldrin Isaac &lt;<a=
>  href=3D"mailto:aldrin.isaac@gmail.com">aldrin.isaac@gmail.com</a>&gt; wrot=
> e:<o:p></o:p></p>
> <p class=3D"MsoNormal">Kireeti,<br>
> <br>
> I'm not clear what difference it makes whether a packet is unicast<br>
> forwarded using MAC address or IP address within a subnet <o:p></o:p></p>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">Two important differences:<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">a) you don't have to know the MAC address if you for=
> ward on IP. &nbsp;I.e., you don't have to propagate the ARP to the destinat=
> ion (flood), get the reply, bind IP to MAC (ARP table), and maintain ARP bi=
> nding (timeout, validate, etc.). &nbsp;The first
>  is a real problem; the rest are annoyances that become problems at scale.<=
> o:p></o:p></p>
> </div>
> </div>
> </div>
> </div>
> <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i=
> n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> <div>
> <div>
> <div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">(Note that the ARMD WG was created to address this i=
> ssue, and you know where that ended.)<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">(Note further that this may be hard to do in general=
> , but in the case of an orchestrated data center, you have the information =
> about where a given IP lives, and you have a control plane (ORACLE) to info=
> rm all relevant NVEs. &nbsp;And of course,
>  an overlay to shield the infrastructure from poking its nose into your for=
> warding behavior -- i.e., the infra doesn't care whether you route or switc=
> h TS traffic.)<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">b) In the quite common case where all traffic from a=
>  TS is IP, you don't have to maintain two tables and two forwarding paradig=
> ms at the NVE (one for IPs and one for MACs). &nbsp;This is common enough t=
> o warrant optimization.<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">A third difference is that&nbsp;if you have only uni=
> cast traffic,&nbsp;you don't have to maintain a multicast tree (for floodin=
> g). &nbsp;For some, this is a nice bonus, but I know you have a multicast p=
> acket or two in your network :-)<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i=
> n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> <p class=3D"MsoNormal">as long as<br>
> it gets to the intended destination along the most optimal path,<br>
> particularly when the price to pay is non-standard behavior<br>
> (intra-subnet ARP manglers ;}, etc). &nbsp;I understand the argument about<=
> br>
> the sub-optimal routing from a third site, but when the primary sites<br>
> end up aggregating prefixes for scaling reasons that argument falls<br>
> off the table. &nbsp;One way or other the piper gets paid.<o:p></o:p></p>
> </blockquote>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">One way, the piper gets paid a fair bit more than th=
> e other!<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i=
> n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> <p class=3D"MsoNormal">In terms of the real world issue of getting there fr=
> om here --<br>
> personally I haven't seen any vendor working towards a standards-based<br>
> solution that will allow intra-subnet routing for subnets over<br>
> HW/TOR-based PE, let alone intra-subnet routing for subnets that span<br>
> across both hypervisor-based PE and TOR-based PE. &nbsp;This makes me leery=
> <br>
> of solutions that can only take us half way there, particularly during<br>
> the transition phase. &nbsp;So if we're talking about network<br>
> virtualization based purely on hypervisors, &quot;route IP, bridge non-IP&q=
> uot;<br>
> may be realistic if you're willing to accept the caveats, but does not<br>
> seem to be otherwise.<o:p></o:p></p>
> </blockquote>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">Good point. &nbsp;Clearly, this is not a local decis=
> ion: &quot;route IP, bridge non-IP&quot; means that intra-subnet routes are=
>  propagated the same way as inter-subnet routes, and thus every NVE, h/w or=
>  s/w, must be on the same page.<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">To make this concrete using BGP VPNs, &quot;route IP=
> , bridge non-IP&quot; means all routes, intra- and inter-subnet, are propag=
> ated as IP VPN routes, and E-VPN routes contain MACs without IPs. &nbsp;&qu=
> ot;Bridge intra-subnet IP and non-IP, route inter-subnet&quot;
>  means inter-subnet routes are propagated as IP VPN routes, and intra-VPN r=
> outes as E-VPN MAC&#43;IP routes.<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">We can have a chat off-list on h/w vendors working t=
> owards this. &nbsp;Hopefully, others will weigh the above arguments, and su=
> pport this. &nbsp;Deployers (like you) have a say in this too :-)<o:p></o:p=
> ></p>
> </div>
> <div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i=
> n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> <p class=3D"MsoNormal">Btw, I understand how multicast may be less than eff=
> icient when<br>
> building both inter and intra subnet trees for the same IP mcast group<br>
> that end up overlapping links (maybe even more than twice) -- but I'd<br>
> like to hear your take on any other *insolvable* issues with regard to<br>
> multicast.<o:p></o:p></p>
> </blockquote>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">Isn't that enough? &nbsp;:-) &nbsp;I am not a multic=
> ast expert, but I can try to dig up IRB multicast horror stories.<o:p></o:p=
> ></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">Cheers,<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal">Kireeti.<o:p></o:p></p>
> </div>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i=
> n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> <p class=3D"MsoNormal">Best regards -- aldrin<o:p></o:p></p>
> <div>
> <p class=3D"MsoNormal"><br>
> <br>
> <br>
> On Tue, Dec 18, 2012 at 6:06 PM, Kireeti Kompella<br>
> &lt;<a href=3D"mailto:kireeti.kompella@gmail.com">kireeti.kompella@gmail.co=
> m</a>&gt; wrote:<o:p></o:p></p>
> </div>
> <div>
> <div>
> <p class=3D"MsoNormal">&gt; Hi Thomas,<br>
> &gt;<br>
> &gt; On Dec 18, 2012, at 09:03 , Thomas Narten &lt;<a href=3D"mailto:narten=
> @us.ibm.com">narten@us.ibm.com</a>&gt; wrote:<br>
> &gt;<br>
> &gt;&gt; Kireeti Kompella &lt;<a href=3D"mailto:kireeti.kompella@gmail.com"=
> >kireeti.kompella@gmail.com</a>&gt; writes:<br>
> &gt;&gt;<br>
> &gt;&gt;&gt; The solution is simple: route if IP, bridge if not. &nbsp;Yes,=
>  one could<br>
> &gt;&gt;&gt; do IRB, but why? &nbsp;IRB brings in complications, especially=
>  for<br>
> &gt;&gt;&gt; multicast. &nbsp;I'm sure someone suggested this already, so p=
> ut me down<br>
> &gt;&gt;&gt; as supporting this view.<br>
> &gt;&gt;<br>
> &gt;&gt; I'm not sure I understand the difference.<br>
> &gt;&gt;<br>
> &gt;&gt; From an *NVE* perspective, when it receives a packet (which will h=
> ave<br>
> &gt;&gt; an L2 header), it can look at the Ethertype, and if its IP, it can=
> <br>
> &gt;&gt; route it. Otherwise, it can provide normal L2 service. So, in this=
> <br>
> &gt;&gt; sense, &quot;route if IP, bridge if not&quot; is straightforward. =
> And more to<br>
> &gt;&gt; the point, I assume that if the packet gets L2 service, the entire=
 VN<br>
> &gt;&gt; is treated as a *single* broadcast domain. All nodes can reach all=
> <br>
> &gt;&gt; other nodes. Right?<br>
> &gt;<br>
> &gt; Right.<br>
> &gt;<br>
> &gt;&gt; Just so I understand, how is this different than IRB? &nbsp;What d=
> oes IRB<br>
> &gt;&gt; imply that the above does not?<br>
> &gt;<br>
> &gt; IRB follows the principle of &quot;bridge when you can, route otherwis=
> e&quot;. &nbsp;So, an IP packet with dest IP in the same subnet actually ge=
> ts bridged; the originator (e.g., the VM) is responsible for ARPing the IP =
> address, slapping the right dest MAC on the packet
>  and sending that to the NVE which simply forwards based on dest MAC addres=
> s *without* decrementing the TTL.<br>
> &gt;<br>
> &gt; If the dest IP is in another subnet, the packet is sent to the gateway=
>  (which for IRB would be the same NVE), which this time does an IP address =
> lookup, decrements TTL and routes the packet.<br>
> &gt;<br>
> &gt; For multicast, there are even more differences.<br>
> &gt;<br>
> &gt;&gt; But this is different than what (I believe) Lucy is arguing for. I=
> n<br>
> &gt;&gt; the case of a multi-subnet VN, you have one VN, but it contains<br=
> >
> &gt;&gt; different subnets. Each subnet is intended to be one broadcast dom=
> ain<br>
> &gt;&gt; (i.e., equivalent of a VLAN), so that when sending LL multicast an=
> d<br>
> &gt;&gt; the like on a specific subnet, such packets are *not* delivered to=
>  all<br>
> &gt;&gt; nodes in the VN, but only those that are part of subnet.<br>
> &gt;<br>
> &gt; If one were to configure multiple subnets on a VLAN, I wonder if LL tr=
> affic goes to all members of the VLAN, or just those in the same subnet as =
> the sender. &nbsp;I suspect the former (but don't know).<br>
> &gt;<br>
> &gt;&gt; This is a more complex type of service to provide. And I'm not sur=
> e we<br>
> &gt;&gt; need this type of service to be provided by one VN.<br>
> &gt;<br>
> &gt; Agree.<br>
> &gt;<br>
> &gt;&gt; A (seemingly<br>
> &gt;&gt; simpler) alternative would be to put each subnet in its own VN and=
> <br>
> &gt;&gt; allow inter-subnet traffic to be handed as inter-VN traffic. So lo=
> ng<br>
> &gt;&gt; as that case is optimized (i.e., the ingress NVE can tunnel direct=
> ly<br>
> &gt;&gt; to the egress NVE without adding triangular routing), this would s=
> eem<br>
> &gt;&gt; to be a cleaner way to implement this.<br>
> &gt;<br>
> &gt; Can be done. &nbsp;However, we're on Lucy's topic; mine was &quot;rout=
> e if IP, bridge otherwise&quot;; the goal was to rationalize the need for L=
> ayer 2 forwarding for non-IP traffic, and inter- and intra-subnet routing.<=
> br>
> &gt;<br>
> &gt; Kireeti.<br>
> &gt;<br>
> &gt;&gt; Thomas<br>
> &gt;&gt;<br>
> &gt;&gt; _______________________________________________<br>
> &gt;&gt; nvo3 mailing list<br>
> &gt;&gt; <a href=3D"mailto:nvo3@ietf.org">nvo3@ietf.org</a><br>
> &gt;&gt; <a href=3D"https://www.ietf.org/mailman/listinfo/nvo3" target=3D"_=
> blank">https://www.ietf.org/mailman/listinfo/nvo3</a><br>
> &gt;<br>
> &gt; _______________________________________________<br>
> &gt; nvo3 mailing list<br>
> &gt; <a href=3D"mailto:nvo3@ietf.org">nvo3@ietf.org</a><br>
> &gt; <a href=3D"https://www.ietf.org/mailman/listinfo/nvo3" target=3D"_blan=
> k">https://www.ietf.org/mailman/listinfo/nvo3</a><o:p></o:p></p>
> </div>
> </div>
> </blockquote>
> </div>
> <p class=3D"MsoNormal"><br>
> <br clear=3D"all">
> <o:p></o:p></p>
> <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> </div>
> <p class=3D"MsoNormal">-- <br>
> Kireeti<o:p></o:p></p>
> </div>
> </div>
> </blockquote>
> </div>
> </div>
> </div>
> </div>
> </body>
> </html>
> 
> --_000_1D70D757A2C9D54D83B4CBD7625FA80E010E66A5MISOUT7MSGUSR9I_--
> 
> --===============3861516006203949397==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
> 
> _______________________________________________
> nvo3 mailing list
> nvo3@ietf.org
> https://www.ietf.org/mailman/listinfo/nvo3
> 
> --===============3861516006203949397==--