Re: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt]
Yakov Rekhter <yakov@juniper.net> Fri, 21 December 2012 16:26 UTC
Return-Path: <yakov@juniper.net>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0475521F84F9 for <nvo3@ietfa.amsl.com>; Fri, 21 Dec 2012 08:26:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -99.834
X-Spam-Level:
X-Spam-Status: No, score=-99.834 tagged_above=-999 required=5 tests=[AWL=-6.535, BAYES_00=-2.599, J_CHICKENPOX_26=0.6, J_CHICKENPOX_32=0.6, J_CHICKENPOX_34=0.6, MANGLED_DOMAIN=2.3, MANGLED_FROM=2.3, MANGLED_MEDS=2.3, MANGLED_SIZE=2.3, MANGLED_WORKS=2.3, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p3Uq2reiw-xE for <nvo3@ietfa.amsl.com>; Fri, 21 Dec 2012 08:26:01 -0800 (PST)
Received: from exprod7og111.obsmtp.com (exprod7og111.obsmtp.com [64.18.2.175]) by ietfa.amsl.com (Postfix) with ESMTP id 4C93421F871A for <nvo3@ietf.org>; Fri, 21 Dec 2012 08:25:55 -0800 (PST)
Received: from P-EMHUB03-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by exprod7ob111.postini.com ([64.18.6.12]) with SMTP ID DSNKUNSNkFecw/NoFd7yxkQfH4Zyf4EU95pY@postini.com; Fri, 21 Dec 2012 08:26:01 PST
Received: from magenta.juniper.net (172.17.27.123) by P-EMHUB03-HQ.jnpr.net (172.24.192.33) with Microsoft SMTP Server (TLS) id 8.3.213.0; Fri, 21 Dec 2012 08:24:21 -0800
Received: from juniper.net (sapphire.juniper.net [172.17.28.108]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id qBLGOJ322463; Fri, 21 Dec 2012 08:24:20 -0800 (PST) (envelope-from yakov@juniper.net)
Message-ID: <201212211624.qBLGOJ322463@magenta.juniper.net>
To: "NAPIERALA, MARIA H" <mn1921@att.com>
In-Reply-To: <1D70D757A2C9D54D83B4CBD7625FA80E010E66A5@MISOUT7MSGUSR9I.ITServices.sbc.com>
References: <2691CE0099834E4A9C5044EEC662BB9D44861C40@dfweml505-mbx> <E6E66922099CFB4391FAA7A7D3238F9F16C97A4CD0@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com> <2691CE0099834E4A9C5044EEC662BB9D44862C52@dfweml505-mbx> <1FB05356C8766F4E9C16732EDC663C0901128825@xmb-aln-x09.cisco.com> <2691CE0099834E4A9C5044EEC662BB9D44862E02@dfweml505-mbx> <E6E66922099CFB4391FAA7A7D3238F9F16C986E568@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com> <2691CE0099834E4A9C5044EEC662BB9D44863016@dfweml505-mbx> <201212141758.qBEHwFDU012866@cichlid.raleigh.ibm.com> <2691CE0099834E4A9C5044EEC662BB9D44863172@dfweml505-mbx> <201212141850.qBEIoxnk013459@cichlid.raleigh.ibm.com> <2691CE0099834E4A9C5044EEC662BB9D448631E7@dfweml505-mbx> <201212141957.qBEJvqQo014045@cichlid.raleigh.ibm.com> <2691CE0099834E4A9C5044EEC662BB9D44863238@dfweml505-mbx> <CAOA2mbx7BmxwVE9dZB6uNctAT6dZGKsAm-BiET2ceVoGo=p+iQ@mail.gmail.com> <CAOA2mbx0buy2aVgCm_X8H2_fDW8uFU_79mYQvkq6NO0xrW9PLw@mail.gmail.com> <1D70D757A2C9D54D83B4CBD7625FA80E010E4D4C@MISOUT7MSGUSR9I.ITServices.sbc.com` g <1D70D757A2C9D54D83B4CBD7625FA80E010E66A5@MISOUT7MSGUSR9I.ITServices.sbc.com>
X-MH-In-Reply-To: "NAPIERALA, MARIA H" <mn1921@att.com> message dated "Thu, 20 Dec 2012 21:36:50 +0000."
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <23071.1356107059.1@juniper.net>
Date: Fri, 21 Dec 2012 08:24:19 -0800
From: Yakov Rekhter <yakov@juniper.net>
Cc: Thomas Narten <narten@us.ibm.com>, Kireeti Kompella <kireeti.kompella@gmail.com>, Aldrin Isaac <aldrin.isaac@gmail.com>, "nvo3@ietf.org" <nvo3@ietf.org>
Subject: Re: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt]
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Dec 2012 16:26:04 -0000
Maria, > EVPN complexity lies in the interaction with bridging. For instance if one = > connects two EVPN access circuits with a physical wire (or bridges two VMs = > over a tunnel) you get a multihomed bridged site. Only one of the access po= > rts can be active or otherwise loops will form. > > But let's step back and look at the problem we are trying to solve. If majo= > rity (if not all) of traffic is IP and if majority of it is routed, wouldn'= > t it be better to develop a networking solution that is optimized for this = > majority of traffic (and not the vice versa)? > > The question is what problem does EVPN solve? In the context of DC, EVPN can > only address packets bridged in the same VLAN. If most packets are routed > then EVPN, even if all the complexity problems are addressed, doesn't achieve > anything for the traffic that is routed. The claim you made in the last paragraph above is factually incorrect - in the context of DC, EVPN can address *not* "only packets bridged in the same VLAN", but *also* can be used to provide (optimal) routing among VMs in *different* VLANs (different IP subnets). For more details please read section 6.1 of draft-raggarwa-data-center-mobility-04.txt (please note that what is described in 6.1 is applicable both in the presence and in the absence of VM mobility). > I believe it is the wrong tradeoff to design a solution around EVPN > (i.e., around bridging). You are certainly entitled to have your own beliefs... Yakov. > From: nvo3-bounces@ietf.org [mailto:nvo3-bounces@ietf.org] On Behalf Of Ald= > rin Isaac > Sent: Wednesday, December 19, 2012 2:43 PM > To: Kireeti Kompella > Cc: Thomas Narten; nvo3@ietf.org > Subject: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for = > draft-yong-nvo3-frwk-dpreq-addition-00.txt] > > Hi Kireeti, > > In E-VPN, ARP is only flooded when the MAC-IP binding is unknown in BGP. O= > nce it is known, the local PE responds locally to the ARP request. This s= > cales quite well so it's not the best reason to lean one way or other. > > An alternative for edge routing using EVPN is for an NVE to localize the VN= > s to which edge routing is desired and stand up a local IP forwarder across= > these VN using the IP info in the EVPN routes. If the DMAC on a packet is= > not present in the EVI and if the payload is IP then pass to the IP forwar= > der.... > > In regards to optimizing multicast, with EVPN this can be done using VN ded= > icated to multicast distribution by using the VLAN-based MVR model. It wor= > ks well and used today. > > Another problem that is addressed in EVPN is that segments can be multihome= > d using LAG. With IP-only solutions, physical end station would need to mu= > ltihome by advertising loopback IP over multiple physical IP interfaces. > > We can have our TORs and use them too!! :) > > Best regards -- aldrin > > On Wednesday, December 19, 2012, Kireeti Kompella wrote: > Hi Aldrin, > On Tue, Dec 18, 2012 at 8:29 PM, Aldrin Isaac <aldrin.isaac@gmail.com<mailt= > o:aldrin.isaac@gmail.com>> wrote: > Kireeti, > > I'm not clear what difference it makes whether a packet is unicast > forwarded using MAC address or IP address within a subnet > > Two important differences: > a) you don't have to know the MAC address if you forward on IP. I.e., you = > don't have to propagate the ARP to the destination (flood), get the reply, = > bind IP to MAC (ARP table), and maintain ARP binding (timeout, validate, et= > c.). The first is a real problem; the rest are annoyances that become prob= > lems at scale. > > (Note that the ARMD WG was created to address this issue, and you know wher= > e that ended.) > > (Note further that this may be hard to do in general, but in the case of an= > orchestrated data center, you have the information about where a given IP = > lives, and you have a control plane (ORACLE) to inform all relevant NVEs. = > And of course, an overlay to shield the infrastructure from poking its nose= > into your forwarding behavior -- i.e., the infra doesn't care whether you = > route or switch TS traffic.) > > b) In the quite common case where all traffic from a TS is IP, you don't ha= > ve to maintain two tables and two forwarding paradigms at the NVE (one for = > IPs and one for MACs). This is common enough to warrant optimization. > > A third difference is that if you have only unicast traffic, you don't have= > to maintain a multicast tree (for flooding). For some, this is a nice bon= > us, but I know you have a multicast packet or two in your network :-) > > as long as > it gets to the intended destination along the most optimal path, > particularly when the price to pay is non-standard behavior > (intra-subnet ARP manglers ;}, etc). I understand the argument about > the sub-optimal routing from a third site, but when the primary sites > end up aggregating prefixes for scaling reasons that argument falls > off the table. One way or other the piper gets paid. > > One way, the piper gets paid a fair bit more than the other! > > In terms of the real world issue of getting there from here -- > personally I haven't seen any vendor working towards a standards-based > solution that will allow intra-subnet routing for subnets over > HW/TOR-based PE, let alone intra-subnet routing for subnets that span > across both hypervisor-based PE and TOR-based PE. This makes me leery > of solutions that can only take us half way there, particularly during > the transition phase. So if we're talking about network > virtualization based purely on hypervisors, "route IP, bridge non-IP" > may be realistic if you're willing to accept the caveats, but does not > seem to be otherwise. > > Good point. Clearly, this is not a local decision: "route IP, bridge non-I= > P" means that intra-subnet routes are propagated the same way as inter-subn= > et routes, and thus every NVE, h/w or s/w, must be on the same page. > > To make this concrete using BGP VPNs, "route IP, bridge non-IP" means all r= > outes, intra- and inter-subnet, are propagated as IP VPN routes, and E-VPN = > routes contain MACs without IPs. "Bridge intra-subnet IP and non-IP, route= > inter-subnet" means inter-subnet routes are propagated as IP VPN routes, a= > nd intra-VPN routes as E-VPN MAC+IP routes. > > We can have a chat off-list on h/w vendors working towards this. Hopefully= > , others will weigh the above arguments, and support this. Deployers (like= > you) have a say in this too :-) > > Btw, I understand how multicast may be less than efficient when > building both inter and intra subnet trees for the same IP mcast group > that end up overlapping links (maybe even more than twice) -- but I'd > like to hear your take on any other *insolvable* issues with regard to > multicast. > > Isn't that enough? :-) I am not a multicast expert, but I can try to dig = > up IRB multicast horror stories. > > Cheers, > Kireeti. > > Best regards -- aldrin > > > > On Tue, Dec 18, 2012 at 6:06 PM, Kireeti Kompella > <kireeti.kompella@gmail.com<mailto:kireeti.kompella@gmail.com>> wrote: > > Hi Thomas, > > > > On Dec 18, 2012, at 09:03 , Thomas Narten <narten@us.ibm.com<mailto:narte= > n@us.ibm.com>> wrote: > > > >> Kireeti Kompella <kireeti.kompella@gmail.com<mailto:kireeti.kompella@gma= > il.com>> writes: > >> > >>> The solution is simple: route if IP, bridge if not. Yes, one could > >>> do IRB, but why? IRB brings in complications, especially for > >>> multicast. I'm sure someone suggested this already, so put me down > >>> as supporting this view. > >> > >> I'm not sure I understand the difference. > >> > >> From an *NVE* perspective, when it receives a packet (which will have > >> an L2 header), it can look at the Ethertype, and if its IP, it can > >> route it. Otherwise, it can provide normal L2 service. So, in this > >> sense, "route if IP, bridge if not" is straightforward. And more to > >> the point, I assume that if the packet gets L2 service, the entire VN > >> is treated as a *single* broadcast domain. All nodes can reach all > >> other nodes. Right? > > > > Right. > > > >> Just so I understand, how is this different than IRB? What does IRB > >> imply that the above does not? > > > > IRB follows the principle of "bridge when you can, route otherwise". So,= > an IP packet with dest IP in the same subnet actually gets bridged; the or= > iginator (e.g., the VM) is responsible for ARPing the IP address, slapping = > the right dest MAC on the packet and sending that to the NVE which simply f= > orwards based on dest MAC address *without* decrementing the TTL. > > > > If the dest IP is in another subnet, the packet is sent to the gateway (w= > hich for IRB would be the same NVE), which this time does an IP address loo= > kup, decrements TTL and routes the packet. > > > > For multicast, there are even more differences. > > > >> But this is different than what (I believe) Lucy is arguing for. In > >> the case of a multi-subnet VN, you have one VN, but it contains > >> different subnets. Each subnet is intended to be one broadcast domain > >> (i.e., equivalent of a VLAN), so that when sending LL multicast and > >> the like on a specific subnet, such packets are *not* delivered to all > >> nodes in the VN, but only those that are part of subnet. > > > > If one were to configure multiple subnets on a VLAN, I wonder if LL traff= > ic goes to all members of the VLAN, or just those in the same subnet as the= > sender. I suspect the former (but don't know). > > > >> This is a more complex type of service to provide. And I'm not sure we > >> need this type of service to be provided by one VN. > > > > Agree. > > > >> A (seemingly > >> simpler) alternative would be to put each subnet in its own VN and > >> allow inter-subnet traffic to be handed as inter-VN traffic. So long > >> as that case is optimized (i.e., the ingress NVE can tunnel directly > >> to the egress NVE without adding triangular routing), this would seem > >> to be a cleaner way to implement this. > > > > Can be done. However, we're on Lucy's topic; mine was "route if IP, brid= > ge otherwise"; the goal was to rationalize the need for Layer 2 forwarding = > for non-IP traffic, and inter- and intra-subnet routing. > > > > Kireeti. > > > >> Thomas > >> > >> _______________________________________________ > >> nvo3 mailing list > >> nvo3@ietf.org<mailto:nvo3@ietf.org> > >> https://www.ietf.org/mailman/listinfo/nvo3 > > > > _______________________________________________ > > nvo3 mailing list > > nvo3@ietf.org<mailto:nvo3@ietf.org> > > https://www.ietf.org/mailman/listinfo/nvo3 > > > > -- > Kireeti > > --_000_1D70D757A2C9D54D83B4CBD7625FA80E010E66A5MISOUT7MSGUSR9I_ > Content-Type: text/html; charset="us-ascii" > Content-Transfer-Encoding: quoted-printable > > <html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr= > osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" = > xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:= > //www.w3.org/TR/REC-html40"> > <head> > <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"= > > > <meta name=3D"Generator" content=3D"Microsoft Word 12 (filtered medium)"> > <style><!-- > /* Font Definitions */ > @font-face > {font-family:Calibri; > panose-1:2 15 5 2 2 2 4 3 2 4;} > @font-face > {font-family:Tahoma; > panose-1:2 11 6 4 3 5 4 4 2 4;} > @font-face > {font-family:Consolas; > panose-1:2 11 6 9 2 2 4 3 2 4;} > /* Style Definitions */ > p.MsoNormal, li.MsoNormal, div.MsoNormal > {margin:0in; > margin-bottom:.0001pt; > font-size:12.0pt; > font-family:"Times New Roman","serif";} > a:link, span.MsoHyperlink > {mso-style-priority:99; > color:blue; > text-decoration:underline;} > a:visited, span.MsoHyperlinkFollowed > {mso-style-priority:99; > color:purple; > text-decoration:underline;} > p.MsoPlainText, li.MsoPlainText, div.MsoPlainText > {mso-style-priority:99; > mso-style-link:"Plain Text Char"; > margin:0in; > margin-bottom:.0001pt; > font-size:10.5pt; > font-family:Consolas;} > span.EmailStyle17 > {mso-style-type:personal-reply; > font-family:"Calibri","sans-serif"; > color:#1F497D;} > span.PlainTextChar > {mso-style-name:"Plain Text Char"; > mso-style-priority:99; > mso-style-link:"Plain Text"; > font-family:Consolas;} > .MsoChpDefault > {mso-style-type:export-only;} > @page WordSection1 > {size:8.5in 11.0in; > margin:1.0in 1.0in 1.0in 1.0in;} > div.WordSection1 > {page:WordSection1;} > --></style><!--[if gte mso 9]><xml> > <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> > </xml><![endif]--><!--[if gte mso 9]><xml> > <o:shapelayout v:ext=3D"edit"> > <o:idmap v:ext=3D"edit" data=3D"1" /> > </o:shapelayout></xml><![endif]--> > </head> > <body lang=3D"EN-US" link=3D"blue" vlink=3D"purple"> > <div class=3D"WordSection1"> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Consolas= > ">EVPN complexity lies in the interaction with bridging. For instance if on= > e connects two EVPN access circuits with a physical wire (or bridges two VM= > s over a tunnel) you get a multihomed > bridged site. Only one of the access ports can be active or otherwise loop= > s will form.<o:p></o:p></span></p> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Consolas= > "><o:p> </o:p></span></p> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Consolas= > ">But let’s step back and look at the problem we are trying to solve.= > If majority (if not all) of traffic is IP and if majority of it is routed,= > wouldn’t it be better to develop a networking > solution that is optimized for this majority of traffic (and not the vice = > versa)?<o:p></o:p></span></p> > <p class=3D"MsoPlainText"><span style=3D"font-size:11.0pt">The question is = > what problem does EVPN solve? In the context of DC, EVPN can only address p= > ackets bridged in the same VLAN. If most packets are routed then EVPN, even= > if all the complexity problems are > addressed, doesn't achieve anything for the traffic that is routed. I beli= > eve it is the wrong tradeoff to design a solution around EVPN (i.e., around= > bridging).<o:p></o:p></span></p> > <p class=3D"MsoPlainText"><span style=3D"font-size:11.0pt"><o:p> </o:p= > ></span></p> > <p class=3D"MsoPlainText"><span style=3D"font-size:11.0pt">Maria<o:p></o:p>= > </span></p> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:"Ca= > libri","sans-serif";color:#1F497D"><o:p> </o:p></span><= > /p> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:"Ca= > libri","sans-serif";color:#1F497D"><o:p> </o:p></span><= > /p> > <div style=3D"border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in = > 4.0pt"> > <div> > <div style=3D"border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in = > 0in 0in"> > <p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:"= > ;Tahoma","sans-serif"">From:</span></b><span style=3D"font-s= > ize:10.0pt;font-family:"Tahoma","sans-serif""> nvo3-bou= > nces@ietf.org [mailto:nvo3-bounces@ietf.org] > <b>On Behalf Of </b>Aldrin Isaac<br> > <b>Sent:</b> Wednesday, December 19, 2012 2:43 PM<br> > <b>To:</b> Kireeti Kompella<br> > <b>Cc:</b> Thomas Narten; nvo3@ietf.org<br> > <b>Subject:</b> [nvo3] Multi-subnet VNs [was Re: FW: New Version Notificati= > on for draft-yong-nvo3-frwk-dpreq-addition-00.txt]<o:p></o:p></span></p> > </div> > </div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > <p class=3D"MsoNormal">Hi Kireeti,<br> > <br> > In E-VPN, ARP is only flooded when the MAC-IP binding is unknown in BGP. &n= > bsp;Once it is known, the local PE responds locally to the ARP request. &nb= > sp; This scales quite well so it's not the best reason to lean one way= > or other. <o:p></o:p></p> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">An alternative for edge routing using EVPN is f= > or an NVE to localize the VNs to which edge routing is desired and stand up= > a local IP forwarder across these VN using the IP info in the EVPN ro= > utes. If the DMAC on a packet is not present > in the EVI and if the payload is IP then pass to the IP for= > warder....<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">In regards to optimizing multicast, with&n= > bsp;EVPN this can be done using VN dedicated to multica= > st distribution by using the VLAN-based MVR model. It = > works well and used today. <o:p></o:p></p> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">Another problem that is addressed in EVPN is th= > at segments can be multihomed using LAG. With IP-only solutions,= > physical end station would need to multihome by advertising= > loopback IP over multiple physical IP interfaces. <o:p></o:p></p= > > > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">We can have our TORs and use them too!! :)<o:p></o:p= > ></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">Best regards -- aldrin<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><br> > On Wednesday, December 19, 2012, Kireeti Kompella wrote:<o:p></o:p></p> > <div> > <div> > <p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">Hi Aldrin,<o:p></o:p>= > </p> > <div> > <p class=3D"MsoNormal">On Tue, Dec 18, 2012 at 8:29 PM, Aldrin Isaac <<a= > href=3D"mailto:aldrin.isaac@gmail.com">aldrin.isaac@gmail.com</a>> wrot= > e:<o:p></o:p></p> > <p class=3D"MsoNormal">Kireeti,<br> > <br> > I'm not clear what difference it makes whether a packet is unicast<br> > forwarded using MAC address or IP address within a subnet <o:p></o:p></p> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">Two important differences:<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal">a) you don't have to know the MAC address if you for= > ward on IP. I.e., you don't have to propagate the ARP to the destinat= > ion (flood), get the reply, bind IP to MAC (ARP table), and maintain ARP bi= > nding (timeout, validate, etc.). The first > is a real problem; the rest are annoyances that become problems at scale.<= > o:p></o:p></p> > </div> > </div> > </div> > </div> > <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i= > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in"> > <div> > <div> > <div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">(Note that the ARMD WG was created to address this i= > ssue, and you know where that ended.)<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">(Note further that this may be hard to do in general= > , but in the case of an orchestrated data center, you have the information = > about where a given IP lives, and you have a control plane (ORACLE) to info= > rm all relevant NVEs. And of course, > an overlay to shield the infrastructure from poking its nose into your for= > warding behavior -- i.e., the infra doesn't care whether you route or switc= > h TS traffic.)<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">b) In the quite common case where all traffic from a= > TS is IP, you don't have to maintain two tables and two forwarding paradig= > ms at the NVE (one for IPs and one for MACs). This is common enough t= > o warrant optimization.<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">A third difference is that if you have only uni= > cast traffic, you don't have to maintain a multicast tree (for floodin= > g). For some, this is a nice bonus, but I know you have a multicast p= > acket or two in your network :-)<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i= > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in"> > <p class=3D"MsoNormal">as long as<br> > it gets to the intended destination along the most optimal path,<br> > particularly when the price to pay is non-standard behavior<br> > (intra-subnet ARP manglers ;}, etc). I understand the argument about<= > br> > the sub-optimal routing from a third site, but when the primary sites<br> > end up aggregating prefixes for scaling reasons that argument falls<br> > off the table. One way or other the piper gets paid.<o:p></o:p></p> > </blockquote> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">One way, the piper gets paid a fair bit more than th= > e other!<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i= > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in"> > <p class=3D"MsoNormal">In terms of the real world issue of getting there fr= > om here --<br> > personally I haven't seen any vendor working towards a standards-based<br> > solution that will allow intra-subnet routing for subnets over<br> > HW/TOR-based PE, let alone intra-subnet routing for subnets that span<br> > across both hypervisor-based PE and TOR-based PE. This makes me leery= > <br> > of solutions that can only take us half way there, particularly during<br> > the transition phase. So if we're talking about network<br> > virtualization based purely on hypervisors, "route IP, bridge non-IP&q= > uot;<br> > may be realistic if you're willing to accept the caveats, but does not<br> > seem to be otherwise.<o:p></o:p></p> > </blockquote> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">Good point. Clearly, this is not a local decis= > ion: "route IP, bridge non-IP" means that intra-subnet routes are= > propagated the same way as inter-subnet routes, and thus every NVE, h/w or= > s/w, must be on the same page.<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">To make this concrete using BGP VPNs, "route IP= > , bridge non-IP" means all routes, intra- and inter-subnet, are propag= > ated as IP VPN routes, and E-VPN routes contain MACs without IPs. &qu= > ot;Bridge intra-subnet IP and non-IP, route inter-subnet" > means inter-subnet routes are propagated as IP VPN routes, and intra-VPN r= > outes as E-VPN MAC+IP routes.<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">We can have a chat off-list on h/w vendors working t= > owards this. Hopefully, others will weigh the above arguments, and su= > pport this. Deployers (like you) have a say in this too :-)<o:p></o:p= > ></p> > </div> > <div> <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i= > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in"> > <p class=3D"MsoNormal">Btw, I understand how multicast may be less than eff= > icient when<br> > building both inter and intra subnet trees for the same IP mcast group<br> > that end up overlapping links (maybe even more than twice) -- but I'd<br> > like to hear your take on any other *insolvable* issues with regard to<br> > multicast.<o:p></o:p></p> > </blockquote> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">Isn't that enough? :-) I am not a multic= > ast expert, but I can try to dig up IRB multicast horror stories.<o:p></o:p= > ></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <div> > <p class=3D"MsoNormal">Cheers,<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal">Kireeti.<o:p></o:p></p> > </div> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <blockquote style=3D"border:none;border-left:solid #CCCCCC 1.0pt;padding:0i= > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in"> > <p class=3D"MsoNormal">Best regards -- aldrin<o:p></o:p></p> > <div> > <p class=3D"MsoNormal"><br> > <br> > <br> > On Tue, Dec 18, 2012 at 6:06 PM, Kireeti Kompella<br> > <<a href=3D"mailto:kireeti.kompella@gmail.com">kireeti.kompella@gmail.co= > m</a>> wrote:<o:p></o:p></p> > </div> > <div> > <div> > <p class=3D"MsoNormal">> Hi Thomas,<br> > ><br> > > On Dec 18, 2012, at 09:03 , Thomas Narten <<a href=3D"mailto:narten= > @us.ibm.com">narten@us.ibm.com</a>> wrote:<br> > ><br> > >> Kireeti Kompella <<a href=3D"mailto:kireeti.kompella@gmail.com"= > >kireeti.kompella@gmail.com</a>> writes:<br> > >><br> > >>> The solution is simple: route if IP, bridge if not. Yes,= > one could<br> > >>> do IRB, but why? IRB brings in complications, especially= > for<br> > >>> multicast. I'm sure someone suggested this already, so p= > ut me down<br> > >>> as supporting this view.<br> > >><br> > >> I'm not sure I understand the difference.<br> > >><br> > >> From an *NVE* perspective, when it receives a packet (which will h= > ave<br> > >> an L2 header), it can look at the Ethertype, and if its IP, it can= > <br> > >> route it. Otherwise, it can provide normal L2 service. So, in this= > <br> > >> sense, "route if IP, bridge if not" is straightforward. = > And more to<br> > >> the point, I assume that if the packet gets L2 service, the entire= VN<br> > >> is treated as a *single* broadcast domain. All nodes can reach all= > <br> > >> other nodes. Right?<br> > ><br> > > Right.<br> > ><br> > >> Just so I understand, how is this different than IRB? What d= > oes IRB<br> > >> imply that the above does not?<br> > ><br> > > IRB follows the principle of "bridge when you can, route otherwis= > e". So, an IP packet with dest IP in the same subnet actually ge= > ts bridged; the originator (e.g., the VM) is responsible for ARPing the IP = > address, slapping the right dest MAC on the packet > and sending that to the NVE which simply forwards based on dest MAC addres= > s *without* decrementing the TTL.<br> > ><br> > > If the dest IP is in another subnet, the packet is sent to the gateway= > (which for IRB would be the same NVE), which this time does an IP address = > lookup, decrements TTL and routes the packet.<br> > ><br> > > For multicast, there are even more differences.<br> > ><br> > >> But this is different than what (I believe) Lucy is arguing for. I= > n<br> > >> the case of a multi-subnet VN, you have one VN, but it contains<br= > > > >> different subnets. Each subnet is intended to be one broadcast dom= > ain<br> > >> (i.e., equivalent of a VLAN), so that when sending LL multicast an= > d<br> > >> the like on a specific subnet, such packets are *not* delivered to= > all<br> > >> nodes in the VN, but only those that are part of subnet.<br> > ><br> > > If one were to configure multiple subnets on a VLAN, I wonder if LL tr= > affic goes to all members of the VLAN, or just those in the same subnet as = > the sender. I suspect the former (but don't know).<br> > ><br> > >> This is a more complex type of service to provide. And I'm not sur= > e we<br> > >> need this type of service to be provided by one VN.<br> > ><br> > > Agree.<br> > ><br> > >> A (seemingly<br> > >> simpler) alternative would be to put each subnet in its own VN and= > <br> > >> allow inter-subnet traffic to be handed as inter-VN traffic. So lo= > ng<br> > >> as that case is optimized (i.e., the ingress NVE can tunnel direct= > ly<br> > >> to the egress NVE without adding triangular routing), this would s= > eem<br> > >> to be a cleaner way to implement this.<br> > ><br> > > Can be done. However, we're on Lucy's topic; mine was "rout= > e if IP, bridge otherwise"; the goal was to rationalize the need for L= > ayer 2 forwarding for non-IP traffic, and inter- and intra-subnet routing.<= > br> > ><br> > > Kireeti.<br> > ><br> > >> Thomas<br> > >><br> > >> _______________________________________________<br> > >> nvo3 mailing list<br> > >> <a href=3D"mailto:nvo3@ietf.org">nvo3@ietf.org</a><br> > >> <a href=3D"https://www.ietf.org/mailman/listinfo/nvo3" target=3D"_= > blank">https://www.ietf.org/mailman/listinfo/nvo3</a><br> > ><br> > > _______________________________________________<br> > > nvo3 mailing list<br> > > <a href=3D"mailto:nvo3@ietf.org">nvo3@ietf.org</a><br> > > <a href=3D"https://www.ietf.org/mailman/listinfo/nvo3" target=3D"_blan= > k">https://www.ietf.org/mailman/listinfo/nvo3</a><o:p></o:p></p> > </div> > </div> > </blockquote> > </div> > <p class=3D"MsoNormal"><br> > <br clear=3D"all"> > <o:p></o:p></p> > <div> > <p class=3D"MsoNormal"><o:p> </o:p></p> > </div> > <p class=3D"MsoNormal">-- <br> > Kireeti<o:p></o:p></p> > </div> > </div> > </blockquote> > </div> > </div> > </div> > </div> > </body> > </html> > > --_000_1D70D757A2C9D54D83B4CBD7625FA80E010E66A5MISOUT7MSGUSR9I_-- > > --===============3861516006203949397== > Content-Type: text/plain; charset="us-ascii" > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > Content-Disposition: inline > > _______________________________________________ > nvo3 mailing list > nvo3@ietf.org > https://www.ietf.org/mailman/listinfo/nvo3 > > --===============3861516006203949397==--
- Re: [nvo3] FW: New Version Notification for draft… Bitar, Nabil N
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Luyuan Fang (lufang)
- [nvo3] FW: New Version Notification for draft-yon… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… LASSERRE, MARC (MARC)
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… Yves Hertoghs (yhertogh)
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… LASSERRE, MARC (MARC)
- Re: [nvo3] FW: New Version Notification for draft… LASSERRE, MARC (MARC)
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… LASSERRE, MARC (MARC)
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… Thomas Narten
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… Thomas Narten
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… Thomas Narten
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… Aldrin Isaac
- Re: [nvo3] FW: New Version Notification for draft… Lizhong Jin
- Re: [nvo3] FW: New Version Notification for draft… Aldrin Isaac
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… NAPIERALA, MARIA H
- Re: [nvo3] FW: New Version Notification for draft… Aldrin Isaac
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… NAPIERALA, MARIA H
- Re: [nvo3] FW: New Version Notification for draft… Kireeti Kompella
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… Aldrin Isaac
- Re: [nvo3] FW: New Version Notification for draft… Luyuan Fang (lufang)
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- [nvo3] Multi-subnet VNs [was Re: FW: New Version … Thomas Narten
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Lucy yong
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Robert Raszuk
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Thomas Narten
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Lucy yong
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Kireeti Kompella
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Jeff Wheeler
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Lucy yong
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Xuxiaohu
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Aldrin Isaac
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Aldrin Isaac
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Kireeti Kompella
- Re: [nvo3] FW: New Version Notification for draft… Ali Sajassi (sajassi)
- [nvo3] Multi-subnet VNs [was Re: FW: New Version … Aldrin Isaac
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… NAPIERALA, MARIA H
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Yakov Rekhter
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Robert Raszuk
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… Luyuan Fang (lufang)
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Robert Raszuk
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Yakov Rekhter
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Yakov Rekhter
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Lucy yong
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Robert Raszuk
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Lucy yong
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Robert Raszuk
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… NAPIERALA, MARIA H
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Kireeti Kompella
- Re: [nvo3] FW: New Version Notification for draft… Kireeti Kompella
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Kireeti Kompella
- Re: [nvo3] FW: New Version Notification for draft… Luyuan Fang (lufang)
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Yakov Rekhter
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Robert Raszuk
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Lucy yong
- [nvo3] Multi-subnet VNs - not a new service type Black, David
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Lucy yong
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Black, David
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Kireeti Kompella
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Truman Boyes
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Lucy yong
- Re: [nvo3] Multi-subnet VNs [was Re: FW: New Vers… Lucy yong
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Black, David
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Lucy yong
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Jakob Heitz
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Lucy yong
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Joel M. Halpern
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Jakob Heitz
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Luyuan Fang (lufang)
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Lucy yong
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Lucy yong
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Jakob Heitz
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Lucy yong
- Re: [nvo3] Multi-subnet VNs - should be a new ser… Jakob Heitz
- Re: [nvo3] FW: New Version Notification for draft… Bitar, Nabil N
- Re: [nvo3] FW: New Version Notification for draft… NAPIERALA, MARIA H
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… NAPIERALA, MARIA H
- Re: [nvo3] FW: New Version Notification for draft… Truman Boyes
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… NAPIERALA, MARIA H
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… NAPIERALA, MARIA H
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… NAPIERALA, MARIA H
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- [nvo3] FW: New Version Notification for draft-yon… Aldrin Isaac
- Re: [nvo3] FW: New Version Notification for draft… Yves Hertoghs (yhertogh)
- Re: [nvo3] FW: New Version Notification for draft… Aldrin Isaac
- Re: [nvo3] FW: New Version Notification for draft… Lucy yong
- Re: [nvo3] FW: New Version Notification for draft… Aldrin Isaac