Re: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt]

"NAPIERALA, MARIA H" <mn1921@att.com> Sat, 22 December 2012 01:42 UTC

Return-Path: <mn1921@att.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3672921F84DD for <nvo3@ietfa.amsl.com>; Fri, 21 Dec 2012 17:42:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.174
X-Spam-Level:
X-Spam-Status: No, score=-104.174 tagged_above=-999 required=5 tests=[AWL=-2.275, BAYES_00=-2.599, J_CHICKENPOX_13=0.6, J_CHICKENPOX_26=0.6, J_CHICKENPOX_32=0.6, J_CHICKENPOX_34=0.6, MANGLED_MEDS=2.3, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P1uY91gZFALn for <nvo3@ietfa.amsl.com>; Fri, 21 Dec 2012 17:42:21 -0800 (PST)
Received: from nbfkord-smmo06.seg.att.com (nbfkord-smmo06.seg.att.com [209.65.160.94]) by ietfa.amsl.com (Postfix) with ESMTP id 0878D21E802E for <nvo3@ietf.org>; Fri, 21 Dec 2012 17:42:20 -0800 (PST)
Received: from unknown [144.160.128.153] (EHLO nbfkord-smmo06.seg.att.com) by nbfkord-smmo06.seg.att.com(mxl_mta-6.11.0-12) with ESMTP id dff05d05.2aab0405f940.1533500.00-500.4234613.nbfkord-smmo06.seg.att.com (envelope-from <mn1921@att.com>); Sat, 22 Dec 2012 01:42:21 +0000 (UTC)
X-MXL-Hash: 50d50ffd490997af-0fae117b36fa96265491dcc548f13cce9668814f
Received: from unknown [144.160.128.153] (EHLO flpi408.enaf.ffdc.sbc.com) by nbfkord-smmo06.seg.att.com(mxl_mta-6.11.0-12) over TLS secured channel with ESMTP id 6ff05d05.0.1533490.00-495.4234581.nbfkord-smmo06.seg.att.com (envelope-from <mn1921@att.com>); Sat, 22 Dec 2012 01:42:16 +0000 (UTC)
X-MXL-Hash: 50d50ff8263d8ee5-42c4421c20d3fc5995e80af6263dc649b73ceaae
Received: from enaf.ffdc.sbc.com (localhost.localdomain [127.0.0.1]) by flpi408.enaf.ffdc.sbc.com (8.14.5/8.14.5) with ESMTP id qBM1gESB023530; Fri, 21 Dec 2012 17:42:14 -0800
Received: from fflint03.pst.cso.att.com (fflint03.pst.cso.att.com [150.234.39.63]) by flpi408.enaf.ffdc.sbc.com (8.14.5/8.14.5) with ESMTP id qBM1g4cV023258 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 21 Dec 2012 17:42:06 -0800
Received: from MISOUT7MSGHUB9D.ITServices.sbc.com (misout7msghub9d.itservices.sbc.com [144.151.223.93]) by fflint03.pst.cso.att.com (RSA Interceptor); Fri, 21 Dec 2012 17:41:25 -0800
Received: from MISOUT7MSGUSR9I.ITServices.sbc.com ([144.151.223.56]) by MISOUT7MSGHUB9D.ITServices.sbc.com ([144.151.223.93]) with mapi id 14.02.0318.001; Fri, 21 Dec 2012 20:41:25 -0500
From: "NAPIERALA, MARIA H" <mn1921@att.com>
To: Yakov Rekhter <yakov@juniper.net>
Thread-Topic: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt]
Thread-Index: AQHN35fjpE2MzdR5G0yiyarsDeCN1ZgkAmHw
Date: Sat, 22 Dec 2012 01:41:24 +0000
Message-ID: <1D70D757A2C9D54D83B4CBD7625FA80E010E7346@MISOUT7MSGUSR9I.ITServices.sbc.com>
References: <2691CE0099834E4A9C5044EEC662BB9D44861C40@dfweml505-mbx> <E6E66922099CFB4391FAA7A7D3238F9F16C97A4CD0@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com> <2691CE0099834E4A9C5044EEC662BB9D44862C52@dfweml505-mbx> <1FB05356C8766F4E9C16732EDC663C0901128825@xmb-aln-x09.cisco.com> <2691CE0099834E4A9C5044EEC662BB9D44862E02@dfweml505-mbx> <E6E66922099CFB4391FAA7A7D3238F9F16C986E568@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com> <2691CE0099834E4A9C5044EEC662BB9D44863016@dfweml505-mbx> <201212141758.qBEHwFDU012866@cichlid.raleigh.ibm.com> <2691CE0099834E4A9C5044EEC662BB9D44863172@dfweml505-mbx> <201212141850.qBEIoxnk013459@cichlid.raleigh.ibm.com> <2691CE0099834E4A9C5044EEC662BB9D448631E7@dfweml505-mbx> <201212141957.qBEJvqQo014045@cichlid.raleigh.ibm.com> <2691CE0099834E4A9C5044EEC662BB9D44863238@dfweml505-mbx> <CAOA2mbx7BmxwVE9dZB6uNctAT6dZGKsAm-BiET2ceVoGo=p+iQ@mail.gmail.com> <CAOA2mbx0buy2aVgCm_X8H2_fDW8uFU_79mYQvkq6NO0xrW9PLw@mail.gmail.com> <1D70D757A2C9D54D83B4CBD7625FA80E010E4D4C@MISOUT7MSGUSR9I.ITServices.sbc.com` g <1D70D757A2C9D54D83B4CBD7625FA80E010E66A5@MISOUT7MSGUSR9I.ITServices.sbc.com> <201212211624.qBLGOJ322463@magenta.juniper.net>
In-Reply-To: <201212211624.qBLGOJ322463@magenta.juniper.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [135.70.84.156]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-RSA-Inspected: yes
X-RSA-Classifications: public
X-Spam: [F=0.2000000000; CM=0.500; S=0.200(2010122901)]
X-MAIL-FROM: <mn1921@att.com>
X-SOURCE-IP: [144.160.128.153]
X-AnalysisOut: [v=2.0 cv=QsfpKyOd c=1 sm=0 a=xwOvzTHDVLE4u4nGvK72ag==:17 a]
X-AnalysisOut: [=dKv35ZjAmyAA:10 a=GNtA4MkR7jEA:10 a=ofMgfj31e3cA:10 a=BLc]
X-AnalysisOut: [eEmwcHowA:10 a=kj9zAlcOel0A:10 a=zQP7CpKOAAAA:8 a=XIqpo32R]
X-AnalysisOut: [AAAA:8 a=-4aTHp9BAToA:10 a=48vgC7mUAAAA:8 a=pGLkceISAAAA:8]
X-AnalysisOut: [ a=VnNF1IyMAAAA:8 a=xfDLHkLGAAAA:8 a=yMhMjlubAAAA:8 a=SSmO]
X-AnalysisOut: [FEACAAAA:8 a=IgcnMljwNe3XvTAUSaMA:9 a=CjuIK1q_8ugA:10 a=Ui]
X-AnalysisOut: [CQ7L4-1S4A:10 a=ZdRwlJ8DlqMA:10 a=lZB815dzVvQA:10 a=MSl-tD]
X-AnalysisOut: [qOz04A:10 a=r_JyoIAfgFht724B:21 a=XX3JAcmHGa9e6Op4:21 a=HW]
X-AnalysisOut: [lsAhRBxLHc_BPa:21]
Cc: Thomas Narten <narten@us.ibm.com>, Kireeti Kompella <kireeti.kompella@gmail.com>, Aldrin Isaac <aldrin.isaac@gmail.com>, "nvo3@ietf.org" <nvo3@ietf.org>
Subject: Re: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt]
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Dec 2012 01:42:24 -0000

Yakov,

> 
> Maria,
> 
> > EVPN complexity lies in the interaction with bridging. For instance
> if one =
> > connects two EVPN access circuits with a physical wire (or bridges
> two VMs =
> > over a tunnel) you get a multihomed bridged site. Only one of the
> access po=
> > rts can be active or otherwise loops will form.
> >
> > But let's step back and look at the problem we are trying to solve.
> If majo=
> > rity (if not all) of traffic is IP and if majority of it is routed,
> wouldn'=
> > t it be better to develop a networking solution that is optimized for
> this =
> > majority of traffic (and not the vice versa)?
> >
> > The question is what problem does EVPN solve? In the context of DC,
> EVPN can
> > only address packets bridged in the same VLAN. If most packets are
> routed
> > then EVPN, even if all the complexity problems are addressed, doesn't
> achieve
> > anything for the traffic that is routed.
> 
> The claim you made in the last paragraph above is factually incorrect
> - in the context of DC, EVPN can address *not* "only packets bridged
> in the same VLAN", but *also* can be used to provide (optimal) routing
> among VMs in *different* VLANs (different IP subnets). For more details
> please read section 6.1 of draft-raggarwa-data-center-mobility-04.txt
> (please note that what is described in 6.1 is applicable both in the
> presence and in the absence of VM mobility).

This re-invents l3vpn with using l2vpn route advertisements and introduces a possibility of extending the reach of a broadcast domain beyond a single subnet. The question is why to introduce a new paradigm for routing?

When traffic crosses between CUGs this traffic is IP routed (naturally and optimally addressed by L3VPNs). Within a CUG the traffic is handled the same way whether one implements EVPN or L3VPN (no visible difference). The only question remaining is how to address non-IP traffic (in DCs that carry or care about non-IP traffic). A solution should use EVPN only when necessary to handle non-IP traffic. As Kireeti described it: "route if IP, bridge otherwise".

In addition, the reference above (section 6.1) does not address one of the main requirements in a service provider's cloud computing environment which is a seamless integration with MPLS/BGP layer 3 VPNs in the WAN and the wireless access networks (pointed out by Robert). 

> 
> > I believe it is the wrong tradeoff to design a solution around EVPN
> > (i.e., around bridging).
> 
> You are certainly entitled to have your own beliefs...

It is not an isolated opinion or only a "personal" belief.

> 
> Yakov.
> 
> > From: nvo3-bounces@ietf.org [mailto:nvo3-bounces@ietf.org] On Behalf
> Of Ald=
> > rin Isaac
> > Sent: Wednesday, December 19, 2012 2:43 PM
> > To: Kireeti Kompella
> > Cc: Thomas Narten; nvo3@ietf.org
> > Subject: [nvo3] Multi-subnet VNs [was Re: FW: New Version
> Notification for =
> > draft-yong-nvo3-frwk-dpreq-addition-00.txt]
> >
> > Hi Kireeti,
> >
> > In E-VPN, ARP is only flooded when the MAC-IP binding is unknown in
> BGP.  O=
> > nce it is known, the local PE responds locally to the ARP request.
> This s=
> > cales quite well so it's not the best reason to lean one way or
> other.
> >
> > An alternative for edge routing using EVPN is for an NVE to localize
> the VN=
> > s to which edge routing is desired and stand up a local IP forwarder
> across=
> >  these VN using the IP info in the EVPN routes.  If the DMAC on a
> packet is=
> >  not present in the EVI and if the payload is IP then pass to the IP
> forwar=
> > der....
> >
> > In regards to optimizing multicast, with EVPN this can be done using
> VN ded=
> > icated to multicast distribution by using the VLAN-based MVR model.
> It wor=
> > ks well and used today.
> >
> > Another problem that is addressed in EVPN is that segments can be
> multihome=
> > d using LAG.  With IP-only solutions, physical end station would need
> to mu=
> > ltihome by advertising loopback IP over multiple physical IP
> interfaces.
> >
> > We can have our TORs and use them too!! :)
> >
> > Best regards -- aldrin
> >
> > On Wednesday, December 19, 2012, Kireeti Kompella wrote:
> > Hi Aldrin,
> > On Tue, Dec 18, 2012 at 8:29 PM, Aldrin Isaac
> <aldrin.isaac@gmail.com<mailt=
> > o:aldrin.isaac@gmail.com>> wrote:
> > Kireeti,
> >
> > I'm not clear what difference it makes whether a packet is unicast
> > forwarded using MAC address or IP address within a subnet
> >
> > Two important differences:
> > a) you don't have to know the MAC address if you forward on IP.
> I.e., you =
> > don't have to propagate the ARP to the destination (flood), get the
> reply, =
> > bind IP to MAC (ARP table), and maintain ARP binding (timeout,
> validate, et=
> > c.).  The first is a real problem; the rest are annoyances that
> become prob=
> > lems at scale.
> >
> > (Note that the ARMD WG was created to address this issue, and you
> know wher=
> > e that ended.)
> >
> > (Note further that this may be hard to do in general, but in the case
> of an=
> >  orchestrated data center, you have the information about where a
> given IP =
> > lives, and you have a control plane (ORACLE) to inform all relevant
> NVEs.  =
> > And of course, an overlay to shield the infrastructure from poking
> its nose=
> >  into your forwarding behavior -- i.e., the infra doesn't care
> whether you =
> > route or switch TS traffic.)
> >
> > b) In the quite common case where all traffic from a TS is IP, you
> don't ha=
> > ve to maintain two tables and two forwarding paradigms at the NVE
> (one for =
> > IPs and one for MACs).  This is common enough to warrant
> optimization.
> >
> > A third difference is that if you have only unicast traffic, you
> don't have=
> >  to maintain a multicast tree (for flooding).  For some, this is a
> nice bon=
> > us, but I know you have a multicast packet or two in your network :-)
> >
> > as long as
> > it gets to the intended destination along the most optimal path,
> > particularly when the price to pay is non-standard behavior
> > (intra-subnet ARP manglers ;}, etc).  I understand the argument about
> > the sub-optimal routing from a third site, but when the primary sites
> > end up aggregating prefixes for scaling reasons that argument falls
> > off the table.  One way or other the piper gets paid.
> >
> > One way, the piper gets paid a fair bit more than the other!
> >
> > In terms of the real world issue of getting there from here --
> > personally I haven't seen any vendor working towards a standards-
> based
> > solution that will allow intra-subnet routing for subnets over
> > HW/TOR-based PE, let alone intra-subnet routing for subnets that span
> > across both hypervisor-based PE and TOR-based PE.  This makes me
> leery
> > of solutions that can only take us half way there, particularly
> during
> > the transition phase.  So if we're talking about network
> > virtualization based purely on hypervisors, "route IP, bridge non-IP"
> > may be realistic if you're willing to accept the caveats, but does
> not
> > seem to be otherwise.
> >
> > Good point.  Clearly, this is not a local decision: "route IP, bridge
> non-I=
> > P" means that intra-subnet routes are propagated the same way as
> inter-subn=
> > et routes, and thus every NVE, h/w or s/w, must be on the same page.
> >
> > To make this concrete using BGP VPNs, "route IP, bridge non-IP" means
> all r=
> > outes, intra- and inter-subnet, are propagated as IP VPN routes, and
> E-VPN =
> > routes contain MACs without IPs.  "Bridge intra-subnet IP and non-IP,
> route=
> >  inter-subnet" means inter-subnet routes are propagated as IP VPN
> routes, a=
> > nd intra-VPN routes as E-VPN MAC+IP routes.
> >
> > We can have a chat off-list on h/w vendors working towards this.
> Hopefully=
> > , others will weigh the above arguments, and support this.  Deployers
> (like=
> >  you) have a say in this too :-)
> >
> > Btw, I understand how multicast may be less than efficient when
> > building both inter and intra subnet trees for the same IP mcast
> group
> > that end up overlapping links (maybe even more than twice) -- but I'd
> > like to hear your take on any other *insolvable* issues with regard
> to
> > multicast.
> >
> > Isn't that enough?  :-)  I am not a multicast expert, but I can try
> to dig =
> > up IRB multicast horror stories.
> >
> > Cheers,
> > Kireeti.
> >
> > Best regards -- aldrin
> >
> >
> >
> > On Tue, Dec 18, 2012 at 6:06 PM, Kireeti Kompella
> > <kireeti.kompella@gmail.com<mailto:kireeti.kompella@gmail.com>>
> wrote:
> > > Hi Thomas,
> > >
> > > On Dec 18, 2012, at 09:03 , Thomas Narten
> <narten@us.ibm.com<mailto:narte=
> > n@us.ibm.com>> wrote:
> > >
> > >> Kireeti Kompella
> <kireeti.kompella@gmail.com<mailto:kireeti.kompella@gma=
> > il.com>> writes:
> > >>
> > >>> The solution is simple: route if IP, bridge if not.  Yes, one
> could
> > >>> do IRB, but why?  IRB brings in complications, especially for
> > >>> multicast.  I'm sure someone suggested this already, so put me
> down
> > >>> as supporting this view.
> > >>
> > >> I'm not sure I understand the difference.
> > >>
> > >> From an *NVE* perspective, when it receives a packet (which will
> have
> > >> an L2 header), it can look at the Ethertype, and if its IP, it can
> > >> route it. Otherwise, it can provide normal L2 service. So, in this
> > >> sense, "route if IP, bridge if not" is straightforward. And more
> to
> > >> the point, I assume that if the packet gets L2 service, the entire
> VN
> > >> is treated as a *single* broadcast domain. All nodes can reach all
> > >> other nodes. Right?
> > >
> > > Right.
> > >
> > >> Just so I understand, how is this different than IRB?  What does
> IRB
> > >> imply that the above does not?
> > >
> > > IRB follows the principle of "bridge when you can, route
> otherwise".  So,=
> >  an IP packet with dest IP in the same subnet actually gets bridged;
> the or=
> > iginator (e.g., the VM) is responsible for ARPing the IP address,
> slapping =
> > the right dest MAC on the packet and sending that to the NVE which
> simply f=
> > orwards based on dest MAC address *without* decrementing the TTL.
> > >
> > > If the dest IP is in another subnet, the packet is sent to the
> gateway (w=
> > hich for IRB would be the same NVE), which this time does an IP
> address loo=
> > kup, decrements TTL and routes the packet.
> > >
> > > For multicast, there are even more differences.
> > >
> > >> But this is different than what (I believe) Lucy is arguing for.
> In
> > >> the case of a multi-subnet VN, you have one VN, but it contains
> > >> different subnets. Each subnet is intended to be one broadcast
> domain
> > >> (i.e., equivalent of a VLAN), so that when sending LL multicast
> and
> > >> the like on a specific subnet, such packets are *not* delivered to
> all
> > >> nodes in the VN, but only those that are part of subnet.
> > >
> > > If one were to configure multiple subnets on a VLAN, I wonder if LL
> traff=
> > ic goes to all members of the VLAN, or just those in the same subnet
> as the=
> >  sender.  I suspect the former (but don't know).
> > >
> > >> This is a more complex type of service to provide. And I'm not
> sure we
> > >> need this type of service to be provided by one VN.
> > >
> > > Agree.
> > >
> > >> A (seemingly
> > >> simpler) alternative would be to put each subnet in its own VN and
> > >> allow inter-subnet traffic to be handed as inter-VN traffic. So
> long
> > >> as that case is optimized (i.e., the ingress NVE can tunnel
> directly
> > >> to the egress NVE without adding triangular routing), this would
> seem
> > >> to be a cleaner way to implement this.
> > >
> > > Can be done.  However, we're on Lucy's topic; mine was "route if
> IP, brid=
> > ge otherwise"; the goal was to rationalize the need for Layer 2
> forwarding =
> > for non-IP traffic, and inter- and intra-subnet routing.
> > >
> > > Kireeti.
> > >
> > >> Thomas
> > >>
> > >> _______________________________________________
> > >> nvo3 mailing list
> > >> nvo3@ietf.org<mailto:nvo3@ietf.org>
> > >> https://www.ietf.org/mailman/listinfo/nvo3
> > >
> > > _______________________________________________
> > > nvo3 mailing list
> > > nvo3@ietf.org<mailto:nvo3@ietf.org>
> > > https://www.ietf.org/mailman/listinfo/nvo3
> >
> >
> >
> > --
> > Kireeti
> >
> > --_000_1D70D757A2C9D54D83B4CBD7625FA80E010E66A5MISOUT7MSGUSR9I_
> > Content-Type: text/html; charset="us-ascii"
> > Content-Transfer-Encoding: quoted-printable
> >
> > <html xmlns:v=3D"urn:schemas-microsoft-com:vml"
> xmlns:o=3D"urn:schemas-micr=
> > osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-
> com:office:word" =
> > xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml"
> xmlns=3D"http:=
> > //www.w3.org/TR/REC-html40">
> > <head>
> > <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-
> ascii"=
> > >
> > <meta name=3D"Generator" content=3D"Microsoft Word 12 (filtered
> medium)">
> > <style><!--
> > /* Font Definitions */
> > @font-face
> > 	{font-family:Calibri;
> > 	panose-1:2 15 5 2 2 2 4 3 2 4;}
> > @font-face
> > 	{font-family:Tahoma;
> > 	panose-1:2 11 6 4 3 5 4 4 2 4;}
> > @font-face
> > 	{font-family:Consolas;
> > 	panose-1:2 11 6 9 2 2 4 3 2 4;}
> > /* Style Definitions */
> > p.MsoNormal, li.MsoNormal, div.MsoNormal
> > 	{margin:0in;
> > 	margin-bottom:.0001pt;
> > 	font-size:12.0pt;
> > 	font-family:"Times New Roman","serif";}
> > a:link, span.MsoHyperlink
> > 	{mso-style-priority:99;
> > 	color:blue;
> > 	text-decoration:underline;}
> > a:visited, span.MsoHyperlinkFollowed
> > 	{mso-style-priority:99;
> > 	color:purple;
> > 	text-decoration:underline;}
> > p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
> > 	{mso-style-priority:99;
> > 	mso-style-link:"Plain Text Char";
> > 	margin:0in;
> > 	margin-bottom:.0001pt;
> > 	font-size:10.5pt;
> > 	font-family:Consolas;}
> > span.EmailStyle17
> > 	{mso-style-type:personal-reply;
> > 	font-family:"Calibri","sans-serif";
> > 	color:#1F497D;}
> > span.PlainTextChar
> > 	{mso-style-name:"Plain Text Char";
> > 	mso-style-priority:99;
> > 	mso-style-link:"Plain Text";
> > 	font-family:Consolas;}
> > .MsoChpDefault
> > 	{mso-style-type:export-only;}
> > @page WordSection1
> > 	{size:8.5in 11.0in;
> > 	margin:1.0in 1.0in 1.0in 1.0in;}
> > div.WordSection1
> > 	{page:WordSection1;}
> > --></style><!--[if gte mso 9]><xml>
> > <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
> > </xml><![endif]--><!--[if gte mso 9]><xml>
> > <o:shapelayout v:ext=3D"edit">
> > <o:idmap v:ext=3D"edit" data=3D"1" />
> > </o:shapelayout></xml><![endif]-->
> > </head>
> > <body lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
> > <div class=3D"WordSection1">
> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-
> family:Consolas=
> > ">EVPN complexity lies in the interaction with bridging. For instance
> if on=
> > e connects two EVPN access circuits with a physical wire (or bridges
> two VM=
> > s over a tunnel) you get a multihomed
> >  bridged site. Only one of the access ports can be active or
> otherwise loop=
> > s will form.<o:p></o:p></span></p>
> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-
> family:Consolas=
> > "><o:p>&nbsp;</o:p></span></p>
> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-
> family:Consolas=
> > ">But let&#8217;s step back and look at the problem we are trying to
> solve.=
> >  If majority (if not all) of traffic is IP and if majority of it is
> routed,=
> >  wouldn&#8217;t it be better to develop a networking
> >  solution that is optimized for this majority of traffic (and not the
> vice =
> > versa)?<o:p></o:p></span></p>
> > <p class=3D"MsoPlainText"><span style=3D"font-size:11.0pt">The
> question is =
> > what problem does EVPN solve? In the context of DC, EVPN can only
> address p=
> > ackets bridged in the same VLAN. If most packets are routed then
> EVPN, even=
> >  if all the complexity problems are
> >  addressed, doesn't achieve anything for the traffic that is routed.
> I beli=
> > eve it is the wrong tradeoff to design a solution around EVPN (i.e.,
> around=
> >  bridging).<o:p></o:p></span></p>
> > <p class=3D"MsoPlainText"><span style=3D"font-
> size:11.0pt"><o:p>&nbsp;</o:p=
> > ></span></p>
> > <p class=3D"MsoPlainText"><span style=3D"font-
> size:11.0pt">Maria<o:p></o:p>=
> > </span></p>
> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-
> family:&quot;Ca=
> > libri&quot;,&quot;sans-
> serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
> > /p>
> > <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-
> family:&quot;Ca=
> > libri&quot;,&quot;sans-
> serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
> > /p>
> > <div style=3D"border:none;border-left:solid blue 1.5pt;padding:0in
> 0in 0in =
> > 4.0pt">
> > <div>
> > <div style=3D"border:none;border-top:solid #B5C4DF
> 1.0pt;padding:3.0pt 0in =
> > 0in 0in">
> > <p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-
> family:&quot=
> > ;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span
> style=3D"font-s=
> > ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">
> nvo3-bou=
> > nces@ietf.org [mailto:nvo3-bounces@ietf.org]
> > <b>On Behalf Of </b>Aldrin Isaac<br>
> > <b>Sent:</b> Wednesday, December 19, 2012 2:43 PM<br>
> > <b>To:</b> Kireeti Kompella<br>
> > <b>Cc:</b> Thomas Narten; nvo3@ietf.org<br>
> > <b>Subject:</b> [nvo3] Multi-subnet VNs [was Re: FW: New Version
> Notificati=
> > on for draft-yong-nvo3-frwk-dpreq-addition-
> 00.txt]<o:p></o:p></span></p>
> > </div>
> > </div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > <p class=3D"MsoNormal">Hi Kireeti,<br>
> > <br>
> > In E-VPN, ARP is only flooded when the MAC-IP binding is unknown in
> BGP. &n=
> > bsp;Once it is known, the local PE responds locally to the ARP
> request. &nb=
> > sp;&nbsp;This scales quite well so it's not the best reason to lean
> one way=
> >  or other.&nbsp;<o:p></o:p></p>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">An alternative for edge routing using
> EVPN&nbsp;is f=
> > or an NVE to localize the VNs to which edge routing is desired and
> stand up=
> >  a local IP&nbsp;forwarder across these VN using the IP info in the
> EVPN ro=
> > utes. &nbsp;If the DMAC on a packet&nbsp;is not present
> >  in the EVI and&nbsp;if the payload is IP then&nbsp;pass to the
> IP&nbsp;for=
> > warder....<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">In regards to
> optimizing&nbsp;multicast,&nbsp;with&n=
> > bsp;EVPN&nbsp;this can be done&nbsp;using&nbsp;VN&nbsp;dedicated to
> multica=
> > st distribution by&nbsp;using the&nbsp;VLAN-based&nbsp;MVR model.
> &nbsp;It =
> > works well and used today. &nbsp;<o:p></o:p></p>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">Another problem that is addressed in&nbsp;EVPN
> is th=
> > at segments&nbsp;can be multihomed using LAG. &nbsp;With IP-only
> solutions,=
> >  physical&nbsp;end station&nbsp;would need to multihome
> by&nbsp;advertising=
> >  loopback IP over multiple physical&nbsp;IP
> interfaces.&nbsp;<o:p></o:p></p=
> > >
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">We can have our TORs and use them too!!
> :)<o:p></o:p=
> > ></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">Best regards -- aldrin<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><br>
> > On Wednesday, December 19, 2012, Kireeti Kompella
> wrote:<o:p></o:p></p>
> > <div>
> > <div>
> > <p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">Hi
> Aldrin,<o:p></o:p>=
> > </p>
> > <div>
> > <p class=3D"MsoNormal">On Tue, Dec 18, 2012 at 8:29 PM, Aldrin Isaac
> &lt;<a=
> >
> href=3D"mailto:aldrin.isaac@gmail.com">aldrin.isaac@gmail.com</a>&gt;
> wrot=
> > e:<o:p></o:p></p>
> > <p class=3D"MsoNormal">Kireeti,<br>
> > <br>
> > I'm not clear what difference it makes whether a packet is
> unicast<br>
> > forwarded using MAC address or IP address within a subnet
> <o:p></o:p></p>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">Two important differences:<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">a) you don't have to know the MAC address if
> you for=
> > ward on IP. &nbsp;I.e., you don't have to propagate the ARP to the
> destinat=
> > ion (flood), get the reply, bind IP to MAC (ARP table), and maintain
> ARP bi=
> > nding (timeout, validate, etc.). &nbsp;The first
> >  is a real problem; the rest are annoyances that become problems at
> scale.<=
> > o:p></o:p></p>
> > </div>
> > </div>
> > </div>
> > </div>
> > <blockquote style=3D"border:none;border-left:solid #CCCCCC
> 1.0pt;padding:0i=
> > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> > <div>
> > <div>
> > <div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">(Note that the ARMD WG was created to address
> this i=
> > ssue, and you know where that ended.)<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">(Note further that this may be hard to do in
> general=
> > , but in the case of an orchestrated data center, you have the
> information =
> > about where a given IP lives, and you have a control plane (ORACLE)
> to info=
> > rm all relevant NVEs. &nbsp;And of course,
> >  an overlay to shield the infrastructure from poking its nose into
> your for=
> > warding behavior -- i.e., the infra doesn't care whether you route or
> switc=
> > h TS traffic.)<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">b) In the quite common case where all traffic
> from a=
> >  TS is IP, you don't have to maintain two tables and two forwarding
> paradig=
> > ms at the NVE (one for IPs and one for MACs). &nbsp;This is common
> enough t=
> > o warrant optimization.<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">A third difference is that&nbsp;if you have
> only uni=
> > cast traffic,&nbsp;you don't have to maintain a multicast tree (for
> floodin=
> > g). &nbsp;For some, this is a nice bonus, but I know you have a
> multicast p=
> > acket or two in your network :-)<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <blockquote style=3D"border:none;border-left:solid #CCCCCC
> 1.0pt;padding:0i=
> > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> > <p class=3D"MsoNormal">as long as<br>
> > it gets to the intended destination along the most optimal path,<br>
> > particularly when the price to pay is non-standard behavior<br>
> > (intra-subnet ARP manglers ;}, etc). &nbsp;I understand the argument
> about<=
> > br>
> > the sub-optimal routing from a third site, but when the primary
> sites<br>
> > end up aggregating prefixes for scaling reasons that argument
> falls<br>
> > off the table. &nbsp;One way or other the piper gets
> paid.<o:p></o:p></p>
> > </blockquote>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">One way, the piper gets paid a fair bit more
> than th=
> > e other!<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <blockquote style=3D"border:none;border-left:solid #CCCCCC
> 1.0pt;padding:0i=
> > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> > <p class=3D"MsoNormal">In terms of the real world issue of getting
> there fr=
> > om here --<br>
> > personally I haven't seen any vendor working towards a standards-
> based<br>
> > solution that will allow intra-subnet routing for subnets over<br>
> > HW/TOR-based PE, let alone intra-subnet routing for subnets that
> span<br>
> > across both hypervisor-based PE and TOR-based PE. &nbsp;This makes me
> leery=
> > <br>
> > of solutions that can only take us half way there, particularly
> during<br>
> > the transition phase. &nbsp;So if we're talking about network<br>
> > virtualization based purely on hypervisors, &quot;route IP, bridge
> non-IP&q=
> > uot;<br>
> > may be realistic if you're willing to accept the caveats, but does
> not<br>
> > seem to be otherwise.<o:p></o:p></p>
> > </blockquote>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">Good point. &nbsp;Clearly, this is not a local
> decis=
> > ion: &quot;route IP, bridge non-IP&quot; means that intra-subnet
> routes are=
> >  propagated the same way as inter-subnet routes, and thus every NVE,
> h/w or=
> >  s/w, must be on the same page.<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">To make this concrete using BGP VPNs,
> &quot;route IP=
> > , bridge non-IP&quot; means all routes, intra- and inter-subnet, are
> propag=
> > ated as IP VPN routes, and E-VPN routes contain MACs without IPs.
> &nbsp;&qu=
> > ot;Bridge intra-subnet IP and non-IP, route inter-subnet&quot;
> >  means inter-subnet routes are propagated as IP VPN routes, and
> intra-VPN r=
> > outes as E-VPN MAC&#43;IP routes.<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">We can have a chat off-list on h/w vendors
> working t=
> > owards this. &nbsp;Hopefully, others will weigh the above arguments,
> and su=
> > pport this. &nbsp;Deployers (like you) have a say in this too :-
> )<o:p></o:p=
> > ></p>
> > </div>
> > <div>
> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <blockquote style=3D"border:none;border-left:solid #CCCCCC
> 1.0pt;padding:0i=
> > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> > <p class=3D"MsoNormal">Btw, I understand how multicast may be less
> than eff=
> > icient when<br>
> > building both inter and intra subnet trees for the same IP mcast
> group<br>
> > that end up overlapping links (maybe even more than twice) -- but
> I'd<br>
> > like to hear your take on any other *insolvable* issues with regard
> to<br>
> > multicast.<o:p></o:p></p>
> > </blockquote>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">Isn't that enough? &nbsp;:-) &nbsp;I am not a
> multic=
> > ast expert, but I can try to dig up IRB multicast horror
> stories.<o:p></o:p=
> > ></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">Cheers,<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal">Kireeti.<o:p></o:p></p>
> > </div>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <blockquote style=3D"border:none;border-left:solid #CCCCCC
> 1.0pt;padding:0i=
> > n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
> > <p class=3D"MsoNormal">Best regards -- aldrin<o:p></o:p></p>
> > <div>
> > <p class=3D"MsoNormal"><br>
> > <br>
> > <br>
> > On Tue, Dec 18, 2012 at 6:06 PM, Kireeti Kompella<br>
> > &lt;<a
> href=3D"mailto:kireeti.kompella@gmail.com">kireeti.kompella@gmail.co=
> > m</a>&gt; wrote:<o:p></o:p></p>
> > </div>
> > <div>
> > <div>
> > <p class=3D"MsoNormal">&gt; Hi Thomas,<br>
> > &gt;<br>
> > &gt; On Dec 18, 2012, at 09:03 , Thomas Narten &lt;<a
> href=3D"mailto:narten=
> > @us.ibm.com">narten@us.ibm.com</a>&gt; wrote:<br>
> > &gt;<br>
> > &gt;&gt; Kireeti Kompella &lt;<a
> href=3D"mailto:kireeti.kompella@gmail.com"=
> > >kireeti.kompella@gmail.com</a>&gt; writes:<br>
> > &gt;&gt;<br>
> > &gt;&gt;&gt; The solution is simple: route if IP, bridge if not.
> &nbsp;Yes,=
> >  one could<br>
> > &gt;&gt;&gt; do IRB, but why? &nbsp;IRB brings in complications,
> especially=
> >  for<br>
> > &gt;&gt;&gt; multicast. &nbsp;I'm sure someone suggested this
> already, so p=
> > ut me down<br>
> > &gt;&gt;&gt; as supporting this view.<br>
> > &gt;&gt;<br>
> > &gt;&gt; I'm not sure I understand the difference.<br>
> > &gt;&gt;<br>
> > &gt;&gt; From an *NVE* perspective, when it receives a packet (which
> will h=
> > ave<br>
> > &gt;&gt; an L2 header), it can look at the Ethertype, and if its IP,
> it can=
> > <br>
> > &gt;&gt; route it. Otherwise, it can provide normal L2 service. So,
> in this=
> > <br>
> > &gt;&gt; sense, &quot;route if IP, bridge if not&quot; is
> straightforward. =
> > And more to<br>
> > &gt;&gt; the point, I assume that if the packet gets L2 service, the
> entire=
>  VN<br>
> > &gt;&gt; is treated as a *single* broadcast domain. All nodes can
> reach all=
> > <br>
> > &gt;&gt; other nodes. Right?<br>
> > &gt;<br>
> > &gt; Right.<br>
> > &gt;<br>
> > &gt;&gt; Just so I understand, how is this different than IRB?
> &nbsp;What d=
> > oes IRB<br>
> > &gt;&gt; imply that the above does not?<br>
> > &gt;<br>
> > &gt; IRB follows the principle of &quot;bridge when you can, route
> otherwis=
> > e&quot;. &nbsp;So, an IP packet with dest IP in the same subnet
> actually ge=
> > ts bridged; the originator (e.g., the VM) is responsible for ARPing
> the IP =
> > address, slapping the right dest MAC on the packet
> >  and sending that to the NVE which simply forwards based on dest MAC
> addres=
> > s *without* decrementing the TTL.<br>
> > &gt;<br>
> > &gt; If the dest IP is in another subnet, the packet is sent to the
> gateway=
> >  (which for IRB would be the same NVE), which this time does an IP
> address =
> > lookup, decrements TTL and routes the packet.<br>
> > &gt;<br>
> > &gt; For multicast, there are even more differences.<br>
> > &gt;<br>
> > &gt;&gt; But this is different than what (I believe) Lucy is arguing
> for. I=
> > n<br>
> > &gt;&gt; the case of a multi-subnet VN, you have one VN, but it
> contains<br=
> > >
> > &gt;&gt; different subnets. Each subnet is intended to be one
> broadcast dom=
> > ain<br>
> > &gt;&gt; (i.e., equivalent of a VLAN), so that when sending LL
> multicast an=
> > d<br>
> > &gt;&gt; the like on a specific subnet, such packets are *not*
> delivered to=
> >  all<br>
> > &gt;&gt; nodes in the VN, but only those that are part of subnet.<br>
> > &gt;<br>
> > &gt; If one were to configure multiple subnets on a VLAN, I wonder if
> LL tr=
> > affic goes to all members of the VLAN, or just those in the same
> subnet as =
> > the sender. &nbsp;I suspect the former (but don't know).<br>
> > &gt;<br>
> > &gt;&gt; This is a more complex type of service to provide. And I'm
> not sur=
> > e we<br>
> > &gt;&gt; need this type of service to be provided by one VN.<br>
> > &gt;<br>
> > &gt; Agree.<br>
> > &gt;<br>
> > &gt;&gt; A (seemingly<br>
> > &gt;&gt; simpler) alternative would be to put each subnet in its own
> VN and=
> > <br>
> > &gt;&gt; allow inter-subnet traffic to be handed as inter-VN traffic.
> So lo=
> > ng<br>
> > &gt;&gt; as that case is optimized (i.e., the ingress NVE can tunnel
> direct=
> > ly<br>
> > &gt;&gt; to the egress NVE without adding triangular routing), this
> would s=
> > eem<br>
> > &gt;&gt; to be a cleaner way to implement this.<br>
> > &gt;<br>
> > &gt; Can be done. &nbsp;However, we're on Lucy's topic; mine was
> &quot;rout=
> > e if IP, bridge otherwise&quot;; the goal was to rationalize the need
> for L=
> > ayer 2 forwarding for non-IP traffic, and inter- and intra-subnet
> routing.<=
> > br>
> > &gt;<br>
> > &gt; Kireeti.<br>
> > &gt;<br>
> > &gt;&gt; Thomas<br>
> > &gt;&gt;<br>
> > &gt;&gt; _______________________________________________<br>
> > &gt;&gt; nvo3 mailing list<br>
> > &gt;&gt; <a href=3D"mailto:nvo3@ietf.org">nvo3@ietf.org</a><br>
> > &gt;&gt; <a href=3D"https://www.ietf.org/mailman/listinfo/nvo3"
> target=3D"_=
> > blank">https://www.ietf.org/mailman/listinfo/nvo3</a><br>
> > &gt;<br>
> > &gt; _______________________________________________<br>
> > &gt; nvo3 mailing list<br>
> > &gt; <a href=3D"mailto:nvo3@ietf.org">nvo3@ietf.org</a><br>
> > &gt; <a href=3D"https://www.ietf.org/mailman/listinfo/nvo3"
> target=3D"_blan=
> > k">https://www.ietf.org/mailman/listinfo/nvo3</a><o:p></o:p></p>
> > </div>
> > </div>
> > </blockquote>
> > </div>
> > <p class=3D"MsoNormal"><br>
> > <br clear=3D"all">
> > <o:p></o:p></p>
> > <div>
> > <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
> > </div>
> > <p class=3D"MsoNormal">-- <br>
> > Kireeti<o:p></o:p></p>
> > </div>
> > </div>
> > </blockquote>
> > </div>
> > </div>
> > </div>
> > </div>
> > </body>
> > </html>
> >
> > --_000_1D70D757A2C9D54D83B4CBD7625FA80E010E66A5MISOUT7MSGUSR9I_--
> >
> > --===============3861516006203949397==
> > Content-Type: text/plain; charset="us-ascii"
> > MIME-Version: 1.0
> > Content-Transfer-Encoding: 7bit
> > Content-Disposition: inline
> >
> > _______________________________________________
> > nvo3 mailing list
> > nvo3@ietf.org
> > https://www.ietf.org/mailman/listinfo/nvo3
> >
> > --===============3861516006203949397==--