Re: [armd] how does "draft-sridharan-virtualization-nvgre-00"advertise its external facing hosts' IP addresses to external world?

Narasimhan Venkataramaiah <narave@microsoft.com> Mon, 26 September 2011 02:18 UTC

Return-Path: <narave@microsoft.com>
X-Original-To: armd@ietfa.amsl.com
Delivered-To: armd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7D9A821F85B1 for <armd@ietfa.amsl.com>; Sun, 25 Sep 2011 19:18:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.598
X-Spam-Level:
X-Spam-Status: No, score=-10.598 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8Qhyca2zmEp5 for <armd@ietfa.amsl.com>; Sun, 25 Sep 2011 19:18:40 -0700 (PDT)
Received: from smtp.microsoft.com (smtp.microsoft.com [131.107.115.215]) by ietfa.amsl.com (Postfix) with ESMTP id 6E71821F85B5 for <armd@ietf.org>; Sun, 25 Sep 2011 19:18:40 -0700 (PDT)
Received: from TK5EX14HUBC104.redmond.corp.microsoft.com (157.54.80.25) by TK5-EXGWY-E802.partners.extranet.microsoft.com (10.251.56.168) with Microsoft SMTP Server (TLS) id 8.2.176.0; Sun, 25 Sep 2011 19:21:20 -0700
Received: from TK5EX14MLTW652.wingroup.windeploy.ntdev.microsoft.com (157.54.71.68) by TK5EX14HUBC104.redmond.corp.microsoft.com (157.54.80.25) with Microsoft SMTP Server (TLS) id 14.1.339.2; Sun, 25 Sep 2011 19:21:20 -0700
Received: from TK5EX14MBXW601.wingroup.windeploy.ntdev.microsoft.com ([169.254.1.117]) by TK5EX14MLTW652.wingroup.windeploy.ntdev.microsoft.com ([157.54.71.68]) with mapi id 14.01.0339.002; Sun, 25 Sep 2011 19:21:19 -0700
From: Narasimhan Venkataramaiah <narave@microsoft.com>
To: Benson Schliesser <bschlies@cisco.com>, Linda Dunbar <linda.dunbar@huawei.com>, Vishwas Manral <vishwas.ietf@gmail.com>, Murari Sridharan <muraris@microsoft.com>
Thread-Topic: [armd] how does "draft-sridharan-virtualization-nvgre-00"advertise its external facing hosts' IP addresses to external world?
Thread-Index: AQHMeudIkbNdsFA7r0K+ePmu2p7juJVe49+A
Date: Mon, 26 Sep 2011 02:21:18 +0000
Message-ID: <65755BEBE02F7C41BD4F137AED91DA5E2FD3B544@TK5EX14MBXW601.wingroup.windeploy.ntdev.microsoft.com>
References: <65755BEBE02F7C41BD4F137AED91DA5E2FD3872E@TK5EX14MBXW601.wingroup.windeploy.ntdev.microsoft.com> <CAA38AA2.15970%bschlies@cisco.com>
In-Reply-To: <CAA38AA2.15970%bschlies@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [157.54.51.90]
Content-Type: multipart/alternative; boundary="_000_65755BEBE02F7C41BD4F137AED91DA5E2FD3B544TK5EX14MBXW601w_"
MIME-Version: 1.0
Cc: "armd@ietf.org" <armd@ietf.org>
Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00"advertise its external facing hosts' IP addresses to external world?
X-BeenThere: armd@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion of issues associated with large amount of virtual machines being introduced in data centers and virtual hosts introduced by Cloud Computing." <armd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/armd>, <mailto:armd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/armd>
List-Post: <mailto:armd@ietf.org>
List-Help: <mailto:armd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/armd>, <mailto:armd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Sep 2011 02:18:48 -0000

Would carrying IP in the overlay inherently localize the address resolution functions (ARP, ND)?
[Simha] Not necessarily. ND can still work over an IP based overlay. ARP can also be made to work by mapping the ARP to some IP packet that performs resolution in the overlay - somewhat like teredo that maps ND packets to UDP bubble packets. But I can't see reason to do so.

Conversely, does carrying L2 frames in the overlay suggest the extension or enlargement of ARP/ND domains? Or would you propose an ARP/ND proxy in the MAC overlay?
[Simha] No. The virtual networks are separating the configuration done on the host vs the configuration done in the network to create a certain network topology. The information that is required to create this separation can be used to make ARP/ND more efficient. One option is to proxy ARP/ND. The other option is to unicast the ARP/ND packet to a tunnel endpoint from where the ARP/ND target is reached without flooding. This is not super useful for ARP. However it is useful for ND, say, if SEND is used.

It seems to me that scale of the address resolution function will be affected by the way it's distributed (or not) throughout the overlay network. Likewise, this may have impact on the forwarding efficiency of traffic (i.e. when there are multiple inbound/outbound paths attached to the overlay).
[Simha] For ARP/ND the forwarding efficiency can be made independent of the number of paths in the overlay by using the information present at the tunnel endpoints. In other words ARP/ND packets will be as efficient to forward as any unicast packet. But i agree with your point for general broadcast that realizing broadcast domains using IP multicast might just move the forwarding efficiency issue from L2 to L3.

Simha

From: Benson Schliesser [mailto:bschlies@cisco.com]
Sent: Saturday, September 24, 2011 11:25 AM
To: Narasimhan Venkataramaiah; Linda Dunbar; Vishwas Manral; Murari Sridharan
Cc: armd@ietf.org
Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00"advertise its external facing hosts' IP addresses to external world?

Thanks, Simha.

Leaving aside your comments about support for non-IP payloads, etc, and focusing strictly on address resolution: Would carrying IP in the overlay inherently localize the address resolution functions (ARP, ND)? Conversely, does carrying L2 frames in the overlay suggest the extension or enlargement of ARP/ND domains? Or would you propose an ARP/ND proxy in the MAC overlay?

It seems to me that scale of the address resolution function will be affected by the way it's distributed (or not) throughout the overlay network. Likewise, this may have impact on the forwarding efficiency of traffic (i.e. when there are multiple inbound/outbound paths attached to the overlay).

Cheers,
-Benson


On 9/24/11 12:36 PM, "Narasimhan Venkataramaiah" <narave@microsoft.com> wrote:
Posting back some offline emails

From: Narasimhan Venkataramaiah
Sent: Thursday, September 22, 2011 9:49 PM
To: 'Vishwas Manral'
Cc: Murari Sridharan; Linda Dunbar; david.black@emc.com; Benson Schliesser
Subject: RE: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?

In some cases the it may be more efficient to just send IP - but given MAC in GRE is a superset, its simpler to just always do MAC in GRE from an engineering point of view for various hardware devices that parse the packets to provide added functionality in the path. They have to deal with one type of packets. We would need MAC in GRE to stretch L2 to non NV-GRE subnets or to carry non IP payload.

Simha

From: Vishwas Manral [mailto:vishwas.ietf@gmail.com]
Sent: Thursday, September 22, 2011 9:43 PM
To: Narasimhan Venkataramaiah
Cc: Murari Sridharan; Linda Dunbar; david.black@emc.com; Benson Schliesser
Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?

I agree PMIP may not directly work here. In such cases the Layer-2 header however is still of no use.

-Vishwas
On Thu, Sep 22, 2011 at 9:07 PM, Narasimhan Venkataramaiah <narave@microsoft.com> wrote:
Right - with NVGRE L2 mobility happens automatically - however if you were to achieve it without NVGRE in a multitenant environment you would need some VLAN reconfiguration. NVGRE achieves that with Tenant IDs.
PMIP is not multi-tenancy aware in the sense that the proxy won't be able to distinguish between multiple tenants using same IP address space in the virtual network.
Simha
From: Vishwas Manral [mailto:vishwas.ietf@gmail.com]
Sent: Thursday, September 22, 2011 8:51 PM
To: Narasimhan Venkataramaiah
Cc: Murari Sridharan; Linda Dunbar; david.black@emc.com; Benson Schliesser

Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?

Hi Simha,
With that aspect there is little difference in the functionality, though L3-in-L3 can use PMIP and other mobility work already done.
Also it begs the question why you even need L2 mobility in this case
Moving off the list as Benson feels the replies are not substantial.
Thanks,
Vishwas
On Thu, Sep 22, 2011 at 8:34 PM, Narasimhan Venkataramaiah <narave@microsoft.com> wrote:
Say you have a virtual subnet that spans 2 physical subnets. A VM in a virtual subnet moving to another host within the same physical subnet is L2 mobility and a VM moving to another host on another in  a different physical subnet is L3 mobility. These 2 cases are the same for NV-GRE in terms of encapsulation as the resulting packets are the same. What does IP in GRE achieve that MAC in GRE does not?
Simha



From: armd-bounces@ietf.org [mailto:armd-bounces@ietf.org] On Behalf Of Linda Dunbar
Sent: Saturday, September 24, 2011 6:36 AM
To: Vishwas Manral; Murari Sridharan
Cc: armd@ietf.org
Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?

Protocols have been developed for IP mobility between Home Gateway and Remote Gateway. The question is if we want similar protocols among TORs or among vSwitches. Handset mobility is random, but VM migration is planned.

Linda


From: armd-bounces@ietf.org [mailto:armd-bounces@ietf.org] On Behalf Of Vishwas Manral
Sent: Thursday, September 22, 2011 9:38 PM
To: Murari Sridharan
Cc: armd@ietf.org
Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?


Murari think IP mobility. :)



On Thu, Sep 22, 2011 at 5:00 PM, Murari Sridharan <muraris@microsoft.com> wrote:

Do you have a scenario in mind?
________________________________

From: Vishwas Manral
Sent: 9/22/2011 4:55 PM


To: Murari Sridharan
Cc: Narasimhan Venkataramaiah; Linda Dunbar; david.black@emc.com; armd@ietf.org
Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?

Hi Murari,



Yes that is what I mean.



Thanks,

Vishwas

On Thu, Sep 22, 2011 at 4:50 PM, Murari Sridharan <muraris@microsoft.com> wrote:

You mean not an Ethernet frame but some IP payload?

From: Vishwas Manral [mailto:vishwas.ietf@gmail.com]
Sent: Thursday, September 22, 2011 4:49 PM
To: Murari Sridharan
Cc: Narasimhan Venkataramaiah; Linda Dunbar; david.black@emc.com; armd@ietf.org


Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?



Murari,



What I am saying is the inner header should be allowed to be L3.



>From the diagram you have that does not seem to be the case. Am I missing it totally?



Thanks,

Vishwas

On Thu, Sep 22, 2011 at 4:43 PM, Murari Sridharan <muraris@microsoft.com> wrote:

Vishwas, Thanks for the feedback we will definitely consider adding that. I am not sure what you mean by doing L3 instead of L2. We allow any arbitrary virtual topology including L3.

Thanks

From: Vishwas Manral [mailto:vishwas.ietf@gmail.com]
Sent: Thursday, September 22, 2011 4:19 PM


To: Narasimhan Venkataramaiah
Cc: Linda Dunbar; Murari Sridharan; david.black@emc.com; armd@ietf.org
Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?



Hi Simha,



I see this as the only difference between VXLAN and the NVGRE solution (besides ofcourse that TNI needs to be parsed in the intermediate device for hashing and using lesser number of bytes).



I would think you should add it to your draft immediately. With tunneling you consolidate the addresses visible to the core and by providing a hash mechanism, you are providing some level of randomness.



The other thing you should look at is L3 (IPv4/ IPv6) over NVGRE instead of L2 alone. I guess it would be the same comment for the VXLAN proposal too.



Thanks,

Vishwas

On Thu, Sep 22, 2011 at 4:11 PM, Narasimhan Venkataramaiah <narave@microsoft.com> wrote:

The draft mentions exactly this as one use of the reserved 8 bits in Section 4. An NVGRE endpoint could use the 8 bits to further distribute flows belonging to a particular TNI and the switches use all 32 bits to get entropy. One step further would be for the switches to get full entropy from the inner Ethernet frame. I take it that your comment would be to make it explicit in the draft. Right?

One
   such example could be to use the upper 8 bits of the Key field to
   add flow based entropy and tag all the packets from a flow with an entropy label.

Simha

From: Vishwas Manral [mailto:vishwas.ietf@gmail.com]
Sent: Thursday, September 22, 2011 4:04 PM
To: Narasimhan Venkataramaiah
Cc: Linda Dunbar; Murari Sridharan; david.black@emc.com; armd@ietf.org
Subject: Re: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?



Hi Simha,



The main (Standards Track) change in your draft is the addition of TNI.



A question I have is a TNI identifies a particular tenant and all flows from/to a tenant will be hashed to the same path (even with the changes in switches to do hashing to use TNI).



Why do you not use the last 8 bits which you have kept as reserved for providing the randomization for hashing flows between same to/from on different paths?



Thanks,

Vishwas

On Sun, Sep 18, 2011 at 11:01 AM, Narasimhan Venkataramaiah <narave@microsoft.com> wrote:
The easiest from the point of view of configuration would be to route everything back through the enterprise - not necessarily the optimal from the enterprise point of view. Are you referring to a scenario where the VMs subnet is split between the cloud and the enterprise? Otherwise I don't see the implication on virtualization as its no different than getting the traffic routed to the enterprise in the first case.

Simha

________________________________________
From: armd-bounces@ietf.org [armd-bounces@ietf.org] on behalf of Linda Dunbar [linda.dunbar@huawei.com]
Sent: Sunday, September 18, 2011 7:06 AM
To: Murari Sridharan; david.black@emc.com; armd@ietf.org
Subject: [armd] how does "draft-sridharan-virtualization-nvgre-00" advertise its external facing hosts' IP addresses to external world?


Hi Murari,

Thank you very much for sharing the presentation.

One question:

For a host within an Enterprise site which needs to communicate with external peers, the host either uses public IP address which is visible to external peers or uses private IP address which is translated to public address at the Enterprise site's gateway.

When this host is moved to "Cloud data center", will the "Cloud Data center" advertise this host address to external peers? Or will all external peers go through enterprise's gateway to reach this host which is no longer residing in the enterprise site?

Thanks, Linda

> -----Original Message-----
> From: armd-bounces@ietf.org [mailto:armd-bounces@ietf.org] On Behalf Of
> Murari Sridharan
> Sent: Saturday, September 17, 2011 3:02 PM
> To: david.black@emc.com; armd@ietf.org
> Subject: Re: [armd] soliciting typical network designs for ARMD
>
> FYI, here is a talk that I gave last week in relation to the nvgre
> draft below.
> http://channel9.msdn.com/Events/BUILD/BUILD2011/SAC-442T
>
> Thanks
> Murari
>
> -----Original Message-----
> From: armd-bounces@ietf.org [mailto:armd-bounces@ietf.org] On Behalf Of
> david.black@emc.com
> Sent: Friday, September 16, 2011 6:14 AM
> To: armd@ietf.org
> Subject: Re: [armd] soliciting typical network designs for ARMD
>
> And two more drafts on this topic:
>
> http://www.ietf.org/id/draft-mahalingam-dutt-dcops-vxlan-00.txt
> http://www.ietf.org/id/draft-sridharan-virtualization-nvgre-00.txt
>
> The edge switches could be the software switches in hypervisors.
>
> Thanks,
> --David
>
>
> > -----Original Message-----
> > From: armd-bounces@ietf.org [mailto:armd-bounces@ietf.org] On Behalf
> > Of Warren Kumari
> > Sent: Wednesday, August 31, 2011 3:16 PM
> > To: Vishwas Manral
> > Cc: armd@ietf.org
> > Subject: Re: [armd] soliciting typical network designs for ARMD
> >
> >
> > On Aug 11, 2011, at 11:40 PM, Vishwas Manral wrote:
> >
> > > Hi Linda/ Anoop,
> > >
> > > Here is the example of the design I was talking about, as defined
> by google.
> >
> > Just a clarification -- s/as defined by google/as described by
> someone
> > who happens to work for google/
> >
> > W
> >
> > > http://www.ietf.org/id/draft-wkumari-dcops-l3-vmmobility-00.txt
> > >
> > > Thanks,
> > > Vishwas
> > > On Tue, Aug 9, 2011 at 2:50 PM, Anoop Ghanwani
> <anoop@alumni.duke.edu> wrote:
> > >
> > > >>>>
> > > (though I think if there was a standard way to map Multicast MAC to
> > > Multicast IP, they could
> > probably use such a standard mechanisms).
> > > >>>>
> > >
> > > They can do that, but then this imposes requirements on the
> > > equipment to be able to do multicast forwarding, and even if does,
> > > because of pruning requirements the number of groups would be very
> > > large.  The average data center switch probably won't handle that
> > > many groups.
> > >
> > > On Tue, Aug 9, 2011 at 2:41 PM, Vishwas Manral
> <vishwas.ietf@gmail.com> wrote:
> > > Hi Anoop,
> > >
> > > From what I know they do not use Multicast GRE (I hear the extra 4
> > > bytes in the GRE header is a
> > proprietery extension).
> > >
> > > I think a directory based mechanism is what is used (though I think
> > > if there was a standard way to
> > map Multicast MAC to Multicast IP, they could probably use such a
> standard mechanisms).
> > >
> > > Thanks,
> > > Vishwas
> > > On Tue, Aug 9, 2011 at 2:03 PM, Anoop Ghanwani
> <anoop@alumni.duke.edu> wrote:
> > > Hi Vishwas,
> > >
> > > How do they get multicast through the network in that case?
> > > Are they planning to use multicast GRE, or just use directory based
> > > lookups and not worry about multicast applications for now?
> > >
> > > Anoop
> > >
> > > On Tue, Aug 9, 2011 at 1:27 PM, Vishwas Manral
> <vishwas.ietf@gmail.com> wrote:
> > > Hi Linda,
> > >
> > > The data packets can be tunnelled at the ToR over say a GRE packet
> > > and the core is a Layer-3 core
> > (except for the downstream ports). So we could have encapsulation/
> > decapsulation of L2 over GRE at the ToR.
> > >
> > > The very same thing can be done at the hypervisor layer too, in
> > > which case the entire DC network
> > would look like a Layer-3 flat network including the ToR to server
> > link and the hypervisor would do the tunneling.
> > >
> > > I am not sure if you got the points above or not. I know cloud OS
> > > companies that provide the service
> > and have big announced customers.
> > >
> > > Thanks,
> > > Vishwas
> > > On Tue, Aug 9, 2011 at 11:51 AM, Linda Dunbar <dunbar.ll@gmail.com>
> wrote:
> > > Vishwas,
> > >
> > > In my mind the bullet 1) in the list refers to ToR switches
> > > downstream ports (facing servers)
> > running Layer 2 and ToR uplinks ports run IP Layer 3.
> > >
> > > Have you seen data center networks with ToR switches downstream
> > > ports (i.e. facing servers) enabling
> > IP routing, even though the physical links are Ethernet?
> > > If yes, we should definitely include it in the ARMD draft.
> > >
> > > Thanks,
> > > Linda
> > > On Tue, Aug 9, 2011 at 12:58 PM, Vishwas Manral
> <vishwas.ietf@gmail.com> wrote:
> > > Hi Linda,
> > > I am unsure what you mean by this, but:
> > >   * layer 3 all the way to TOR (Top of Rack switches), We can also
> > > have a heirarchical network, with the core totally Layer-3 (and
> > > having seperate
> > routing), from the hosts still in a large Layer-3 subnet. Another
> > aspect could be to have a totally
> > Layer-3 network.
> > >
> > > The difference between them is the link between the servers and the
> ToR.
> > >
> > > Thanks,
> > > Vishwas
> > > On Tue, Aug 9, 2011 at 10:22 AM, Linda Dunbar <dunbar.ll@gmail.com>
> wrote:
> > > During the 81st IETF ARMD WG discussion, it was suggested that it
> is
> > > necessary to document typical
> > data center network designs so that address resolution scaling issues
> > can be properly described. Many data center operators have expressed
> that they can't openly reveal their detailed network designs.
> > Therefore, we only want to document anonymous designs without too
> much
> > detail. During the journey of establishing ARMD, we have come across
> the following typical data center network designs:
> > >   * layer 3 all the way to TOR (Top of Rack switches),
> > >   * large layer 2 with hundreds (or thousands) of ToRs being
> > > interconnected by Layer 2. This
> > design will have thousands of hosts under the L2/L3 boundary router
> > (s)
> > >   * CLOS design  with thousands of switches. This design will have
> > > thousands of hosts under the
> > L2/L3 boundary router(s)
> > > We have heard that each of the designs above has its own problems.
> > > ARMD problem statements might
> > need to document DC problems under each typical design.
> > > Please send feedback to us (either to the armd email list  or to
> the
> > > ARMD chair Benson & Linda) to
> > indicate if we have missed any typical Data Center network designs.
> > >
> > > Your contribution can greatly accelerate the progress of ARMD WG.
> > >
> > > Thank you very much.
> > >
> > > Linda & Benson
> > >





________________________________
_______________________________________________
armd mailing list
armd@ietf.org
https://www.ietf.org/mailman/listinfo/armd