Re: [dcrg-interest] [armd] IP over IP solution for data center interconnect

"Ashish Dalela (adalela)" <> Tue, 27 December 2011 06:13 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D025321F8558 for <>; Mon, 26 Dec 2011 22:13:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.706
X-Spam-Status: No, score=-0.706 tagged_above=-999 required=5 tests=[AWL=-0.896, BAYES_00=-2.599, CN_BODY_35=0.339, MIME_CHARSET_FARAWAY=2.45]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id tilFDt4nAwPF for <>; Mon, 26 Dec 2011 22:13:50 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id C642E21F8551 for <>; Mon, 26 Dec 2011 22:13:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;;; l=8875; q=dns/txt; s=iport; t=1324966430; x=1326176030; h=mime-version:content-transfer-encoding:subject:date: message-id:in-reply-to:references:from:to:cc; bh=pdZnmblKKp1YQsgdHjh3e3LsWjBGI4Fip2e02XAN30U=; b=AryXc0TC2HWsB1F1Cp1gRQPNVCv0HwA/10IZbGK86FAJGVyjghWh6ep6 4kkmHCD1x43cBzatllXfGjwxqlj+dmtj3SGTBVBW2t+56UUxGBpmoyFY2 wd+WjE0y+rBZssHQTBJNv3+8P19KvF6Tx2x+q/tbr4P1/FAW+tqFDqto0 w=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-AV: E=Sophos;i="4.71,414,1320624000"; d="scan'208";a="2369141"
Received: from (HELO ([]) by with ESMTP; 27 Dec 2011 06:13:47 +0000
Received: from ( []) by (8.14.3/8.14.3) with ESMTP id pBR6DDEB012522; Tue, 27 Dec 2011 06:13:47 GMT
Received: from ([]) by with Microsoft SMTPSVC(6.0.3790.4675); Tue, 27 Dec 2011 11:43:15 +0530
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="gb2312"
Content-Transfer-Encoding: quoted-printable
Date: Tue, 27 Dec 2011 11:43:15 +0530
Message-ID: <>
In-Reply-To: <>
Thread-Topic: [armd] IP over IP solution for data center interconnect
Thread-Index: AQHMwT2CzwDYBIw29UGeVkQ77HSrWZXvI1zQ
References: <> <>
From: "Ashish Dalela (adalela)" <>
To: Xuxiaohu <>, "Eggert, Lars" <>,,
X-OriginalArrivalTime: 27 Dec 2011 06:13:15.0888 (UTC) FILETIME=[9EE80F00:01CCC45E]
X-Mailman-Approved-At: Tue, 27 Dec 2011 00:16:39 -0800
Subject: Re: [dcrg-interest] [armd] IP over IP solution for data center interconnect
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 27 Dec 2011 06:13:51 -0000

For all the proponents of L3, here is a thought.

The biggest concern in cloud is data protection. That is just make or break. Data can't move when the VM moves. It's just too expensive and time consuming. Further, to reduce disk wastage you have to put data on network storage. That's just easy for data backup, and cheap in terms of total disk space you need. There is a lot going on here in terms of data de-duplication.

Now given network storage, you have two options - NAS and SAN.

On the cloud each VM administrator has root privileges. And IP and MAC addresses can be duplicated (if you are root just configure any IP and MAC on a virtual interface of choice). So, you have to protect the network storage from multiple tenants who can have the same IP address and MAC address. NAS assumes your id on the UNIX is globally unique and your permissions are unique - not true anymore. So, you have IP, MAC and ID duplication.

That means that if you have to protect data, you have to know somehow which host is allowed to see which disks. Doing this end-to-end using TCP/IP is problematic because the NAS controller doesn't know about tenant id, and whether the MAC and IP are being duplicated across customers. To make the storage aware of segmentation, you have to carry some segment (GRE, MPLS, VLAN, etc.) into the storage box. That's not true today.

Network level isolation happens in SAN, but not in NAS. NAS just runs end-to-end over TCP. In SAN you can say which host will see which disk. The FC security is high because MAC and FC-ID is assigned by the network and not in the control of the host. The network knows which port is which host/MAC. So, if you only know which port is which customer, you know which MAC can see which disk. Binding a port to a customer is not hard. But mapping all of that to data security and disk visibility is not trivial. E.g. you have to do GRE tunnel mapping to a set of disk permissions. Who needs that?

SAN does not use IP. It uses Fiber Channel. With convergence, we are moving to FC over Ethernet (FCoE). FCIP and iSCSI have not worked very well. Security for data comes from FC and lowering of cost comes from use of FCoE. That needs native Ethernet (no IP). 

When you have site-to-site VM mobility, data is not moving across sites. E.g. an application server will connect to a set of load-balanced set of database servers within a VLAN. The application and database servers can be different sites, and they connect over a VLAN. Alternately, the application and database servers can be in the same location and the DB server accesses the SAN over the Internet. Even if data is replicated across many sites, load-balancing across application or db servers uses VLANs. In HPC as well, there is a lot of broadcast and discovery, and this may use IP or not, the fact is that it depends on broadcast. The HPC case externalizes the memory on a host to other hosts through RDMA. Everything to do with Big Data will rely on this, and besides disk externalization we will also see increase in memory externalization. I just went to a IEEE HPC cloud conference :-)

L3 assumes that all data (disk and memory) is inside the host. That assumption changed a decade ago. Cloud just increases the distribution. Just as compute is optimized by muxing the VMs on a physical machine, storage is optimized by consolidating data separately on storage devices and memory by hosting it in different hosts. Network has to deal with all types of traffic and security/isolation (imagine someone changing memory over the network). Storage and memory traffic uses L2. To discover each other, hosts use L2. These L2 domains need to span across sites and within the site.

This are not corner cases. This is just central to what cloud means in the long run. 

Thanks, Ashish

-----Original Message-----
From: [] On Behalf Of Xuxiaohu
Sent: Friday, December 23, 2011 12:09 PM
To: Eggert, Lars;;
Subject: Re: [armd] IP over IP solution for data center interconnect

> -----邮件原件-----
> 发件人: Xuxiaohu
> 发送时间: 2011年12月22日 16:28
> 收件人: 'Eggert, Lars';; ''
> 主题: IP over IP solution for data center interconnect
> Hi all,
> There has been a lot of L2 over L3 solutions or proposals for data center
> network and data center interconnect till now. However, it seems recently
> there are increasing interests on L3 over L3 (e.g., IP over IP) solutions for data
> center network and data center interconnect, please see the following text
> quoted from IETF82 L2VPN minutes
> ( It's a general belief
> that L3 is more scalable than L2. Especially when considering the data center
> interconnection case, the L3 over L3 solutions can bring DC operators more
> benefits compared to the L2 over L3 solutions, such as path optimization,
> active-active data centers, MAC table reduction on DC switches and broadcast
> flood suppression etc .

By the way, the ARP table scaling issue on DC gateways, which is deemed by the ARMD WG as the only worthy ARP problem in data center networks, could also be solved with the IP over IP solution.

Best regards,
> Although Layer 2 connectivity is still required for some high-availability clusters
> which use non-IP and link-local multicast for communication, more and more
> cluster vendors are either already able to support cluster services at Layer3 or
> will support it in the near future. In addition, the GSLB mechanism (e.g., DNS
> based GSLB) works very well in most cases, is geo-cluster service still a
> common requirement for data center interconnect? If not, could we spend any
> time on exploring the possibility of using L3 over L3 solution for the most data
> center interconnection scenarios where geo-clustering, especially non-IP based
> geo-clustering is not needed?
> Best regards,
> Xiaohu
> ****************************************************************
> **********************************************************
> Kireeti: I keep seeing ISID, PBB, VLANs.  We have to stop conceding to layer 2
> and start moving to layer 3 (applause) and lots of problems will go away.
> Florin: We should put VPN on the same page to modify the priorities. You need
> to expand so can look at L3 over L3 as well as L2 over L3. For item number 2 we
> need to look at L3 transport as well as Ethernet and MPLS.  And for number 3
> we need to work on this as there's a lot of expertise in L2, L3 and other WGs.
> Marc: The problem statement needs to include more of the L3 issues around
> sending L2 over L3, as the L2 traffic already contains L3.  Issue is just framing.
> Eric Nordmark; "Yes, Yes and Yes".  We need to focus from the DC prospective
> and not from the VPN view.  don't just think about Ethernet over IP, but also IP
> over IP.  Hypervisor may get decoupled over time.
> ****************************************************************
> **********************************************************
> > -----邮件原件-----
> > 发件人: [] 代表 Eggert,
> Lars
> > 发送时间: 2011年12月16日 16:24
> > 收件人:
> > 主题: [dc] IRTF datacenter research group discussion list
> >
> > Hi,
> >
> > I wanted to make you all aware of the IRTF's dcrg-interest mailing list, which
> > was set up following a face-to-face meeting at SIGCOMM this year to discuss
> a
> > possible IRTF research group on datacenter networking:
> >
> >
> > There has not been much activity towards the formation of an IRTF RG since
> > SIGCOMM, but I am hopeful that the high energy level demonstrated in the
> > various Taipei meetings on the topic will inject some energy here - several of
> > the topics that may be too early for the IETF to standardize around would fit
> an
> > IRTF RG very well. I'm certainly supportive of this.
> >
> > Lars
> > IRTF Chair
> >
> >
> > --
> > Mobile number during December:    +358 46 5215582
> > Mobile number starting January:  +49 151 12055791

armd mailing list