RE: [Ipoverib] Please read - proposed WG termination

Bernard King-Smith <wombat2@us.ibm.com> Thu, 01 September 2005 00:11 UTC

Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EAcgJ-0005PR-CC; Wed, 31 Aug 2005 20:11:51 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EAcgE-0005PG-OY; Wed, 31 Aug 2005 20:11:50 -0400
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA28235; Wed, 31 Aug 2005 20:11:40 -0400 (EDT)
Received: from e4.ny.us.ibm.com ([32.97.182.144]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1EAci0-0008JU-3L; Wed, 31 Aug 2005 20:13:37 -0400
Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e4.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j810BGkQ017453; Wed, 31 Aug 2005 20:11:16 -0400
Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j810BFCo097152; Wed, 31 Aug 2005 20:11:15 -0400
Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j810B5cG012635; Wed, 31 Aug 2005 20:11:05 -0400
Received: from [9.56.228.210] (d01mlc96.pok.ibm.com [9.56.228.210]) by d01av03.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j810B5Qx012339; Wed, 31 Aug 2005 20:11:05 -0400
In-Reply-To: <506C3D7B14CDD411A52C00025558DED60893AD3C@mtlex01.yok.mtl.com>
Subject: RE: [Ipoverib] Please read - proposed WG termination
To: Dror Goldenberg <gdror@mellanox.co.il>
X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003
Message-ID: <OFEAD798BC.5BD12489-ON8525706E.00447E83-8525706F.0000FE88@us.ibm.com>
From: Bernard King-Smith <wombat2@us.ibm.com>
Date: Wed, 31 Aug 2005 20:10:51 -0400
X-MIMETrack: Serialize by Router on D01MLC96/01/M/IBM(Build V70_M6_06302005 Beta 4 HF4|August 24, 2005) at 08/31/2005 20:11:04
MIME-Version: 1.0
Content-type: text/plain; charset="US-ASCII"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 5d7a7e767f20255fce80fa0b77fb2433
Cc: margaret@thingmagic.com, kashyapv@us.ibm.com, "H.K. Jerry Chu" <Jerry.Chu@eng.sun.com>, ipoverib@ietf.org, Bill_Strahm@McAfee.com, ipoverib-bounces@ietf.org
X-BeenThere: ipoverib@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IP over InfiniBand WG Discussion List <ipoverib.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ipoverib>, <mailto:ipoverib-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ipoverib@ietf.org>
List-Help: <mailto:ipoverib-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ipoverib>, <mailto:ipoverib-request@ietf.org?subject=subscribe>
Sender: ipoverib-bounces@ietf.org
Errors-To: ipoverib-bounces@ietf.org




Having IPoIB-CM is a very important feature to make IB a viable
interconnect in clustered systems. Without IPoIB-CM for HPC clusters, and
commercial clusters using IB to SAN, you need two networks for good total
cluster performance, one for IB ( non-IP traffic ) and an IP performance
network like GigE.  This means that IB is not cost effective as the GigE (
or 10 GigE) network which handles both types of traffic reasonably.

In the HPC world most clusters use the cluster fabric ( and IB is the
future direction ) for both MPI and IP traffic. The IP traffic is usually
for parallel file systems and system management and control. This high
bandwidth IP network is required in most production HPC clusters.  With the
current IPoIB only using UD, the performance is dismal. Our simulations
using the small packet MTU of IB says that the parallel file systems (
GPFS, PVFS, Lustre etc ) can only get 25% of a 4X IB link today and at 12X
it will be about 10%. The problem is that the IP drivers are single
threaded per adapter. Also the CPU utilization of TCP/IP at a MTU of the IB
link very high because of the per packet stack processing. Going to
IPoIB-CM means we can cut down the number of TCP/IP stack traversals from
32 to 1 for a 60K IP packet. This means that you have 30 times as much data
transmitted per device driver call. This will enable IP to show similar
bandwidth with multiple sockets as other protocols that can use RC or
fragment within the device driver.

For commercial clusters, if IB is used for storage, then you save a network
by having fast IP performance and can use the IB network for both. Why use
IB and another network for the commercial cluster, when the other network
supports similar bandwidth for storage and IP.

Implementing IPoIB-CM makes IB viable in the HPC cluster and some
commercial clusters. Otherwise I don't think it competes economically with
other network technologies.

Regards.

Bernie King-Smith
IBM Corporation
Server Group
Cluster System Performance
wombat2@us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES

"We are not responsible for the world we are born into, only for the world
we leave when we die.
So we have to accept what has gone before us and work to change the only
thing we can,
-- The Future." William Shatner


                                                                           
             Dror Goldenberg                                               
             <gdror@mellanox.c                                             
             o.il>                                                      To 
             Sent by:                  kashyapv@us.ltcfwd.linux.ibm.com,   
             ipoverib-bounces@         "H.K. Jerry Chu"                    
             ietf.org                  <Jerry.Chu@eng.sun.com>             
                                                                        cc 
                                       margaret@thingmagic.com,            
             08/30/2005 09:32          ipoverib@ietf.org,                  
             AM                        Bill_Strahm@McAfee.com              
                                                                   Subject 
                                       RE: [Ipoverib] Please read -        
                                       proposed WG termination             
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           








> From: Vivek Kashyap [mailto:kashyapv@us.ibm.com]
> Sent: Tuesday, August 30, 2005 8:39 AM
>
> On Mon, 29 Aug 2005, H.K. Jerry Chu wrote:
>


<snip>


> > 1. IPoIB connected mode draft-ietf-ipoib-connected-mode-00.txt
> > updated recently
>
> Well, in recent days there has been a discussion going on
> based on Dror's input. I also made some updates after some
> discussion on OpenIB (not on
> IETF though).  This draft itself became a working group draft
> this february
> after some lively discussion just before that.  It appears to
> me that we
> should be possible to finalise this draft soon enough.
>
> 20th sept. might be long enough to know one way or the other...
>
> vivek
>


We would like to see IPoIB-CM being finalized in IETF. We see
great value in having a standard for connected mode which effectively
increases the MTU. We are willing to contribute to the standardization
effort. We're also looking at the implementation of IPoIB-CM in Linux.


-Dror _______________________________________________
IPoverIB mailing list
IPoverIB@ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib





_______________________________________________
IPoverIB mailing list
IPoverIB@ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib