RE: [Ipoverib] Please read - proposed WG termination
Michael Krause <krause@cup.hp.com> Thu, 01 September 2005 17:48 UTC
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EAtAu-0008DT-SW; Thu, 01 Sep 2005 13:48:32 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EAtAs-0008CI-Ap; Thu, 01 Sep 2005 13:48:30 -0400
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA16962; Thu, 1 Sep 2005 13:48:27 -0400 (EDT)
Received: from palrel10.hp.com ([156.153.255.245]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1EAtCp-0006NO-SA; Thu, 01 Sep 2005 13:50:34 -0400
Received: from esmail.cup.hp.com (esmail.cup.hp.com [15.0.65.164]) by palrel10.hp.com (Postfix) with ESMTP id C88BB2585; Thu, 1 Sep 2005 10:48:21 -0700 (PDT)
Received: from MK73191c.cup.hp.com ([15.244.205.99]) by esmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id KAA28326; Thu, 1 Sep 2005 10:40:18 -0700 (PDT)
Message-Id: <6.2.0.14.2.20050901102429.028498c8@esmail.cup.hp.com>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.0.14
Date: Thu, 01 Sep 2005 10:38:50 -0700
To: Bernard King-Smith <wombat2@us.ibm.com>, Dror Goldenberg <gdror@mellanox.co.il>
From: Michael Krause <krause@cup.hp.com>
Subject: RE: [Ipoverib] Please read - proposed WG termination
In-Reply-To: <OFEAD798BC.5BD12489-ON8525706E.00447E83-8525706F.0000FE88@ us.ibm.com>
References: <506C3D7B14CDD411A52C00025558DED60893AD3C@mtlex01.yok.mtl.com> <OFEAD798BC.5BD12489-ON8525706E.00447E83-8525706F.0000FE88@us.ibm.com>
Mime-Version: 1.0
X-Spam-Score: 0.8 (/)
X-Scan-Signature: 4a96669441ad70ecf6aebb4b47b971cd
Cc: "H.K. Jerry Chu" <Jerry.Chu@eng.sun.com>, Bill_Strahm@McAfee.com, margaret@thingmagic.com, kashyapv@us.ibm.com, ipoverib-bounces@ietf.org, ipoverib@ietf.org
X-BeenThere: ipoverib@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IP over InfiniBand WG Discussion List <ipoverib.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ipoverib>, <mailto:ipoverib-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ipoverib@ietf.org>
List-Help: <mailto:ipoverib-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ipoverib>, <mailto:ipoverib-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1220239762=="
Sender: ipoverib-bounces@ietf.org
Errors-To: ipoverib-bounces@ietf.org
At 05:10 PM 8/31/2005, Bernard King-Smith wrote: >Having IPoIB-CM is a very important feature to make IB a viable >interconnect in clustered systems. Without IPoIB-CM for HPC clusters, and >commercial clusters using IB to SAN, you need two networks for good total >cluster performance, one for IB ( non-IP traffic ) and an IP performance >network like GigE. This means that IB is not cost effective as the GigE ( >or 10 GigE) network which handles both types of traffic reasonably. > >In the HPC world most clusters use the cluster fabric ( and IB is the >future direction ) for both MPI and IP traffic. The IP traffic is usually >for parallel file systems and system management and control. This high >bandwidth IP network is required in most production HPC clusters. With the >current IPoIB only using UD, the performance is dismal. Our simulations >using the small packet MTU of IB says that the parallel file systems ( >GPFS, PVFS, Lustre etc ) can only get 25% of a 4X IB link today and at 12X >it will be about 10%. The problem is that the IP drivers are single >threaded per adapter. Also the CPU utilization of TCP/IP at a MTU of the IB >link very high because of the per packet stack processing. Going to >IPoIB-CM means we can cut down the number of TCP/IP stack traversals from >32 to 1 for a 60K IP packet. This means that you have 30 times as much data >transmitted per device driver call. This will enable IP to show similar >bandwidth with multiple sockets as other protocols that can use RC or >fragment within the device driver. These performance problems are primarily implementation-specific and have little to do with IB technology itself. In addition, nearly all IB solutions use a 2KB not the smallest MTU to transfer data - no different than Ethernet. As I and others have raised over the years, the enablement of IP over IB to perform well is a local HCA issue not a standards issue. Addition of checksum off-load support to the HCA is rather trivial and does not require standardization (this is what is done for Ethernet today and is non-standard). Addition of large send off-load support is a local HCA issue not a standards issue and effectively provides the same benefit as connected mode. The use of multiple QP to spread work across CPU for both send / receive ala the multi-queue support I've worked with various Ethernet IHV to get in place is again a local HCA issue (does not have to be visible as part of the layer 2 address resolution). One can construct a very nice performing IP over IB solution but there hasn't been much public progress to implement these de facto capabilities found in Ethernet solutions on IB. Getting these into a HCA implementation is a heck of a lot easier and faster to do than to develop a standard and getting all of the OS changes made (the HCA implementation issues can all be done underneath the IP stack just like with Ethernet so no real OS impacts). >For commercial clusters, if IB is used for storage, then you save a network >by having fast IP performance and can use the IB network for both. Why use >IB and another network for the commercial cluster, when the other network >supports similar bandwidth for storage and IP. There will always be Ethernet in any cluster so the fabric is there. The question is whether it is just for low-bandwidth / management services or for applications. For storage, need to separate the discussion into whether it is block or file. For block, IB gateways to Fibre Channel, etc. can and are being used today quite nicely. Performance is reasonable and the ecosystem costs, target availability, customer "pain", etc. are much lower than attempting to move to native IB storage. The same applies to file based where IB gateways to Ethernet which then attaches to file servers works quite nicely. In fact, the original vision of IB was that of an I/O fabric to create modular server solutions. The addition of IPC came later in the process when it was found to be relatively low cost to define. So, IB is successful in the HPC world and slowly entering some commercial solutions. To state that its future relies on getting an IP over IB RC solution is perhaps blowing it a bit out of proportion. The easier path for all is to simply use the techniques I and others have advocated for years now and solve the problems within the HCA implementation. Much lower costs and will result in delivering a good performance solution. BTW, RNIC / Ethernet solutions implement these techniques today. With the arrival of 10 GbE and the lower prices of RNIC and 10 GbE switch ports, lower latency switches (competitive enough with IB for commercial and many HPC clusters), etc. the success of IB must lie elsewhere and not on an IETF spec. This was noted at the recent IEEE Hot Interconnects conference as well so isn't just my opinion. Mike >Implementing IPoIB-CM makes IB viable in the HPC cluster and some >commercial clusters. Otherwise I don't think it competes economically with >other network technologies. > >Regards. > >Bernie King-Smith >IBM Corporation >Server Group >Cluster System Performance >wombat2@us.ibm.com (845)433-8483 >Tie. 293-8483 or wombat2 on NOTES > >"We are not responsible for the world we are born into, only for the world >we leave when we die. >So we have to accept what has gone before us and work to change the only >thing we can, >-- The Future." William Shatner > > > > Dror Goldenberg > <gdror@mellanox.c > o.il> To > Sent by: kashyapv@us.ltcfwd.linux.ibm.com, > ipoverib-bounces@ "H.K. Jerry Chu" > ietf.org <Jerry.Chu@eng.sun.com> > cc > margaret@thingmagic.com, > 08/30/2005 09:32 ipoverib@ietf.org, > AM Bill_Strahm@McAfee.com > Subject > RE: [Ipoverib] Please read - > proposed WG termination > > > > > > > > > > > > > > > > From: Vivek Kashyap [mailto:kashyapv@us.ibm.com] > > Sent: Tuesday, August 30, 2005 8:39 AM > > > > On Mon, 29 Aug 2005, H.K. Jerry Chu wrote: > > > > ><snip> > > > > > 1. IPoIB connected mode draft-ietf-ipoib-connected-mode-00.txt > > > updated recently > > > > Well, in recent days there has been a discussion going on > > based on Dror's input. I also made some updates after some > > discussion on OpenIB (not on > > IETF though). This draft itself became a working group draft > > this february > > after some lively discussion just before that. It appears to > > me that we > > should be possible to finalise this draft soon enough. > > > > 20th sept. might be long enough to know one way or the other... > > > > vivek > > > > >We would like to see IPoIB-CM being finalized in IETF. We see >great value in having a standard for connected mode which effectively >increases the MTU. We are willing to contribute to the standardization >effort. We're also looking at the implementation of IPoIB-CM in Linux. > > >-Dror _______________________________________________ >IPoverIB mailing list >IPoverIB@ietf.org >https://www1.ietf.org/mailman/listinfo/ipoverib > > > > > >_______________________________________________ >IPoverIB mailing list >IPoverIB@ietf.org >https://www1.ietf.org/mailman/listinfo/ipoverib
_______________________________________________ IPoverIB mailing list IPoverIB@ietf.org https://www1.ietf.org/mailman/listinfo/ipoverib
- [Ipoverib] Please read - proposed WG termination H.K. Jerry Chu
- Re: [Ipoverib] Please read - proposed WG terminat… Michael Krause
- RE: [Ipoverib] Please read - proposed WG terminat… Yaron Haviv
- RE: [Ipoverib] Please read - proposed WG terminat… H.K. Jerry Chu
- RE: [Ipoverib] Please read - proposed WG terminat… Michael Krause
- RE: [Ipoverib] Please read - proposed WG terminat… Yaron Haviv
- RE: [Ipoverib] Please read - proposed WG terminat… Yaron Haviv
- RE: [Ipoverib] Please read - proposed WG terminat… Carl Hensler
- Re: [Ipoverib] Please read - proposed WG terminat… Vivek Kashyap
- RE: [Ipoverib] Please read - proposed WG terminat… Harald Tveit Alvestrand
- RE: [Ipoverib] Please read - proposed WG terminat… Vivek Kashyap
- RE: [Ipoverib] Please read - proposed WG terminat… Yaron Haviv
- RE: [Ipoverib] Please read - proposed WG terminat… Dror Goldenberg
- RE: [Ipoverib] Please read - proposed WG terminat… Michael Krause
- RE: [Ipoverib] Please read - proposed WG terminat… Michael Krause
- RE: [Ipoverib] Please read - proposed WG terminat… Yaron Haviv
- RE: [Ipoverib] Please read - proposed WG terminat… Bernard King-Smith
- RE: [Ipoverib] Please read - proposed WG terminat… Harald Tveit Alvestrand
- RE: [Ipoverib] Please read - proposed WG terminat… Michael Krause
- Re: [Ipoverib] Please read - proposed WG terminat… Vivek Kashyap
- RE: [Ipoverib] Please read - proposed WG terminat… H.K. Jerry Chu
- Re: FW: [Ipoverib] Please read - proposed WG term… Eitan Zahavi
- Re: [Ipoverib] Please read - proposed WG terminat… Roland Dreier
- Re: FW: [Ipoverib] Please read - proposed WG term… H.K. Jerry Chu
- RE: FW: [Ipoverib] Please read - proposed WG term… Sean Harnedy
- RE: FW: [Ipoverib] Please read - proposed WG term… H.K. Jerry Chu
- RE: FW: [Ipoverib] Please read - proposed WG term… bill
- RE: [Ipoverib] Please read - proposed WG terminat… Michael Krause
- Re: [Ipoverib] Please read - proposed WG terminat… Michael Krause
- RE: [Ipoverib] Please read - proposed WG terminat… Bernard King-Smith
- Re: [Ipoverib] Please read - proposed WG terminat… Vivek Kashyap
- RE: [Ipoverib] Please read - proposed WG terminat… Vivek Kashyap
- Why is MTU an issue? (RE: [Ipoverib] Please read … Harald Tveit Alvestrand
- Ecosystems cost of additional specs (RE: [Ipoveri… Harald Tveit Alvestrand
- Re: Why is MTU an issue? (RE: [Ipoverib] Please r… Mark Townsley
- Re: FW: [Ipoverib] Please read - proposed WG term… Eitan Zahavi
- RE: [Ipoverib] Please read - proposed WG terminat… Dror Goldenberg
- RE: [Ipoverib] Please read - proposed WG terminat… Dror Goldenberg
- RE: [Ipoverib] Please read - proposed WG terminat… Michael Krause
- RE: [Ipoverib] Please read - proposed WG terminat… Michael Krause
- RE: [Ipoverib] Please read - proposed WG terminat… Dror Goldenberg
- RE: [Ipoverib] Please read - proposed WG terminat… Michael Krause