RE: [Ipoverib] Please read - proposed WG termination

"H.K. Jerry Chu" <Jerry.Chu@eng.sun.com> Thu, 01 September 2005 18:19 UTC

Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EAteX-0000Ec-Nc; Thu, 01 Sep 2005 14:19:09 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1EAteT-0000ER-KX; Thu, 01 Sep 2005 14:19:08 -0400
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA18442; Thu, 1 Sep 2005 14:19:02 -0400 (EDT)
Received: from brmea-mail-4.sun.com ([192.18.98.36]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1EAtgR-0007K7-HH; Thu, 01 Sep 2005 14:21:10 -0400
Received: from jurassic.eng.sun.com ([129.146.17.57]) by brmea-mail-4.sun.com (8.12.10/8.12.9) with ESMTP id j81IITTW007786; Thu, 1 Sep 2005 12:18:29 -0600 (MDT)
Received: from sweethome (punchin-hkchu.SFBay.Sun.COM [192.9.61.13]) by jurassic.eng.sun.com (8.13.5.Beta0+Sun/8.13.5.Beta0) with SMTP id j81IIPJV251312; Thu, 1 Sep 2005 11:18:28 -0700 (PDT)
Message-Id: <200509011818.j81IIPJV251312@jurassic.eng.sun.com>
Date: Thu, 01 Sep 2005 11:19:23 -0700
From: "H.K. Jerry Chu" <Jerry.Chu@eng.sun.com>
Subject: RE: [Ipoverib] Please read - proposed WG termination
To: wombat2@us.ibm.com, gdror@mellanox.co.il, krause@cup.hp.com
MIME-Version: 1.0
Content-Type: TEXT/plain; charset="us-ascii"
Content-MD5: PZixTRKD39S6gG5FvHC76g==
X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.6 SunOS 5.10 sun4u sparc
X-Spam-Score: 0.0 (/)
X-Scan-Signature: d890c9ddd0b0a61e8c597ad30c1c2176
Cc: margaret@thingmagic.com, kashyapv@us.ibm.com, Bill_Strahm@McAfee.com, ipoverib-bounces@ietf.org, ipoverib@ietf.org
X-BeenThere: ipoverib@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: "H.K. Jerry Chu" <Jerry.Chu@eng.sun.com>
List-Id: IP over InfiniBand WG Discussion List <ipoverib.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ipoverib>, <mailto:ipoverib-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ipoverib@ietf.org>
List-Help: <mailto:ipoverib-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ipoverib>, <mailto:ipoverib-request@ietf.org?subject=subscribe>
Sender: ipoverib-bounces@ietf.org
Errors-To: ipoverib-bounces@ietf.org

[co-chair hat off]

...
<snip>

>These performance problems are primarily implementation-specific and have 
>little to do with IB technology itself.  In addition, nearly all IB 
>solutions use a 2KB not the smallest MTU to transfer data - no different 
>than Ethernet.

Ethernet is adopting jumboframe to get more firing power. Where is IB's
equivalent of jumboframe?

>As I and others have raised over the years, the enablement 
>of IP over IB to perform well is a local HCA issue not a standards 
>issue.  Addition of checksum off-load support to the HCA is rather trivial 
>and does not require standardization (this is what is done for Ethernet 
>today and is non-standard).  Addition of large send off-load support is a 
>local HCA issue not a standards issue and effectively provides the same 
>benefit as connected mode.

Yes LSO (or TSO as some call it) is relatively easy. But LRO (large receive
offload) is a heck more difficult. IB connected transports already have all
silicons to do it. Why not just use it?

>The use of multiple QP to spread work across 
>CPU for both send / receive ala the multi-queue support I've worked with 
>various Ethernet IHV to get in place is again a local HCA issue (does not 
>have to be visible as part of the layer 2 address resolution).  One can 
>construct a very nice performing IP over IB solution but there hasn't been 
>much public progress to implement these de facto capabilities found in 
>Ethernet solutions on IB.  Getting these into a HCA implementation is a 
>heck of a lot easier and faster to do than to develop a standard and 
>getting all of the OS changes made (the HCA implementation issues can all 
>be done underneath the IP stack just like with Ethernet so no real OS impacts).

I don't understand the large MTU issue to the OS (requiring continguous physical
addresses). Aren't all decent hardware capable of scatter/gather these days?

What's more hairy to the OS stack is the per-destination MTU and different
MTU for multicast than for unicast inherited in IPoIB CM.

Jerry

>
>
>>For commercial clusters, if IB is used for storage, then you save a network
>>by having fast IP performance and can use the IB network for both. Why use
>>IB and another network for the commercial cluster, when the other network
>>supports similar bandwidth for storage and IP.
>
>There will always be Ethernet in any cluster so the fabric is there.  The 
>question is whether it is just for low-bandwidth / management services or 
>for applications.  For storage, need to separate the discussion into 
>whether it is block or file.  For block, IB gateways to Fibre Channel, etc. 
>can and are being used today quite nicely.  Performance is reasonable and 
>the ecosystem costs, target availability, customer "pain", etc. are much 
>lower than attempting to move to native IB storage.  The same applies to 
>file based where IB gateways to Ethernet which then attaches to file 
>servers works quite nicely.  In fact, the original vision of IB was that of 
>an I/O fabric to create modular server solutions.  The addition of IPC came 
>later in the process when it was found to be relatively low cost to 
>define.  So, IB is successful in the HPC world and slowly entering some 
>commercial solutions.  To state that its future relies on getting an IP 
>over IB RC solution is perhaps blowing it a bit out of proportion.   The 
>easier path for all is to simply use the techniques I and others have 
>advocated for years now and solve the problems within the HCA 
>implementation.  Much lower costs and will result in delivering a good 
>performance solution.
>
>BTW, RNIC / Ethernet solutions implement these techniques today.  With the 
>arrival of 10 GbE and the lower prices of RNIC and 10 GbE switch ports, 
>lower latency switches (competitive enough with IB for commercial and many 
>HPC clusters), etc. the success of IB must lie elsewhere and not on an IETF 
>spec.   This was noted at the recent IEEE Hot Interconnects conference as 
>well so isn't just my opinion.
>
>Mike
>
>>Implementing IPoIB-CM makes IB viable in the HPC cluster and some
>>commercial clusters. Otherwise I don't think it competes economically with
>>other network technologies.
>>
>>Regards.
>>
>>Bernie King-Smith
>>IBM Corporation
>>Server Group
>>Cluster System Performance
>>wombat2@us.ibm.com    (845)433-8483
>>Tie. 293-8483 or wombat2 on NOTES
>>
>>"We are not responsible for the world we are born into, only for the world
>>we leave when we die.
>>So we have to accept what has gone before us and work to change the only
>>thing we can,
>>-- The Future." William Shatner
>>
>>
>>
>>              Dror Goldenberg
>>              <gdror@mellanox.c
>>              o.il>                                                      To
>>              Sent by:                  kashyapv@us.ltcfwd.linux.ibm.com,
>>              ipoverib-bounces@         "H.K. Jerry Chu"
>>              ietf.org                  <Jerry.Chu@eng.sun.com>
>>                                                                         cc
>>                                        margaret@thingmagic.com,
>>              08/30/2005 09:32          ipoverib@ietf.org,
>>              AM                        Bill_Strahm@McAfee.com
>>                                                                    Subject
>>                                        RE: [Ipoverib] Please read -
>>                                        proposed WG termination
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > From: Vivek Kashyap [mailto:kashyapv@us.ibm.com]
>> > Sent: Tuesday, August 30, 2005 8:39 AM
>> >
>> > On Mon, 29 Aug 2005, H.K. Jerry Chu wrote:
>> >
>>
>>
>><snip>
>>
>>
>> > > 1. IPoIB connected mode draft-ietf-ipoib-connected-mode-00.txt
>> > > updated recently
>> >
>> > Well, in recent days there has been a discussion going on
>> > based on Dror's input. I also made some updates after some
>> > discussion on OpenIB (not on
>> > IETF though).  This draft itself became a working group draft
>> > this february
>> > after some lively discussion just before that.  It appears to
>> > me that we
>> > should be possible to finalise this draft soon enough.
>> >
>> > 20th sept. might be long enough to know one way or the other...
>> >
>> > vivek
>> >
>>
>>
>>We would like to see IPoIB-CM being finalized in IETF. We see
>>great value in having a standard for connected mode which effectively
>>increases the MTU. We are willing to contribute to the standardization
>>effort. We're also looking at the implementation of IPoIB-CM in Linux.
>>
>>
>>-Dror _______________________________________________
>>IPoverIB mailing list
>>IPoverIB@ietf.org
>>https://www1.ietf.org/mailman/listinfo/ipoverib
>>
>>
>>
>>
>>
>>_______________________________________________
>>IPoverIB mailing list
>>IPoverIB@ietf.org
>>https://www1.ietf.org/mailman/listinfo/ipoverib


_______________________________________________
IPoverIB mailing list
IPoverIB@ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib