RE: [Ips] Recent comments about FCoE and iSCSI
Julian Satran <Julian_Satran@il.ibm.com> Fri, 27 April 2007 01:16 UTC
Return-path: <ips-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HhF4a-0002qk-4h; Thu, 26 Apr 2007 21:16:32 -0400
Received: from ips by megatron.ietf.org with local (Exim 4.43) id 1HhF4Y-0002U0-9r for ips-confirm+ok@megatron.ietf.org; Thu, 26 Apr 2007 21:16:30 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HhF4X-0002O9-OC for ips@ietf.org; Thu, 26 Apr 2007 21:16:29 -0400
Received: from mtagate1.de.ibm.com ([195.212.29.150]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HhF4T-0000RB-C6 for ips@ietf.org; Thu, 26 Apr 2007 21:16:29 -0400
Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate1.de.ibm.com (8.13.8/8.13.8) with ESMTP id l3R1GO2R110416 for <ips@ietf.org>; Fri, 27 Apr 2007 01:16:24 GMT
Received: from d12av02.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v8.3) with ESMTP id l3R1GOT44096242 for <ips@ietf.org>; Fri, 27 Apr 2007 03:16:24 +0200
Received: from d12av02.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l3R1GOmg006623 for <ips@ietf.org>; Fri, 27 Apr 2007 03:16:24 +0200
Received: from d12mc102.megacenter.de.ibm.com (d12mc102.megacenter.de.ibm.com [9.149.167.114]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id l3R1GO8G006620; Fri, 27 Apr 2007 03:16:24 +0200
In-Reply-To: <179300.47262.qm@web36705.mail.mud.yahoo.com>
To: Zack Best <zbest28@yahoo.com>
MIME-Version: 1.0
Subject: RE: [Ips] Recent comments about FCoE and iSCSI
X-Mailer: Lotus Notes Release 7.0 HF277 June 21, 2006
From: Julian Satran <Julian_Satran@il.ibm.com>
Message-ID: <OF163C49D3.8556B9C6-ON852572CA.00061041-852572CA.0006FD27@il.ibm.com>
Date: Thu, 26 Apr 2007 21:16:21 -0400
X-MIMETrack: Serialize by Router on D12MC102/12/M/IBM(Release 7.0.2HF71 | November 3, 2006) at 27/04/2007 04:16:23, Serialize complete at 27/04/2007 04:16:23
X-Spam-Score: 0.5 (/)
X-Scan-Signature: c96e11e58076fc8e92061fb6cbdfae15
Cc: ips@ietf.org
X-BeenThere: ips@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IP Storage <ips.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ips>, <mailto:ips-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ips@ietf.org>
List-Help: <mailto:ips-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ips>, <mailto:ips-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1988015737=="
Errors-To: ips-bounces@ietf.org
Excellent comments. My take (if not obvious from the previous text) is that data centers will be very large and compute power (as evidenced by the multicore) and advances in stack implementation are bound to improve substantialy the performance of the protocol stacks (see Intel and our work) and layer 3 switching. It is important also to point out that Ethernet has substantial latencies if only bridging is using and replacement technologies (such as Rbridges or others) may take some time to appear. Julo Zack Best <zbest28@yahoo.com> 25/04/07 16:37 To ips@ietf.org cc Subject RE: [Ips] Recent comments about FCoE and iSCSI The real debate here is between two types of networks. The first is reliable at the link level and does not drop packets under congestion. The second is running a reliable transport protocol (i.e. TCP) over an unreliable link level network. I agree with the scaling argument. For sufficiently large networks, reliable link level doesn't work well because network component failure, or chronically congested links are not handled well. For sufficiently small networks, reliable link level has some significant advantages in simplicity, low hardware cost, performance, and worst case latency. My personal view is that the vast majority of enterprise storage networks fall in the "sufficiently small" category. This view has to some extent been vindicated by the continuing success of Fibre Channel in this space and the inability of iSCSI to displace FC in any significant way for enterprise storage. Of course, this may or may not change in the future. Whether FC is simpler than iSCSI depends largely on your definition of simplicity. If one defines simplicity/complexity as the number of gates or lines of code to reduce the protocol to hardware or firmware, then my experience is that iSCSI is 2X to 3X the complexity of FC. This has implications in cost and reliability. Particularly problematic with iSCSI is the unpredictability of the performance. Performance is great with no packet drop. However even a small amount of congestion can cause a sudden large drop and performance. This can be difficult to predict as a network that is almost but not quite congested can run great, but a small incremental change of any sort can cause the performance to become suddenly unacceptable. For FC, or other protocol using link level flow control, the reduction in performance is much more graceful and incremental when the level of congestion is small and intermittent. A second major problem with iSCSI is the unbounded nature of worst case latency. When a storage network fails, it is desirable to detect the failure in a fraction of a second and transition to a backup network. TCP, when implemented to the standards, can take many seconds or minutes to determine that a network has failed and close the connection. RFC 2988, for instance, requires that the minimum retransmission be one second. This means a single dropped packet may add one second to the latency of outstanding commands. This is a huge amount of time on a 10G link. No doubt this could be mitigated by drastically reducing the timeouts within TCP, but the market seems to be surprisingly resistant to tampering with accepted standards here. Overall, the FC and FCP protocol have a lot in common with the Intel i86 instruction set architecture. They are overly complex, and rather poorly designed by modern standards. But they are good enough, and there is a huge amount of value add that has been built on top of them, and therefore little incentive to change. FCoE is an interesting idea because it preserves 90% of the existing value add of FC, unifies the physical link with Ethernet, and uses the reliable link method of packet delivery. There are two significant possibilities for iSCSI to displace FC (or FCoE) in enterprise storage networks. First is if the networks start to scale to large enough size that FC can't be made sufficiently reliable, and second if CPU compute cycles become sufficiently cheap that the iSCSI protocol can be run in host software with no negative performance impact. Barring either of these, it seems that iSCSI will have an uphill battle, and FCoE may have a place. -----Original Message----- From: Julian Satran [mailto:Julian_Satran@il.ibm.com] Sent: Tuesday, April 24, 2007 3:10 PM To: ips@ietf.org Subject: [Ips] Recent comments about FCoE and iSCSI Dear All, The trade press is lately full with comments about the latest and greatest reincarnation of Fiber Channel over ethernet. It made me try and summarize all the long and hot debates that preceded the advent of iSCSI. Although FCoE proponents make it look like no debate preceded iSCSI that was not so - FCoE was considered even then and was dropped as a dumb idea. Here is a summary (as afar as I can remember) of the main arguments. They are not bad arguments even in retrospect and technically FCoE doesn't look better than it did then. Feel free to use this material in a nay form. I expect this group to seriously expand my arguments and make them public - in personal or collective form. And do not forget - it is a technical dispute - although we all must have some doubts about the way it is pursued. Regards, Julo --------------------------------------------------------------------- What a piece of nostalgia :-) Around 1997 when a team at IBM Research (Haifa and Almaden) started looking at connecting storage to servers using the "regular network" (the ubiquitous LAN) we considered many alternatives (another team even had a look at ATM - still a computer network candidate at the time). I won't get you over all of our rationale (and we went over some of them again at the end of 1999 with a team from CISCO before we convened the first IETF BOF in 2000 at Adelaide that resulted in iSCSI and all the rest) but some of the reasons we choose to drop Fiber Channel over raw Ethernet where multiple: Fiber Channel Protocol (SCSI over Fiber Channel Link) is "mildly" effective because: it implements endpoints in a dedicated engine (Offload) it has no transport layer (recovery is done at the application layer under the assumption that the error rate will be very low) the network is limited in physical span and logical span (number of switches) flow-control/congestion control is achieved with a mechanism adequate for a limited span network (credits). The packet loss rate is almost nil and that allows FCP to avoid using a transport (end-to-end) layer FCP she switches are simple (addresses are local and the memory requirements cam be limited through the credit mechanism) However FCP endpoints are inherently costlier than simple NICs ? the cost argument (initiators are more expensive) The credit mechanisms is highly unstable for large networks (check switch vendors planning docs for the network diameter limits) ? the scaling argument The assumption of low losses due to errors might radically change when moving from 1 to 10 Gb/s ? the scaling argument Ethernet has no credit mechanism and any mechanism with a similar effect increases the end point cost. Building a transport layer in the protocol stack has always been the preferred choice of the networking community ? the community argument The "performance penalty" of a complete protocol stack has always been overstated (and overrated). Advances in protocol stack implementation and finer tuning of the congestion control mechanisms make conventional TCP/IP performing well even at 10 Gb/s and over. Moreover the multicore processors that become dominant on the computing scene have enough compute cycles available to make any "offloading" possible as a mere code restructuring exercise (see the stack reports from Intel, IBM etc.) Building on a complete stack makes available a wealth of operational and management mechanisms built over the years by the networking community (routing, provisioning, security, service location etc.) ? the community argument Higher level storage access over an IP network is widely available and having both block and file served over the same connection with the same support and management structure is compelling ? the community argument Highly efficient networks are easy to build over IP with optimal (shortest path) routing while Layer 2 networks use bridging and are limited by the logical tree structure that bridges must follow. The effort to combine routers and bridges (rbridges) is promising to change that but it will take some time to finalize (and we don't know exactly how it will operate). Untill then the scale of Layer 2 network is going to seriously limited ? the scaling argument As a side argument ? a performance comparison made in 1998 showed SCSI over TCP (a predecessor of the later iSCSI) to perform better than FCP at 1Gbs for block sizes typical for OLTP (4-8KB). That was what convinced us to take the path that lead to iSCSI ? and we used plain vanilla x86 servers with plain-vanilla NICs and Linux (with similar measurements conducted on Windows). The networking and storage community acknowledged those arguments and developed iSCSI and the companion protocols for service discovery, boot etc. The community also acknowledged the need to support existing infrastructure and extend it in a reasonable fashion and developed 2 protocols iFCP (to support hosts with FCP drivers and IP connections to connect to storage by a simple conversion from FCP to TCP packets) FCPIP to extend the reach of FCP through IP (connects FCP islands through TCP links). Both have been implemented and their foundation is solid. The current attempt of developing a "new-age" FCP over an Ethernet link is going against most of the arguments that have given us iSCSI etc. It ignores the networking layering practice, build an application protocol directly above a link and thus limits scaling, mandates elements at the link layer and application layer that make applications more expensive and leaves aside the whole "ecosystem" that accompanies TCP/IP (and not Ethernet). In some related effort (and at a point also when developing iSCSI) we considered also moving away from SCSI (like some "no standardized" but popular in some circles software did ? e.g., NBP) but decided against. SCSI is a mature and well understood access architecture for block storage and is implemented by many device vendors. Moving away from it would not have been justified at the time. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Ips mailing list Ips@ietf.org https://www1.ietf.org/mailman/listinfo/ips
_______________________________________________ Ips mailing list Ips@ietf.org https://www1.ietf.org/mailman/listinfo/ips
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Eddy Quicksall
- [Ips] Recent comments about FCoE and iSCSI Julian Satran
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Julian Satran
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Eddy Quicksall
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Julian Satran
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Eddy Quicksall
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Julian Satran
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Eddy Quicksall
- RE: FW: [Ips] Recent comments about FCoE and iSCSI John Hufferd
- RE: FW: [Ips] Recent comments about FCoE and iSCSI Sandars, Ken
- RE: FW: [Ips] Recent comments about FCoE and iSCSI John Hufferd
- RE: FW: [Ips] Recent comments about FCoE and iSCSI Robert Snively
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Silvano Gai
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Julian Satran
- RE: FW: [Ips] Recent comments about FCoE and iSCSI Silvano Gai
- RE: FW: [Ips] Recent comments about FCoE and iSCSI brown_David1
- RE: FW: [Ips] Recent comments about FCoE and iSCSI Julian Satran
- RE: FW: [Ips] Recent comments about FCoE and iSCSI Silvano Gai
- RE: FW: [Ips] Recent comments about FCoE and iSCSI Frank D'Agostino (fdagosti)
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Eddy Quicksall
- RE: FW: [Ips] Recent comments about FCoE and iSCSI John Hufferd
- RE: FW: [Ips] Recent comments about FCoE and iSCSI John Hufferd
- RE: FW: [Ips] Recent comments about FCoE and iSCSI Michael Krause
- RE: FW: [Ips] Recent comments about FCoE and iSCSI Silvano Gai
- Re: FW: [Ips] Recent comments about FCoE and iSCSI Eddy Quicksall
- RE: [Ips] Recent comments about FCoE and iSCSI Julian Satran
- RE: [Ips] Recent comments about FCoE and iSCSI Silvano Gai
- RE: [Ips] Recent comments about FCoE and iSCSI Larry Boucher
- RE: [Ips] Recent comments about FCoE and iSCSI Julian Satran
- RE: [Ips] Recent comments about FCoE and iSCSI Michael Krause
- RE: [Ips] Recent comments about FCoE and iSCSI Michael Krause
- RE: [Ips] Recent comments about FCoE and iSCSI Julian Satran
- RE: [Ips] Recent comments about FCoE and iSCSI Nicholas A. Bellinger