RE: [Ips] Recent comments about FCoE and iSCSI

Michael Krause <> Mon, 30 April 2007 19:21 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1HibRF-0003G9-Kq; Mon, 30 Apr 2007 15:21:33 -0400
Received: from ips by with local (Exim 4.43) id 1HibRD-0003G3-Lc for; Mon, 30 Apr 2007 15:21:31 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1HibRD-0003Fv-Bo for; Mon, 30 Apr 2007 15:21:31 -0400
Received: from ([]) by with esmtp (Exim 4.43) id 1HibRC-00073i-0L for; Mon, 30 Apr 2007 15:21:31 -0400
Received: from ( []) by (Postfix) with ESMTP id 865A1349CD; Mon, 30 Apr 2007 12:21:29 -0700 (PDT)
Received: from ( []) by (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id MAA04908; Mon, 30 Apr 2007 12:15:34 -0700 (PDT)
Message-Id: <>
X-Mailer: QUALCOMM Windows Eudora Version
Date: Mon, 30 Apr 2007 08:38:58 -0700
To: Julian Satran <>, Zack Best <>
From: Michael Krause <>
Subject: RE: [Ips] Recent comments about FCoE and iSCSI
In-Reply-To: <OF163C49D3.8556B9C6-ON852572CA.00061041-852572CA.0006FD27@>
References: <> <>
Mime-Version: 1.0
X-Spam-Score: 0.5 (/)
X-Scan-Signature: 74c8c6a39062dbfd583931efcf641276
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IP Storage <>
List-Unsubscribe: <>, <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
Content-Type: multipart/mixed; boundary="===============2084374771=="

Slight variation:

As many-core technology becomes common place and compute power density is 
combined with the rise of blades as the preferred mechanical packaging of 
volume servers, the size of data center is irrelevant.  What matters it the 
ability to co-locate or create locality associations that will enable 
efficient., secure access and communication between all of these resources.

A second order problem issue is how long or to what degree must legacy 
components be accommodated?   In the case of FCoE, the argument appears to 
center around if you replace all of your Ethernet equipment, then you can 
continue to support your traditional SAN deployment.   This is, in part, 
predicated upon the need for DCE being paramount within the data center so 
given it is fait accompli, simply ride that train and add a few gateway 
devices to bring your legacy SAN along for the ride.   The counter argument 
would be to replace the slow but dependable diesel generator train with a 
modern bullet train and cast off the legacy under the operating assumption 
that the overlap between the old vs. new will not be significant enough to 
warrant providing much more than a bridging solution for the glacial 
customer.    In either case, the functional changes in the fabric will take 
N years to occur making the how long question very relevant to what 
technologies should be invested in and gather momentum.  Both approaches 
have benefits to various company players as either leads to a replacement 
of equipment.  One might argue though that the weight or cost of making all 
of the components - new, legacy, etc. - interoperate will be significant no 
matter what so nothing is a done deal or slam dunk no brainer to pursue.


At 06:16 PM 4/26/2007, Julian Satran wrote:

>Excellent comments. My take (if not obvious from the previous text) is 
>that data centers will be very large and compute power (as evidenced by 
>the multicore) and advances in stack implementation are bound to improve 
>substantialy the performance of the protocol stacks (see Intel and our 
>work) and layer 3 switching.
>It is important also to point out that Ethernet has substantial latencies 
>if only bridging is using and replacement technologies (such as Rbridges 
>or others) may take some time to appear.
>Zack Best <>
>25/04/07 16:37
>RE: [Ips] Recent comments about FCoE and iSCSI
>The real debate here is between two types of networks.
>The first is reliable at the link level and does not
>drop packets under congestion.  The second is running
>a reliable transport protocol (i.e. TCP) over an
>unreliable link level network.
>I agree with the scaling argument.  For sufficiently
>large networks, reliable link level doesn't work well
>because network component failure, or chronically
>congested links are not handled well.  For
>sufficiently small networks, reliable link level has
>some significant advantages in simplicity, low
>hardware cost, performance, and worst case latency.
>My personal view is that the vast majority of
>enterprise storage networks fall in the "sufficiently
>small" category.  This view has to some extent been
>vindicated by the continuing success of Fibre Channel
>in this space and the inability of iSCSI to displace
>FC in any significant way for enterprise storage.  Of
>course, this may or may not change in the future.
>Whether FC is simpler than iSCSI depends largely on
>your definition of simplicity.  If one defines
>simplicity/complexity as the number of gates or lines
>of code to reduce the protocol to hardware or
>firmware, then my experience is that iSCSI is 2X to 3X
>the complexity of FC.  This has implications in cost
>and reliability.
>Particularly problematic with iSCSI is the
>unpredictability of the performance.  Performance is
>great with no packet drop.  However even a small
>amount of congestion can cause a sudden large drop and
>performance.  This can be difficult to predict as a
>network that is almost but not quite congested can run
>great, but a small incremental change of any sort can
>cause the performance to become suddenly unacceptable.
>For FC, or other protocol using link level flow
>control, the reduction in performance is much more
>graceful and incremental when the level of congestion
>is small and intermittent.
>A second major problem with iSCSI is the unbounded
>nature of worst case latency.  When a storage network
>fails, it is desirable to detect the failure in a
>fraction of a second and transition to a backup
>network.  TCP, when implemented to the standards, can
>take many seconds or minutes to determine that a
>network has failed and close the connection.  RFC
>2988, for instance, requires that the minimum
>retransmission be one second.  This means a single
>dropped packet may add one second to the latency of
>outstanding commands.  This is a huge amount of time
>on a 10G link.  No doubt this could be mitigated by
>drastically reducing the timeouts within TCP, but the
>market seems to be surprisingly resistant to tampering
>with accepted standards here.
>Overall, the FC and FCP protocol have a lot in common
>with the Intel i86 instruction set architecture.  They
>are overly complex, and rather poorly designed by
>modern standards.  But they are good enough, and there
>is a huge amount of value add that has been built on
>top of them, and therefore little incentive to change.
>FCoE is an interesting idea because it preserves 90%
>of the existing value add of FC, unifies the physical
>link with Ethernet, and uses the reliable link method
>of packet delivery.
>There are two significant possibilities for iSCSI to
>displace FC (or FCoE) in enterprise storage networks.
>First is if the networks start to scale to large
>enough size that FC can't be made sufficiently
>reliable, and second if CPU compute cycles become
>sufficiently cheap that the iSCSI protocol can be run
>in host software with no negative performance impact.
>Barring either of these, it seems that iSCSI will have
>an uphill battle, and FCoE may have a place.
>-----Original Message-----
>From: Julian Satran []
>Sent: Tuesday, April 24, 2007 3:10 PM
>Subject: [Ips] Recent comments about FCoE and iSCSI
>Dear All,
>The trade press is lately full with comments about the
>latest and greatest reincarnation of Fiber Channel
>over ethernet.
>It made me try and summarize all the long and hot
>debates that preceded the advent of iSCSI.
>Although FCoE proponents make it look like no debate
>preceded iSCSI that was not so - FCoE was considered
>even then and was dropped as a dumb idea.
>Here is a summary (as afar as I can remember) of the
>main arguments. They are not bad arguments even in
>retrospect and technically FCoE doesn't look better
>than it did then.
>Feel free to use this material in a nay form. I expect
>this group to seriously  expand my arguments and make
>them public - in personal or collective form.
>And do not forget - it is a technical dispute -
>although we all must have some doubts about the way it
>is pursued.
>What a piece of nostalgia :-)
>Around 1997 when a team at IBM Research (Haifa and
>Almaden) started looking at connecting storage to
>servers using the "regular network" (the ubiquitous
>LAN) we considered many alternatives (another team
>even had a look at ATM - still a computer network
>candidate at the time). I won't get you over all of
>our rationale (and we went over some of them again at
>the end of 1999 with a team from CISCO before we
>convened the first IETF BOF in 2000 at Adelaide that
>resulted in iSCSI and all the rest) but some of the
>reasons we choose to drop Fiber Channel over raw
>Ethernet where multiple:
>Fiber Channel Protocol (SCSI over Fiber Channel Link)
>is "mildly" effective because:
>it implements endpoints in a dedicated engine
>it has no transport layer (recovery is done at the
>application layer under the assumption that the error
>rate will be very low)
>the network is limited in physical span and logical
>span (number of switches)
>flow-control/congestion control is achieved with a
>mechanism adequate for a limited span network
>(credits). The packet loss rate is almost nil and that
>allows FCP to avoid using a transport (end-to-end)
>FCP she switches are simple (addresses are local and
>the memory requirements cam be limited through the
>credit mechanism)
>However FCP endpoints are inherently costlier than
>simple NICs – the cost argument (initiators are more
>The credit mechanisms is highly unstable for large
>networks (check switch vendors planning docs for the
>network diameter limits) – the scaling argument
>The assumption of low losses due to errors might
>radically change when moving from 1 to 10 Gb/s – the
>scaling argument
>Ethernet has no credit mechanism and any mechanism
>with a similar effect increases the end point cost.
>Building a transport layer in the protocol stack has
>always been the preferred choice of the networking
>community – the community argument
>The "performance penalty" of a complete protocol stack
>has always been overstated (and overrated). Advances
>in protocol stack implementation and finer tuning of
>the congestion control mechanisms make conventional
>TCP/IP performing well even at 10 Gb/s and over.
>Moreover the multicore processors that become dominant
>on the computing scene have enough compute cycles
>available to make any "offloading" possible as a mere
>code restructuring exercise (see the stack reports
>from Intel, IBM etc.)
>Building on a complete stack makes available a wealth
>of operational and management mechanisms built over
>the years by the networking community (routing,
>provisioning, security, service location etc.) – the
>community argument
>Higher level storage access over an IP network is
>widely available and having both block and file served
>over the same connection with the same support and
>management structure is compelling – the community
>Highly efficient networks are easy to build over IP
>with optimal (shortest path) routing while Layer 2
>networks use bridging and are limited by the logical
>tree structure that bridges must follow. The effort to
>combine routers and bridges (rbridges) is promising to
>change that but it will take some time to finalize
>(and we don't know exactly how it will operate).
>Untill then the scale of Layer 2 network is going to
>seriously limited – the scaling argument
>As a side argument – a performance comparison made in
>1998 showed SCSI over TCP (a predecessor of the later
>iSCSI) to perform better than FCP at 1Gbs for block
>sizes typical for OLTP (4-8KB). That was what
>convinced us to take the path that lead to iSCSI – and
>we used plain vanilla x86 servers with plain-vanilla
>NICs and Linux (with similar measurements conducted on
>The networking and storage community acknowledged
>those arguments and developed iSCSI and the companion
>protocols for service discovery, boot etc.
>The community also acknowledged the need to support
>existing infrastructure and extend it in a reasonable
>fashion and developed 2 protocols iFCP (to support
>hosts with FCP drivers and IP connections to connect
>to storage by a simple conversion from FCP to TCP
>packets) FCPIP to extend the reach of FCP through IP
>(connects FCP islands through TCP links). Both have
>implemented and their foundation is solid.
>The current attempt of developing a "new-age" FCP over
>an Ethernet link is going against most of the
>arguments that have given us iSCSI etc.
>It ignores the networking layering practice, build an
>application protocol directly above a link and thus
>limits scaling, mandates elements at the link layer
>and application layer that make applications more
>expensive and leaves aside the whole "ecosystem" that
>accompanies TCP/IP (and not Ethernet).
>In some related effort (and at a point also when
>developing iSCSI) we considered also moving away from
>SCSI (like some "no standardized" but popular in some
>circles software did – e.g., NBP) but decided against.
>SCSI is a mature and well understood access
>architecture for block storage and is implemented by
>many device vendors. Moving away from it would not
>have been justified at the time.
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around
>Ips mailing list
>Ips mailing list
Ips mailing list