RE: [Ips] Recent comments about FCoE and iSCSI

Julian Satran <Julian_Satran@il.ibm.com> Fri, 27 April 2007 01:16 UTC

Return-path: <ips-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HhF4a-0002qk-4h; Thu, 26 Apr 2007 21:16:32 -0400
Received: from ips by megatron.ietf.org with local (Exim 4.43) id 1HhF4Y-0002U0-9r for ips-confirm+ok@megatron.ietf.org; Thu, 26 Apr 2007 21:16:30 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HhF4X-0002O9-OC for ips@ietf.org; Thu, 26 Apr 2007 21:16:29 -0400
Received: from mtagate1.de.ibm.com ([195.212.29.150]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HhF4T-0000RB-C6 for ips@ietf.org; Thu, 26 Apr 2007 21:16:29 -0400
Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate1.de.ibm.com (8.13.8/8.13.8) with ESMTP id l3R1GO2R110416 for <ips@ietf.org>; Fri, 27 Apr 2007 01:16:24 GMT
Received: from d12av02.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v8.3) with ESMTP id l3R1GOT44096242 for <ips@ietf.org>; Fri, 27 Apr 2007 03:16:24 +0200
Received: from d12av02.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l3R1GOmg006623 for <ips@ietf.org>; Fri, 27 Apr 2007 03:16:24 +0200
Received: from d12mc102.megacenter.de.ibm.com (d12mc102.megacenter.de.ibm.com [9.149.167.114]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id l3R1GO8G006620; Fri, 27 Apr 2007 03:16:24 +0200
In-Reply-To: <179300.47262.qm@web36705.mail.mud.yahoo.com>
To: Zack Best <zbest28@yahoo.com>
MIME-Version: 1.0
Subject: RE: [Ips] Recent comments about FCoE and iSCSI
X-Mailer: Lotus Notes Release 7.0 HF277 June 21, 2006
From: Julian Satran <Julian_Satran@il.ibm.com>
Message-ID: <OF163C49D3.8556B9C6-ON852572CA.00061041-852572CA.0006FD27@il.ibm.com>
Date: Thu, 26 Apr 2007 21:16:21 -0400
X-MIMETrack: Serialize by Router on D12MC102/12/M/IBM(Release 7.0.2HF71 | November 3, 2006) at 27/04/2007 04:16:23, Serialize complete at 27/04/2007 04:16:23
X-Spam-Score: 0.5 (/)
X-Scan-Signature: c96e11e58076fc8e92061fb6cbdfae15
Cc: ips@ietf.org
X-BeenThere: ips@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IP Storage <ips.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ips>, <mailto:ips-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ips@ietf.org>
List-Help: <mailto:ips-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ips>, <mailto:ips-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1988015737=="
Errors-To: ips-bounces@ietf.org

Excellent comments. My take (if not obvious from the previous text) is 
that data centers will be very large and compute power (as evidenced by 
the multicore) and advances in stack implementation are bound to improve 
substantialy the performance of the protocol stacks (see Intel and our 
work) and layer 3 switching.
It is important also to point out that Ethernet has substantial latencies 
if only bridging is using and replacement technologies (such as Rbridges 
or others) may take some time to appear.

Julo



Zack Best <zbest28@yahoo.com> 
25/04/07 16:37

To
ips@ietf.org
cc

Subject
RE: [Ips] Recent comments about FCoE and iSCSI






The real debate here is between two types of networks.
 The first is reliable at the link level and does not
drop packets under congestion.  The second is running
a reliable transport protocol (i.e. TCP) over an
unreliable link level network.

I agree with the scaling argument.  For sufficiently
large networks, reliable link level doesn't work well
because network component failure, or chronically
congested links are not handled well.  For
sufficiently small networks, reliable link level has
some significant advantages in simplicity, low
hardware cost, performance, and worst case latency.

My personal view is that the vast majority of
enterprise storage networks fall in the "sufficiently
small" category.  This view has to some extent been
vindicated by the continuing success of Fibre Channel
in this space and the inability of iSCSI to displace
FC in any significant way for enterprise storage.  Of
course, this may or may not change in the future.

Whether FC is simpler than iSCSI depends largely on
your definition of simplicity.  If one defines
simplicity/complexity as the number of gates or lines
of code to reduce the protocol to hardware or
firmware, then my experience is that iSCSI is 2X to 3X
the complexity of FC.  This has implications in cost
and reliability.

Particularly problematic with iSCSI is the
unpredictability of the performance.  Performance is
great with no packet drop.  However even a small
amount of congestion can cause a sudden large drop and
performance.  This can be difficult to predict as a
network that is almost but not quite congested can run
great, but a small incremental change of any sort can
cause the performance to become suddenly unacceptable.
 For FC, or other protocol using link level flow
control, the reduction in performance is much more
graceful and incremental when the level of congestion
is small and intermittent.

A second major problem with iSCSI is the unbounded
nature of worst case latency.  When a storage network
fails, it is desirable to detect the failure in a
fraction of a second and transition to a backup
network.  TCP, when implemented to the standards, can
take many seconds or minutes to determine that a
network has failed and close the connection.  RFC
2988, for instance, requires that the minimum
retransmission be one second.  This means a single
dropped packet may add one second to the latency of
outstanding commands.  This is a huge amount of time
on a 10G link.  No doubt this could be mitigated by
drastically reducing the timeouts within TCP, but the
market seems to be surprisingly resistant to tampering
with accepted standards here.

Overall, the FC and FCP protocol have a lot in common
with the Intel i86 instruction set architecture.  They
are overly complex, and rather poorly designed by
modern standards.  But they are good enough, and there
is a huge amount of value add that has been built on
top of them, and therefore little incentive to change.
 FCoE is an interesting idea because it preserves 90%
of the existing value add of FC, unifies the physical
link with Ethernet, and uses the reliable link method
of packet delivery.

There are two significant possibilities for iSCSI to
displace FC (or FCoE) in enterprise storage networks. 
First is if the networks start to scale to large
enough size that FC can't be made sufficiently
reliable, and second if CPU compute cycles become
sufficiently cheap that the iSCSI protocol can be run
in host software with no negative performance impact. 
Barring either of these, it seems that iSCSI will have
an uphill battle, and FCoE may have a place.

 -----Original Message-----
From: Julian Satran [mailto:Julian_Satran@il.ibm.com]
Sent: Tuesday, April 24, 2007 3:10 PM
To: ips@ietf.org
Subject: [Ips] Recent comments about FCoE and iSCSI



Dear All, 

The trade press is lately full with comments about the
latest and greatest reincarnation of Fiber Channel
over ethernet. 
It made me try and summarize all the long and hot
debates that preceded the advent of iSCSI. 
Although FCoE proponents make it look like no debate
preceded iSCSI that was not so - FCoE was considered
even then and was dropped as a dumb idea. 

Here is a summary (as afar as I can remember) of the
main arguments. They are not bad arguments even in
retrospect and technically FCoE doesn't look better
than it did then. 

Feel free to use this material in a nay form. I expect
this group to seriously  expand my arguments and make
them public - in personal or collective form. 

And do not forget - it is a technical dispute -
although we all must have some doubts about the way it
is pursued. 

Regards, 
Julo 

---------------------------------------------------------------------


What a piece of nostalgia :-) 

Around 1997 when a team at IBM Research (Haifa and
Almaden) started looking at connecting storage to
servers using the "regular network" (the ubiquitous
LAN) we considered many alternatives (another team
even had a look at ATM - still a computer network
candidate at the time). I won't get you over all of
our rationale (and we went over some of them again at
the end of 1999 with a team from CISCO before we
convened the first IETF BOF in 2000 at Adelaide that
resulted in iSCSI and all the rest) but some of the
reasons we choose to drop Fiber Channel over raw
Ethernet where multiple: 

Fiber Channel Protocol (SCSI over Fiber Channel Link)
is "mildly" effective because: 
it implements endpoints in a dedicated engine
(Offload) 
it has no transport layer (recovery is done at the
application layer under the assumption that the error
rate will be very low) 
the network is limited in physical span and logical
span (number of switches) 
flow-control/congestion control is achieved with a
mechanism adequate for a limited span network
(credits). The packet loss rate is almost nil and that
allows FCP to avoid using a transport (end-to-end)
layer
FCP she switches are simple (addresses are local and
the memory requirements cam be limited through the
credit mechanism) 
However FCP endpoints are inherently costlier than
simple NICs ? the cost argument (initiators are more
expensive) 
The credit mechanisms is highly unstable for large
networks (check switch vendors planning docs for the
network diameter limits) ? the scaling argument 
The assumption of low losses due to errors might
radically change when moving from 1 to 10 Gb/s ? the
scaling argument 
Ethernet has no credit mechanism and any mechanism
with a similar effect increases the end point cost.
Building a transport layer in the protocol stack has
always been the preferred choice of the networking
community ? the community argument 
The "performance penalty" of a complete protocol stack
has always been overstated (and overrated). Advances
in protocol stack implementation and finer tuning of
the congestion control mechanisms make conventional
TCP/IP performing well even at 10 Gb/s and over.
Moreover the multicore processors that become dominant
on the computing scene have enough compute cycles
available to make any "offloading" possible as a mere
code restructuring exercise (see the stack reports
from Intel, IBM etc.) 
Building on a complete stack makes available a wealth
of operational and management mechanisms built over
the years by the networking community (routing,
provisioning, security, service location etc.) ? the
community argument 
Higher level storage access over an IP network is
widely available and having both block and file served
over the same connection with the same support and
management structure is compelling ? the community
argument 
Highly efficient networks are easy to build over IP
with optimal (shortest path) routing while Layer 2
networks use bridging and are limited by the logical
tree structure that bridges must follow. The effort to
combine routers and bridges (rbridges) is promising to
change that but it will take some time to finalize
(and we don't know exactly how it will operate).
Untill then the scale of Layer 2 network is going to
seriously limited ? the scaling argument


As a side argument ? a performance comparison made in
1998 showed SCSI over TCP (a predecessor of the later
iSCSI) to perform better than FCP at 1Gbs for block
sizes typical for OLTP (4-8KB). That was what
convinced us to take the path that lead to iSCSI ? and
we used plain vanilla x86 servers with plain-vanilla
NICs and Linux (with similar measurements conducted on
Windows). 
The networking and storage community acknowledged
those arguments and developed iSCSI and the companion
protocols for service discovery, boot etc. 

The community also acknowledged the need to support
existing infrastructure and extend it in a reasonable
fashion and developed 2 protocols iFCP (to support
hosts with FCP drivers and IP connections to connect
to storage by a simple conversion from FCP to TCP
packets) FCPIP to extend the reach of FCP through IP
(connects FCP islands through TCP links). Both have
been 
implemented and their foundation is solid. 

The current attempt of developing a "new-age" FCP over
an Ethernet link is going against most of the
arguments that have given us iSCSI etc. 

It ignores the networking layering practice, build an
application protocol directly above a link and thus
limits scaling, mandates elements at the link layer
and application layer that make applications more
expensive and leaves aside the whole "ecosystem" that
accompanies TCP/IP (and not Ethernet). 

In some related effort (and at a point also when
developing iSCSI) we considered also moving away from
SCSI (like some "no standardized" but popular in some
circles software did ? e.g., NBP) but decided against.
SCSI is a mature and well understood access
architecture for block storage and is implemented by
many device vendors. Moving away from it would not
have been justified at the time. 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



_______________________________________________
Ips mailing list
Ips@ietf.org
https://www1.ietf.org/mailman/listinfo/ips

_______________________________________________
Ips mailing list
Ips@ietf.org
https://www1.ietf.org/mailman/listinfo/ips