[Ips] RE: iSCSI over SCTP (RDMA or not)

"Caitlin Bestler" <caitlinb@siliquent.com> Mon, 14 March 2005 17:15 UTC

Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA24584 for <ips-web-archive@ietf.org>; Mon, 14 Mar 2005 12:15:09 -0500 (EST)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DAtDW-0003Gm-32 for ips-web-archive@ietf.org; Mon, 14 Mar 2005 12:18:58 -0500
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DAt7f-0004fi-OY; Mon, 14 Mar 2005 12:12:55 -0500
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DAt7d-0004fX-B5 for ips@megatron.ietf.org; Mon, 14 Mar 2005 12:12:53 -0500
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA24258 for <ips@ietf.org>; Mon, 14 Mar 2005 12:12:49 -0500 (EST)
Received: from [67.118.4.34] (helo=fiona.siliquent.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DAtBH-00036s-HM for ips@ietf.org; Mon, 14 Mar 2005 12:16:40 -0500
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 14 Mar 2005 09:15:09 -0800
X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0
Message-ID: <8508251A6FC08A489844A94261D3693A038FD9@fiona.siliquent.com>
Thread-Topic: iSCSI over SCTP (RDMA or not)
Thread-Index: AcUn7ulYf5TQpquQS9iEPLFBAF8q7wAxk8kg
From: Caitlin Bestler <caitlinb@siliquent.com>
To: ips@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 5d7a7e767f20255fce80fa0b77fb2433
Content-Transfer-Encoding: quoted-printable
Subject: [Ips] RE: iSCSI over SCTP (RDMA or not)
X-BeenThere: ips@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IP Storage <ips.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ips>, <mailto:ips-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ips@ietf.org>
List-Help: <mailto:ips-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ips>, <mailto:ips-request@ietf.org?subject=subscribe>
Sender: ips-bounces@ietf.org
Errors-To: ips-bounces@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: b22590c27682ace61775ee7b453b40d3
Content-Transfer-Encoding: quoted-printable

Defining message oriented ULPs, even those that use
multiple connections per session, to be transport
neutral between TCP and SCTP is actually fairly
easy, and is not really different for RDMA or
non-RDMA applications.

And if anyone happens to be curious, the steps
necessary to be neutral between TCP and SCTP
also happen to allow the same applications to
be ported to other message oriented reliable
transport protocols (whether they suppport 
RDMA or not).

The key is differentiating between 'connection'
setup and message exchange.

The message exchange part is simple. If the ULP
sends and receives messages, it can do so over
SCTP or TCP. It just needs to define how messages
are delineated over TCP, and then not bother with
that delineation over SCTP.

I don't claim to be a language-lawyer on iSCSI,
but my understanding of an iSCSI PDU conforms to
this requirement. It is a message that can be
carried in a specified TCP delineation, or in 
an SCTP user message. TCP specifics, like periodic
marker insertion, simply would not apply to the
SCTP mapping.

RDMA support is simply a matter of supporting
RDMA Writes and RDMA Reads in the messages,
instead of just Sending/Receiving 'untagged'
messages.


That leaves the 'connection' setup.

The SCTP adaptation for RDMAP/RDDP assumes that
a pair of unidirectional SCTP streams together
form the equivalent of a TCP connection, and
that the SCTP association is created when there
is at least one "connection" required between
the same two endpoints. Multiple associations
can be created when the connections should not
be shared. Different applications definitely
qualify. Whether or not to create multiple
SCTP associations for multiple iSCSI sessions
is an interesting question, but probably
somehthing that can be left to the discretion
of individual implementers.

The next step is that startup-negotiations 
have to be converted to a message basis. The
practice encouraged in the RDMA applicability
draft is to perform startup-negotiations in
"private data" exchanges during 'connection
startup'. For RDMA, 'private data' exchanges
are supported by both MPA/TCP and the SCTP
mapping. With the SCTP mapping, the private
data exchanges are conducted on a per-stream-pair
basis to establish the 'connected' state on 
that stream-pair (and associate the SCTP stream
with an RDMA endpoint).

When a single 'private data' exchange is not
adequate (which is probable for iSCSI) then 
the first exchange should be used to enable
messaging mode over the 'connection'. The
'connection' then completes the required
negotiations to fully enable the 'connection'.
For RDMA connections the key requirement is 
that the first exchange, using 'private data',
is adequate to properly configure the RDMA
endpoint. Essentially, this requires selecting
the proper QP that is associated with the 
correct Protection Domain and any Shared
Receive Queue. That implies that the first
private-data exchange must identify when
this 'connection' is part of an existing
session, and if it is to establish a new
session it should identify the user (even
if this identification is not yet validated).

There is an important caveat on mixing 
RDMA and non-RDMA traffic over SCTP. In order
to facilitate offload of the SCTP stack to
the RNIC, the RDMA/SCTP specification explicitly
requires that each SCTP association either be
in the RDMA mode or not.

So an iSCSI application that wants to establish
an iSCSI/iSER/SCTP session would attempt to 
establish an RDMA enabled SCTP association.
If the RDMA adaptation was rejected, it would
ask for establish a plain SCTP association
instead with the intent of running iSCSI/SCTP.
Or it could fall back to iSCSI/iSER/TCP or
iSCSI/TCP, at its discretion.

In summary, an application can make itself
transport neutral between TCP and SCTP by
working on a message-oriented basis, and 
ensuring that the methods to delineate
message in TCP are clearly labeled as 
such, and that session establishment
exchanges also be conducted as a series
of message exchanges where the first
exchange uses private-data if RDMA support
is desired (and that certain critical 
information be supplied in the *first*
exchange).

This approach should work for RDMA and
non-RDMA, and have the benefit of consolidating
multiple connecitons into a single association.



Caitlin Bestler
caitlinb@siliquent.com


_______________________________________________
Ips mailing list
Ips@ietf.org
https://www1.ietf.org/mailman/listinfo/ips