Re: [storm] iSER update request from Bob Russell

"Mallikarjun Chadalapaka" <cbm@chadalapaka.com> Tue, 03 November 2009 02:01 UTC

Return-Path: <cbm@chadalapaka.com>
X-Original-To: storm@core3.amsl.com
Delivered-To: storm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D4C4D3A682F for <storm@core3.amsl.com>; Mon, 2 Nov 2009 18:01:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.109
X-Spam-Level:
X-Spam-Status: No, score=-1.109 tagged_above=-999 required=5 tests=[BAYES_05=-1.11, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AIcijKuA6fHi for <storm@core3.amsl.com>; Mon, 2 Nov 2009 18:01:20 -0800 (PST)
Received: from blu0-omc2-s21.blu0.hotmail.com (blu0-omc2-s21.blu0.hotmail.com [65.55.111.96]) by core3.amsl.com (Postfix) with ESMTP id DA9193A6873 for <storm@ietf.org>; Mon, 2 Nov 2009 18:01:19 -0800 (PST)
Received: from BLU136-DS4 ([65.55.111.71]) by blu0-omc2-s21.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 2 Nov 2009 18:01:39 -0800
X-Originating-IP: [15.251.201.73]
X-Originating-Email: [cbm@chadalapaka.com]
Message-ID: <BLU136-DS47B300208E821F9829B7FA0B20@phx.gbl>
From: Mallikarjun Chadalapaka <cbm@chadalapaka.com>
To: Black_David@emc.com, Michael@huaweisymantec.com, storm@ietf.org
References: <E265A5696240423CB3579D112E454FDB@china.huawei.com> <9FA859626025B64FBC2AF149D97C944A040BF6DA@CORPUSMX80A.corp.emc.com>
In-Reply-To: <9FA859626025B64FBC2AF149D97C944A040BF6DA@CORPUSMX80A.corp.emc.com>
Date: Mon, 02 Nov 2009 18:01:38 -0800
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0001_01CA5BE6.86DE9420"
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Acoh5RBcsDJYU9J2RJKjhUKC/xsL6As2+Q9wA1kbqBA=
Content-Language: en-us
X-OriginalArrivalTime: 03 Nov 2009 02:01:39.0958 (UTC) FILETIME=[952BF160:01CA5C29]
Subject: Re: [storm] iSER update request from Bob Russell
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Nov 2009 02:01:30 -0000

As I recall, I believe the ZBVA issue was the other way around.  When we
defined it, ZBVA was assumed to be a native feature of all iWARP
implementations.   The reasoning was the following as I recall:

 

-          SCSI, Fibre Channel and iSCSI traditionally transfer only the
buffer/block *offset* into the I/O address space for that I/O and that model
had worked well in doing high-performance offload implementations.  No
reason for iWARP to support a different model requiring the base address
exchange.

-          Starting VA of an STag can be managed in the local state of an
STag for a data source or data sink buffer, rather than transferring it
across the wire to the other side and get it back.

-           There's a value in keeping the iSER header small as its size
dictates the size of each buffer in the anonymous buffer pool attached to an
RQ.  Smaller size means that for a given size of a shared RQ buffer pool in
a PD/session, a target implementation can support a larger iSCSI "queue
depth" (larger MaxCmdSN).

 

So given what I remember anyways, I am actually surprised about the
assertion that no iWARP implementations support ZBVA addressing.   Perhaps
iWARP vendors can chime in here?

 

Mallikarjun

 

From: storm-bounces@ietf.org [mailto:storm-bounces@ietf.org] On Behalf Of
Black_David@emc.com
Sent: Friday, October 16, 2009 5:38 PM
To: Michael@huaweisymantec.com; storm@ietf.org
Subject: Re: [storm] iSER update request from Bob Russell

 

There's a lot of material in here.  With WG chair hat off, my

opinions are:

 

- If all the implementations start the connection immediately

    in RDMA mode, then the RFC should be revised to reflect that

    "running code".  I do hope the initial MPA Request and Reply

    frames are being used.

- If all the implementations are using the Expanded iSER header,

    then the RFC ought to be revised to reflect "running code".

- ZBVA was originally left out of the RFC because it was thought

    to be IB-specific.  If that concept also applies to iWARP,

    it belongs in the new iSER RFC ("running code" again).

Thanks,
--David

 

 

  _____  

From: storm-bounces@ietf.org [mailto:storm-bounces@ietf.org] On Behalf Of
Mike Ko
Sent: Thursday, August 20, 2009 6:25 PM
To: STORM
Subject: [storm] iSER update request from Bob Russell

On March 22, 2009, Bob Russell posted the following at the IPS mailing list.
I have embedded my responses below prefixed by <mk>.

 

Mike

 

-----  from Bob Russell  -----

 

There are 2 issues I would like to suggest for discussion at the BOF
meeting later this week.  Both have to do with the iSER spec, RFC 5046.

 

1. At the present time, as far as I know, no existing hardware,
   neither Infiniband nor iWARP, is capable of opening a connection
   in "normal" TCP mode and then transitioning it into zero-copy mode.
   Unfortunately, the iSER spec requires that.
   Can't we just replace that part of the iSER spec?
   Otherwise, all hardware and all implementations are non-standard.

 

<mk> When I wrote the Supplement to Infiniband Architecture Specification
Annex A12 (Support for iSCSI Extensions for RDMA), there was no mention of
transitioning the connection from TCP mode to RDMA mode.  I can update
update RFC 5046 to remove this requirement if that is the consensus of the
group.

 

2. The OFED stack is used to access both Infiniband and iWARP hardware.
   This software requires 2 extra 64-bit fields for addressing
   on both Infiniband and iWARP hardware, but these fields
   are not allowed for in the current iSER Header Format.
   Can't we just add those extra fields to the iSER spec?
   If someday some other implementation doesn't need those
   fields, they can be just set to 0 (which is what is implied by
   the current iSER standard anyway).  Again, by not doing this,
   all implementations are non-standard.

 

<mk> I assume you are suggesting that we should take the "Expanded iSER
Header for Supporting Virtual Address" as defined in table 4 of Annex A12
and update RFC 5046 accordingly.  Again, I am fine with this if that is the
consensus of the group.

 

In other words, I'm suggesting that we consider replacing the relevant
parts of the current iSER specs with the current OFED specs on these
2 issues.

 

Thanks for your consideration,
Bob Russell

 

Note: The following (old) posting by Mike Ko states that the
extra header fields are needed only by IB, not by IETF
(i.e., iWARP), because IB uses nZBVA, whereas iWARP uses ZBVA.

 

But are there any IETF/iWARP implementations out there that actually
use ZBVA with iWARP RNICs?  (I don't mean software simulations of
the iWARP protocol.)  We have built an iSER implementation that
uses the OFED stack to access both IB and iWARP hardware, and for
both of them we need to use the extra iSER header fields (nZBVA).
Perhaps this is an issue with the design of the OFED stack, which
was built primarily to access IB hardware and therefore reflects
the needs of the IB hardware.  But we found that the only way to
access iWARP hardware via the OFED stack was to used the expanded
(nZBVA) iSER header (and to use a meaningful value in the extra field,
NOT to just set it to zero).

 

In any case, rather than have 2 different versions of the iSER header,
it would be better to have just one, regardless of the underlying
technology involved (after all, isn't that what a standard is for??).
This is especially relevant when using the OFED stack, because,
as we have demonstrated, software built on top of the OFED stack can
(AND SHOULD!) be able to run with EITHER IB or iWARP hardware,
with NO change to that software.  Having 2 different iSER headers
does NOT make that possible!

 

 

 

    2008/4/15 Mike Ko <mako at almaden.ibm.com>:

 

    VA is a concept introduced in an Infiniband annex to support iSER. It
    appears in the expanded iSER header for Infiniband use only to support
the
    non-Zero Based Virtual Address (non-ZBVA) used in Infiniband vs the ZBVA
    used in IETF.

 

Mike - could you please put me in contact with someone who has actually
implemented iSER on top of IETF/iWARP hardware NICs using ZBVA?

 

<mk> Perhaps vendors who have built iSER stacks can comment on this.


    "The DataDescriptorOut describes the I/O buffer starting with the
immediate
    unsolicited data (if any), followed by the non-immediate unsolicited
data
    (if any) and solicited data." If non-ZBVA mode is used, then VA points
to
    the beginning of this buffer. So in your example, the VA field in the
    expanded iSER header will be zero. Note that for IETF, ZBVA is assumed
and
    there is no provision to specify a different VA in the iSER header.

 

Mike - I believe this VA field in the expanded iSER header is almost
NEVER zero -- it is always an actual virtual address.

 

<mk> So if the OFED iSER stack is based on Annex 12, then it already has the
means to select ZBVA or non-ZBVA as specified in the "iSER CM REQ Message
Private Data Format" of table 2.  So rather than having to change the OFED
implementation and update the Infiniband Annex, I suggest that we leave the
ZBVA/non-ZBVA option alone even though ZBVA is never used as you said.


    Tagged offset (TO) refers to the offset within a tagged buffer in RDMA
Write
    and RDMA Read Request Messages. When sending non-immediate unsolicited
    data, Send Message types are used and the TO field is not present.
Instead,
    the buffer offset is appropriately represented by the Buffer Offset
field in
    the SCSI Data-Out PDU. Note that Tagged Offset is not the same as write
VA
    and it does not appear in the iSER header.

 

    Mike