Re: [rddp] [Ips] Storage Maintenance (storm) BOF reminder & requests

Bernard Metzler <BMT@zurich.ibm.com> Tue, 24 March 2009 15:26 UTC

Return-Path: <BMT@zurich.ibm.com>
X-Original-To: rddp@core3.amsl.com
Delivered-To: rddp@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9C04A28C103; Tue, 24 Mar 2009 08:26:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.598
X-Spam-Level:
X-Spam-Status: No, score=-6.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hQRF3stOQRGr; Tue, 24 Mar 2009 08:26:00 -0700 (PDT)
Received: from mtagate6.de.ibm.com (mtagate6.de.ibm.com [195.212.29.155]) by core3.amsl.com (Postfix) with ESMTP id ED95128C0D0; Tue, 24 Mar 2009 08:25:59 -0700 (PDT)
Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate6.de.ibm.com (8.14.3/8.13.8) with ESMTP id n2OFQXtG643090; Tue, 24 Mar 2009 15:26:33 GMT
Received: from d12av06.megacenter.de.ibm.com (d12av06.megacenter.de.ibm.com [9.149.165.230]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n2OFQX7F2846914; Tue, 24 Mar 2009 16:26:33 +0100
Received: from d12av06.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av06.megacenter.de.ibm.com (8.13.1/8.13.3) with ESMTP id n2OFQWYU009319; Tue, 24 Mar 2009 16:26:33 +0100
Received: from d12ml302.megacenter.de.ibm.com (d12ml302.megacenter.de.ibm.com [9.149.166.18]) by d12av06.megacenter.de.ibm.com (8.13.1/8.12.11) with ESMTP id n2OFQWSD009315; Tue, 24 Mar 2009 16:26:32 +0100
In-Reply-To: <Pine.LNX.4.64.0903221139330.17377@postal.iol.unh.edu>
To: "Robert D. Russell" <rdr@iol.unh.edu>
MIME-Version: 1.0
X-Mailer: Lotus Notes Release 7.0.1 January 17, 2006
From: Bernard Metzler <BMT@zurich.ibm.com>
Message-ID: <OF4342F229.A9B5BAAC-ONC1257582.005CFA83-88257583.0054D2A0@ch.ibm.com>
Date: Tue, 24 Mar 2009 07:26:29 -0800
X-MIMETrack: Serialize by Router on D12ML302/12/M/IBM(Release 8.0.1|February 07, 2008) at 24/03/2009 16:26:32, Serialize complete at 24/03/2009 16:26:32
Content-Type: multipart/alternative; boundary="=_alternative 005E89DCC1257582_="
Cc: imss@ietf.org, ips@ietf.org, rddp-bounces@ietf.org, Black_David@emc.com, rddp@ietf.org
Subject: Re: [rddp] [Ips] Storage Maintenance (storm) BOF reminder & requests
X-BeenThere: rddp@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "IETF Remote Direct Data Placement \(rddp\) WG" <rddp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/rddp>, <mailto:rddp-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rddp>
List-Post: <mailto:rddp@ietf.org>
List-Help: <mailto:rddp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rddp>, <mailto:rddp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Mar 2009 15:26:02 -0000

Hi Robert,

an RNIC would be happy to transition an established socket
into RDMA mode if the user wants to. No problem - as far as i know -
to transition from TOE mode to RDMA mode at least. It is a question 
of envrionmental support to allow that. To allow that,
the environment (aka OFED) wold have to integrate late
socket translation, which allows RNIC drivers to officially do
TOE to RDMA mode transition or to take over a socket 
(where the OS would have to expose a well-defined TCP
context reallocation scheme).
Maybe it is not a good idea to change an end-to-end protocol
because a host environment has functional limitations
which may stem from historical transport (IB) limitations or
OS considerations.

many thanks,
bernard.

rddp-bounces@ietf.org wrote on 03/22/2009 04:41:55 PM:

> David:
> 
> There are 2 issues I would like to suggest for discussion at the BOF
> meeting later this week.  Both have to do with the iSER spec, RFC 5046.
> 
> 1. At the present time, as far as I know, no existing hardware,
>     neither Infiniband nor iWARP, is capable of opening a connection
>     in "normal" TCP mode and then transitioning it into zero-copy mode.
>     Unfortunately, the iSER spec requires that.
>     Can't we just replace that part of the iSER spec?
>     Otherwise, all hardware and all implementations are non-standard.
> 
> 2. The OFED stack is used to access both Infiniband and iWARP hardware.
>     This software requires 2 extra 64-bit fields for addressing
>     on both Infiniband and iWARP hardware, but these fields
>     are not allowed for in the current iSER Header Format.
>     Can't we just add those extra fields to the iSER spec?
>     If someday some other implementation doesn't need those
>     fields, they can be just set to 0 (which is what is implied by
>     the current iSER standard anyway).  Again, by not doing this,
>     all implementations are non-standard.
> 
> In other words, I'm suggesting that we consider replacing the relevant
> parts of the current iSER specs with the current OFED specs on these
> 2 issues.
> 
> Thanks for your consideration,
> Bob Russell
> 
> Note: The following (old) posting by Mike Ko states that the
> extra header fields are needed only by IB, not by IETF
> (i.e., iWARP), because IB uses nZBVA, whereas iWARP uses ZBVA.
> 
> But are there any IETF/iWARP implementations out there that actually
> use ZBVA with iWARP RNICs?  (I don't mean software simulations of
> the iWARP protocol.)  We have built an iSER implementation that
> uses the OFED stack to access both IB and iWARP hardware, and for
> both of them we need to use the extra iSER header fields (nZBVA).
> Perhaps this is an issue with the design of the OFED stack, which
> was built primarily to access IB hardware and therefore reflects
> the needs of the IB hardware.  But we found that the only way to
> access iWARP hardware via the OFED stack was to used the expanded
> (nZBVA) iSER header (and to use a meaningful value in the extra field,
> NOT to just set it to zero).
> 
> In any case, rather than have 2 different versions of the iSER header,
> it would be better to have just one, regardless of the underlying
> technology involved (after all, isn't that what a standard is for??).
> This is especially relevant when using the OFED stack, because,
> as we have demonstrated, software built on top of the OFED stack can
> (AND SHOULD!) be able to run with EITHER IB or iWARP hardware,
> with NO change to that software.  Having 2 different iSER headers
> does NOT make that possible!
> 
> 
> > 2008/4/15 Mike Ko <mako at almaden.ibm.com>:
> >
> > VA is a concept introduced in an Infiniband annex to support iSER. It
> > appears in the expanded iSER header for Infiniband use only to support 
the
> > non-Zero Based Virtual Address (non-ZBVA) used in Infiniband vs the 
ZBVA
> > used in IETF.
> 
> Mike - could you please put me in contact with someone who has actually
> implemented iSER on top of IETF/iWARP hardware NICs using ZBVA?
> 
> >
> > "The DataDescriptorOut describes the I/O buffer starting with the 
immediate
> > unsolicited data (if any), followed by the non-immediate unsolicited 
data
> > (if any) and solicited data." If non-ZBVA mode is used, then VA points 
to
> > the beginning of this buffer. So in your example, the VA field in the
> > expanded iSER header will be zero. Note that for IETF, ZBVA is assumed 
and
> > there is no provision to specify a different VA in the iSER header.
> 
> Mike - I believe this VA field in the expanded iSER header is almost
> NEVER zero -- it is always an actual virtual address.
> 
> >
> > Tagged offset (TO) refers to the offset within a tagged buffer in RDMA 
Write
> > and RDMA Read Request Messages. When sending non-immediate unsolicited
> > data, Send Message types are used and the TO field is not present. 
Instead,
> > the buffer offset is appropriately represented by the Buffer Offset 
field in
> > the SCSI Data-Out PDU. Note that Tagged Offset is not the same as 
write VA
> > and it does not appear in the iSER header.
> >
> >Mike
> 
> On Wed, 11 Mar 2009, Black_David@emc.com wrote:
> 
> > This is a reminder that the Storage Maintenance BOF will
> > be held in about 2 weeks at the IETF meetings in San Francisco.
> > Please plan to attend if you're interested:
> >
> > THURSDAY, March 26, 2009
> > Continental 1&2     TSV     storm      Storage Maintenance BOF
> >
> > The BOF description is at:
> > http://www.ietf.org/mail-archive/web/ips/current/msg02669.html
> >
> > The initial agenda is here:
> > http://www.ietf.org/mail-archive/web/ips/current/msg02670.html
> >
> > I'm going to go upload that initial agenda as the BOF agenda,
> > and it can be bashed at the meeting.
> >
> > The primary purpose of this BOF is to answer two questions:
> > (1) What storage maintenance work (IP Storage, Remote Direct
> >    Data Placement) should be done?
> > (2) Should an IETF Working Group be formed to undertake that
> >    work?
> >
> > Everyone gets to weigh in on these decisions, even those who
> > can't attend the BOF meeting.  Anyone who thinks that there is
> > work that should be done, and who cannot come to the BOF meeting
> > should say so on the IPS or RDDP mailing lists (and it'd be a
> > good idea for those who can come to do this).  As part of the
> > email, please indicate how you're interested in helping (author
> > or co-author of specific drafts, promise to review and comment
> > on specific drafts).
> >
> > Here's a summary of the initial draft list of work items:
> > - iSCSI: Combine RFCs into one document, removing unused features.
> > - iSCSI: Interoperability report on what has been implemented and
> >    interoperates in support of Draft Standard status for iSCSI.
> > - iSCSI: Add backwards-compatible features to support SAM-4.
> > - iFCP: The Address Translation mode of iFCP needs to be deprecated.
> > - RDDP MPA: Small startup update for MPI application support.
> > - iSER: A few minor updates based on InfiniBand experience.
> >
> > Additional work (e.g., updated/improved iSNS for iSCSI, MIB changes,
> > updated ipsec security profile [i.e., IKEv2-based]) is possible if
> > there's interest.
> >
> > There are (at least) four possible outcomes:
> > (A) None of this work needs to be done.
> > (B) There are some small work items that make sense.  Individual
> >    drafts with a draft shepherd (i.e., David Black) will
> >    suffice.
> > (C) A working group is needed to undertake more complex work
> >    items and reach consensus on design issues.  The WG can
> >    be "virtual" and operate mostly via the mailing list
> >    until/unless controversial/contentious issues arise.
> > (D) There is a lot of complex work that is needed, and a WG
> >    that will plan to meet at every IETF meeting should be
> >    formed.
> >
> > Please note that the IETF "rough consensus" process requires a
> > working group in practice to be effective.  This makes outcome
> > (C) look attractive to me, as:
> > - I'm coming under increasing pressure to limit travel, and
> >    the next two IETF meetings after San Francisco are not
> >    in the US.
> > - I'd rather have the "rough consensus" process available and
> >    not need it than need it and not have it available.
> >
> > Setting an example for how to express interest ...
> >
> > ---------------
> > I think that the iSCSI single RFC and interoperability report are
> > good ideas, but I want to see a bunch of people expressing interest
> > in these, as significant effort is involved.  It might make sense
> > to do the single iSCSI RFC but put off the interoperability report
> > (the resulting RFC would remain at Proposed Standard rather than
> > going to Draft Standard), as I'm not hearing about major iSCSI
> > interoperability issues.
> >
> > I think the latter four items (SAM-4 for iSCSI, deprecate iFCP
> > address translation, MPI fix to MPA and iSER fixes) should all
> > be done.
> >
> > I plan to author the iFCP address translation deprecation draft,
> > and review all other drafts.
> >
> > I think that a virtual WG should be formed that plans to do its
> > work primarily via the mailing list.  I believe the SAM-4 work
> > by itself is complex enough to need a working group - I would
> > expect design issues to turn up at least there and in determining
> > whether to remove certain iSCSI features, but I'm cautiously
> > optimistic that the mailing list is sufficient to work these
> > issues out (and concerned that travel restrictions are likely to
> > force use of the mailing list).
> >
> > -----------------
> >
> > Ok, who wants to go next?
> >
> > Thanks,
> > --David
> > ----------------------------------------------------
> > David L. Black, Distinguished Engineer
> > EMC Corporation, 176 South St., Hopkinton, MA  01748
> > +1 (508) 293-7953             FAX: +1 (508) 293-7786
> > black_david@emc.com        Mobile: +1 (978) 394-7754
> > ----------------------------------------------------
> > _______________________________________________
> > Ips mailing list
> > Ips@ietf.org
> > https://www.ietf.org/mailman/listinfo/ips
> >
> _______________________________________________
> rddp mailing list
> rddp@ietf.org
> https://www.ietf.org/mailman/listinfo/rddp