Re: [storm] iSER draft

Michael Ko <Michael@huaweisymantec.com> Wed, 29 June 2011 21:26 UTC

Return-Path: <Michael@huaweisymantec.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7DBA69E8063 for <storm@ietfa.amsl.com>; Wed, 29 Jun 2011 14:26:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oDUdhF12x97k for <storm@ietfa.amsl.com>; Wed, 29 Jun 2011 14:26:19 -0700 (PDT)
Received: from mta2.huaweisymantec.com (mta2.huaweisymantec.com [218.17.155.15]) by ietfa.amsl.com (Postfix) with ESMTP id 1E1379E805D for <storm@ietf.org>; Wed, 29 Jun 2011 14:26:19 -0700 (PDT)
MIME-version: 1.0
Content-type: multipart/alternative; boundary="Boundary_(ID_HxQcbemlksbqBL6Hji2lNg)"
Received: from hstml01-in.huaweisymantec.com ([172.26.3.42]) by hstga02-in.huaweisymantec.com (Sun Java(tm) System Messaging Server 6.3-8.03 (built Apr 24 2009; 32bit)) with ESMTP id <0LNK00E9PM7S0Z70@hstga02-in.huaweisymantec.com> for storm@ietf.org; Thu, 30 Jun 2011 05:26:16 +0800 (CST)
Received: from m90003900a ([69.199.248.19]) by hstml01-in.huaweisymantec.com (Sun Java(tm) System Messaging Server 6.3-8.03 (built Apr 24 2009; 32bit)) with ESMTPA id <0LNK008W6M8T9D00@hstml01-in.huaweisymantec.com> for storm@ietf.org; Thu, 30 Jun 2011 05:26:57 +0800 (CST)
Message-id: <971779FD6B314A40B1A3658E521C0D4A@china.huawei.com>
From: Michael Ko <Michael@huaweisymantec.com>
To: Tom Talpey <ttalpey@microsoft.com>, Alexander Nezhinsky <nezhinsky@gmail.com>, storm@ietf.org
References: <BANLkTi=GVHUBdsZiwsRXxETd_rU=F8f85w@mail.gmail.com> <F83812DF4B59B9499C1BC978336D91745EDE207F@TK5EX14MBXC118.redmond.corp.microsoft.com>
Date: Wed, 29 Jun 2011 14:26:12 -0700
X-Priority: 3
X-MSMail-priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Subject: Re: [storm] iSER draft
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2011 21:26:21 -0000

Tom,

As originally written, the iSER spec always assumes ZBVA (zero-based virtual address).  But when iSER is used in conjunction with transports other than iWARP, there is a need to include the virtual address for the tagged buffer.  Such is the case for Infiniband.  Accordingly, the Virtual Address field has been added to the iSCSI Control Type PDUs (sec. 9.2).  Alex pointed out that the tagged buffer should now be identified by the 4-tuple (STag, VA, TO, and length) instead.  This should be updated everywhere where the term "tagged buffer" is mentioned in the spec.  For example, in sec. 1.1 under Advertisement, the sentence "A typical method would be for the iSER Layer to embed the Tagged Buffer's STag, TO, and buffer length in a Send Message destined for the remote iSER Layer" should be modified to "A typical method would be for the iSER Layer to embed the Tagged Buffer's STag, VA, TO, and buffer length in a Send Message destined for the remote iSER Layer."  BTW, this sentence also answers your question as to whether VA is passed on the wire.

The Virtual Address is literally the virtual address of the tagged buffer.  The tagged offset is the displacement relative to the beginning of the buffer starting at the virtual address.

We could rephrase the definition of Virtual Address in sec. 1.1 as follows:

"Virtual Address (VA) - This is the virtual address of the Tagged Buffer on a Node.  When set to 0, it indicates that a virtual address is not needed to locate the Tagged Buffer."

Mike

----- Original Message ----- 
  From: Tom Talpey 
  To: Alexander Nezhinsky ; storm@ietf.org 
  Sent: Wednesday, June 29, 2011 8:30 AM
  Subject: Re: [storm] iSER draft


  Working Group Co-chair hat off.

   

  Alexander, thanks for contributing these updates. I have some general comments on two items.

   

  > * Add a definition of VA to section 1.1. For example:

  >"Virtual Address (VA) - an identifier supplied by RCaP together with STag 
  >when registering a memory buffer. In some modes it is not used,
  >and VA=0 is implied. These modes are known as ZBVA (Zero Based VA).
  >In such cases VA should be set to zero when buffer advertisements
  >are communicated."

   

  I think it may be problematic to introduce the new term “Virtual Address” without disambiguating it from the existing term “Tagged Offset”. The term VA does not appear in RFC5046, and in the update, was introduced in only one message type (control PDU sections 6.9 and 9.2). Also, your definition implies that it is provided locally to an unspecified API. If it is not passed on the wire, then it seems superfluous to the protocol.

   

  Either way, the definition needs to avoid the statements “In some modes” and “should be set to zero”. These are protocol behavior and such discussion is be avoided in the glossary. I would suggest deleting all but the first sentence, after resolving the use of VA of course.

   

  >* Definition of Tagged Buffer:

  >"Tagged Buffer - A buffer that is explicitly Advertised to the iSER Layer
  >at the remote node  through the exchange of an STag, Tagged Offset, and length."

  >
  >should read:

  >"...the exchange of an STag, VA, Tagged Offset, and length."

   

  Now I’m really confused. Is the Tagged Buffer represented as a 4-tuple in the new protocol? If so, where will this be described?

   

   

   

  From: storm-bounces@ietf.org [mailto:storm-bounces@ietf.org] On Behalf Of Alexander Nezhinsky
  Sent: Wednesday, June 29, 2011 7:59 AM
  To: storm@ietf.org; david.black@emc.com; michael@huaweisymantec.com
  Subject: [storm] iSER draft

   

  Hi,

  I would like to post some comments and suggestions regarding the latest iSER draft: 
  http://www.watersprings.org/pub/id/draft-ietf-storm-iser-01.txt

  Unfortunately no comments have been posted since it expired. I'm acting both as a 
  contributor and a practical implementer.
  Earlier this year i submitted a new stable implementation of iSER, which was accepted
  by the open source STGT project. As a part of their official release it already made its way 
  to a few Linux distributions.

  There is a gap between the existing implementation and the original RFC, which is mostly
  (but not exclusively) related to the possibility of implementation over Infiniband.
  This gap was almost completely covered by the recent draft-ietf-storm-iser-01.
  My comments below strive to eliminate the remaining discrepancies.
  The general idea is to either reflect some implemented changes in the spec,
  or to pave a comprehensive way for the implementation to become spec compliant.

  Alexander Nezhinsky

  ---

  1. Account for VA in buffer advertisement

  Add VA to the list of attributes which identify advertized buffers and change related definitions and statements.

  * Add a definition of VA to section 1.1. For example:

  "Virtual Address (VA) - an identifier supplied by RCaP together with STag 
  when registering a memory buffer. In some modes it is not used,
  and VA=0 is implied. These modes are known as ZBVA (Zero Based VA).
  In such cases VA should be set to zero when buffer advertisements
  are communicated."

   

  * Definition of Remote Mapping:

  "Remote Mapping - A task state record maintained by the iSER Layer
   that associates the Initiator Task Tag to the Advertised STag(s)..."


  should read:

  "...that associates the Initiator Task Tag to the Advertised STag(s)
  and Virtual Address(es)..."


  * Definition of Tagged Buffer:

  "Tagged Buffer - A buffer that is explicitly Advertised to the iSER Layer
  at the remote node  through the exchange of an STag, Tagged Offset, and length."


  should read:

  "...the exchange of an STag, VA, Tagged Offset, and length."


  2. iser message format

  The headers are now defined for using with iWARP. Actually, they contain an iWARP
  prefix followed by a generic part. 

  Define a generic iser header as mandatory, while stating that the underlying transport
  protocols may add their own header sections before the iser header
  and provide iWARP as the example of such prefix.

  Read and write buffers are identified by the combination of STag and VA.
  Both fields should be required for any transport. This requirement does not preclude
  using ZBVA where possible. When VA is not used (as in iWARP) it is set to 0.

  Below is the actually used format:

  struct iser_hdr {
    uint8_t   flags;
    uint8_t   rsvd[3];
    uint32_t  write_stag; /* write rkey in IB */
    uint64_t  write_va;
    uint32_t  read_stag;  /* read rkey in IB */
    uint64_t  read_va;
  }

  3. Use RCaP as a general term, iWARP and IB as particular cases

  The language of the current draft is iWARP-dominated in many places. We need to
  make the text to refer to RCaP as a common denominator, stressing the differences
  between protocols where necessary.

  * section 2 :
   

  "The iWARP protocol stack provides direct data placement
  functionality that is usable in practice, and in addition, there is
  also interest in using iSCSI with other RDMA protocol stacks that
  support direct data placement, such as the one provided by
  InfiniBand.  The generic term RDMA-Capable Protocol (RCaP) is used
  to refer to the RDMA functionality provided by such protocol stacks."


  should read:

  "The generic term RDMA-Capable Protocol (RCaP) is used to refer to
  the RDMA functionality provided by the underlying protocol stack.
  Two practical examples of protocol stacks providing direct data placement
  functionality are iWARP and InfiniBand."


  * section 5.1.1 :

   "Among the connection resources allocated at the initiator is the
  Inbound RDMA Read Queue Depth (IRD) ...
  ... Initially, the iSER-IRD value at the initiator SHOULD
  be set to the IRD value at the initiator and MUST NOT be more than
  the IRD value."


  should be prefixed with:

  "For iWARP based implementations, ..."


  4. Mark iWARP-specific sections as such

  Some protocol-specific statements, like in above, are scattered through the
  text, but there are entire sections which are iWARP-specific (RDMAP features
  having no paralle in IB, etc.) We should explicitly state this. 
  The same goes for IB, of course, although right now there are no such clauses.

  In particular, sections 8.2. and 8.3. with sub-sections 8.3.x
  should be either grouped as the subsections of
   "8.2. iWARP considerations in Flow Control"
  or retain their numbers and have "(iWARP specific)" added to their titles

  5. Fix a typo in section numbering

   

  Appendix A should be section 15.

  6. iSER Hello-HelloReply

  Hello-HelloReply should be made optional and transport dependent.

  Right now the IB based target's implementation terminates the connection if it receives Hello.
  IB-based implementation does not need any Hello exchange, so the implemented behavior may be
  turned into standard. iWARP does need Hello-HelloReply and its format is provided.

  We can define a new key called iSERHelloRequired with the default being "No", so
  that the current implementation becomes compliant.

  Need to update some statements regarding IserHello, like in 5.1.1 and 5.1.2
  alleviating some MUSTs with "if relevant for transport"-style of comments.

  7. Default values for MaxOutstandingUnexpectedPDUs and MaxAHSLength

  The defaults for MaxOutstandingUnexpectedPDUs and MaxAHSLength
  should change from 0 (unlimited) to some small reasonable values
  to protect a side not sending these keys from potentially harmful assumption 
  by its peer about unlimited resources available, so that buffer overflow (MaxAHSLength)
  or sending more buffers than actually posted (MaxOutstandingUnexpectedPDUs) may result.

  proposed default values:
  MaxOutstandingUnexpectedPDUs = 4
  MaxAHSLength = 256 (accounts for 1 Var len CDB + 1 bidir read-length)

  8. Ignoring MaxRecvDataSegmentLength key

  It should be clarified that MaxRecvDataSegmentLength may be sent but MUST be ignored,
  because {Initiator|Target}RecvDataSegmentLength have their own defaults. 

  If {Initiator|Target}RecvDataSegmentLength are sent, the declared values would override 
  MaxRecvDataSegmentLength anyway, and if they are not sent, the default values
  must be used to avoid confusing scenarios.





------------------------------------------------------------------------------


  _______________________________________________
  storm mailing list
  storm@ietf.org
  https://www.ietf.org/mailman/listinfo/storm