Re: [storm] iSER draft

Tom Talpey <ttalpey@microsoft.com> Wed, 29 June 2011 15:30 UTC

Return-Path: <ttalpey@microsoft.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 14EDC11E807F for <storm@ietfa.amsl.com>; Wed, 29 Jun 2011 08:30:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.598
X-Spam-Level:
X-Spam-Status: No, score=-110.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AoL6Oav+2eeD for <storm@ietfa.amsl.com>; Wed, 29 Jun 2011 08:30:08 -0700 (PDT)
Received: from smtp.microsoft.com (smtp.microsoft.com [131.107.115.214]) by ietfa.amsl.com (Postfix) with ESMTP id 8310E11E8076 for <storm@ietf.org>; Wed, 29 Jun 2011 08:30:08 -0700 (PDT)
Received: from TK5EX14MLTC102.redmond.corp.microsoft.com (157.54.79.180) by TK5-EXGWY-E803.partners.extranet.microsoft.com (10.251.56.169) with Microsoft SMTP Server (TLS) id 8.2.176.0; Wed, 29 Jun 2011 08:30:07 -0700
Received: from TK5EX14MBXC118.redmond.corp.microsoft.com ([169.254.9.75]) by TK5EX14MLTC102.redmond.corp.microsoft.com ([157.54.79.180]) with mapi id 14.01.0289.008; Wed, 29 Jun 2011 08:30:07 -0700
From: Tom Talpey <ttalpey@microsoft.com>
To: Alexander Nezhinsky <nezhinsky@gmail.com>, "storm@ietf.org" <storm@ietf.org>
Thread-Topic: [storm] iSER draft
Thread-Index: AQHMNlPpov6tD7xTq0WTfzkPhGmP2JTUcDZA
Date: Wed, 29 Jun 2011 15:30:06 +0000
Message-ID: <F83812DF4B59B9499C1BC978336D91745EDE207F@TK5EX14MBXC118.redmond.corp.microsoft.com>
References: <BANLkTi=GVHUBdsZiwsRXxETd_rU=F8f85w@mail.gmail.com>
In-Reply-To: <BANLkTi=GVHUBdsZiwsRXxETd_rU=F8f85w@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [157.54.51.34]
Content-Type: multipart/alternative; boundary="_000_F83812DF4B59B9499C1BC978336D91745EDE207FTK5EX14MBXC118r_"
MIME-Version: 1.0
Subject: Re: [storm] iSER draft
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2011 15:30:11 -0000

Working Group Co-chair hat off.

Alexander, thanks for contributing these updates. I have some general comments on two items.

> * Add a definition of VA to section 1.1. For example:
>"Virtual Address (VA) - an identifier supplied by RCaP together with STag
>when registering a memory buffer. In some modes it is not used,
>and VA=0 is implied. These modes are known as ZBVA (Zero Based VA).
>In such cases VA should be set to zero when buffer advertisements
>are communicated."

I think it may be problematic to introduce the new term “Virtual Address” without disambiguating it from the existing term “Tagged Offset”. The term VA does not appear in RFC5046, and in the update, was introduced in only one message type (control PDU sections 6.9 and 9.2). Also, your definition implies that it is provided locally to an unspecified API. If it is not passed on the wire, then it seems superfluous to the protocol.

Either way, the definition needs to avoid the statements “In some modes” and “should be set to zero”. These are protocol behavior and such discussion is be avoided in the glossary. I would suggest deleting all but the first sentence, after resolving the use of VA of course.

>* Definition of Tagged Buffer:
>"Tagged Buffer - A buffer that is explicitly Advertised to the iSER Layer
>at the remote node  through the exchange of an STag, Tagged Offset, and length."
>
>should read:
>"...the exchange of an STag, VA, Tagged Offset, and length."

Now I’m really confused. Is the Tagged Buffer represented as a 4-tuple in the new protocol? If so, where will this be described?



From: storm-bounces@ietf.org [mailto:storm-bounces@ietf.org] On Behalf Of Alexander Nezhinsky
Sent: Wednesday, June 29, 2011 7:59 AM
To: storm@ietf.org; david.black@emc.com; michael@huaweisymantec.com
Subject: [storm] iSER draft

Hi,

I would like to post some comments and suggestions regarding the latest iSER draft:
http://www.watersprings.org/pub/id/draft-ietf-storm-iser-01.txt

Unfortunately no comments have been posted since it expired. I'm acting both as a
contributor and a practical implementer.
Earlier this year i submitted a new stable implementation of iSER, which was accepted
by the open source STGT project. As a part of their official release it already made its way
to a few Linux distributions.

There is a gap between the existing implementation and the original RFC, which is mostly
(but not exclusively) related to the possibility of implementation over Infiniband.
This gap was almost completely covered by the recent draft-ietf-storm-iser-01.
My comments below strive to eliminate the remaining discrepancies.
The general idea is to either reflect some implemented changes in the spec,
or to pave a comprehensive way for the implementation to become spec compliant.

Alexander Nezhinsky

---

1. Account for VA in buffer advertisement

Add VA to the list of attributes which identify advertized buffers and change related definitions and statements.

* Add a definition of VA to section 1.1. For example:
"Virtual Address (VA) - an identifier supplied by RCaP together with STag
when registering a memory buffer. In some modes it is not used,
and VA=0 is implied. These modes are known as ZBVA (Zero Based VA).
In such cases VA should be set to zero when buffer advertisements
are communicated."

* Definition of Remote Mapping:
"Remote Mapping - A task state record maintained by the iSER Layer
 that associates the Initiator Task Tag to the Advertised STag(s)..."

should read:
"...that associates the Initiator Task Tag to the Advertised STag(s)
and Virtual Address(es)..."

* Definition of Tagged Buffer:
"Tagged Buffer - A buffer that is explicitly Advertised to the iSER Layer
at the remote node  through the exchange of an STag, Tagged Offset, and length."

should read:
"...the exchange of an STag, VA, Tagged Offset, and length."

2. iser message format

The headers are now defined for using with iWARP. Actually, they contain an iWARP
prefix followed by a generic part.

Define a generic iser header as mandatory, while stating that the underlying transport
protocols may add their own header sections before the iser header
and provide iWARP as the example of such prefix.

Read and write buffers are identified by the combination of STag and VA.
Both fields should be required for any transport. This requirement does not preclude
using ZBVA where possible. When VA is not used (as in iWARP) it is set to 0.

Below is the actually used format:

struct iser_hdr {
  uint8_t   flags;
  uint8_t   rsvd[3];
  uint32_t  write_stag; /* write rkey in IB */
  uint64_t  write_va;
  uint32_t  read_stag;  /* read rkey in IB */
  uint64_t  read_va;
}

3. Use RCaP as a general term, iWARP and IB as particular cases

The language of the current draft is iWARP-dominated in many places. We need to
make the text to refer to RCaP as a common denominator, stressing the differences
between protocols where necessary.

* section 2 :

"The iWARP protocol stack provides direct data placement
functionality that is usable in practice, and in addition, there is
also interest in using iSCSI with other RDMA protocol stacks that
support direct data placement, such as the one provided by
InfiniBand.  The generic term RDMA-Capable Protocol (RCaP) is used
to refer to the RDMA functionality provided by such protocol stacks."

should read:
"The generic term RDMA-Capable Protocol (RCaP) is used to refer to
the RDMA functionality provided by the underlying protocol stack.
Two practical examples of protocol stacks providing direct data placement
functionality are iWARP and InfiniBand."

* section 5.1.1 :
 "Among the connection resources allocated at the initiator is the
Inbound RDMA Read Queue Depth (IRD) ...
... Initially, the iSER-IRD value at the initiator SHOULD
be set to the IRD value at the initiator and MUST NOT be more than
the IRD value."

should be prefixed with:
"For iWARP based implementations, ..."

4. Mark iWARP-specific sections as such

Some protocol-specific statements, like in above, are scattered through the
text, but there are entire sections which are iWARP-specific (RDMAP features
having no paralle in IB, etc.) We should explicitly state this.
The same goes for IB, of course, although right now there are no such clauses.

In particular, sections 8.2. and 8.3. with sub-sections 8.3.x
should be either grouped as the subsections of
 "8.2. iWARP considerations in Flow Control"
or retain their numbers and have "(iWARP specific)" added to their titles

5. Fix a typo in section numbering

Appendix A should be section 15.

6. iSER Hello-HelloReply

Hello-HelloReply should be made optional and transport dependent.

Right now the IB based target's implementation terminates the connection if it receives Hello.
IB-based implementation does not need any Hello exchange, so the implemented behavior may be
turned into standard. iWARP does need Hello-HelloReply and its format is provided.

We can define a new key called iSERHelloRequired with the default being "No", so
that the current implementation becomes compliant.

Need to update some statements regarding IserHello, like in 5.1.1 and 5.1.2
alleviating some MUSTs with "if relevant for transport"-style of comments.

7. Default values for MaxOutstandingUnexpectedPDUs and MaxAHSLength

The defaults for MaxOutstandingUnexpectedPDUs and MaxAHSLength
should change from 0 (unlimited) to some small reasonable values
to protect a side not sending these keys from potentially harmful assumption
by its peer about unlimited resources available, so that buffer overflow (MaxAHSLength)
or sending more buffers than actually posted (MaxOutstandingUnexpectedPDUs) may result.

proposed default values:
MaxOutstandingUnexpectedPDUs = 4
MaxAHSLength = 256 (accounts for 1 Var len CDB + 1 bidir read-length)

8. Ignoring MaxRecvDataSegmentLength key

It should be clarified that MaxRecvDataSegmentLength may be sent but MUST be ignored,
because {Initiator|Target}RecvDataSegmentLength have their own defaults.

If {Initiator|Target}RecvDataSegmentLength are sent, the declared values would override
MaxRecvDataSegmentLength anyway, and if they are not sent, the default values
must be used to avoid confusing scenarios.