[storm] iSER draft

Alexander Nezhinsky <nezhinsky@gmail.com> Wed, 29 June 2011 11:58 UTC

Return-Path: <nezhinsky@gmail.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 60C4B21F86E0 for <storm@ietfa.amsl.com>; Wed, 29 Jun 2011 04:58:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.598
X-Spam-Level:
X-Spam-Status: No, score=-3.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id e-QthgmyclCY for <storm@ietfa.amsl.com>; Wed, 29 Jun 2011 04:58:43 -0700 (PDT)
Received: from mail-gx0-f172.google.com (mail-gx0-f172.google.com [209.85.161.172]) by ietfa.amsl.com (Postfix) with ESMTP id 6774F21F86CE for <storm@ietf.org>; Wed, 29 Jun 2011 04:58:43 -0700 (PDT)
Received: by gxk19 with SMTP id 19so558817gxk.31 for <storm@ietf.org>; Wed, 29 Jun 2011 04:58:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=B2jQtCSre3/4HQQ1mdORE9Ge9ZeHmCWjYq0DaO4auOo=; b=NvFqme57SljSX0YgHGqZ1ymG0nZK3n+mP7OK+r8D70IA4n4/AbzUq5FT35TfreMVyb 6lLnHwMa3hscjw8f3vBuqY9X0qdzRFFBRhzWDQBH/MVPUtfIQnN41HoHxXqoGFgG0T5z QsAIBU4OHJ1P5fLlNiuaFk2iLaVjbEKNyHYN4=
MIME-Version: 1.0
Received: by 10.236.116.101 with SMTP id f65mr802005yhh.32.1309348722487; Wed, 29 Jun 2011 04:58:42 -0700 (PDT)
Received: by 10.236.203.196 with HTTP; Wed, 29 Jun 2011 04:58:42 -0700 (PDT)
Date: Wed, 29 Jun 2011 14:58:42 +0300
Message-ID: <BANLkTi=GVHUBdsZiwsRXxETd_rU=F8f85w@mail.gmail.com>
From: Alexander Nezhinsky <nezhinsky@gmail.com>
To: storm@ietf.org, david.black@emc.com, michael@huaweisymantec.com
Content-Type: multipart/alternative; boundary=20cf303ddb567db33a04a6d88190
Subject: [storm] iSER draft
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2011 11:58:45 -0000

Hi,

I would like to post some comments and suggestions regarding the latest iSER
draft:
http://www.watersprings.org/pub/id/draft-ietf-storm-iser-01.txt

Unfortunately no comments have been posted since it expired. I'm acting both
as a
contributor and a practical implementer.
Earlier this year i submitted a new stable implementation of iSER, which was
accepted
by the open source STGT project. As a part of their official release it
already made its way
to a few Linux distributions.

There is a gap between the existing implementation and the original RFC,
which is mostly
(but not exclusively) related to the possibility of implementation over
Infiniband.
This gap was almost completely covered by the recent
draft-ietf-storm-iser-01.
My comments below strive to eliminate the remaining discrepancies.
The general idea is to either reflect some implemented changes in the spec,
or to pave a comprehensive way for the implementation to become spec
compliant.

Alexander Nezhinsky

---

1.* Account for VA in buffer advertisement*

Add VA to the list of attributes which identify advertized buffers and
change related definitions and statements.

* Add a definition of VA to section 1.1. For example:
"Virtual Address (VA) - an identifier supplied by RCaP together with STag
when registering a memory buffer. In some modes it is not used,
and VA=0 is implied. These modes are known as ZBVA (Zero Based VA).
In such cases VA should be set to zero when buffer advertisements
are communicated."

* Definition of Remote Mapping:
"Remote Mapping - A task state record maintained by the iSER Layer
 that associates the Initiator Task Tag to the Advertised STag(s)..."

should read:

"...that associates the Initiator Task Tag to the Advertised STag(s)
and Virtual Address(es)..."

* Definition of Tagged Buffer:

"Tagged Buffer - A buffer that is explicitly Advertised to the iSER Layer
at the remote node  through the exchange of an STag, Tagged Offset, and
length."

should read:

"...the exchange of an STag, VA, Tagged Offset, and length."

2.* iser message format*

The headers are now defined for using with iWARP. Actually, they contain an
iWARP
prefix followed by a generic part.

Define a generic iser header as mandatory, while stating that the underlying
transport
protocols may add their own header sections before the iser header
and provide iWARP as the example of such prefix.

Read and write buffers are identified by the combination of STag and VA.
Both fields should be required for any transport. This requirement does not
preclude
using ZBVA where possible. When VA is not used (as in iWARP) it is set to 0.

Below is the actually used format:

struct iser_hdr {
  uint8_t   flags;
  uint8_t   rsvd[3];
  uint32_t  write_stag; /* write rkey in IB */
  uint64_t  write_va;
  uint32_t  read_stag;  /* read rkey in IB */
  uint64_t  read_va;
}

3.* Use RCaP as a general term, iWARP and IB as particular cases*

The language of the current draft is iWARP-dominated in many places. We need
to
make the text to refer to RCaP as a common denominator, stressing the
differences
between protocols where necessary.

* section 2 :

"The iWARP protocol stack provides direct data placement
functionality that is usable in practice, and in addition, there is
also interest in using iSCSI with other RDMA protocol stacks that
support direct data placement, such as the one provided by
InfiniBand.  The generic term RDMA-Capable Protocol (RCaP) is used
to refer to the RDMA functionality provided by such protocol stacks."

should read:

"The generic term RDMA-Capable Protocol (RCaP) is used to refer to
the RDMA functionality provided by the underlying protocol stack.
Two practical examples of protocol stacks providing direct data placement
functionality are iWARP and InfiniBand."

* section 5.1.1 :

 "Among the connection resources allocated at the initiator is the
Inbound RDMA Read Queue Depth (IRD) ...
... Initially, the iSER-IRD value at the initiator SHOULD
be set to the IRD value at the initiator and MUST NOT be more than
the IRD value."

should be prefixed with:

"For iWARP based implementations, ..."

4. *Mark iWARP-specific sections as such*

Some protocol-specific statements, like in above, are scattered through the
text, but there are entire sections which are iWARP-specific (RDMAP features
having no paralle in IB, etc.) We should explicitly state this.
The same goes for IB, of course, although right now there are no such
clauses.

In particular, sections 8.2. and 8.3. with sub-sections 8.3.x
should be either grouped as the subsections of
 "8.2. iWARP considerations in Flow Control"
or retain their numbers and have "(iWARP specific)" added to their titles

5. *Fix a typo in section numbering*

Appendix A should be section 15.

6.* iSER Hello-HelloReply*

Hello-HelloReply should be made optional and transport dependent.

Right now the IB based target's implementation terminates the connection if
it receives Hello.
IB-based implementation does not need any Hello exchange, so the implemented
behavior may be
turned into standard. iWARP does need Hello-HelloReply and its format is
provided.

We can define a new key called iSERHelloRequired with the default being
"No", so
that the current implementation becomes compliant.

Need to update some statements regarding IserHello, like in 5.1.1 and 5.1.2
alleviating some MUSTs with "if relevant for transport"-style of comments.

7.* Default values for MaxOutstandingUnexpectedPDUs and MaxAHSLength*

The defaults for MaxOutstandingUnexpectedPDUs and MaxAHSLength
should change from 0 (unlimited) to some small reasonable values
to protect a side not sending these keys from potentially harmful assumption

by its peer about unlimited resources available, so that buffer overflow
(MaxAHSLength)
or sending more buffers than actually posted (MaxOutstandingUnexpectedPDUs)
may result.

proposed default values:
MaxOutstandingUnexpectedPDUs = 4
MaxAHSLength = 256 (accounts for 1 Var len CDB + 1 bidir read-length)

8. *Ignoring MaxRecvDataSegmentLength key*

It should be clarified that MaxRecvDataSegmentLength may be sent but MUST be
ignored,
because {Initiator|Target}RecvDataSegmentLength have their own defaults.

If {Initiator|Target}RecvDataSegmentLength are sent, the declared values
would override
MaxRecvDataSegmentLength anyway, and if they are not sent, the default
values
must be used to avoid confusing scenarios.