Re: [storm] Review comments on draft-ietf-storm-rdmap-ext

"Black, David" <david.black@emc.com> Fri, 06 September 2013 23:31 UTC

Return-Path: <david.black@emc.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0D9E711E80E3 for <storm@ietfa.amsl.com>; Fri, 6 Sep 2013 16:31:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ps-P9tzKXXfp for <storm@ietfa.amsl.com>; Fri, 6 Sep 2013 16:31:18 -0700 (PDT)
Received: from mailuogwdur.emc.com (mailuogwdur.emc.com [128.221.224.79]) by ietfa.amsl.com (Postfix) with ESMTP id CDE1121F9FB5 for <storm@ietf.org>; Fri, 6 Sep 2013 16:31:17 -0700 (PDT)
Received: from maildlpprd54.lss.emc.com (maildlpprd54.lss.emc.com [10.106.48.158]) by mailuogwprd54.lss.emc.com (Sentrion-MTA-4.3.0/Sentrion-MTA-4.3.0) with ESMTP id r86NVBZD031708 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 6 Sep 2013 19:31:12 -0400
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd54.lss.emc.com r86NVBZD031708
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=emc.com; s=jan2013; t=1378510272; bh=HJk/sWtA5hlWkupWc6pRtXJISGA=; h=From:To:Date:Subject:Message-ID:References:In-Reply-To: Content-Type:Content-Transfer-Encoding:MIME-Version; b=c8VMQWrDKpyQuXsNfNhVwdLSVSRpK9o2LgdWaMkAWJMzgl92XxRPZZz0/sawvcknJ vPRgbi2gIcgil1c9VOzSpXKLXGpkByl2+j2yWf75rT0gQKecqk+r2VL03ENPAJ6Y+d FuHKO69giI6yMDy2JMbRJFVQmZeXP4t6elnU8dv8=
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd54.lss.emc.com r86NVBZD031708
Received: from mailusrhubprd04.lss.emc.com (mailusrhubprd04.lss.emc.com [10.253.24.22]) by maildlpprd54.lss.emc.com (RSA Interceptor); Fri, 6 Sep 2013 19:31:06 -0400
Received: from mxhub28.corp.emc.com (mxhub28.corp.emc.com [10.254.110.184]) by mailusrhubprd04.lss.emc.com (Sentrion-MTA-4.3.0/Sentrion-MTA-4.3.0) with ESMTP id r86NV54Z029874 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Fri, 6 Sep 2013 19:31:06 -0400
Received: from mx15a.corp.emc.com ([169.254.1.46]) by mxhub28.corp.emc.com ([10.254.110.184]) with mapi; Fri, 6 Sep 2013 19:31:06 -0400
From: "Black, David" <david.black@emc.com>
To: "Sharp, Robert O" <robert.o.sharp@intel.com>, Tom Talpey <ttalpey@microsoft.com>, "storm@ietf.org" <storm@ietf.org>
Date: Fri, 6 Sep 2013 19:31:04 -0400
Thread-Topic: [storm] Review comments on draft-ietf-storm-rdmap-ext
Thread-Index: Ac5865W+3+PDQi76TQWRLH3sDqnoewtj12JQADcqFOA=
Message-ID: <8D3D17ACE214DC429325B2B98F3AE712025D762712@MX15A.corp.emc.com>
References: <614F550557B82C44AC27C492ADA391AA148BF8B1@TK5EX14MBXC285.redmond.corp.microsoft.com> <2ABFA3E36CBB794685BFBA191CC1964952B1D77D@FMSMSX105.amr.corp.intel.com>
In-Reply-To: <2ABFA3E36CBB794685BFBA191CC1964952B1D77D@FMSMSX105.amr.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Sentrion-Hostname: mailusrhubprd04.lss.emc.com
X-EMM-GWVC: 1
X-RSA-Classifications: DLM_1, public
X-EMM-McAfeeVC: 1
Subject: Re: [storm] Review comments on draft-ietf-storm-rdmap-ext
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Sep 2013 23:31:22 -0000

One more small item:

> > + Section 5.2.1 Atomic operation values
> >
> > >   ---------+-----------+----------+----------+---------+---------
> > >   0011b    |           |
> > >   to       | Reserved  |            Not Specified
> > >   1111b    |           |
> > >   ---------+-----------+-----------------------------------------
> >
> > ... and
> >
> > >   4. At the Responder, when an invalid Atomic Operation Request
> > >      Message is delivered to the Remote Peer's RDMAP layer, an error
> > >      is surfaced.
> >
> > Are the "Reserved" fields invalid?
> >
>
> [Authors] Yes, the reserved values are invalid.  Clarifying text has been
> added.

Unbeknownst to you, "Reserved" turns out to be a reserved word for IANA that
has a meaning different from usage in other standards bodies (we hit this in
working through IANA's comments on the iscsi-sam draft).  This is not a problem
in the new IANA Considerations text, but we probably ought to also expunge
use of that word here.

I suggest deleting the above table row in Figure 5, and making the following
change to item 4 below the table:

OLD
   4. At the Responder, when an invalid Atomic Operation Request
      Message with a reserved Atomic Operation Code is delivered to the
      Remote Peer's RDMAP layer, an error MUST be surfaced.
NEW
   4. At the Responder, an error MUST be surfaced in response to delivery
      to the Remote Peer's RDMAP layer of an Atomic Operation Request
      Message with an Atomic Operation Code that the RNIC does not support.

Thanks,
--David

> -----Original Message-----
> From: storm-bounces@ietf.org [mailto:storm-bounces@ietf.org] On Behalf Of
> Sharp, Robert O
> Sent: Thursday, September 05, 2013 5:23 PM
> To: Tom Talpey; storm@ietf.org
> Subject: Re: [storm] Review comments on draft-ietf-storm-rdmap-ext
>
> Hi Tom,
>
> Thanks for the comments!  The author's responses are embedded below...
>
> Thanks,
> Bob
>
> > -----Original Message-----
> > From: storm-bounces@ietf.org [mailto:storm-bounces@ietf.org] On Behalf
> > Of Tom Talpey
> > Sent: Wednesday, July 10, 2013 11:10 AM
> > To: storm@ietf.org
> > Subject: [storm] Review comments on draft-ietf-storm-rdmap-ext
> >
> > I've read and reviewed the latest RDMA Extensions draft (draft-ietf-storm-
> > rdmap-ext-04) and have the following comments.
> >
> > I have two high-level concerns. These extensions to the RDMAP protocol are
> > made without any versioning or capability detection between peers at the
> > RDMAP level. There is a statement deep in section 5 that "discovery" of
> > Atomics needs to be performed by the upper layer or application, and no
> > similar statement is made in section 6 regarding Immediate Data.
> >
> > This seems problematic, for two reasons. First, it imposes an
> interoperability
> > burden on upper layers to perform this negotiation. That's perhaps
> > acceptable, but more discussion as to this requirement needs to be made,
> > and in particular some discussion of why it is *not* being performed by the
> > RDMAP protocol in this extension.
>
> [Authors] Significant text has been added to describe application level
> discovery used today and how this new capability fits in to the current
> discovery.
>
> >
> > Second, there is no discussion of why the chosen set of operations is
> > necessary, or sufficient. The Introduction states that (unspecified)
> > applications are able to use Atomics and Immediate Data on "other"
> > (unspecified) RDMA transports, but there is no discussion of whether the
> > operations are sufficient. And, when comparing the Atomics in this
> > document to, for example, Infiniband, there is an additional operation
> > ("Swap") which is not part of the IB specification. What other operations,
> or
> > semantics, are possible? Are those left for future work?
> >
> > In short, I would like to see a succinct Problem Statement, and a discussion
> of
> > how the extension is to interoperate with existing implementations.
> > Discussion of how the RDMAP protocol may be extended in the future would
> > be good to have, too.
>
> [Authors] The problem statement has been improved. References to middleware
> and applications have  been added.
>
> >
> >
> > Specific issues on sections follow.
> >
> > + Section 1, paragraph 1
> >
> > >   The RDMA Protocol [RFC5040] provides capabilities for zero copy and
> > >   kernel bypass data communications.  This document specifies the
> >
> > There is much more to RDMAP than zero copy, I think. And the concept of
> > "kernel bypass" is mentioned just once in RFC5040, and in the abstract; it
> > seems irrelevant here and should be omitted. Paragraph 3 goes on to say:
>
> [Authors] Agreed. This paragraph has been reworked.
>
> >
> > >   Other RDMA transport protocols define the functionality added by
> > >   these extensions leading to differences in RDMA applications and/or
> >
> > which makes no specific reference for either the protocols or the
> > applications. In other words, what is the detailed problem statement that
> the
> > extensions are proposed to solve?
> >
> > It would be preferable to see specific reasons motivating these extensions,
> > to justify them individually.
> >
>
> [Authors] Motivations have been added.
>
> >
> > + Editorial in section 1, is "following" the right word? Since the Immediate
> > Data is carried by a new message, it seems more like a tinygram-style
> > message in itself.
> >
> > >   o  Immediate Data messages allow the ULP at the sender to provide a
> > >      small amount of data following an RDMA Message.
> >
>
>
> [Authors] Description of the intended usage with RDMA Write and also
> standalone usage has been reworked.
>
> >
> > + Section 3 "Atomic Operation"
> >
> > >   Atomic Operation - is an operation that results in an execution of a
> > >   64-bit operation at a specific address on a remote node. The
> >
> > The "64-bit" qualifier is not necessary for the glossary and can be removed
> > here. Also, isn't it a specific "registered" address?
>
> [Authors] Switched to ULP Buffer address terminology to be consistent with RFC
> 5041/5040.
>
> >
> > >   consumer can use Atomic Operations to read, modify and write at the
> > >   destination address while at the same time guarantee that no other
> > >   read or write operation will occur across any other RDMAP Streams on
> > >   an RNIC at the Data Sink.
> >
> > Generalize the last sentence to "The consumer can use Atomic Operations to
> > perform operations on the contents of the destination address with certain
> > atomicity guarantees." I have other comments on the "across any other
> > RDMAP Streams" later.
>
> [Authors] Definition has been reworked.
>
> >
> > Even if you disagree with the prior sentence, the term "Data Sink" should be
> > deleted. Atomics source and sink data at the same peer. In fact, the
> glossary
> > goes on to define "Requester" and "Responder" just a few lines later.
> >
> >
> > + Section 4 first paragraph
> >
> > >   The control information of RDMA Messages is included in DDP protocol
> > >   [RFC5041] defined header fields, with the following new formats:
> >
> > This would appear to propose an extension to RFC5041, as written. This
> > should be stated, or made more clear. Any requirements made of other
> > protocols are critical to capture.
> >
>
> [Authors] Text has been changes to reflect that this is only extending RFC
> 5040.
>
> >
> > + Section 4.1, first paragraph
> >
> > >   The RDMA Messages defined by this specification use all 8 bits of
> > >   the RDMAP Control Field. The first octet reserved for ULP use in the
> > >   DDP Protocol MUST be used by the RDMAP to carry the RDMAP Control
> >
> > This statement appears to have been imported from RFC5040, and therefore
> > is not part of the extension. This should be reworded to make it clear what
> is
> > being extended and changed.
> >
>
> [Authors] Extension of RFC 5040 has been clarified.
>
> >
> > + Section 4.1, second paragraph
> >
> > >    0                   1                   2                   3
> > >    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> > >                                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > >                                   |T|L| Resrv | DV| RV|Rsv| Opcode|
> > >   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > >   |                     Invalidate STag                           |
> > >   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > >
> > >               Figure 1 DDP Control and RDMAP Control Fields
> >
> > There are no differences in these fields from RFC5040. What is being
> > extended in them? (Just the Opcodes, right?)
> >
>
> [Authors] Correct.
>
> >
> > + Section 4.1, table
> >
> > > -------+-----------+-------------------------------------------------
> > > 1011b  | Atomic    | 0     | N/A  | 3     | N/A       | Yes
> > >        | Response  |       |      |       |           |
> > > -------+-----------+-------------------------------------------------
> >
> > The newly-defined use of RDMAP queue #3 is quite surprising. What's the
> > purpose of the assignment? What semantics are being enforced? The
> > document is silent on them, and as noted in later comments, possibly
> > inconsistent.
> >
>
> [Authors]  Good catch.  QN=3 was added late and was not properly covered.
> Additional text has been added in the revised draft.
>
> >
> > + Section 5 third paragraph
> >
> > >   Atomic operations as specified in this document execute a 64-bit
> > >   operation at a specified destination address on a remote node. The
> >
> > It's much more than "specified", it's a _registered_ address, represented by
> > an STag and thereby under the control of the RFC5042 security model. It is
> > critical to state this, and make the reference.
>
> [Authors] Agreed.  The term ULP Buffer address should complete the connection
> back to RFC 5042 indirectly through RFC 5040.
>
> >
> > Also, the address has other possible restrictions, with respect to
> alignment.
> > The protocol should state whether these are required by the protocol itself,
> > or if they are subject to local constraints imposed by the registration. If
> they
> > are protocol-imposed, then there will be implications for the "offset"
> fields
> > with regard to byte granularity, and how and where they will be checked.
> > Either way, discussion here is required.
>
> [Authors] We added a note that the errors covered by the wire protocol over
> the alignment issue and also revised other text regarding the 64-bit alignment
> topic.  Clearly (as you are pointing out) the wire protocol cannot dictate if
> a given implementation can handle unaligned accesses atomically.  We are
> documenting the alignment issue and allowing it to be reported properly.
>
> >
> > >   operations atomically read, modify and write back the contents of
> >
> > Are read, modify and write the only actions possible in an atomic? It's not
> > clear how the proposed "swap" operations apply here. The R/M/W meaning
> > should be made clear.
>
> [Authors] We believe that this text is sufficient.  We consider swap to be a
> modification.
>
> >
> > >   the destination address and guarantee that Atomic Operations on this
> > >   address by other RDMAP Streams on the same RNIC do not occur
> > between
> > >   the read and the write.
> >
> > This seems to be significantly less of an atomicity guarantee than expected.
> > Normally, atomics protect all concurrent modifications to the target
> location,
> > including those from local access via the CPU at the responder, and other
> > RNICs. Is the scope intentionally narrowed to a single RNIC here? Whatever
> > the guarantee, some statement of RNIC requirements to achieve this is
> > needed, because the proposed protocol itself does not address it.
>
> [Authors] Yes, the scope was intentionally narrowed.  The largest scope the
> authors believe that we can address in a wire protocol document is within a
> single RNIC.
>
> >
> > >   document MAY be implemented. The discovery of whether the Atomic
> > >   Operations are implemented or not is outside the scope of this
> > >   specification and it should be handled by the ULPs or applications.
> >
> > This is the first mention in the document of this requirement, and it needs
> > quite a bit more discussion on exactly how an upper layer is to proceed. Why
> > is the "should" not normative here, and is it a MUST? See earlier comment.
> >
>
> [Authors] The discovery topic has been reworked and should clear up this
> issue.
>
>
> >
> > + Section 5.2 first paragraph
> >
> > >   The Atomic Operation Request and Response are RDMA Messages.  An
> > >   Atomic Operation makes use of the DDP Untagged Buffer Model. Atomic
> > >   Operations use the same Queue Number as RDMA Read Requests
> > (QN=1).
> >
> > Earlier, atomic *requests* are specified to use QN=1, but atomic
> > *responses* used QN=3. Which is correct? Note, section 5.2.2 bullet #3 also
> > makes this statement.
> >
>
> [Authors] Good catch.  This has been corrected.
>
> >
> > + Section 5.2.1 Atomic operation values
> >
> > > ---------+-----------+----------+----------+---------+---------
> > >   0011b    |           |
> > >   to       | Reserved  |            Not Specified
> > >   1111b    |           |
> > >   ---------+-----------+-----------------------------------------
> >
> > ... and
> >
> > >   4. At the Responder, when an invalid Atomic Operation Request
> > >      Message is delivered to the Remote Peer's RDMAP layer, an error
> > >      is surfaced.
> >
> > Are the "Reserved" fields invalid?
> >
>
> [Authors] Yes, the reserved values are invalid.  Clarifying text has been
> added.
>
> >
> > + Section 5.2.2 points #5 and #7
> >
> > >   5. The RDMAP layer MUST Deliver the Atomic Operation Response
> > >      Message to the ULP.
> >
> > >   7. The Responder RDMAP layer MUST pass Atomic Operation Response
> > >      Messages to the DDP layer, in the order that the Atomic Operation
> > >      Request Messages were received by the RDMAP layer, at the
> > >      Responder.
> >
> > These steps appear out of order. I suggest switching them, so the sending of
> > the response precedes its processing by the receiver.
> >
>
> [Authors] Reordered.
>
> >
> > + Section 5.3. Atomicity Guarantees
> >
> > >   Atomicity of the Read-Modify-Write (RMW) on the Responder's node by
> > >   the Atomic Operation MUST be assured in the presence of concurrent
> > >   atomic accesses by other RDMAP Streams on the same RNIC.
> >
> > Similar comment as earlier - are these the only types of concurrent access
> > guarantees?
> >
>
> [Authors] Yes, as described above.
>
> >
> > + Section 5.4 Atomic completion rules
> >
> > >   1. For an Atomic operation, the contents of the Tagged Buffer at the
> > >      Responder MAY be indeterminate until the Atomic Operation
> > >      Response Message has been Delivered at the Requester.
> >
> > I don't understand this statement. The protocol defines no means for the
> > Requester to acknowledge the Delivery of the Atomic Response, so how can
> > this in any way influence the behavior at the Responder?
>
> [Authors] Good comment.  This has been reworked.
>
> >
> > >   4. Atomic Operation Response Message processing at the Responder
> > >      MUST be started only after the Atomic Operation Request Message
> > >      has been Delivered by the DDP layer (thus, all previous RDMA
> > >      Messages have been properly submitted for ordered Placement).
> >
> > Again, I don't understand. The DDP protocol uses Queue Numbers to enforce
> > this ordering, and since the Atomic Request uses QN=1, the requirement for
> > "all previous RDMA Messages" is not sufficiently clear. Is this making a
> > requirement that previous RDMA Writes to the Atomic location be Placed
> > (there's no mention of this in section 7)? Or, is the Atomic required to
> > perform some sort of memory barrier against all other access?
> >
>
> [Authors] The requirement for starting an Atomic operation at the responder is
> exactly like RDMA Reads.  There is a sort of barrier to starting the Atomic
> Operation with relationship to placing data that is in front of it destined
> for host memory.  Additional text has been added.
>
> >
> > + Section 6. Immediate Data
> >
> > >   The Immediate Data operation is used in conjunction with an RDMA
> > >   Operation to improve ULP processing efficiency by allowing 8 bytes
> > >   of immediate data to be delivered with the completion of the
> > >   previous operation after the previous operation has been delivered
> > >   at the Remote Peer.
> >
> > I definitely do not understand this. The text appears to state that any
> > immediate data is added to some *prior* completion. How can an operation,
> > which has already been delivered, be modified by a later one?
> >
>
> [Authors] Good catch.  Text has been clarified.
>
> >
> > + Section 6.3 Immediate data handling
> >
> > >   .  The Remote Peer MUST check that Immediate Data and Immediate Data
> > >      with SE Messages include exactly 8 bytes of data from the DDP
> > >      layer.
> >
> > How does the receiver make this test? There is no length field in the
> > Immediate Data message, it's a fixed 8-byte buffer.
> >
>
> [Authors] DDP always delivers the length with the data.  The receiver will
> just check the DDP length.
>
> >
> > + Section 7 Ordering table
> >
> > >----------+------------+-------------+-------------+-------------------
> > >Atomic    | Atomic     | No Placement| No Placement| Second Atomic
> > >          |            | Guarantee   | Guarantee   | Response
> > >          |            | between two | between two | Message will
> > >          |            | Atomic      | Atomic      | not be
> > >          |            | Requests    | Responses   | generated
> > >          |            |             |             | until first
> > >          |            |             |             | Atomic Response
> > >          |            |             |             | has been
> > >          |            |             |             | generated
> > >----------+------------+-------------+-------------+-------------------
> >
> > This appears to be incorrect. Earlier, there are specific requirements that
> the
> > RMW operation be performed prior to replying to a previous Atomic, and in
> > fact all Atomic requests are serialized on a single QN (1). I'm not sure
> what
> > application semantic would be satisfied by this rather weak guarantee as
> > stated.
> >
>
> [Authors]  Good point.  Guarantee has been strengthened to include not
> starting the next message until the previous response message has been
> generated.
>
> >
> > + Section 11.1
> >
> > [RFC5042] (RDMA security model) is missing. Needed for discussion of
> > Atomics steering tag protection.
> >
> > [RFC5226] is not a normative reference. Move to 11.2, if desired, but more
> > likely, just delete.
> >
>
> [Authors] Addressed in revised draft.
>
> >
> > + Section 11.2
> >
> > Would like to see references to Infiniband specifications here, as relevant
> for
> > Atomics and other behavior.
> >
>
> [Authors] Added.  Thanks!
>
> > _______________________________________________
> > storm mailing list
> > storm@ietf.org
> > https://www.ietf.org/mailman/listinfo/storm
> _______________________________________________
> storm mailing list
> storm@ietf.org
> https://www.ietf.org/mailman/listinfo/storm