Re: [storm] Review comments on draft-ietf-storm-rdmap-ext

"Sharp, Robert O" <robert.o.sharp@intel.com> Thu, 05 September 2013 21:22 UTC

Return-Path: <robert.o.sharp@intel.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 106E921E80CC for <storm@ietfa.amsl.com>; Thu, 5 Sep 2013 14:22:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jcglOHv4FdwB for <storm@ietfa.amsl.com>; Thu, 5 Sep 2013 14:22:39 -0700 (PDT)
Received: from mga03.intel.com (mga03.intel.com [143.182.124.21]) by ietfa.amsl.com (Postfix) with ESMTP id 6B2BF21E80DA for <storm@ietf.org>; Thu, 5 Sep 2013 14:22:38 -0700 (PDT)
Received: from azsmga002.ch.intel.com ([10.2.17.35]) by azsmga101.ch.intel.com with ESMTP; 05 Sep 2013 14:22:38 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.90,849,1371106800"; d="scan'208";a="291361558"
Received: from fmsmsx107.amr.corp.intel.com ([10.19.9.54]) by AZSMGA002.ch.intel.com with ESMTP; 05 Sep 2013 14:22:37 -0700
Received: from fmsmsx153.amr.corp.intel.com (10.19.17.7) by FMSMSX107.amr.corp.intel.com (10.19.9.54) with Microsoft SMTP Server (TLS) id 14.3.123.3; Thu, 5 Sep 2013 14:22:36 -0700
Received: from fmsmsx105.amr.corp.intel.com ([169.254.5.91]) by FMSMSX153.amr.corp.intel.com ([169.254.9.191]) with mapi id 14.03.0123.003; Thu, 5 Sep 2013 14:22:36 -0700
From: "Sharp, Robert O" <robert.o.sharp@intel.com>
To: Tom Talpey <ttalpey@microsoft.com>, "storm@ietf.org" <storm@ietf.org>
Thread-Topic: [storm] Review comments on draft-ietf-storm-rdmap-ext
Thread-Index: Ac5865W+3+PDQi76TQWRLH3sDqnoewtj12JQ
Date: Thu, 5 Sep 2013 21:22:35 +0000
Message-ID: <2ABFA3E36CBB794685BFBA191CC1964952B1D77D@FMSMSX105.amr.corp.intel.com>
References: <614F550557B82C44AC27C492ADA391AA148BF8B1@TK5EX14MBXC285.redmond.corp.microsoft.com>
In-Reply-To: <614F550557B82C44AC27C492ADA391AA148BF8B1@TK5EX14MBXC285.redmond.corp.microsoft.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.1.200.106]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [storm] Review comments on draft-ietf-storm-rdmap-ext
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Sep 2013 21:22:45 -0000

Hi Tom,

Thanks for the comments!  The author's responses are embedded below...

Thanks,
Bob

> -----Original Message-----
> From: storm-bounces@ietf.org [mailto:storm-bounces@ietf.org] On Behalf
> Of Tom Talpey
> Sent: Wednesday, July 10, 2013 11:10 AM
> To: storm@ietf.org
> Subject: [storm] Review comments on draft-ietf-storm-rdmap-ext
> 
> I've read and reviewed the latest RDMA Extensions draft (draft-ietf-storm-
> rdmap-ext-04) and have the following comments.
> 
> I have two high-level concerns. These extensions to the RDMAP protocol are
> made without any versioning or capability detection between peers at the
> RDMAP level. There is a statement deep in section 5 that "discovery" of
> Atomics needs to be performed by the upper layer or application, and no
> similar statement is made in section 6 regarding Immediate Data.
> 
> This seems problematic, for two reasons. First, it imposes an interoperability
> burden on upper layers to perform this negotiation. That's perhaps
> acceptable, but more discussion as to this requirement needs to be made,
> and in particular some discussion of why it is *not* being performed by the
> RDMAP protocol in this extension.

[Authors] Significant text has been added to describe application level discovery used today and how this new capability fits in to the current discovery.

> 
> Second, there is no discussion of why the chosen set of operations is
> necessary, or sufficient. The Introduction states that (unspecified)
> applications are able to use Atomics and Immediate Data on "other"
> (unspecified) RDMA transports, but there is no discussion of whether the
> operations are sufficient. And, when comparing the Atomics in this
> document to, for example, Infiniband, there is an additional operation
> ("Swap") which is not part of the IB specification. What other operations, or
> semantics, are possible? Are those left for future work?
> 
> In short, I would like to see a succinct Problem Statement, and a discussion of
> how the extension is to interoperate with existing implementations.
> Discussion of how the RDMAP protocol may be extended in the future would
> be good to have, too.

[Authors] The problem statement has been improved. References to middleware and applications have  been added.

> 
> 
> Specific issues on sections follow.
> 
> + Section 1, paragraph 1
> 
> >   The RDMA Protocol [RFC5040] provides capabilities for zero copy and
> >   kernel bypass data communications.  This document specifies the
> 
> There is much more to RDMAP than zero copy, I think. And the concept of
> "kernel bypass" is mentioned just once in RFC5040, and in the abstract; it
> seems irrelevant here and should be omitted. Paragraph 3 goes on to say:

[Authors] Agreed. This paragraph has been reworked.

> 
> >   Other RDMA transport protocols define the functionality added by
> >   these extensions leading to differences in RDMA applications and/or
> 
> which makes no specific reference for either the protocols or the
> applications. In other words, what is the detailed problem statement that the
> extensions are proposed to solve?
> 
> It would be preferable to see specific reasons motivating these extensions,
> to justify them individually.
> 

[Authors] Motivations have been added. 

> 
> + Editorial in section 1, is "following" the right word? Since the Immediate
> Data is carried by a new message, it seems more like a tinygram-style
> message in itself.
> 
> >   o  Immediate Data messages allow the ULP at the sender to provide a
> >      small amount of data following an RDMA Message.
> 


[Authors] Description of the intended usage with RDMA Write and also standalone usage has been reworked. 

> 
> + Section 3 "Atomic Operation"
> 
> >   Atomic Operation - is an operation that results in an execution of a
> >   64-bit operation at a specific address on a remote node. The
> 
> The "64-bit" qualifier is not necessary for the glossary and can be removed
> here. Also, isn't it a specific "registered" address?

[Authors] Switched to ULP Buffer address terminology to be consistent with RFC 5041/5040.

> 
> >   consumer can use Atomic Operations to read, modify and write at the
> >   destination address while at the same time guarantee that no other
> >   read or write operation will occur across any other RDMAP Streams on
> >   an RNIC at the Data Sink.
> 
> Generalize the last sentence to "The consumer can use Atomic Operations to
> perform operations on the contents of the destination address with certain
> atomicity guarantees." I have other comments on the "across any other
> RDMAP Streams" later.

[Authors] Definition has been reworked.

> 
> Even if you disagree with the prior sentence, the term "Data Sink" should be
> deleted. Atomics source and sink data at the same peer. In fact, the glossary
> goes on to define "Requester" and "Responder" just a few lines later.
> 
> 
> + Section 4 first paragraph
> 
> >   The control information of RDMA Messages is included in DDP protocol
> >   [RFC5041] defined header fields, with the following new formats:
> 
> This would appear to propose an extension to RFC5041, as written. This
> should be stated, or made more clear. Any requirements made of other
> protocols are critical to capture.
> 

[Authors] Text has been changes to reflect that this is only extending RFC 5040. 

> 
> + Section 4.1, first paragraph
> 
> >   The RDMA Messages defined by this specification use all 8 bits of
> >   the RDMAP Control Field. The first octet reserved for ULP use in the
> >   DDP Protocol MUST be used by the RDMAP to carry the RDMAP Control
> 
> This statement appears to have been imported from RFC5040, and therefore
> is not part of the extension. This should be reworded to make it clear what is
> being extended and changed.
> 

[Authors] Extension of RFC 5040 has been clarified. 

> 
> + Section 4.1, second paragraph
> 
> >    0                   1                   2                   3
> >    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> >                                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >                                   |T|L| Resrv | DV| RV|Rsv| Opcode|
> >   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >   |                     Invalidate STag                           |
> >   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >
> >               Figure 1 DDP Control and RDMAP Control Fields
> 
> There are no differences in these fields from RFC5040. What is being
> extended in them? (Just the Opcodes, right?)
> 

[Authors] Correct.

> 
> + Section 4.1, table
> 
> > -------+-----------+-------------------------------------------------
> > 1011b  | Atomic    | 0     | N/A  | 3     | N/A       | Yes
> >        | Response  |       |      |       |           |
> > -------+-----------+-------------------------------------------------
> 
> The newly-defined use of RDMAP queue #3 is quite surprising. What's the
> purpose of the assignment? What semantics are being enforced? The
> document is silent on them, and as noted in later comments, possibly
> inconsistent.
> 

[Authors]  Good catch.  QN=3 was added late and was not properly covered.  Additional text has been added in the revised draft. 

> 
> + Section 5 third paragraph
> 
> >   Atomic operations as specified in this document execute a 64-bit
> >   operation at a specified destination address on a remote node. The
> 
> It's much more than "specified", it's a _registered_ address, represented by
> an STag and thereby under the control of the RFC5042 security model. It is
> critical to state this, and make the reference.

[Authors] Agreed.  The term ULP Buffer address should complete the connection back to RFC 5042 indirectly through RFC 5040.

> 
> Also, the address has other possible restrictions, with respect to alignment.
> The protocol should state whether these are required by the protocol itself,
> or if they are subject to local constraints imposed by the registration. If they
> are protocol-imposed, then there will be implications for the "offset" fields
> with regard to byte granularity, and how and where they will be checked.
> Either way, discussion here is required.

[Authors] We added a note that the errors covered by the wire protocol over the alignment issue and also revised other text regarding the 64-bit alignment topic.  Clearly (as you are pointing out) the wire protocol cannot dictate if a given implementation can handle unaligned accesses atomically.  We are documenting the alignment issue and allowing it to be reported properly.

> 
> >   operations atomically read, modify and write back the contents of
> 
> Are read, modify and write the only actions possible in an atomic? It's not
> clear how the proposed "swap" operations apply here. The R/M/W meaning
> should be made clear.

[Authors] We believe that this text is sufficient.  We consider swap to be a modification.

> 
> >   the destination address and guarantee that Atomic Operations on this
> >   address by other RDMAP Streams on the same RNIC do not occur
> between
> >   the read and the write.
> 
> This seems to be significantly less of an atomicity guarantee than expected.
> Normally, atomics protect all concurrent modifications to the target location,
> including those from local access via the CPU at the responder, and other
> RNICs. Is the scope intentionally narrowed to a single RNIC here? Whatever
> the guarantee, some statement of RNIC requirements to achieve this is
> needed, because the proposed protocol itself does not address it.

[Authors] Yes, the scope was intentionally narrowed.  The largest scope the authors believe that we can address in a wire protocol document is within a single RNIC.

> 
> >   document MAY be implemented. The discovery of whether the Atomic
> >   Operations are implemented or not is outside the scope of this
> >   specification and it should be handled by the ULPs or applications.
> 
> This is the first mention in the document of this requirement, and it needs
> quite a bit more discussion on exactly how an upper layer is to proceed. Why
> is the "should" not normative here, and is it a MUST? See earlier comment.
>

[Authors] The discovery topic has been reworked and should clear up this issue.

 
> 
> + Section 5.2 first paragraph
> 
> >   The Atomic Operation Request and Response are RDMA Messages.  An
> >   Atomic Operation makes use of the DDP Untagged Buffer Model. Atomic
> >   Operations use the same Queue Number as RDMA Read Requests
> (QN=1).
> 
> Earlier, atomic *requests* are specified to use QN=1, but atomic
> *responses* used QN=3. Which is correct? Note, section 5.2.2 bullet #3 also
> makes this statement.
> 

[Authors] Good catch.  This has been corrected.

> 
> + Section 5.2.1 Atomic operation values
> 
> > ---------+-----------+----------+----------+---------+---------
> >   0011b    |           |
> >   to       | Reserved  |            Not Specified
> >   1111b    |           |
> >   ---------+-----------+-----------------------------------------
> 
> ... and
> 
> >   4. At the Responder, when an invalid Atomic Operation Request
> >      Message is delivered to the Remote Peer's RDMAP layer, an error
> >      is surfaced.
> 
> Are the "Reserved" fields invalid?
> 

[Authors] Yes, the reserved values are invalid.  Clarifying text has been added.

> 
> + Section 5.2.2 points #5 and #7
> 
> >   5. The RDMAP layer MUST Deliver the Atomic Operation Response
> >      Message to the ULP.
> 
> >   7. The Responder RDMAP layer MUST pass Atomic Operation Response
> >      Messages to the DDP layer, in the order that the Atomic Operation
> >      Request Messages were received by the RDMAP layer, at the
> >      Responder.
> 
> These steps appear out of order. I suggest switching them, so the sending of
> the response precedes its processing by the receiver.
> 

[Authors] Reordered.

> 
> + Section 5.3. Atomicity Guarantees
> 
> >   Atomicity of the Read-Modify-Write (RMW) on the Responder's node by
> >   the Atomic Operation MUST be assured in the presence of concurrent
> >   atomic accesses by other RDMAP Streams on the same RNIC.
> 
> Similar comment as earlier - are these the only types of concurrent access
> guarantees?
> 

[Authors] Yes, as described above. 

> 
> + Section 5.4 Atomic completion rules
> 
> >   1. For an Atomic operation, the contents of the Tagged Buffer at the
> >      Responder MAY be indeterminate until the Atomic Operation
> >      Response Message has been Delivered at the Requester.
> 
> I don't understand this statement. The protocol defines no means for the
> Requester to acknowledge the Delivery of the Atomic Response, so how can
> this in any way influence the behavior at the Responder?

[Authors] Good comment.  This has been reworked.

> 
> >   4. Atomic Operation Response Message processing at the Responder
> >      MUST be started only after the Atomic Operation Request Message
> >      has been Delivered by the DDP layer (thus, all previous RDMA
> >      Messages have been properly submitted for ordered Placement).
> 
> Again, I don't understand. The DDP protocol uses Queue Numbers to enforce
> this ordering, and since the Atomic Request uses QN=1, the requirement for
> "all previous RDMA Messages" is not sufficiently clear. Is this making a
> requirement that previous RDMA Writes to the Atomic location be Placed
> (there's no mention of this in section 7)? Or, is the Atomic required to
> perform some sort of memory barrier against all other access?
> 

[Authors] The requirement for starting an Atomic operation at the responder is exactly like RDMA Reads.  There is a sort of barrier to starting the Atomic Operation with relationship to placing data that is in front of it destined for host memory.  Additional text has been added.

> 
> + Section 6. Immediate Data
> 
> >   The Immediate Data operation is used in conjunction with an RDMA
> >   Operation to improve ULP processing efficiency by allowing 8 bytes
> >   of immediate data to be delivered with the completion of the
> >   previous operation after the previous operation has been delivered
> >   at the Remote Peer.
> 
> I definitely do not understand this. The text appears to state that any
> immediate data is added to some *prior* completion. How can an operation,
> which has already been delivered, be modified by a later one?
> 

[Authors] Good catch.  Text has been clarified. 

> 
> + Section 6.3 Immediate data handling
> 
> >   .  The Remote Peer MUST check that Immediate Data and Immediate Data
> >      with SE Messages include exactly 8 bytes of data from the DDP
> >      layer.
> 
> How does the receiver make this test? There is no length field in the
> Immediate Data message, it's a fixed 8-byte buffer.
> 

[Authors] DDP always delivers the length with the data.  The receiver will just check the DDP length. 

> 
> + Section 7 Ordering table
> 
> >----------+------------+-------------+-------------+-------------------
> >Atomic    | Atomic     | No Placement| No Placement| Second Atomic
> >          |            | Guarantee   | Guarantee   | Response
> >          |            | between two | between two | Message will
> >          |            | Atomic      | Atomic      | not be
> >          |            | Requests    | Responses   | generated
> >          |            |             |             | until first
> >          |            |             |             | Atomic Response
> >          |            |             |             | has been
> >          |            |             |             | generated
> >----------+------------+-------------+-------------+-------------------
> 
> This appears to be incorrect. Earlier, there are specific requirements that the
> RMW operation be performed prior to replying to a previous Atomic, and in
> fact all Atomic requests are serialized on a single QN (1). I'm not sure what
> application semantic would be satisfied by this rather weak guarantee as
> stated.
> 

[Authors]  Good point.  Guarantee has been strengthened to include not starting the next message until the previous response message has been generated. 

> 
> + Section 11.1
> 
> [RFC5042] (RDMA security model) is missing. Needed for discussion of
> Atomics steering tag protection.
> 
> [RFC5226] is not a normative reference. Move to 11.2, if desired, but more
> likely, just delete.
> 

[Authors] Addressed in revised draft. 

> 
> + Section 11.2
> 
> Would like to see references to Infiniband specifications here, as relevant for
> Atomics and other behavior.
> 

[Authors] Added.  Thanks!

> _______________________________________________
> storm mailing list
> storm@ietf.org
> https://www.ietf.org/mailman/listinfo/storm