Re: [storm] Review comments on draft-ietf-storm-rdmap-ext
"Sharp, Robert O" <robert.o.sharp@intel.com> Thu, 05 September 2013 21:22 UTC
Return-Path: <robert.o.sharp@intel.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 106E921E80CC for <storm@ietfa.amsl.com>; Thu, 5 Sep 2013 14:22:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jcglOHv4FdwB for <storm@ietfa.amsl.com>; Thu, 5 Sep 2013 14:22:39 -0700 (PDT)
Received: from mga03.intel.com (mga03.intel.com [143.182.124.21]) by ietfa.amsl.com (Postfix) with ESMTP id 6B2BF21E80DA for <storm@ietf.org>; Thu, 5 Sep 2013 14:22:38 -0700 (PDT)
Received: from azsmga002.ch.intel.com ([10.2.17.35]) by azsmga101.ch.intel.com with ESMTP; 05 Sep 2013 14:22:38 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.90,849,1371106800"; d="scan'208";a="291361558"
Received: from fmsmsx107.amr.corp.intel.com ([10.19.9.54]) by AZSMGA002.ch.intel.com with ESMTP; 05 Sep 2013 14:22:37 -0700
Received: from fmsmsx153.amr.corp.intel.com (10.19.17.7) by FMSMSX107.amr.corp.intel.com (10.19.9.54) with Microsoft SMTP Server (TLS) id 14.3.123.3; Thu, 5 Sep 2013 14:22:36 -0700
Received: from fmsmsx105.amr.corp.intel.com ([169.254.5.91]) by FMSMSX153.amr.corp.intel.com ([169.254.9.191]) with mapi id 14.03.0123.003; Thu, 5 Sep 2013 14:22:36 -0700
From: "Sharp, Robert O" <robert.o.sharp@intel.com>
To: Tom Talpey <ttalpey@microsoft.com>, "storm@ietf.org" <storm@ietf.org>
Thread-Topic: [storm] Review comments on draft-ietf-storm-rdmap-ext
Thread-Index: Ac5865W+3+PDQi76TQWRLH3sDqnoewtj12JQ
Date: Thu, 05 Sep 2013 21:22:35 +0000
Message-ID: <2ABFA3E36CBB794685BFBA191CC1964952B1D77D@FMSMSX105.amr.corp.intel.com>
References: <614F550557B82C44AC27C492ADA391AA148BF8B1@TK5EX14MBXC285.redmond.corp.microsoft.com>
In-Reply-To: <614F550557B82C44AC27C492ADA391AA148BF8B1@TK5EX14MBXC285.redmond.corp.microsoft.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.1.200.106]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [storm] Review comments on draft-ietf-storm-rdmap-ext
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Sep 2013 21:22:45 -0000
Hi Tom, Thanks for the comments! The author's responses are embedded below... Thanks, Bob > -----Original Message----- > From: storm-bounces@ietf.org [mailto:storm-bounces@ietf.org] On Behalf > Of Tom Talpey > Sent: Wednesday, July 10, 2013 11:10 AM > To: storm@ietf.org > Subject: [storm] Review comments on draft-ietf-storm-rdmap-ext > > I've read and reviewed the latest RDMA Extensions draft (draft-ietf-storm- > rdmap-ext-04) and have the following comments. > > I have two high-level concerns. These extensions to the RDMAP protocol are > made without any versioning or capability detection between peers at the > RDMAP level. There is a statement deep in section 5 that "discovery" of > Atomics needs to be performed by the upper layer or application, and no > similar statement is made in section 6 regarding Immediate Data. > > This seems problematic, for two reasons. First, it imposes an interoperability > burden on upper layers to perform this negotiation. That's perhaps > acceptable, but more discussion as to this requirement needs to be made, > and in particular some discussion of why it is *not* being performed by the > RDMAP protocol in this extension. [Authors] Significant text has been added to describe application level discovery used today and how this new capability fits in to the current discovery. > > Second, there is no discussion of why the chosen set of operations is > necessary, or sufficient. The Introduction states that (unspecified) > applications are able to use Atomics and Immediate Data on "other" > (unspecified) RDMA transports, but there is no discussion of whether the > operations are sufficient. And, when comparing the Atomics in this > document to, for example, Infiniband, there is an additional operation > ("Swap") which is not part of the IB specification. What other operations, or > semantics, are possible? Are those left for future work? > > In short, I would like to see a succinct Problem Statement, and a discussion of > how the extension is to interoperate with existing implementations. > Discussion of how the RDMAP protocol may be extended in the future would > be good to have, too. [Authors] The problem statement has been improved. References to middleware and applications have been added. > > > Specific issues on sections follow. > > + Section 1, paragraph 1 > > > The RDMA Protocol [RFC5040] provides capabilities for zero copy and > > kernel bypass data communications. This document specifies the > > There is much more to RDMAP than zero copy, I think. And the concept of > "kernel bypass" is mentioned just once in RFC5040, and in the abstract; it > seems irrelevant here and should be omitted. Paragraph 3 goes on to say: [Authors] Agreed. This paragraph has been reworked. > > > Other RDMA transport protocols define the functionality added by > > these extensions leading to differences in RDMA applications and/or > > which makes no specific reference for either the protocols or the > applications. In other words, what is the detailed problem statement that the > extensions are proposed to solve? > > It would be preferable to see specific reasons motivating these extensions, > to justify them individually. > [Authors] Motivations have been added. > > + Editorial in section 1, is "following" the right word? Since the Immediate > Data is carried by a new message, it seems more like a tinygram-style > message in itself. > > > o Immediate Data messages allow the ULP at the sender to provide a > > small amount of data following an RDMA Message. > [Authors] Description of the intended usage with RDMA Write and also standalone usage has been reworked. > > + Section 3 "Atomic Operation" > > > Atomic Operation - is an operation that results in an execution of a > > 64-bit operation at a specific address on a remote node. The > > The "64-bit" qualifier is not necessary for the glossary and can be removed > here. Also, isn't it a specific "registered" address? [Authors] Switched to ULP Buffer address terminology to be consistent with RFC 5041/5040. > > > consumer can use Atomic Operations to read, modify and write at the > > destination address while at the same time guarantee that no other > > read or write operation will occur across any other RDMAP Streams on > > an RNIC at the Data Sink. > > Generalize the last sentence to "The consumer can use Atomic Operations to > perform operations on the contents of the destination address with certain > atomicity guarantees." I have other comments on the "across any other > RDMAP Streams" later. [Authors] Definition has been reworked. > > Even if you disagree with the prior sentence, the term "Data Sink" should be > deleted. Atomics source and sink data at the same peer. In fact, the glossary > goes on to define "Requester" and "Responder" just a few lines later. > > > + Section 4 first paragraph > > > The control information of RDMA Messages is included in DDP protocol > > [RFC5041] defined header fields, with the following new formats: > > This would appear to propose an extension to RFC5041, as written. This > should be stated, or made more clear. Any requirements made of other > protocols are critical to capture. > [Authors] Text has been changes to reflect that this is only extending RFC 5040. > > + Section 4.1, first paragraph > > > The RDMA Messages defined by this specification use all 8 bits of > > the RDMAP Control Field. The first octet reserved for ULP use in the > > DDP Protocol MUST be used by the RDMAP to carry the RDMAP Control > > This statement appears to have been imported from RFC5040, and therefore > is not part of the extension. This should be reworded to make it clear what is > being extended and changed. > [Authors] Extension of RFC 5040 has been clarified. > > + Section 4.1, second paragraph > > > 0 1 2 3 > > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > |T|L| Resrv | DV| RV|Rsv| Opcode| > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > | Invalidate STag | > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > > > Figure 1 DDP Control and RDMAP Control Fields > > There are no differences in these fields from RFC5040. What is being > extended in them? (Just the Opcodes, right?) > [Authors] Correct. > > + Section 4.1, table > > > -------+-----------+------------------------------------------------- > > 1011b | Atomic | 0 | N/A | 3 | N/A | Yes > > | Response | | | | | > > -------+-----------+------------------------------------------------- > > The newly-defined use of RDMAP queue #3 is quite surprising. What's the > purpose of the assignment? What semantics are being enforced? The > document is silent on them, and as noted in later comments, possibly > inconsistent. > [Authors] Good catch. QN=3 was added late and was not properly covered. Additional text has been added in the revised draft. > > + Section 5 third paragraph > > > Atomic operations as specified in this document execute a 64-bit > > operation at a specified destination address on a remote node. The > > It's much more than "specified", it's a _registered_ address, represented by > an STag and thereby under the control of the RFC5042 security model. It is > critical to state this, and make the reference. [Authors] Agreed. The term ULP Buffer address should complete the connection back to RFC 5042 indirectly through RFC 5040. > > Also, the address has other possible restrictions, with respect to alignment. > The protocol should state whether these are required by the protocol itself, > or if they are subject to local constraints imposed by the registration. If they > are protocol-imposed, then there will be implications for the "offset" fields > with regard to byte granularity, and how and where they will be checked. > Either way, discussion here is required. [Authors] We added a note that the errors covered by the wire protocol over the alignment issue and also revised other text regarding the 64-bit alignment topic. Clearly (as you are pointing out) the wire protocol cannot dictate if a given implementation can handle unaligned accesses atomically. We are documenting the alignment issue and allowing it to be reported properly. > > > operations atomically read, modify and write back the contents of > > Are read, modify and write the only actions possible in an atomic? It's not > clear how the proposed "swap" operations apply here. The R/M/W meaning > should be made clear. [Authors] We believe that this text is sufficient. We consider swap to be a modification. > > > the destination address and guarantee that Atomic Operations on this > > address by other RDMAP Streams on the same RNIC do not occur > between > > the read and the write. > > This seems to be significantly less of an atomicity guarantee than expected. > Normally, atomics protect all concurrent modifications to the target location, > including those from local access via the CPU at the responder, and other > RNICs. Is the scope intentionally narrowed to a single RNIC here? Whatever > the guarantee, some statement of RNIC requirements to achieve this is > needed, because the proposed protocol itself does not address it. [Authors] Yes, the scope was intentionally narrowed. The largest scope the authors believe that we can address in a wire protocol document is within a single RNIC. > > > document MAY be implemented. The discovery of whether the Atomic > > Operations are implemented or not is outside the scope of this > > specification and it should be handled by the ULPs or applications. > > This is the first mention in the document of this requirement, and it needs > quite a bit more discussion on exactly how an upper layer is to proceed. Why > is the "should" not normative here, and is it a MUST? See earlier comment. > [Authors] The discovery topic has been reworked and should clear up this issue. > > + Section 5.2 first paragraph > > > The Atomic Operation Request and Response are RDMA Messages. An > > Atomic Operation makes use of the DDP Untagged Buffer Model. Atomic > > Operations use the same Queue Number as RDMA Read Requests > (QN=1). > > Earlier, atomic *requests* are specified to use QN=1, but atomic > *responses* used QN=3. Which is correct? Note, section 5.2.2 bullet #3 also > makes this statement. > [Authors] Good catch. This has been corrected. > > + Section 5.2.1 Atomic operation values > > > ---------+-----------+----------+----------+---------+--------- > > 0011b | | > > to | Reserved | Not Specified > > 1111b | | > > ---------+-----------+----------------------------------------- > > ... and > > > 4. At the Responder, when an invalid Atomic Operation Request > > Message is delivered to the Remote Peer's RDMAP layer, an error > > is surfaced. > > Are the "Reserved" fields invalid? > [Authors] Yes, the reserved values are invalid. Clarifying text has been added. > > + Section 5.2.2 points #5 and #7 > > > 5. The RDMAP layer MUST Deliver the Atomic Operation Response > > Message to the ULP. > > > 7. The Responder RDMAP layer MUST pass Atomic Operation Response > > Messages to the DDP layer, in the order that the Atomic Operation > > Request Messages were received by the RDMAP layer, at the > > Responder. > > These steps appear out of order. I suggest switching them, so the sending of > the response precedes its processing by the receiver. > [Authors] Reordered. > > + Section 5.3. Atomicity Guarantees > > > Atomicity of the Read-Modify-Write (RMW) on the Responder's node by > > the Atomic Operation MUST be assured in the presence of concurrent > > atomic accesses by other RDMAP Streams on the same RNIC. > > Similar comment as earlier - are these the only types of concurrent access > guarantees? > [Authors] Yes, as described above. > > + Section 5.4 Atomic completion rules > > > 1. For an Atomic operation, the contents of the Tagged Buffer at the > > Responder MAY be indeterminate until the Atomic Operation > > Response Message has been Delivered at the Requester. > > I don't understand this statement. The protocol defines no means for the > Requester to acknowledge the Delivery of the Atomic Response, so how can > this in any way influence the behavior at the Responder? [Authors] Good comment. This has been reworked. > > > 4. Atomic Operation Response Message processing at the Responder > > MUST be started only after the Atomic Operation Request Message > > has been Delivered by the DDP layer (thus, all previous RDMA > > Messages have been properly submitted for ordered Placement). > > Again, I don't understand. The DDP protocol uses Queue Numbers to enforce > this ordering, and since the Atomic Request uses QN=1, the requirement for > "all previous RDMA Messages" is not sufficiently clear. Is this making a > requirement that previous RDMA Writes to the Atomic location be Placed > (there's no mention of this in section 7)? Or, is the Atomic required to > perform some sort of memory barrier against all other access? > [Authors] The requirement for starting an Atomic operation at the responder is exactly like RDMA Reads. There is a sort of barrier to starting the Atomic Operation with relationship to placing data that is in front of it destined for host memory. Additional text has been added. > > + Section 6. Immediate Data > > > The Immediate Data operation is used in conjunction with an RDMA > > Operation to improve ULP processing efficiency by allowing 8 bytes > > of immediate data to be delivered with the completion of the > > previous operation after the previous operation has been delivered > > at the Remote Peer. > > I definitely do not understand this. The text appears to state that any > immediate data is added to some *prior* completion. How can an operation, > which has already been delivered, be modified by a later one? > [Authors] Good catch. Text has been clarified. > > + Section 6.3 Immediate data handling > > > . The Remote Peer MUST check that Immediate Data and Immediate Data > > with SE Messages include exactly 8 bytes of data from the DDP > > layer. > > How does the receiver make this test? There is no length field in the > Immediate Data message, it's a fixed 8-byte buffer. > [Authors] DDP always delivers the length with the data. The receiver will just check the DDP length. > > + Section 7 Ordering table > > >----------+------------+-------------+-------------+------------------- > >Atomic | Atomic | No Placement| No Placement| Second Atomic > > | | Guarantee | Guarantee | Response > > | | between two | between two | Message will > > | | Atomic | Atomic | not be > > | | Requests | Responses | generated > > | | | | until first > > | | | | Atomic Response > > | | | | has been > > | | | | generated > >----------+------------+-------------+-------------+------------------- > > This appears to be incorrect. Earlier, there are specific requirements that the > RMW operation be performed prior to replying to a previous Atomic, and in > fact all Atomic requests are serialized on a single QN (1). I'm not sure what > application semantic would be satisfied by this rather weak guarantee as > stated. > [Authors] Good point. Guarantee has been strengthened to include not starting the next message until the previous response message has been generated. > > + Section 11.1 > > [RFC5042] (RDMA security model) is missing. Needed for discussion of > Atomics steering tag protection. > > [RFC5226] is not a normative reference. Move to 11.2, if desired, but more > likely, just delete. > [Authors] Addressed in revised draft. > > + Section 11.2 > > Would like to see references to Infiniband specifications here, as relevant for > Atomics and other behavior. > [Authors] Added. Thanks! > _______________________________________________ > storm mailing list > storm@ietf.org > https://www.ietf.org/mailman/listinfo/storm
- [storm] Review comments on draft-ietf-storm-rdmap… Tom Talpey
- Re: [storm] Review comments on draft-ietf-storm-r… Sharp, Robert O
- Re: [storm] Review comments on draft-ietf-storm-r… Black, David
- Re: [storm] Review comments on draft-ietf-storm-r… Sharp, Robert O