Re: [nfsv4] Fwd: New Version Notification for draft-dnoveck-nfsv4-rpcrdma-rtissues-01.txt

karen deitke <karen.deitke@oracle.com> Tue, 20 September 2016 23:27 UTC

Return-Path: <karen.deitke@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BDB68126D74 for <nfsv4@ietfa.amsl.com>; Tue, 20 Sep 2016 16:27:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.517
X-Spam-Level:
X-Spam-Status: No, score=-6.517 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-2.316, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nkMVl4biHC3X for <nfsv4@ietfa.amsl.com>; Tue, 20 Sep 2016 16:27:13 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3F2E312B011 for <nfsv4@ietf.org>; Tue, 20 Sep 2016 16:27:13 -0700 (PDT)
Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8KNRBYl016268 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 20 Sep 2016 23:27:11 GMT
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u8KNRBsg010351 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 20 Sep 2016 23:27:11 GMT
Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u8KNRAUx024031; Tue, 20 Sep 2016 23:27:11 GMT
Received: from [10.154.162.124] (/10.154.162.124) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 20 Sep 2016 16:27:10 -0700
To: David Noveck <davenoveck@gmail.com>
References: <147292013637.2343.7092433187165824743.idtracker@ietfa.amsl.com> <CADaq8jeBaLLKkoSVy8kaBA9k4_6a7PLtEDMyx4zjhDX6U6q6Ow@mail.gmail.com> <234e3071-2b0e-e5a1-f5d5-91919e9388b1@oracle.com> <CADaq8jeP=FJKZAh4GEsogccuCKsoH5=-h7=ymKO1FkRqc=944Q@mail.gmail.com> <763c7255-9cf3-eece-f7e7-8454a23126a5@oracle.com> <CADaq8jf7DHRptJKMVGacH03-uwGBuyg5pxaGs5V6kHe7oZyYGA@mail.gmail.com>
From: karen deitke <karen.deitke@oracle.com>
Organization: Oracle Corporation
Message-ID: <d40729a0-55b4-13d9-7176-7e3ac85b36e8@oracle.com>
Date: Tue, 20 Sep 2016 17:27:03 -0600
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <CADaq8jf7DHRptJKMVGacH03-uwGBuyg5pxaGs5V6kHe7oZyYGA@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------606F2DA40E93D174DF31896E"
X-Source-IP: userv0021.oracle.com [156.151.31.71]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/8X4FsxkVVtWzJNLM5bvEMytvN9A>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] Fwd: New Version Notification for draft-dnoveck-nfsv4-rpcrdma-rtissues-01.txt
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Sep 2016 23:27:16 -0000


On 9/17/2016 4:15 AM, David Noveck wrote:
> > where with the RDMA_WRITE we do not need to wait? (i.e.
> > sections 2.2 and 2.3)
>
> Actually, that is why there is not an internode round trip which 
> contributes to latency.
>
> I do distinguish in  this documentbetween round-trips which do and do 
> not contribute to latency.  For example, I mention certain acks which 
> re part of round trips that nobody has to wait for.
>
> I believe that there is no ack for an RDMA write but I've never 
> actually looked on the wire to see that.  I imagine that there might 
> be certain rare cases in which some sort of separate ack were sent.  
> For example, if you did multiple RDMA write in sequence without a SEND 
> or there were a long delay without the response being sent, 
> a separate ACK of the RDMA WRITE might be sent.
>
> However, in the common case, in which a SEND immediately follows, the 
> RDMA WRITE, there is no need for a separate ACK and I believe the ACK 
> of the SEND suffices to assure the responder RNIC, that both 
> previous operations have completed successfully.
I'm not seeing that this is how it works on solaris unless I'm 
misunderstanding something.  We can potentially issue multiple 
RDMA_WRITES for one NFS READ, but we do wait for the last RDMA_WRITE to 
complete, then go on to RDMA SEND the reply.  We need to wait for the 
write to confirm it completed successfully before sending the reply.  If 
the write fails for some reason we cant send success in the reply.
Karen
>
> On Thu, Sep 15, 2016 at 2:58 PM, karen deitke <karen.deitke@oracle.com 
> <mailto:karen.deitke@oracle.com>> wrote:
>
>     Thanks,
>
>
>     That clears things up.  Next question, are you saying that an
>     RDMA_WRITE is NOT an internode round trip and an RDMA_READ is an
>     internode round trip, because we have to wait for the data from
>     the RDMA_READ before proceeding, where with the RDMA_WRITE we do
>     not need to wait? (i.e. sections 2.2 and 2.3)
>
>     Karen
>
>
>     On 9/12/2016 2:24 PM, David Noveck wrote:
>>     > I'm confused by this summarization.
>>
>>     :-(.  Let's see what I can do to make this clearer.
>>
>>     > In the text above you indicate 3 different places where "internode
>>     round trip is involved, yet in the summary you only mention 2.
>>
>>     The point I was trying to make was that, although there were
>>     three round-trips, only two contribute to the request latency. 
>>     In some
>>     cases, there is a round trip because an ack is sent, but because
>>     neither the client nor the server is waiting for it.
>>
>>     > What is the definition of an "internode round trip?"
>>
>>     Any situation in which a message is sent in one direction and,
>>     after that, another message is sent in the opposite direction.
>>
>>     > Also its unclear to me what you mean my "in the context of a connected
>>     operation".
>>
>>     maybe I should have said, "Because this reliable connected
>>     operation in which messages are acked.
>>
>>     > Also you mention that there are two-responder-side interrupt latencies,
>>     are you referring to the notification of the RDMA_READ
>>     > and the send completion queue for sending the response?
>>
>>     I'm referring to the notification that the request has been
>>     received and and the notification that the RDMA_READ has comnpleted.
>>
>>     Does this interrupt latency come into play in the latency of the
>>     operation?
>>
>>     I think the two I mentioned do.
>>
>>     > Once the client side gets the response it can continue, even if the
>>     server thread is still waiting for notification of a successful
>>     send correct?
>>
>>     Yes.
>>
>>     > Also are you missing the interrupt latency of the send on the client? In
>>     addition to the interrupt latency of receiving the reply?
>>
>>     I don't think that contributes to latency.  The request
>>     processing can continue once the request is received on the
>>     server, even if the client
>>     has not received notification of the completion of the send.
>>
>>     On Mon, Sep 12, 2016 at 3:52 PM, karen deitke
>>     <karen.deitke@oracle.com <mailto:karen.deitke@oracle.com>> wrote:
>>
>>         Hi Dave,
>>
>>         I'm struggling following this below:
>>
>>            o  First, the memory to be accessed remotely is
>>         registered.  This is
>>               a local operation.
>>
>>            o  Once the registration has been done, the initial send
>>         of the
>>               request can proceed.  Since this is in the context of
>>         connected
>>               operation, there is an internode round trip involved. 
>>         However,
>>               the next step can proceed after the initial transmission is
>>               received by the responder.  As a result, only the
>>         responder-bound
>>               side of the transmission contributes to overall
>>         operation latency.
>>
>>            o  The responder, after being notified of the receipt of
>>         the request,
>>               uses RDMA READ to fetch the bulk data. This involves an
>>         internode
>>               round-trip latency.  After the fetch of the data, the
>>         responder
>>               needs to be notified of the completion of the explicit RDMA
>>               operation
>>
>>            o  The responder (after performing the requested
>>         operation) sends the
>>               response.  Again, as this is in the context of connected
>>               operation, there is an internode round trip involved. 
>>         However,
>>               the next step can proceed after the initial transmission is
>>               received by the requester.
>>
>>            o  The memory registered before the request was issued
>>         needs to be
>>               deregistered, before the request is considered complete
>>         and the
>>               sending process restarted.  When remote invalidation is not
>>               available, the requester, after being notified of the
>>         receipt of
>>               the response, performs a local operation to deregister
>>         the memory
>>               in question.  Alternatively, the responder will use
>>         Send With
>>               Invalidate and the responder's RNIC will effect the
>>         deregistration
>>               before notifying the requester of the response which
>>         has been
>>               received.
>>
>>            To summarize, if we exclude the actual server execution of the
>>            request, the latency consists of two internode round-trip
>>         latencies
>>            plus two-responder-side interrupt latencies plus one
>>         requester-side
>>            interrupt latency plus any necessary
>>         registration/de-registration
>>            overhead.  This is in contrast to a request not using
>>         explicit RDMA
>>            operations in which there is a single inter-node
>>         round-trip latency
>>            and one interrupt latency on the requester and the responder.
>>
>>         I'm confused by this summarization.  In the text above you
>>         indicate 3 different places where "internode round trip is
>>         involved, yet in the summary you only mention 2.  What is the
>>         definition of an "internode round trip?" Also its unclear to
>>         me what you mean my "in the context of a connected operation".
>>
>>         Also you mention that there are two-responder-side interupt
>>         latencies, are you referring to the notification of the
>>         RDMA_READ and the send completion queue for sending the
>>         response?  Does this interrupt latency come into play in the
>>         latency of the operation? Once the client side gets the
>>         response it can continue, even if the server thread is still
>>         waiting for notification of a successful send correct?
>>
>>         Also are you missing the interrupt latency of the send on the
>>         client? In addition to the interrupt latency of receiving the
>>         reply?
>>
>>         Karen
>>
>>
>>         _______________________________________________
>>         nfsv4 mailing list
>>         nfsv4@ietf.org <mailto:nfsv4@ietf.org>
>>         https://www.ietf.org/mailman/listinfo/nfsv4
>>         <https://www.ietf.org/mailman/listinfo/nfsv4>
>>
>>
>
>