Re: [nfsv4] Fwd: New Version Notification for draft-dnoveck-nfsv4-rpcrdma-rtissues-01.txt

David Noveck <davenoveck@gmail.com> Wed, 21 September 2016 01:27 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0B2E012B4ED for <nfsv4@ietfa.amsl.com>; Tue, 20 Sep 2016 18:27:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1-AZQWvu0Rl2 for <nfsv4@ietfa.amsl.com>; Tue, 20 Sep 2016 18:27:52 -0700 (PDT)
Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com [IPv6:2607:f8b0:4003:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E93DF12B4D9 for <nfsv4@ietf.org>; Tue, 20 Sep 2016 18:27:51 -0700 (PDT)
Received: by mail-oi0-x232.google.com with SMTP id w11so43477967oia.2 for <nfsv4@ietf.org>; Tue, 20 Sep 2016 18:27:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=qv/ds4IC+owWdEdwFh78RGkoEJT1YUVYJZ9Rng2cyiM=; b=HcEY2X4Zqmm/39hN1bvMC2TgY7paQSVuFEQVyIxt0quIfBsSxSorrcANvgzH6HEMKI xKN25mfbd3jL4BhJMLJ98fNcMZ9LUbz0MumEIStqL/qJmX0CB6fj86/0zL024XGkGv7f T0PytXSxcZ9uZYoVDqVx+aEc7qBgXY7hFZQBpeq6bw3m0dVN/N1dO/vVDzpNEBEQ5DMN kB7GZbwyLAixeaIIzIT4NQF+6A7lk06RNSLFAkRZqox9OPB5MDu5Jf/kOXPLY1CwqehL 0H+3j2nOA1QGP5MBVuGaqhL9Qbt5mR+C2lEykdSObBArY1FK+ifS7r5Uulj7Gyq0XvnO BNXQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=qv/ds4IC+owWdEdwFh78RGkoEJT1YUVYJZ9Rng2cyiM=; b=C39p4ECNPBAtS5VIibgldnPbK64wS+sgm+EgHm5bOEHsZUB8emkQAgqOIx8WoDziEC +9d4qmhYIVHNcoEKBOaCWqjW35ULjw0Ayy+UC7raCZo9YBQDhgOL9ELSqKKJFh74072Q EIfPHy/ETUsx026SFGP04YLLm85jpks/kG2m9/IisDRadvLBErk+mv+ggymzJGGWCSi4 qJqMzNZRcHlbtLkIJ7R6r7QMr20fx8ZJ2Q5xhemhFaYe/znXHpvG0aSh3RmAjyGC1gX+ IqGtnisDZIzVXyNfKUw+kp/GTSGLGNFQNI2hT5on68VZF9x12DC9sgOc7I70mc8Cwhe2 AWuw==
X-Gm-Message-State: AE9vXwP2YDLrsiOeU5uYNBoH2AmNwOc95wC1SqfGEHLIs32YwJqr7BWgQJXS0cwFVF8HYufzxj/Q/ZGVmHTSmQ==
X-Received: by 10.202.231.197 with SMTP id e188mr41680385oih.68.1474421271210; Tue, 20 Sep 2016 18:27:51 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.192.10 with HTTP; Tue, 20 Sep 2016 18:27:50 -0700 (PDT)
In-Reply-To: <d40729a0-55b4-13d9-7176-7e3ac85b36e8@oracle.com>
References: <147292013637.2343.7092433187165824743.idtracker@ietfa.amsl.com> <CADaq8jeBaLLKkoSVy8kaBA9k4_6a7PLtEDMyx4zjhDX6U6q6Ow@mail.gmail.com> <234e3071-2b0e-e5a1-f5d5-91919e9388b1@oracle.com> <CADaq8jeP=FJKZAh4GEsogccuCKsoH5=-h7=ymKO1FkRqc=944Q@mail.gmail.com> <763c7255-9cf3-eece-f7e7-8454a23126a5@oracle.com> <CADaq8jf7DHRptJKMVGacH03-uwGBuyg5pxaGs5V6kHe7oZyYGA@mail.gmail.com> <d40729a0-55b4-13d9-7176-7e3ac85b36e8@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Tue, 20 Sep 2016 21:27:50 -0400
Message-ID: <CADaq8jcojcqZOqB-rnM888YxfOJ1h=fW4KeRqFGjyVBbwVEQ0w@mail.gmail.com>
To: karen deitke <karen.deitke@oracle.com>
Content-Type: multipart/alternative; boundary="001a1141b44a1eb1eb053cfa7482"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/G-hKobARwUHu4myR20mQi6LX1S8>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] Fwd: New Version Notification for draft-dnoveck-nfsv4-rpcrdma-rtissues-01.txt
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Sep 2016 01:27:54 -0000

> I'm not seeing that this is how it works on solaris unless I'm
> misunderstanding something.

Let's just go forward based on your presentation of what Solaris.

I think if you that are misunderstanding something, it is how RDMA WRITE
works on the wire.  I think you are assuming it works the way that I
thought it worked when I wrote rtissues-00.

Based on some comments I received, I switched to the different approach in
rtissues-01, which I now believe to be correct.  I want to make sure my
understaning is correct.  Of course rtissues-00 and rtissues-01 can bothe
be wrong but it is impossible for the both to be correct.

>We can potentially issue multiple RDMA_WRITES for one NFS
> READ, but we do wait for the last RDMA_WRITE to complete,

The question is if you get the completion indication when do you get it?  I
(as of rtissues-01) believe that it means send of the RDMA data has
finished and it does't mean the data has actually been stored on
the destitution.  For that to happen there would have to be an additional
internode round trip which rtissues-01 says doesn't exist.  rtissues-00,
supposed that it did exist but my current understanding is that that is
wrong.

> then go on to RDMA SEND the reply.

OK.

>We need to wait for the write to confirm it completed successfully >
before sending the reply.

You don't from a sequencing point of view.  Given ordered delivery, the
reply cannot get there first.

> If the write fails for some reason we cant send success in the
> reply.

My understanding is that if the RDMA_WRITE fails, the send will not succeed
and that you could queue the request to do the RDMA_WRITE and the send at
the same time.  To me, that seems counterintuitive, but I think that is the
way RC works

On Tue, Sep 20, 2016 at 7:27 PM, karen deitke <karen.deitke@oracle.com>
wrote:

>
>
> On 9/17/2016 4:15 AM, David Noveck wrote:
>
> > where with the RDMA_WRITE we do not need to wait? (i.e.
> > sections 2.2 and 2.3)
>
> Actually, that is why there is not an internode round trip which
> contributes to latency.
>
> I do distinguish in  this documentbetween round-trips which do and do not
> contribute to latency.  For example, I mention certain acks which re part
> of round trips that nobody has to wait for.
>
> I believe that there is no ack for an RDMA write but I've never actually
> looked on the wire to see that.  I imagine that there might be certain rare
> cases in which some sort of separate ack were sent.  For example, if you
> did multiple RDMA write in sequence without a SEND or there were a long
> delay without the response being sent, a separate ACK of the RDMA WRITE
> might be sent.
>
> However, in the common case, in which a SEND immediately follows, the RDMA
> WRITE, there is no need for a separate ACK and I believe the ACK of the
> SEND suffices to assure the responder RNIC, that both
> previous operations have completed successfully.
>
> I'm not seeing that this is how it works on solaris unless I'm
> misunderstanding something.  We can potentially issue multiple RDMA_WRITES
> for one NFS READ, but we do wait for the last RDMA_WRITE to complete, then
> go on to RDMA SEND the reply.  We need to wait for the write to confirm it
> completed successfully before sending the reply.  If the write fails for
> some reason we cant send success in the reply.
> Karen
>
>
> On Thu, Sep 15, 2016 at 2:58 PM, karen deitke <karen.deitke@oracle.com>
> wrote:
>
>> Thanks,
>>
>> That clears things up.  Next question, are you saying that an RDMA_WRITE
>> is NOT an internode round trip and an RDMA_READ is an internode round trip,
>> because we have to wait for the data from the RDMA_READ before proceeding,
>> where with the RDMA_WRITE we do not need to wait? (i.e. sections 2.2 and
>> 2.3)
>>
>> Karen
>>
>>
>> On 9/12/2016 2:24 PM, David Noveck wrote:
>>
>> > I'm confused by this summarization.
>>
>> :-(.  Let's see what I can do to make this clearer.
>>
>> > In the text above you indicate 3 different places where "internode
>> round trip is involved, yet in the summary you only mention 2.
>>
>> The point I was trying to make was that, although there were three
>> round-trips, only two contribute to the request latency.  In some
>> cases, there is a round trip because an ack is sent, but because neither
>> the client nor the server is waiting for it.
>>
>> > What is the definition of an "internode round trip?"
>>
>> Any situation in which a message is sent in one direction and, after
>> that, another message is sent in the opposite direction.
>>
>> > Also its unclear to me what you mean my "in the context of a connected
>> operation".
>>
>> maybe I should have said, "Because this reliable connected operation in
>> which messages are acked.
>>
>> > Also you mention that there are two-responder-side interrupt latencies,
>> are you referring to the notification of the RDMA_READ
>> > and the send completion queue for sending the response?
>>
>> I'm referring to the notification that the request has been received and
>> and the notification that the RDMA_READ has comnpleted.
>>
>> Does this interrupt latency come into play in the latency of the
>> operation?
>>
>> I think the two I mentioned do.
>>
>> > Once the client side gets the response it can continue, even if the
>> server thread is still waiting for notification of a successful send
>> correct?
>>
>> Yes.
>>
>> > Also are you missing the interrupt latency of the send on the client?
>> In addition to the interrupt latency of receiving the reply?
>>
>> I don't think that contributes to latency.  The request processing can
>> continue once the request is received on the server, even if the client
>> has not received notification of the completion of the send.
>>
>> On Mon, Sep 12, 2016 at 3:52 PM, karen deitke <karen.deitke@oracle.com>
>> wrote:
>>
>>> Hi Dave,
>>>
>>> I'm struggling following this below:
>>>
>>>    o  First, the memory to be accessed remotely is registered.  This is
>>>       a local operation.
>>>
>>>    o  Once the registration has been done, the initial send of the
>>>       request can proceed.  Since this is in the context of connected
>>>       operation, there is an internode round trip involved.  However,
>>>       the next step can proceed after the initial transmission is
>>>       received by the responder.  As a result, only the responder-bound
>>>       side of the transmission contributes to overall operation latency.
>>>
>>>    o  The responder, after being notified of the receipt of the request,
>>>       uses RDMA READ to fetch the bulk data.  This involves an internode
>>>       round-trip latency.  After the fetch of the data, the responder
>>>       needs to be notified of the completion of the explicit RDMA
>>>       operation
>>>
>>>    o  The responder (after performing the requested operation) sends the
>>>       response.  Again, as this is in the context of connected
>>>       operation, there is an internode round trip involved.  However,
>>>       the next step can proceed after the initial transmission is
>>>       received by the requester.
>>>
>>>    o  The memory registered before the request was issued needs to be
>>>       deregistered, before the request is considered complete and the
>>>       sending process restarted.  When remote invalidation is not
>>>       available, the requester, after being notified of the receipt of
>>>       the response, performs a local operation to deregister the memory
>>>       in question.  Alternatively, the responder will use Send With
>>>       Invalidate and the responder's RNIC will effect the deregistration
>>>       before notifying the requester of the response which has been
>>>       received.
>>>
>>>    To summarize, if we exclude the actual server execution of the
>>>    request, the latency consists of two internode round-trip latencies
>>>    plus two-responder-side interrupt latencies plus one requester-side
>>>    interrupt latency plus any necessary registration/de-registration
>>>    overhead.  This is in contrast to a request not using explicit RDMA
>>>    operations in which there is a single inter-node round-trip latency
>>>    and one interrupt latency on the requester and the responder.
>>>
>>> I'm confused by this summarization.  In the text above you indicate 3
>>> different places where "internode round trip is involved, yet in the
>>> summary you only mention 2.  What is the definition of an "internode round
>>> trip?"  Also its unclear to me what you mean my "in the context of a
>>> connected operation".
>>>
>>> Also you mention that there are two-responder-side interupt latencies,
>>> are you referring to the notification of the RDMA_READ and the send
>>> completion queue for sending the response?  Does this interrupt latency
>>> come into play in the latency of the operation? Once the client side gets
>>> the response it can continue, even if the server thread is still waiting
>>> for notification of a successful send correct?
>>>
>>> Also are you missing the interrupt latency of the send on the client? In
>>> addition to the interrupt latency of receiving the reply?
>>>
>>> Karen
>>>
>>>
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>
>>
>>
>>
>
>