[nfsv4] 答复: Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01

"yangjing (U)" <yangjing8@huawei.com> Fri, 28 April 2023 01:20 UTC

Return-Path: <yangjing8@huawei.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3A0D5C151984 for <nfsv4@ietfa.amsl.com>; Thu, 27 Apr 2023 18:20:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IJtKwUke4UK0 for <nfsv4@ietfa.amsl.com>; Thu, 27 Apr 2023 18:20:51 -0700 (PDT)
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A2E1FC15171F for <nfsv4@ietf.org>; Thu, 27 Apr 2023 18:20:51 -0700 (PDT)
Received: from lhrpeml500006.china.huawei.com (unknown [172.18.147.206]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Q6vq41kyTz6J732 for <nfsv4@ietf.org>; Fri, 28 Apr 2023 09:17:36 +0800 (CST)
Received: from kwepemi100012.china.huawei.com (7.221.188.202) by lhrpeml500006.china.huawei.com (7.191.161.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Fri, 28 Apr 2023 02:20:48 +0100
Received: from kwepemi500014.china.huawei.com (7.221.188.232) by kwepemi100012.china.huawei.com (7.221.188.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Fri, 28 Apr 2023 09:20:46 +0800
Received: from kwepemi500014.china.huawei.com ([7.221.188.232]) by kwepemi500014.china.huawei.com ([7.221.188.232]) with mapi id 15.01.2507.023; Fri, 28 Apr 2023 09:20:46 +0800
From: "yangjing (U)" <yangjing8@huawei.com>
To: Rick Macklem <rick.macklem@gmail.com>
CC: "nfsv4@ietf.org" <nfsv4@ietf.org>
Thread-Topic: [nfsv4] Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01
Thread-Index: Adlrdv8o4J6qb1jcQoy+CNAfWT26KAALyr8AAAF/tIAAAMDTgAAHljeAAgchPOAADBo3gAFVQi5g
Date: Fri, 28 Apr 2023 01:20:46 +0000
Message-ID: <d1fac950f0a842e6a31ab721de9a8bf3@huawei.com>
References: <83389b42bb0f49509c757cdabd5c6b3f@huawei.com> <CAM5tNy7F9Yf8rJQqy4rOwtcK_FSa+SuRsG=9KenB=mG9Wqv6MA@mail.gmail.com> <CAM5tNy56P0XxrDU==hjocvtBuY9DO2JrOn8PThyfBxRorbW74g@mail.gmail.com> <CAM5tNy6=3JRZTZZH5NW=QHocTd1DOS9HSav+PXnVC=N0_xduxA@mail.gmail.com> <CAM5tNy6x75sprE2Zqkxu5QE3wBFysuOUKj056Uyj9fqHPbkLbg@mail.gmail.com> <30a502ce7a90469196140b69b91f2867@huawei.com> <CAM5tNy7Rnv_K=JPGRkabCiMdHn=_tmC=RgpaUcZwJj+QNyjqiQ@mail.gmail.com>
In-Reply-To: <CAM5tNy7Rnv_K=JPGRkabCiMdHn=_tmC=RgpaUcZwJj+QNyjqiQ@mail.gmail.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.110.46.233]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/uk4SQU8c3oXf-R_b5EU7JkEabqI>
Subject: [nfsv4] 答复: Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Apr 2023 01:20:56 -0000

Hi Rick,

I think we are talking about two issues,

1. When a client get NFS4ERR_SEQ_MISORDERED response, whether the client should resend this request.
I agree with you that a sequenceid being misordered implies that the client is trying to break "exactly once semantic", but denied by the server with NFS4ERR_SEQ_MISORDERED. 
And Next, client MUST NOT resend this request with the original session/slot/new sequence, or with the original session/new slot, or with a new session. If the client do so, it will break EOS.  
You are worrying about that if the misordered slot is recovered, the request (originally denied) will be resent to server (with the original session/slot/new sequence), right? 
In fact, client and user app can do this even if the slot is marked as broken, for example, it can resent the request with the original session/new slot.

2. When a client get NFS4ERR_SEQ_MISORDERED response, you prefer the following two methods on how to deal with the session/slot. I have different opioion.
Method1: Don't destroy the session, but mark the slot as broken. Excluding the impact of concurrence, we have to maintain slot status. 
If so, current data structures have to be changed. Is this acceptable? 
Method2: Don't destroy the session, get correct sequenceid of this slot by attempts with only SEQUENCE operation. In this way, Interactions between client and server are unknown. 
One request and one reply is the best case. However, with the expansion of operations to get correct sequenceid, one request and one reply is the only case.

> -----邮件原件-----
> 发件人: Rick Macklem [mailto:rick.macklem@gmail.com]
> 发送时间: 2023年4月21日 10:28 PM
> 收件人: yangjing (U) <yangjing8@huawei.com>
> 抄送: nfsv4@ietf.org
> 主题: Re: [nfsv4] Draft is updated following your comments, Thanks//答复:
> draft-mzhang-nfsv4-sequence-id-calibration-01
> 
> On Thu, Apr 20, 2023 at 8:08 PM yangjing (U) <yangjing8@huawei.com>
> wrote:
> >
> > Hi Rick, Thanks for your comments and sorry for the late response ^_^
> >
> >
> > On Mon, Apr 10, 2023 at 2:20 PM Rick Macklem <rick.macklem@gmail.com>
> wrote:
> > >
> > > On Mon, Apr 10, 2023 at 1:58 PM Rick Macklem
> <rick.macklem@gmail.com> wrote:
> > > >
> > > > On Mon, Apr 10, 2023 at 1:15 PM Rick Macklem
> <rick.macklem@gmail.com> wrote:
> > > > >
> > > > > On Sun, Apr 9, 2023 at 11:39 PM yangjing (U)
> > > > > <yangjing8=40huawei.com@dmarc.ietf.org> wrote:
> > > > > >
> > > > > > The draft is updated as
> > > > > > https://www.ietf.org/archive/id/draft-mzhang-nfsv4-sequence-id
> > > > > > -c
> > > > > > alibration-02.txt
> > > > > Here are a few generic comments about the draft (I know nothing
> > > > > about detailed formatting, etc).
> > > > >
> > > > > 1 - It is my understanding that new operations can only be added
> > > > > to minor version 2 and not
> > > > >      minor version 1.
> >
> > I agree with you,  If new operations can only be added to minor version 2.
> >
> > > > > 2 - I largely agree with what Tom Talpey said in his comments. I
> > > > > think you need to figure out
> > > > >      how/when the sequenceid is getting messed up.
> > > > >      Note that it is my understanding that the main purpose of
> > > > > sessions is to achieve
> > > > >      "exactly once RPC semantics" (the fact that the reply cache
> > > > > is bounded in size is a nice
> > > > >      added feature, but not the principal purpose).
> > > > >      As such, a sequenceid being misordered implies that
> > > > > "exactly once" semantics could
> > > > >      already have been broken by a software bug in the client.
> > > > >
> > > > >      You note that the misordered sequenceid could be caused by
> > > > > "network partitioning".
> > > > >      In a correct implementation that should not be the case.
> > > > > After a network partition heals,
> > > > >      the client should re-send all outstanding RPCs with the
> > > > > same Session/slot/sequenceid
> > > > >      as was used in the original RPC request message. These
> > > > > should succeed (no misordered
> > > > >      error reply) using cached replies, as required.
> > > > >      --> If a client's recovery from network partitioning is not
> > > > > doing this, then it needs to be
> > > > >            fixed.
> > > > >      --> As above, if a misordered sequenceid error reply is due
> > > > > to a client bug, then that
> > > > >            must be fixed, since "exactly once" semantics may already be
> broken.
> > > > > The common client bug that results in a misordered sequenceid
> > > > > error reply happens when an in-progress RPC gets interrupted
> > > > > before the RPC reply is processed.
> >
> > You are right. Thanks for your analysis on when misordered sequenceid
> happens. But this draft does not care how and when it happens, the same as
> the protocol specification.
> > We  focus on how to deal with this error.  If this error never
> > happens, we will not define it, right? We did find that misordered
> sequenceid happens and IO performance is affected because session is
> reestablished, but we are not sure why and how it happens, because no
> packages are being captured at that moment.
> >
> > > > > This is not a case
> > > > > where your suggested extension should be used, since the
> > > > > in-progress RPC may still be "in-progress". Although I am not
> > > > > sure there is a 100% correct way for a client to handle the need
> > > > > to interrupt an in-progress RPC, marking the session slot as
> > > > > broken and no longer using it is the best approach I am aware
> > > > > of. (I suspect you might be referring to the Linux client and I
> > > > > cannot comment on how it handles interruption via signals or
> > > > > timeouts for session slots, since I am not a Linux client
> > > > > implementor.)
> >
> > Yes, when we find IO performance is affected, we check Linux client code,
> find that client will drop the session, then we check the NFS protocol
> specification....
> > I don’t think marking the session slot as broken is the best way. What is next?
> > First: Leaving the slot broken. On one hand, with more and more slots
> > broken, concurrency of this  session will decline, On another hand, Is new
> mechanism needed to maintain and notify slot status between client and
> server?
> > Second: Recover the broken slot. That is what this draft is trying to do, I
> think.
> But you cannot resynchronize the broken slot safely unless the RPC that
> caused the sequuenceid to become non-synchronized is an idempotent one
> (one that can be done more than once on the server safely).
> This is difficult (maybe impossible) to determine when a reply to a subsequent
> RPC using the same session/slot is NFS4ERR_SEQ_MISORDERED.
> 
> To do so for all cases where a RPC reply of NFS4ERR_SEQ_MISORDERED is
> received could result in a non-idempotent RPC being executed multiple times
> on the server. This is a serious breakage and essentially defeats the main
> purpose for using sessions.
> 
> You are correct that, once a slot is marked bad, performance degradation is
> possible. However NFS4ERR_SEQ_MISORDERED failures should be rare (some
> might say that this error should never occur for correctly implemented clients)
> and, as such, having several slots on a session should also be a rare
> occurrence.
> --> After some percentage of the slots are marked bad, the client will
>      need to do a CreateSession and replace the session with the bad slots
>      with a new one.
>      What percentage is a design tradeoff. It sounds like Linux chose
>     "> 0%" (or first bad slot, if you prefer). For FreeBSD, I chose "100%"
>     (or all slots bad) if you prefer.
> >
> > > > >
> > > > > It sounds like migration might be an exception to the above, but
> > > > > it should be addressed specifically (as Tom Talpey notes).
> >
> > For migration scenario, we will analyze it later
> >
> > > > >
> > > > > 3 - If your extension were used by a client, how would the
> > > > > client know whether or not
> > > > >      there are still RPC(s) in-progress that have been assigned
> > > > > that session/slot/sequenceid?
> > > > An additional comment:
> > > > - If the client knows that the outstanding RPC request on the
> > > >   session/slot/sequenceid that got a misordered error reply
> > > >   is one where "exactly once" semantics is not required,
> > > >   then I can see that the client implementor might be able to
> > > >    safely resynchronize the sequenceid.
> > >Oops again. It would be the outstanding RPC request for which no reply has
> been received that causes the sequenceid to be misordered and that is the
> RPC that cannot require "exactly once" semantics.
> >
> > >rick
> > > >   To do this, I do not think your extensions are needed.
> > > >    If the client issues an RPC that consists of only the
> > > >    Sequence operation repeatedly, with the sequenceid incremented
> > > >    by one each attempt after waiting for a reply to the previous
> > > >    attempt, it should get NFS_OK within a few attempts.
> >
> > The misordered sequenceid is not known. If the cached sequenceid is N, any
> value beyond N+1 and N will be identified as misordered. The range of
> sequenceid is (1, 2^32 - 1).
> > So, I don’t think attempting repeatedly is a better choice.
> Well, here is the basic algorithm the client would use to maintain a slot's
> sequenceid. (Note there should never be more than one RPC on the fly for a
> given session/slot.)
> - When an RPC reply is received from the server with NFS_OK for the
>   Sequence operation, advance the sequenceid for the session/slot by 1.
> As such, I do not see how a sequenceid can ever be out by more than 1 unless
> there is a coding bug. For the case of a coding bug, fix the bug.
> 
> For example:
> - Lets assume that the client is configured (not a good situation, but
>   possibly unavoidable) so that a timeout or POSIX signal results in
>   the client not waiting "forever" for an RPC reply.
> - Lets also assume that client becomes network partitioned from the server.
> 
> When this is the case, it is possible that a few (even many) RPCs will be sent to
> the server with the same session/slot/sequenceid, one after another, each
> timing out before an RPC reply is received. But note that they all will have the
> same sequenceid.
> --> When this happens, it is possible that one of these RPC requests
>      will make it to the server, be processed, and advance the sequenceid
>      by one.
>      --> All subsequent RPCs with the same session/slot/sequenceid
>          processed at the server will get a NFS4ERR_SEQ_MISORDERED reply.
> 
> Once the network partition is healed, the client will see a
> NFS4ERR_SEQ_MISORDERED reply (plus possibly other delayed replies for the
> same session/slot/sequenceid), but the sequenceid will only be out by 1
> compared with what the server's sequenceid is for the same session/slot.
> --> This implies that the first attempt of a RPC that consists of only
>      the Sequence operation with sa_sequenceid set to N + 1
>      (where N is what the client is currently using for sequenceid for
>       the session/slot) will succeed.
>      --> Whether a subsequence attempt with sequenceid set to N + 2
>            is justified is debatable, but certainly doing more that that does
>            not seem reasonable to me.
> 
> If, due to some serious breakage, the above does not work. I would say a
> CreateSession to replace the now badly broken session, is in order.
> 
> Note again "this re-synchronization can only be safely done if the first RPC for
> which no reply was received is an idempotent one" and that is going to be
> difficult to determine when the NFS4ERR_SEQ_MISORDERED is received for a
> subsequent RPC that uses the same session/slot.
> --> If the client notes when it "gives up on waiting for an RPC reply"
> --> that the
>       slot is bad and it sees that the sa_cachethis argument in the Sequence
> operation
>       for the outstanding RPC request was set false (indicating it is an
> idempotent RPC)
>       then I believ it could safely do a re-synchronization of the slot at that point
> in time.
>       --> I chose to not do that and just mark the slot bad, but I can see an
> argument
>            for doing so, since the RPC is known to be idempotent.
> 
> rick
> 
> >
> > > >    Then the correct sequenceid to be used for the session/slot is known.
> > > Oops, "known" is probably too strong a word here. Since the cause of
> > > the misordered sequenceid is not known, I think there is a very low
> > > probability that an RPC using the session/slot is still in flight
> > > and will change the sequenceid when processed on the server, causing
> > > another misordered sequenceid error reply.
> > >
> >
> > I agree with you on "There may be still RPC(s) in-progress that have
> > been assigned that session/slot/sequenceid" when misorder reply getting
> the client, I will update the draft to specify that client should wait for replies
> for all flying requests, then query correct sequenceid.
> >
> >
> > > rick
> > >
> > > >
> > > > rick
> > > >
> > > > >
> > > > > rick
> > > > >
> > > > > > _______________________________________________
> > > > > > nfsv4 mailing list
> > > > > > nfsv4@ietf.org
> > > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >
> > Thank you Rick
> >
> > Best Regards
> > Jing
> >
> >
> >