Re: [nfsv4] Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01

On Fri, Apr 21, 2023, 10:29 AM Rick Macklem <rick.macklem@gmail.com> wrote:

> On Thu, Apr 20, 2023 at 8:08 PM yangjing (U) <yangjing8@huawei.com> wrote:
> >
> > Hi Rick, Thanks for your comments and sorry for the late response ^_^
> >
> >
> > On Mon, Apr 10, 2023 at 2:20 PM Rick Macklem <rick.macklem@gmail.com>
> wrote:
> > >
> > > On Mon, Apr 10, 2023 at 1:58 PM Rick Macklem <rick.macklem@gmail.com>
> wrote:
> > > >
> > > > On Mon, Apr 10, 2023 at 1:15 PM Rick Macklem <rick.macklem@gmail.com>
> wrote:
> > > > >
> > > > > On Sun, Apr 9, 2023 at 11:39 PM yangjing (U)
> > > > > <yangjing8=40huawei.com@dmarc.ietf.org> wrote:
> > > > > >
> > > > > > The draft is updated as
> > > > > > https://www.ietf.org/archive/id/draft-mzhang-nfsv4-sequence-id-c
> > > > > > alibration-02.txt
> > > > > Here are a few generic comments about the draft (I know nothing
> > > > > about detailed formatting, etc).
> > > > >
> > > > > 1 - It is my understanding that new operations can only be added
> > > > > to minor version 2 and not
> > > > >      minor version 1.
> >
> > I agree with you,  If new operations can only be added to minor version
> 2.
> >
> > > > > 2 - I largely agree with what Tom Talpey said in his comments. I
> > > > > think you need to figure out
> > > > >      how/when the sequenceid is getting messed up.
> > > > >      Note that it is my understanding that the main purpose of
> > > > > sessions is to achieve
> > > > >      "exactly once RPC semantics" (the fact that the reply cache
> > > > > is bounded in size is a nice
> > > > >      added feature, but not the principal purpose).
> > > > >      As such, a sequenceid being misordered implies that "exactly
> > > > > once" semantics could
> > > > >      already have been broken by a software bug in the client.
> > > > >
> > > > >      You note that the misordered sequenceid could be caused by
> > > > > "network partitioning".
> > > > >      In a correct implementation that should not be the case.
> > > > > After a network partition heals,
> > > > >      the client should re-send all outstanding RPCs with the same
> > > > > Session/slot/sequenceid
> > > > >      as was used in the original RPC request message. These should
> > > > > succeed (no misordered
> > > > >      error reply) using cached replies, as required.
> > > > >      --> If a client's recovery from network partitioning is not
> > > > > doing this, then it needs to be
> > > > >            fixed.
> > > > >      --> As above, if a misordered sequenceid error reply is due
> > > > > to a client bug, then that
> > > > >            must be fixed, since "exactly once" semantics may
> already be broken.
> > > > > The common client bug that results in a misordered sequenceid
> > > > > error reply happens when an in-progress RPC gets interrupted
> > > > > before the RPC reply is processed.
> >
> > You are right. Thanks for your analysis on when misordered sequenceid
> happens. But this draft does not care how and when it happens, the same as
> the protocol specification.
> > We  focus on how to deal with this error.  If this error never happens,
> we will not define it, right? We did find that misordered sequenceid
> happens and IO performance is affected
> > because session is reestablished, but we are not sure why and how it
> happens, because no packages are being captured at that moment.
> >
> > > > > This is not a case
> > > > > where your suggested extension should be used, since the
> > > > > in-progress RPC may still be "in-progress". Although I am not sure
> > > > > there is a 100% correct way for a client to handle the need to
> > > > > interrupt an in-progress RPC, marking the session slot as broken
> > > > > and no longer using it is the best approach I am aware of. (I
> > > > > suspect you might be referring to the Linux client and I cannot
> > > > > comment on how it handles interruption via signals or timeouts for
> > > > > session slots, since I am not a Linux client implementor.)
> >
> > Yes, when we find IO performance is affected, we check Linux client
> code, find that client will drop the session, then we check the NFS
> protocol specification....
> > I don’t think marking the session slot as broken is the best way. What
> is next?
> > First: Leaving the slot broken. On one hand, with more and more slots
> broken, concurrency
> > of this  session will decline, On another hand, Is new mechanism needed
> to maintain and notify slot status between client and server?
> > Second: Recover the broken slot. That is what this draft is trying to
> do, I think.
> But you cannot resynchronize the broken slot safely unless the RPC that
> caused the sequuenceid to become non-synchronized is an idempotent one
> (one that can be done more than once on the server safely).
> This is difficult (maybe impossible) to determine when a reply to a
> subsequent RPC using the same session/slot is NFS4ERR_SEQ_MISORDERED.
>
> To do so for all cases where a RPC reply of NFS4ERR_SEQ_MISORDERED
> is received could result in a non-idempotent RPC being executed multiple
> times on the server. This is a serious breakage and essentially defeats the
> main purpose for using sessions.
>
> You are correct that, once a slot is marked bad, performance degradation
> is possible. However NFS4ERR_SEQ_MISORDERED failures should be
> rare (some might say that this error should never occur for correctly
> implemented clients) and, as such, having several slots on a session
> should also be a rare occurrence.
> --> After some percentage of the slots are marked bad, the client will
>      need to do a CreateSession and replace the session with the bad slots
>      with a new one.
>      What percentage is a design tradeoff. It sounds like Linux chose
>     "> 0%" (or first bad slot, if you prefer). For FreeBSD, I chose "100%"
>     (or all slots bad) if you prefer.
> >
> > > > >
> > > > > It sounds like migration might be an exception to the above, but
> > > > > it should be addressed specifically (as Tom Talpey notes).
> >
> > For migration scenario, we will analyze it later
> >
> > > > >
> > > > > 3 - If your extension were used by a client, how would the client
> > > > > know whether or not
> > > > >      there are still RPC(s) in-progress that have been assigned
> > > > > that session/slot/sequenceid?
> > > > An additional comment:
> > > > - If the client knows that the outstanding RPC request on the
> > > >   session/slot/sequenceid that got a misordered error reply
> > > >   is one where "exactly once" semantics is not required,
> > > >   then I can see that the client implementor might be able to
> > > >    safely resynchronize the sequenceid.
> > >Oops again. It would be the outstanding RPC request for which no reply
> has been received that causes the sequenceid to be misordered and that is
> the RPC that cannot require "exactly once" semantics.
> >
> > >rick
> > > >   To do this, I do not think your extensions are needed.
> > > >    If the client issues an RPC that consists of only the
> > > >    Sequence operation repeatedly, with the sequenceid incremented
> > > >    by one each attempt after waiting for a reply to the previous
> > > >    attempt, it should get NFS_OK within a few attempts.
> >
> > The misordered sequenceid is not known. If the cached sequenceid is N,
> any value beyond N+1 and N will be identified as misordered. The range of
> sequenceid is (1, 2^32 - 1).
> > So, I don’t think attempting repeatedly is a better choice.
> Well, here is the basic algorithm the client would use to maintain a slot's
> sequenceid. (Note there should never be more than one RPC on the fly
> for a given session/slot.)
> - When an RPC reply is received from the server with NFS_OK for the
>   Sequence operation, advance the sequenceid for the session/slot by 1.
> As such, I do not see how a sequenceid can ever be out by more than 1
> unless there is a coding bug. For the case of a coding bug, fix the bug.
>
> For example:
> - Lets assume that the client is configured (not a good situation, but
>   possibly unavoidable) so that a timeout or POSIX signal results in
>   the client not waiting "forever" for an RPC reply.
> - Lets also assume that client becomes network partitioned from the server.
>
> When this is the case, it is possible that a few (even many) RPCs will
> be sent to the server with the same session/slot/sequenceid, one after
> another, each
> timing out before an RPC reply is received. But note that they all
> will have the same sequenceid.
> --> When this happens, it is possible that one of these RPC requests
>      will make it to the server, be processed, and advance the sequenceid
>      by one.
>      --> All subsequent RPCs with the same session/slot/sequenceid
>          processed at the server will get a NFS4ERR_SEQ_MISORDERED reply.
>
> Once the network partition is healed, the client will see a
> NFS4ERR_SEQ_MISORDERED reply (plus possibly other delayed
> replies for the same session/slot/sequenceid), but the sequenceid will only
> be out by 1 compared with what the server's sequenceid is for the same
> session/slot.
> --> This implies that the first attempt of a RPC that consists of only
>      the Sequence operation with sa_sequenceid set to N + 1
>      (where N is what the client is currently using for sequenceid for
>       the session/slot) will succeed.
>      --> Whether a subsequence attempt with sequenceid set to N + 2
>            is justified is debatable, but certainly doing more that that
> does
>            not seem reasonable to me.
>
> If, due to some serious breakage, the above does not work. I would
> say a CreateSession to replace the now badly broken session, is in
> order.
>
> Note again "this re-synchronization can only be safely done if the
> first RPC for which
> no reply was received is an idempotent one" and that is
> going to be difficult to determine when the NFS4ERR_SEQ_MISORDERED is
> received
> for a subsequent RPC that uses the same session/slot.
> --> If the client notes when it "gives up on waiting for an RPC reply"
> that the
>       slot is bad and it sees that the sa_cachethis argument in the
> Sequence operation
>       for the outstanding RPC request was set false (indicating it is
> an idempotent RPC)
>       then I believ it could safely do a re-synchronization of the
> slot at that point in time.
>       --> I chose to not do that and just mark the slot bad, but I can
> see an argument
>            for doing so, since the RPC is known to be idempotent.
>
> rick
>
> >
> > > >    Then the correct sequenceid to be used for the session/slot is
> known.
> > > Oops, "known" is probably too strong a word here. Since the cause of
> > > the misordered sequenceid is not known, I think there is a very low
> > > probability that an RPC using the session/slot is still in flight and
> > > will change the sequenceid when processed on the server, causing
> > > another misordered sequenceid error reply.
> > >
> >
> > I agree with you on "There may be still RPC(s) in-progress that have
> been assigned that session/slot/sequenceid" when misorder reply getting the
> client,
> > I will update the draft to specify that client should wait for replies
> for all flying requests, then query correct sequenceid.
>

I am  not sure which draft you are proposing updating.  However, rfcs 5661
and 8881 say that the client "MUST" wait for all flying requests.  The big
problem is that existing clients do not obey this specification and it most
unlikely that they will ever do so.

This issue is discussed in Appendix C.2 of draft-ietf-nfsv4-rfc5661bis.
The working group should discuss this issue at a forthcoming interim
meeting since this issue needs to be addressed before WGLC for this
document.

>
> >
> > > rick
> > >
> > > >
> > > > rick
> > > >
> > > > >
> > > > > rick
> > > > >
> > > > > > _______________________________________________
> > > > > > nfsv4 mailing list
> > > > > > nfsv4@ietf.org
> > > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >
> > Thank you Rick
> >
> > Best Regards
> > Jing
> >
> >
> >
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>