Re: [nfsv4] Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01

Rick Macklem <rick.macklem@gmail.com> Tue, 02 May 2023 23:58 UTC

Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B79F0C151711 for <nfsv4@ietfa.amsl.com>; Tue, 2 May 2023 16:58:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.096
X-Spam-Level:
X-Spam-Status: No, score=-7.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3h8cGnBuyzBu for <nfsv4@ietfa.amsl.com>; Tue, 2 May 2023 16:58:06 -0700 (PDT)
Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C5B71C15154A for <nfsv4@ietf.org>; Tue, 2 May 2023 16:58:06 -0700 (PDT)
Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-64115e652eeso1034871b3a.0 for <nfsv4@ietf.org>; Tue, 02 May 2023 16:58:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683071886; x=1685663886; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6n0XY4hFYjE2DjEYqSyIb2/PZfSrqNMkeZTwpeTiD5k=; b=LbBA4rDCz6vsOGcmXBMyA3zwtUleFs9kyqRDaMYQYi9b5meEcG0LYsh/qmrH/jWYqx nE5XWGy9vDhLR8C6dyHSGV/dk/Fvi7meni4zgwFxUSLeAlIfFFAIFxiG+ZHZ82GmhGQe ixGbNwdWVHQ8Rg9zqqnXcuGvN7REizoYFhYnZEdl8M2DWkGhTH4yNfuesyBuXR3eCUaV 3Bp7EqGgshsSi5cxjRLCt4OEVoq/SbwOuQpHxiziO5N5NH3UdNtoAw4S4bouNSKiDEdm jsrM5yyr40MOYknxb2b2k7ogSaa92T6I+xyswE2wZOKk3+1yP3J6l5hNde51U5xUFIld mA7w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683071886; x=1685663886; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6n0XY4hFYjE2DjEYqSyIb2/PZfSrqNMkeZTwpeTiD5k=; b=hJWQRAqXwtsrvLbSVEadSfR7LpgcsouEmX3B3gyrrUCzohocXXHUpq1gRIqIQR3Owd klIIilfsFCJALY8q6XdBpmuwbmG757uw2uZWqJaq+PJf+NZKTOdFTuzkfrmjUukj8Pvv GqERiqgI0AsgilP53zGrovnbwTu/vkdsCGpk8UofDDzRXjaSr2gOp+3OsKnTvBsc6t7r HN7aUbvfbxK49ZI2UJmcfRrD2a3lzoH7vsv1CcpoBGjNANraZdAe8ECuczGHPPT0+oI3 oUK+Cx1jkfusUp62UhuRBhHIWMn9xZMtqR4ZCJrI8K/T7YLPLQAZfiernrksA3lWe/62 dh0A==
X-Gm-Message-State: AC+VfDyYXXLJDpYiZIdTBQiryogn7N+78sThb+DJe6Y5cnAIH4Iyq08S rx75uOYvFhDEs2PM5e+cDxAfSXFfL5jPpWkXFDpB/c4=
X-Google-Smtp-Source: ACHHUZ47zzELqPootsIzWwmHfVolU2gppL6bUHsQda+PI+KEaUcn4cQCHhEFZewhyq7fiJCGZx7SOXfIQ3/hh4rehwI=
X-Received: by 2002:a05:6a20:3d82:b0:d9:6660:8746 with SMTP id s2-20020a056a203d8200b000d966608746mr359525pzi.18.1683071886085; Tue, 02 May 2023 16:58:06 -0700 (PDT)
MIME-Version: 1.0
References: <83389b42bb0f49509c757cdabd5c6b3f@huawei.com> <CAM5tNy7F9Yf8rJQqy4rOwtcK_FSa+SuRsG=9KenB=mG9Wqv6MA@mail.gmail.com> <CAM5tNy56P0XxrDU==hjocvtBuY9DO2JrOn8PThyfBxRorbW74g@mail.gmail.com> <CAM5tNy6=3JRZTZZH5NW=QHocTd1DOS9HSav+PXnVC=N0_xduxA@mail.gmail.com> <CAM5tNy6x75sprE2Zqkxu5QE3wBFysuOUKj056Uyj9fqHPbkLbg@mail.gmail.com> <30a502ce7a90469196140b69b91f2867@huawei.com> <CAM5tNy7Rnv_K=JPGRkabCiMdHn=_tmC=RgpaUcZwJj+QNyjqiQ@mail.gmail.com> <d1fac950f0a842e6a31ab721de9a8bf3@huawei.com>
In-Reply-To: <d1fac950f0a842e6a31ab721de9a8bf3@huawei.com>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Tue, 02 May 2023 16:57:53 -0700
Message-ID: <CAM5tNy64mRk-O4mHCDsmw1dKiGHm7tj5pxRk7L2YHOB1oaT7fw@mail.gmail.com>
To: "yangjing (U)" <yangjing8@huawei.com>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/fJG5CzzivRC6vZv4GAiFOSuPyYk>
Subject: Re: [nfsv4] Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 May 2023 23:58:07 -0000

On Thu, Apr 27, 2023 at 6:20 PM yangjing (U) <yangjing8@huawei.com> wrote:
>
> Hi Rick,
>
> I think we are talking about two issues,
>
> 1. When a client get NFS4ERR_SEQ_MISORDERED response, whether the client should resend this request.
> I agree with you that a sequenceid being misordered implies that the client is trying to break "exactly once semantic", but denied by the server with NFS4ERR_SEQ_MISORDERED.
> And Next, client MUST NOT resend this request with the original session/slot/new sequence, or with the original session/new slot, or with a new session. If the client do so, it will break EOS.
> You are worrying about that if the misordered slot is recovered, the request (originally denied) will be resent to server (with the original session/slot/new sequence), right?
> In fact, client and user app can do this even if the slot is marked as broken, for example, it can resent the request with the original session/new slot.
>
Ok, lets assume the client (for any reason) chooses to not wait for an RPC reply
and that this RPC had a Sequence operation with sa_cache_this == true.
- client sends RPC X using session A, slot I and seq# K
- client does not wait for the reply to RPC X
Now, I believe cases are possible:
i) - server did not process RPC X
ii) - server processed RPC X and cached RPC X's reply
If the client sends RPC Y using session A, slot I and seq# K, then
it might work ok (i case) or it might get RPC X's reply (ii case).
To avoid getting RPC X's reply, it can send RPC Y using session
A, slot I and seq#K+1.
--> Now, it will either succeed ok (ii case) or fail with NFS4ERR_SEQ_MISORDERED
      (i case).
      --> This is the only case where NFS4ERR_SEQ_MISORDER can occur (except
            maybe migration, which I am not familiar with) as far as I
can think of.
            (The others are coding bugs, plain and simple.)
For the (i) case, session A, slot I can now be used with seq#K and neither RPC X
nor RPC Y have been performed at the server, so RPC Y can be done with seq#K.

I suggested that RPC Y could be done as an RPC with only the Sequence operation,
but that is not strictly necessary.

Note that a problem with implementing the above is that it assumes
that the server
is not broken. For the FreeBSD client I choose to mark the
session/slot (session A,
slot I for the above example) bad and no longer use it. Then the
client uses some
slot other than I for RPC Y to avoid problems caused by broken server
implementations.

Do others think this analysis sounds reasonable? rick

> 2. When a client get NFS4ERR_SEQ_MISORDERED response, you prefer the following two methods on how to deal with the session/slot. I have different opioion.
> Method1: Don't destroy the session, but mark the slot as broken. Excluding the impact of concurrence, we have to maintain slot status.
> If so, current data structures have to be changed. Is this acceptable?
> Method2: Don't destroy the session, get correct sequenceid of this slot by attempts with only SEQUENCE operation. In this way, Interactions between client and server are unknown.
> One request and one reply is the best case. However, with the expansion of operations to get correct sequenceid, one request and one reply is the only case.
>
> > -----邮件原件-----
> > 发件人: Rick Macklem [mailto:rick.macklem@gmail.com]
> > 发送时间: 2023年4月21日 10:28 PM
> > 收件人: yangjing (U) <yangjing8@huawei.com>
> > 抄送: nfsv4@ietf.org
> > 主题: Re: [nfsv4] Draft is updated following your comments, Thanks//答复:
> > draft-mzhang-nfsv4-sequence-id-calibration-01
> >
> > On Thu, Apr 20, 2023 at 8:08 PM yangjing (U) <yangjing8@huawei.com>
> > wrote:
> > >
> > > Hi Rick, Thanks for your comments and sorry for the late response ^_^
> > >
> > >
> > > On Mon, Apr 10, 2023 at 2:20 PM Rick Macklem <rick.macklem@gmail.com>
> > wrote:
> > > >
> > > > On Mon, Apr 10, 2023 at 1:58 PM Rick Macklem
> > <rick.macklem@gmail.com> wrote:
> > > > >
> > > > > On Mon, Apr 10, 2023 at 1:15 PM Rick Macklem
> > <rick.macklem@gmail.com> wrote:
> > > > > >
> > > > > > On Sun, Apr 9, 2023 at 11:39 PM yangjing (U)
> > > > > > <yangjing8=40huawei.com@dmarc.ietf.org> wrote:
> > > > > > >
> > > > > > > The draft is updated as
> > > > > > > https://www.ietf.org/archive/id/draft-mzhang-nfsv4-sequence-id
> > > > > > > -c
> > > > > > > alibration-02.txt
> > > > > > Here are a few generic comments about the draft (I know nothing
> > > > > > about detailed formatting, etc).
> > > > > >
> > > > > > 1 - It is my understanding that new operations can only be added
> > > > > > to minor version 2 and not
> > > > > >      minor version 1.
> > >
> > > I agree with you,  If new operations can only be added to minor version 2.
> > >
> > > > > > 2 - I largely agree with what Tom Talpey said in his comments. I
> > > > > > think you need to figure out
> > > > > >      how/when the sequenceid is getting messed up.
> > > > > >      Note that it is my understanding that the main purpose of
> > > > > > sessions is to achieve
> > > > > >      "exactly once RPC semantics" (the fact that the reply cache
> > > > > > is bounded in size is a nice
> > > > > >      added feature, but not the principal purpose).
> > > > > >      As such, a sequenceid being misordered implies that
> > > > > > "exactly once" semantics could
> > > > > >      already have been broken by a software bug in the client.
> > > > > >
> > > > > >      You note that the misordered sequenceid could be caused by
> > > > > > "network partitioning".
> > > > > >      In a correct implementation that should not be the case.
> > > > > > After a network partition heals,
> > > > > >      the client should re-send all outstanding RPCs with the
> > > > > > same Session/slot/sequenceid
> > > > > >      as was used in the original RPC request message. These
> > > > > > should succeed (no misordered
> > > > > >      error reply) using cached replies, as required.
> > > > > >      --> If a client's recovery from network partitioning is not
> > > > > > doing this, then it needs to be
> > > > > >            fixed.
> > > > > >      --> As above, if a misordered sequenceid error reply is due
> > > > > > to a client bug, then that
> > > > > >            must be fixed, since "exactly once" semantics may already be
> > broken.
> > > > > > The common client bug that results in a misordered sequenceid
> > > > > > error reply happens when an in-progress RPC gets interrupted
> > > > > > before the RPC reply is processed.
> > >
> > > You are right. Thanks for your analysis on when misordered sequenceid
> > happens. But this draft does not care how and when it happens, the same as
> > the protocol specification.
> > > We  focus on how to deal with this error.  If this error never
> > > happens, we will not define it, right? We did find that misordered
> > sequenceid happens and IO performance is affected because session is
> > reestablished, but we are not sure why and how it happens, because no
> > packages are being captured at that moment.
> > >
> > > > > > This is not a case
> > > > > > where your suggested extension should be used, since the
> > > > > > in-progress RPC may still be "in-progress". Although I am not
> > > > > > sure there is a 100% correct way for a client to handle the need
> > > > > > to interrupt an in-progress RPC, marking the session slot as
> > > > > > broken and no longer using it is the best approach I am aware
> > > > > > of. (I suspect you might be referring to the Linux client and I
> > > > > > cannot comment on how it handles interruption via signals or
> > > > > > timeouts for session slots, since I am not a Linux client
> > > > > > implementor.)
> > >
> > > Yes, when we find IO performance is affected, we check Linux client code,
> > find that client will drop the session, then we check the NFS protocol
> > specification....
> > > I don’t think marking the session slot as broken is the best way. What is next?
> > > First: Leaving the slot broken. On one hand, with more and more slots
> > > broken, concurrency of this  session will decline, On another hand, Is new
> > mechanism needed to maintain and notify slot status between client and
> > server?
> > > Second: Recover the broken slot. That is what this draft is trying to do, I
> > think.
> > But you cannot resynchronize the broken slot safely unless the RPC that
> > caused the sequuenceid to become non-synchronized is an idempotent one
> > (one that can be done more than once on the server safely).
> > This is difficult (maybe impossible) to determine when a reply to a subsequent
> > RPC using the same session/slot is NFS4ERR_SEQ_MISORDERED.
> >
> > To do so for all cases where a RPC reply of NFS4ERR_SEQ_MISORDERED is
> > received could result in a non-idempotent RPC being executed multiple times
> > on the server. This is a serious breakage and essentially defeats the main
> > purpose for using sessions.
> >
> > You are correct that, once a slot is marked bad, performance degradation is
> > possible. However NFS4ERR_SEQ_MISORDERED failures should be rare (some
> > might say that this error should never occur for correctly implemented clients)
> > and, as such, having several slots on a session should also be a rare
> > occurrence.
> > --> After some percentage of the slots are marked bad, the client will
> >      need to do a CreateSession and replace the session with the bad slots
> >      with a new one.
> >      What percentage is a design tradeoff. It sounds like Linux chose
> >     "> 0%" (or first bad slot, if you prefer). For FreeBSD, I chose "100%"
> >     (or all slots bad) if you prefer.
> > >
> > > > > >
> > > > > > It sounds like migration might be an exception to the above, but
> > > > > > it should be addressed specifically (as Tom Talpey notes).
> > >
> > > For migration scenario, we will analyze it later
> > >
> > > > > >
> > > > > > 3 - If your extension were used by a client, how would the
> > > > > > client know whether or not
> > > > > >      there are still RPC(s) in-progress that have been assigned
> > > > > > that session/slot/sequenceid?
> > > > > An additional comment:
> > > > > - If the client knows that the outstanding RPC request on the
> > > > >   session/slot/sequenceid that got a misordered error reply
> > > > >   is one where "exactly once" semantics is not required,
> > > > >   then I can see that the client implementor might be able to
> > > > >    safely resynchronize the sequenceid.
> > > >Oops again. It would be the outstanding RPC request for which no reply has
> > been received that causes the sequenceid to be misordered and that is the
> > RPC that cannot require "exactly once" semantics.
> > >
> > > >rick
> > > > >   To do this, I do not think your extensions are needed.
> > > > >    If the client issues an RPC that consists of only the
> > > > >    Sequence operation repeatedly, with the sequenceid incremented
> > > > >    by one each attempt after waiting for a reply to the previous
> > > > >    attempt, it should get NFS_OK within a few attempts.
> > >
> > > The misordered sequenceid is not known. If the cached sequenceid is N, any
> > value beyond N+1 and N will be identified as misordered. The range of
> > sequenceid is (1, 2^32 - 1).
> > > So, I don’t think attempting repeatedly is a better choice.
> > Well, here is the basic algorithm the client would use to maintain a slot's
> > sequenceid. (Note there should never be more than one RPC on the fly for a
> > given session/slot.)
> > - When an RPC reply is received from the server with NFS_OK for the
> >   Sequence operation, advance the sequenceid for the session/slot by 1.
> > As such, I do not see how a sequenceid can ever be out by more than 1 unless
> > there is a coding bug. For the case of a coding bug, fix the bug.
> >
> > For example:
> > - Lets assume that the client is configured (not a good situation, but
> >   possibly unavoidable) so that a timeout or POSIX signal results in
> >   the client not waiting "forever" for an RPC reply.
> > - Lets also assume that client becomes network partitioned from the server.
> >
> > When this is the case, it is possible that a few (even many) RPCs will be sent to
> > the server with the same session/slot/sequenceid, one after another, each
> > timing out before an RPC reply is received. But note that they all will have the
> > same sequenceid.
> > --> When this happens, it is possible that one of these RPC requests
> >      will make it to the server, be processed, and advance the sequenceid
> >      by one.
> >      --> All subsequent RPCs with the same session/slot/sequenceid
> >          processed at the server will get a NFS4ERR_SEQ_MISORDERED reply.
> >
> > Once the network partition is healed, the client will see a
> > NFS4ERR_SEQ_MISORDERED reply (plus possibly other delayed replies for the
> > same session/slot/sequenceid), but the sequenceid will only be out by 1
> > compared with what the server's sequenceid is for the same session/slot.
> > --> This implies that the first attempt of a RPC that consists of only
> >      the Sequence operation with sa_sequenceid set to N + 1
> >      (where N is what the client is currently using for sequenceid for
> >       the session/slot) will succeed.
> >      --> Whether a subsequence attempt with sequenceid set to N + 2
> >            is justified is debatable, but certainly doing more that that does
> >            not seem reasonable to me.
> >
> > If, due to some serious breakage, the above does not work. I would say a
> > CreateSession to replace the now badly broken session, is in order.
> >
> > Note again "this re-synchronization can only be safely done if the first RPC for
> > which no reply was received is an idempotent one" and that is going to be
> > difficult to determine when the NFS4ERR_SEQ_MISORDERED is received for a
> > subsequent RPC that uses the same session/slot.
> > --> If the client notes when it "gives up on waiting for an RPC reply"
> > --> that the
> >       slot is bad and it sees that the sa_cachethis argument in the Sequence
> > operation
> >       for the outstanding RPC request was set false (indicating it is an
> > idempotent RPC)
> >       then I believ it could safely do a re-synchronization of the slot at that point
> > in time.
> >       --> I chose to not do that and just mark the slot bad, but I can see an
> > argument
> >            for doing so, since the RPC is known to be idempotent.
> >
> > rick
> >
> > >
> > > > >    Then the correct sequenceid to be used for the session/slot is known.
> > > > Oops, "known" is probably too strong a word here. Since the cause of
> > > > the misordered sequenceid is not known, I think there is a very low
> > > > probability that an RPC using the session/slot is still in flight
> > > > and will change the sequenceid when processed on the server, causing
> > > > another misordered sequenceid error reply.
> > > >
> > >
> > > I agree with you on "There may be still RPC(s) in-progress that have
> > > been assigned that session/slot/sequenceid" when misorder reply getting
> > the client, I will update the draft to specify that client should wait for replies
> > for all flying requests, then query correct sequenceid.
> > >
> > >
> > > > rick
> > > >
> > > > >
> > > > > rick
> > > > >
> > > > > >
> > > > > > rick
> > > > > >
> > > > > > > _______________________________________________
> > > > > > > nfsv4 mailing list
> > > > > > > nfsv4@ietf.org
> > > > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > >
> > >
> > > Thank you Rick
> > >
> > > Best Regards
> > > Jing
> > >
> > >
> > >