Re: [nfsv4] Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01
Rick Macklem <rick.macklem@gmail.com> Tue, 02 May 2023 23:58 UTC
Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B79F0C151711 for <nfsv4@ietfa.amsl.com>; Tue, 2 May 2023 16:58:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.096
X-Spam-Level:
X-Spam-Status: No, score=-7.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3h8cGnBuyzBu for <nfsv4@ietfa.amsl.com>; Tue, 2 May 2023 16:58:06 -0700 (PDT)
Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C5B71C15154A for <nfsv4@ietf.org>; Tue, 2 May 2023 16:58:06 -0700 (PDT)
Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-64115e652eeso1034871b3a.0 for <nfsv4@ietf.org>; Tue, 02 May 2023 16:58:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683071886; x=1685663886; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6n0XY4hFYjE2DjEYqSyIb2/PZfSrqNMkeZTwpeTiD5k=; b=LbBA4rDCz6vsOGcmXBMyA3zwtUleFs9kyqRDaMYQYi9b5meEcG0LYsh/qmrH/jWYqx nE5XWGy9vDhLR8C6dyHSGV/dk/Fvi7meni4zgwFxUSLeAlIfFFAIFxiG+ZHZ82GmhGQe ixGbNwdWVHQ8Rg9zqqnXcuGvN7REizoYFhYnZEdl8M2DWkGhTH4yNfuesyBuXR3eCUaV 3Bp7EqGgshsSi5cxjRLCt4OEVoq/SbwOuQpHxiziO5N5NH3UdNtoAw4S4bouNSKiDEdm jsrM5yyr40MOYknxb2b2k7ogSaa92T6I+xyswE2wZOKk3+1yP3J6l5hNde51U5xUFIld mA7w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683071886; x=1685663886; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6n0XY4hFYjE2DjEYqSyIb2/PZfSrqNMkeZTwpeTiD5k=; b=hJWQRAqXwtsrvLbSVEadSfR7LpgcsouEmX3B3gyrrUCzohocXXHUpq1gRIqIQR3Owd klIIilfsFCJALY8q6XdBpmuwbmG757uw2uZWqJaq+PJf+NZKTOdFTuzkfrmjUukj8Pvv GqERiqgI0AsgilP53zGrovnbwTu/vkdsCGpk8UofDDzRXjaSr2gOp+3OsKnTvBsc6t7r HN7aUbvfbxK49ZI2UJmcfRrD2a3lzoH7vsv1CcpoBGjNANraZdAe8ECuczGHPPT0+oI3 oUK+Cx1jkfusUp62UhuRBhHIWMn9xZMtqR4ZCJrI8K/T7YLPLQAZfiernrksA3lWe/62 dh0A==
X-Gm-Message-State: AC+VfDyYXXLJDpYiZIdTBQiryogn7N+78sThb+DJe6Y5cnAIH4Iyq08S rx75uOYvFhDEs2PM5e+cDxAfSXFfL5jPpWkXFDpB/c4=
X-Google-Smtp-Source: ACHHUZ47zzELqPootsIzWwmHfVolU2gppL6bUHsQda+PI+KEaUcn4cQCHhEFZewhyq7fiJCGZx7SOXfIQ3/hh4rehwI=
X-Received: by 2002:a05:6a20:3d82:b0:d9:6660:8746 with SMTP id s2-20020a056a203d8200b000d966608746mr359525pzi.18.1683071886085; Tue, 02 May 2023 16:58:06 -0700 (PDT)
MIME-Version: 1.0
References: <83389b42bb0f49509c757cdabd5c6b3f@huawei.com> <CAM5tNy7F9Yf8rJQqy4rOwtcK_FSa+SuRsG=9KenB=mG9Wqv6MA@mail.gmail.com> <CAM5tNy56P0XxrDU==hjocvtBuY9DO2JrOn8PThyfBxRorbW74g@mail.gmail.com> <CAM5tNy6=3JRZTZZH5NW=QHocTd1DOS9HSav+PXnVC=N0_xduxA@mail.gmail.com> <CAM5tNy6x75sprE2Zqkxu5QE3wBFysuOUKj056Uyj9fqHPbkLbg@mail.gmail.com> <30a502ce7a90469196140b69b91f2867@huawei.com> <CAM5tNy7Rnv_K=JPGRkabCiMdHn=_tmC=RgpaUcZwJj+QNyjqiQ@mail.gmail.com> <d1fac950f0a842e6a31ab721de9a8bf3@huawei.com>
In-Reply-To: <d1fac950f0a842e6a31ab721de9a8bf3@huawei.com>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Tue, 02 May 2023 16:57:53 -0700
Message-ID: <CAM5tNy64mRk-O4mHCDsmw1dKiGHm7tj5pxRk7L2YHOB1oaT7fw@mail.gmail.com>
To: "yangjing (U)" <yangjing8@huawei.com>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/fJG5CzzivRC6vZv4GAiFOSuPyYk>
Subject: Re: [nfsv4] Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 May 2023 23:58:07 -0000
On Thu, Apr 27, 2023 at 6:20 PM yangjing (U) <yangjing8@huawei.com> wrote: > > Hi Rick, > > I think we are talking about two issues, > > 1. When a client get NFS4ERR_SEQ_MISORDERED response, whether the client should resend this request. > I agree with you that a sequenceid being misordered implies that the client is trying to break "exactly once semantic", but denied by the server with NFS4ERR_SEQ_MISORDERED. > And Next, client MUST NOT resend this request with the original session/slot/new sequence, or with the original session/new slot, or with a new session. If the client do so, it will break EOS. > You are worrying about that if the misordered slot is recovered, the request (originally denied) will be resent to server (with the original session/slot/new sequence), right? > In fact, client and user app can do this even if the slot is marked as broken, for example, it can resent the request with the original session/new slot. > Ok, lets assume the client (for any reason) chooses to not wait for an RPC reply and that this RPC had a Sequence operation with sa_cache_this == true. - client sends RPC X using session A, slot I and seq# K - client does not wait for the reply to RPC X Now, I believe cases are possible: i) - server did not process RPC X ii) - server processed RPC X and cached RPC X's reply If the client sends RPC Y using session A, slot I and seq# K, then it might work ok (i case) or it might get RPC X's reply (ii case). To avoid getting RPC X's reply, it can send RPC Y using session A, slot I and seq#K+1. --> Now, it will either succeed ok (ii case) or fail with NFS4ERR_SEQ_MISORDERED (i case). --> This is the only case where NFS4ERR_SEQ_MISORDER can occur (except maybe migration, which I am not familiar with) as far as I can think of. (The others are coding bugs, plain and simple.) For the (i) case, session A, slot I can now be used with seq#K and neither RPC X nor RPC Y have been performed at the server, so RPC Y can be done with seq#K. I suggested that RPC Y could be done as an RPC with only the Sequence operation, but that is not strictly necessary. Note that a problem with implementing the above is that it assumes that the server is not broken. For the FreeBSD client I choose to mark the session/slot (session A, slot I for the above example) bad and no longer use it. Then the client uses some slot other than I for RPC Y to avoid problems caused by broken server implementations. Do others think this analysis sounds reasonable? rick > 2. When a client get NFS4ERR_SEQ_MISORDERED response, you prefer the following two methods on how to deal with the session/slot. I have different opioion. > Method1: Don't destroy the session, but mark the slot as broken. Excluding the impact of concurrence, we have to maintain slot status. > If so, current data structures have to be changed. Is this acceptable? > Method2: Don't destroy the session, get correct sequenceid of this slot by attempts with only SEQUENCE operation. In this way, Interactions between client and server are unknown. > One request and one reply is the best case. However, with the expansion of operations to get correct sequenceid, one request and one reply is the only case. > > > -----邮件原件----- > > 发件人: Rick Macklem [mailto:rick.macklem@gmail.com] > > 发送时间: 2023年4月21日 10:28 PM > > 收件人: yangjing (U) <yangjing8@huawei.com> > > 抄送: nfsv4@ietf.org > > 主题: Re: [nfsv4] Draft is updated following your comments, Thanks//答复: > > draft-mzhang-nfsv4-sequence-id-calibration-01 > > > > On Thu, Apr 20, 2023 at 8:08 PM yangjing (U) <yangjing8@huawei.com> > > wrote: > > > > > > Hi Rick, Thanks for your comments and sorry for the late response ^_^ > > > > > > > > > On Mon, Apr 10, 2023 at 2:20 PM Rick Macklem <rick.macklem@gmail.com> > > wrote: > > > > > > > > On Mon, Apr 10, 2023 at 1:58 PM Rick Macklem > > <rick.macklem@gmail.com> wrote: > > > > > > > > > > On Mon, Apr 10, 2023 at 1:15 PM Rick Macklem > > <rick.macklem@gmail.com> wrote: > > > > > > > > > > > > On Sun, Apr 9, 2023 at 11:39 PM yangjing (U) > > > > > > <yangjing8=40huawei.com@dmarc.ietf.org> wrote: > > > > > > > > > > > > > > The draft is updated as > > > > > > > https://www.ietf.org/archive/id/draft-mzhang-nfsv4-sequence-id > > > > > > > -c > > > > > > > alibration-02.txt > > > > > > Here are a few generic comments about the draft (I know nothing > > > > > > about detailed formatting, etc). > > > > > > > > > > > > 1 - It is my understanding that new operations can only be added > > > > > > to minor version 2 and not > > > > > > minor version 1. > > > > > > I agree with you, If new operations can only be added to minor version 2. > > > > > > > > > 2 - I largely agree with what Tom Talpey said in his comments. I > > > > > > think you need to figure out > > > > > > how/when the sequenceid is getting messed up. > > > > > > Note that it is my understanding that the main purpose of > > > > > > sessions is to achieve > > > > > > "exactly once RPC semantics" (the fact that the reply cache > > > > > > is bounded in size is a nice > > > > > > added feature, but not the principal purpose). > > > > > > As such, a sequenceid being misordered implies that > > > > > > "exactly once" semantics could > > > > > > already have been broken by a software bug in the client. > > > > > > > > > > > > You note that the misordered sequenceid could be caused by > > > > > > "network partitioning". > > > > > > In a correct implementation that should not be the case. > > > > > > After a network partition heals, > > > > > > the client should re-send all outstanding RPCs with the > > > > > > same Session/slot/sequenceid > > > > > > as was used in the original RPC request message. These > > > > > > should succeed (no misordered > > > > > > error reply) using cached replies, as required. > > > > > > --> If a client's recovery from network partitioning is not > > > > > > doing this, then it needs to be > > > > > > fixed. > > > > > > --> As above, if a misordered sequenceid error reply is due > > > > > > to a client bug, then that > > > > > > must be fixed, since "exactly once" semantics may already be > > broken. > > > > > > The common client bug that results in a misordered sequenceid > > > > > > error reply happens when an in-progress RPC gets interrupted > > > > > > before the RPC reply is processed. > > > > > > You are right. Thanks for your analysis on when misordered sequenceid > > happens. But this draft does not care how and when it happens, the same as > > the protocol specification. > > > We focus on how to deal with this error. If this error never > > > happens, we will not define it, right? We did find that misordered > > sequenceid happens and IO performance is affected because session is > > reestablished, but we are not sure why and how it happens, because no > > packages are being captured at that moment. > > > > > > > > > This is not a case > > > > > > where your suggested extension should be used, since the > > > > > > in-progress RPC may still be "in-progress". Although I am not > > > > > > sure there is a 100% correct way for a client to handle the need > > > > > > to interrupt an in-progress RPC, marking the session slot as > > > > > > broken and no longer using it is the best approach I am aware > > > > > > of. (I suspect you might be referring to the Linux client and I > > > > > > cannot comment on how it handles interruption via signals or > > > > > > timeouts for session slots, since I am not a Linux client > > > > > > implementor.) > > > > > > Yes, when we find IO performance is affected, we check Linux client code, > > find that client will drop the session, then we check the NFS protocol > > specification.... > > > I don’t think marking the session slot as broken is the best way. What is next? > > > First: Leaving the slot broken. On one hand, with more and more slots > > > broken, concurrency of this session will decline, On another hand, Is new > > mechanism needed to maintain and notify slot status between client and > > server? > > > Second: Recover the broken slot. That is what this draft is trying to do, I > > think. > > But you cannot resynchronize the broken slot safely unless the RPC that > > caused the sequuenceid to become non-synchronized is an idempotent one > > (one that can be done more than once on the server safely). > > This is difficult (maybe impossible) to determine when a reply to a subsequent > > RPC using the same session/slot is NFS4ERR_SEQ_MISORDERED. > > > > To do so for all cases where a RPC reply of NFS4ERR_SEQ_MISORDERED is > > received could result in a non-idempotent RPC being executed multiple times > > on the server. This is a serious breakage and essentially defeats the main > > purpose for using sessions. > > > > You are correct that, once a slot is marked bad, performance degradation is > > possible. However NFS4ERR_SEQ_MISORDERED failures should be rare (some > > might say that this error should never occur for correctly implemented clients) > > and, as such, having several slots on a session should also be a rare > > occurrence. > > --> After some percentage of the slots are marked bad, the client will > > need to do a CreateSession and replace the session with the bad slots > > with a new one. > > What percentage is a design tradeoff. It sounds like Linux chose > > "> 0%" (or first bad slot, if you prefer). For FreeBSD, I chose "100%" > > (or all slots bad) if you prefer. > > > > > > > > > > > > > > > It sounds like migration might be an exception to the above, but > > > > > > it should be addressed specifically (as Tom Talpey notes). > > > > > > For migration scenario, we will analyze it later > > > > > > > > > > > > > > > 3 - If your extension were used by a client, how would the > > > > > > client know whether or not > > > > > > there are still RPC(s) in-progress that have been assigned > > > > > > that session/slot/sequenceid? > > > > > An additional comment: > > > > > - If the client knows that the outstanding RPC request on the > > > > > session/slot/sequenceid that got a misordered error reply > > > > > is one where "exactly once" semantics is not required, > > > > > then I can see that the client implementor might be able to > > > > > safely resynchronize the sequenceid. > > > >Oops again. It would be the outstanding RPC request for which no reply has > > been received that causes the sequenceid to be misordered and that is the > > RPC that cannot require "exactly once" semantics. > > > > > > >rick > > > > > To do this, I do not think your extensions are needed. > > > > > If the client issues an RPC that consists of only the > > > > > Sequence operation repeatedly, with the sequenceid incremented > > > > > by one each attempt after waiting for a reply to the previous > > > > > attempt, it should get NFS_OK within a few attempts. > > > > > > The misordered sequenceid is not known. If the cached sequenceid is N, any > > value beyond N+1 and N will be identified as misordered. The range of > > sequenceid is (1, 2^32 - 1). > > > So, I don’t think attempting repeatedly is a better choice. > > Well, here is the basic algorithm the client would use to maintain a slot's > > sequenceid. (Note there should never be more than one RPC on the fly for a > > given session/slot.) > > - When an RPC reply is received from the server with NFS_OK for the > > Sequence operation, advance the sequenceid for the session/slot by 1. > > As such, I do not see how a sequenceid can ever be out by more than 1 unless > > there is a coding bug. For the case of a coding bug, fix the bug. > > > > For example: > > - Lets assume that the client is configured (not a good situation, but > > possibly unavoidable) so that a timeout or POSIX signal results in > > the client not waiting "forever" for an RPC reply. > > - Lets also assume that client becomes network partitioned from the server. > > > > When this is the case, it is possible that a few (even many) RPCs will be sent to > > the server with the same session/slot/sequenceid, one after another, each > > timing out before an RPC reply is received. But note that they all will have the > > same sequenceid. > > --> When this happens, it is possible that one of these RPC requests > > will make it to the server, be processed, and advance the sequenceid > > by one. > > --> All subsequent RPCs with the same session/slot/sequenceid > > processed at the server will get a NFS4ERR_SEQ_MISORDERED reply. > > > > Once the network partition is healed, the client will see a > > NFS4ERR_SEQ_MISORDERED reply (plus possibly other delayed replies for the > > same session/slot/sequenceid), but the sequenceid will only be out by 1 > > compared with what the server's sequenceid is for the same session/slot. > > --> This implies that the first attempt of a RPC that consists of only > > the Sequence operation with sa_sequenceid set to N + 1 > > (where N is what the client is currently using for sequenceid for > > the session/slot) will succeed. > > --> Whether a subsequence attempt with sequenceid set to N + 2 > > is justified is debatable, but certainly doing more that that does > > not seem reasonable to me. > > > > If, due to some serious breakage, the above does not work. I would say a > > CreateSession to replace the now badly broken session, is in order. > > > > Note again "this re-synchronization can only be safely done if the first RPC for > > which no reply was received is an idempotent one" and that is going to be > > difficult to determine when the NFS4ERR_SEQ_MISORDERED is received for a > > subsequent RPC that uses the same session/slot. > > --> If the client notes when it "gives up on waiting for an RPC reply" > > --> that the > > slot is bad and it sees that the sa_cachethis argument in the Sequence > > operation > > for the outstanding RPC request was set false (indicating it is an > > idempotent RPC) > > then I believ it could safely do a re-synchronization of the slot at that point > > in time. > > --> I chose to not do that and just mark the slot bad, but I can see an > > argument > > for doing so, since the RPC is known to be idempotent. > > > > rick > > > > > > > > > > Then the correct sequenceid to be used for the session/slot is known. > > > > Oops, "known" is probably too strong a word here. Since the cause of > > > > the misordered sequenceid is not known, I think there is a very low > > > > probability that an RPC using the session/slot is still in flight > > > > and will change the sequenceid when processed on the server, causing > > > > another misordered sequenceid error reply. > > > > > > > > > > I agree with you on "There may be still RPC(s) in-progress that have > > > been assigned that session/slot/sequenceid" when misorder reply getting > > the client, I will update the draft to specify that client should wait for replies > > for all flying requests, then query correct sequenceid. > > > > > > > > > > rick > > > > > > > > > > > > > > rick > > > > > > > > > > > > > > > > > rick > > > > > > > > > > > > > _______________________________________________ > > > > > > > nfsv4 mailing list > > > > > > > nfsv4@ietf.org > > > > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > Thank you Rick > > > > > > Best Regards > > > Jing > > > > > > > > >
- [nfsv4] Draft is updated following your comments,… yangjing (U)
- Re: [nfsv4] Draft is updated following your comme… Thomas Haynes
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- [nfsv4] 答复: Draft is updated following your comme… yangjing (U)
- [nfsv4] 答复: Draft is updated following your comme… yangjing (U)
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- Re: [nfsv4] Draft is updated following your comme… David Noveck
- [nfsv4] 答复: Draft is updated following your comme… yangjing (U)
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem