Re: [nfsv4] Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01
Rick Macklem <rick.macklem@gmail.com> Fri, 21 April 2023 14:28 UTC
Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C6F32C151542 for <nfsv4@ietfa.amsl.com>; Fri, 21 Apr 2023 07:28:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.095
X-Spam-Level:
X-Spam-Status: No, score=-2.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qspZuW6HoDER for <nfsv4@ietfa.amsl.com>; Fri, 21 Apr 2023 07:28:23 -0700 (PDT)
Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 03D17C151546 for <nfsv4@ietf.org>; Fri, 21 Apr 2023 07:28:22 -0700 (PDT)
Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-63b4bf2d74aso1911528b3a.2 for <nfsv4@ietf.org>; Fri, 21 Apr 2023 07:28:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682087302; x=1684679302; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MXvIgxDgGKQ3WmN4SW0SgTvpfo0ilx4xd2X+TCh2c9Y=; b=Q7KpfvM+2AAWyUL7aGhbuqi1dzS7Oso7PxaOEH1sGmsEJtMogDYaYMF8yTDNA/kohK Og4QRzdT9umwZS1r9YBnF723C4YBY3sHjeiuvo5SDnLp05At6WqmSR5pGD/0iKwD/inS X/HVHjRnRQ7SgkVAtyjjPRBcv9Gqh9oBAdBVYVkxyrlU74QqesHJO0/PFP9MU7ppo7xa x9IxPGr6PfSzjxt9OjQcta13p+5A9/W6zDsC7u7avcTWFvElnDiHWJBcHTLXkmB43fkB PSrvGHbEkTOrPDLF1dPRwgS+Bqzi+/soEENo9UrDWItXWpzdqP4oWXkelvRNARrZ5pjq 13Og==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682087302; x=1684679302; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MXvIgxDgGKQ3WmN4SW0SgTvpfo0ilx4xd2X+TCh2c9Y=; b=QuOrTg7ab/fVoNA5bMWkbXimWYP3UbN7Oy5Zlx4Ix7ZvP01T3xa/vURpgBxelj+UD2 ofV3ySThjIOGcNWhlHUbi6dyE4BPuCTwhkEGL5IQB6VXf15Av5MR91+FovIGz/08gQIl qZ8imwPHSeqK42+AGQ+POEIUWhUTbxto1DAy0+J6dud0uGZ9trOagHyo4qUQv0VkEr2k RY7RmMLSzSmjTWww1ltV/9o12jgJeJoZUjnsFfkinCKCDPs4F2pA7FQuton2+QAdUroX yewFrdhL1eaTF7HxuCAFVfilUtSnH0f6GfsOL2ok40XCLl1T1tt6D+LAUtm3cTqF74Vz Wx9A==
X-Gm-Message-State: AAQBX9d8qHAh9Sx/+5YQyMhMdTesdezgy4txlIp2/Ss0yr39AcgHDxcP oLu02/6PfG66nAtXrG6XNXAUw0dMbw7yUp/fcQ==
X-Google-Smtp-Source: AKy350ZWC93XqiTVm8zfjR+WF+PSeJhcY738ZjcHK0JLzwkm6ZDk5Tv+9l976qsaRiVATjXuFn0pP68If1csQYws4JQ=
X-Received: by 2002:a17:90a:bf83:b0:249:66b3:946 with SMTP id d3-20020a17090abf8300b0024966b30946mr5314819pjs.13.1682087301901; Fri, 21 Apr 2023 07:28:21 -0700 (PDT)
MIME-Version: 1.0
References: <83389b42bb0f49509c757cdabd5c6b3f@huawei.com> <CAM5tNy7F9Yf8rJQqy4rOwtcK_FSa+SuRsG=9KenB=mG9Wqv6MA@mail.gmail.com> <CAM5tNy56P0XxrDU==hjocvtBuY9DO2JrOn8PThyfBxRorbW74g@mail.gmail.com> <CAM5tNy6=3JRZTZZH5NW=QHocTd1DOS9HSav+PXnVC=N0_xduxA@mail.gmail.com> <CAM5tNy6x75sprE2Zqkxu5QE3wBFysuOUKj056Uyj9fqHPbkLbg@mail.gmail.com> <30a502ce7a90469196140b69b91f2867@huawei.com>
In-Reply-To: <30a502ce7a90469196140b69b91f2867@huawei.com>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Fri, 21 Apr 2023 07:28:13 -0700
Message-ID: <CAM5tNy7Rnv_K=JPGRkabCiMdHn=_tmC=RgpaUcZwJj+QNyjqiQ@mail.gmail.com>
To: "yangjing (U)" <yangjing8@huawei.com>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/RgC1aAKJCpBDYJrTiw6JrBVYogY>
Subject: Re: [nfsv4] Draft is updated following your comments, Thanks//答复: draft-mzhang-nfsv4-sequence-id-calibration-01
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Apr 2023 14:28:26 -0000
On Thu, Apr 20, 2023 at 8:08 PM yangjing (U) <yangjing8@huawei.com> wrote: > > Hi Rick, Thanks for your comments and sorry for the late response ^_^ > > > On Mon, Apr 10, 2023 at 2:20 PM Rick Macklem <rick.macklem@gmail.com> wrote: > > > > On Mon, Apr 10, 2023 at 1:58 PM Rick Macklem <rick.macklem@gmail.com> wrote: > > > > > > On Mon, Apr 10, 2023 at 1:15 PM Rick Macklem <rick.macklem@gmail.com> wrote: > > > > > > > > On Sun, Apr 9, 2023 at 11:39 PM yangjing (U) > > > > <yangjing8=40huawei.com@dmarc.ietf.org> wrote: > > > > > > > > > > The draft is updated as > > > > > https://www.ietf.org/archive/id/draft-mzhang-nfsv4-sequence-id-c > > > > > alibration-02.txt > > > > Here are a few generic comments about the draft (I know nothing > > > > about detailed formatting, etc). > > > > > > > > 1 - It is my understanding that new operations can only be added > > > > to minor version 2 and not > > > > minor version 1. > > I agree with you, If new operations can only be added to minor version 2. > > > > > 2 - I largely agree with what Tom Talpey said in his comments. I > > > > think you need to figure out > > > > how/when the sequenceid is getting messed up. > > > > Note that it is my understanding that the main purpose of > > > > sessions is to achieve > > > > "exactly once RPC semantics" (the fact that the reply cache > > > > is bounded in size is a nice > > > > added feature, but not the principal purpose). > > > > As such, a sequenceid being misordered implies that "exactly > > > > once" semantics could > > > > already have been broken by a software bug in the client. > > > > > > > > You note that the misordered sequenceid could be caused by > > > > "network partitioning". > > > > In a correct implementation that should not be the case. > > > > After a network partition heals, > > > > the client should re-send all outstanding RPCs with the same > > > > Session/slot/sequenceid > > > > as was used in the original RPC request message. These should > > > > succeed (no misordered > > > > error reply) using cached replies, as required. > > > > --> If a client's recovery from network partitioning is not > > > > doing this, then it needs to be > > > > fixed. > > > > --> As above, if a misordered sequenceid error reply is due > > > > to a client bug, then that > > > > must be fixed, since "exactly once" semantics may already be broken. > > > > The common client bug that results in a misordered sequenceid > > > > error reply happens when an in-progress RPC gets interrupted > > > > before the RPC reply is processed. > > You are right. Thanks for your analysis on when misordered sequenceid happens. But this draft does not care how and when it happens, the same as the protocol specification. > We focus on how to deal with this error. If this error never happens, we will not define it, right? We did find that misordered sequenceid happens and IO performance is affected > because session is reestablished, but we are not sure why and how it happens, because no packages are being captured at that moment. > > > > > This is not a case > > > > where your suggested extension should be used, since the > > > > in-progress RPC may still be "in-progress". Although I am not sure > > > > there is a 100% correct way for a client to handle the need to > > > > interrupt an in-progress RPC, marking the session slot as broken > > > > and no longer using it is the best approach I am aware of. (I > > > > suspect you might be referring to the Linux client and I cannot > > > > comment on how it handles interruption via signals or timeouts for > > > > session slots, since I am not a Linux client implementor.) > > Yes, when we find IO performance is affected, we check Linux client code, find that client will drop the session, then we check the NFS protocol specification.... > I don’t think marking the session slot as broken is the best way. What is next? > First: Leaving the slot broken. On one hand, with more and more slots broken, concurrency > of this session will decline, On another hand, Is new mechanism needed to maintain and notify slot status between client and server? > Second: Recover the broken slot. That is what this draft is trying to do, I think. But you cannot resynchronize the broken slot safely unless the RPC that caused the sequuenceid to become non-synchronized is an idempotent one (one that can be done more than once on the server safely). This is difficult (maybe impossible) to determine when a reply to a subsequent RPC using the same session/slot is NFS4ERR_SEQ_MISORDERED. To do so for all cases where a RPC reply of NFS4ERR_SEQ_MISORDERED is received could result in a non-idempotent RPC being executed multiple times on the server. This is a serious breakage and essentially defeats the main purpose for using sessions. You are correct that, once a slot is marked bad, performance degradation is possible. However NFS4ERR_SEQ_MISORDERED failures should be rare (some might say that this error should never occur for correctly implemented clients) and, as such, having several slots on a session should also be a rare occurrence. --> After some percentage of the slots are marked bad, the client will need to do a CreateSession and replace the session with the bad slots with a new one. What percentage is a design tradeoff. It sounds like Linux chose "> 0%" (or first bad slot, if you prefer). For FreeBSD, I chose "100%" (or all slots bad) if you prefer. > > > > > > > > > It sounds like migration might be an exception to the above, but > > > > it should be addressed specifically (as Tom Talpey notes). > > For migration scenario, we will analyze it later > > > > > > > > > 3 - If your extension were used by a client, how would the client > > > > know whether or not > > > > there are still RPC(s) in-progress that have been assigned > > > > that session/slot/sequenceid? > > > An additional comment: > > > - If the client knows that the outstanding RPC request on the > > > session/slot/sequenceid that got a misordered error reply > > > is one where "exactly once" semantics is not required, > > > then I can see that the client implementor might be able to > > > safely resynchronize the sequenceid. > >Oops again. It would be the outstanding RPC request for which no reply has been received that causes the sequenceid to be misordered and that is the RPC that cannot require "exactly once" semantics. > > >rick > > > To do this, I do not think your extensions are needed. > > > If the client issues an RPC that consists of only the > > > Sequence operation repeatedly, with the sequenceid incremented > > > by one each attempt after waiting for a reply to the previous > > > attempt, it should get NFS_OK within a few attempts. > > The misordered sequenceid is not known. If the cached sequenceid is N, any value beyond N+1 and N will be identified as misordered. The range of sequenceid is (1, 2^32 - 1). > So, I don’t think attempting repeatedly is a better choice. Well, here is the basic algorithm the client would use to maintain a slot's sequenceid. (Note there should never be more than one RPC on the fly for a given session/slot.) - When an RPC reply is received from the server with NFS_OK for the Sequence operation, advance the sequenceid for the session/slot by 1. As such, I do not see how a sequenceid can ever be out by more than 1 unless there is a coding bug. For the case of a coding bug, fix the bug. For example: - Lets assume that the client is configured (not a good situation, but possibly unavoidable) so that a timeout or POSIX signal results in the client not waiting "forever" for an RPC reply. - Lets also assume that client becomes network partitioned from the server. When this is the case, it is possible that a few (even many) RPCs will be sent to the server with the same session/slot/sequenceid, one after another, each timing out before an RPC reply is received. But note that they all will have the same sequenceid. --> When this happens, it is possible that one of these RPC requests will make it to the server, be processed, and advance the sequenceid by one. --> All subsequent RPCs with the same session/slot/sequenceid processed at the server will get a NFS4ERR_SEQ_MISORDERED reply. Once the network partition is healed, the client will see a NFS4ERR_SEQ_MISORDERED reply (plus possibly other delayed replies for the same session/slot/sequenceid), but the sequenceid will only be out by 1 compared with what the server's sequenceid is for the same session/slot. --> This implies that the first attempt of a RPC that consists of only the Sequence operation with sa_sequenceid set to N + 1 (where N is what the client is currently using for sequenceid for the session/slot) will succeed. --> Whether a subsequence attempt with sequenceid set to N + 2 is justified is debatable, but certainly doing more that that does not seem reasonable to me. If, due to some serious breakage, the above does not work. I would say a CreateSession to replace the now badly broken session, is in order. Note again "this re-synchronization can only be safely done if the first RPC for which no reply was received is an idempotent one" and that is going to be difficult to determine when the NFS4ERR_SEQ_MISORDERED is received for a subsequent RPC that uses the same session/slot. --> If the client notes when it "gives up on waiting for an RPC reply" that the slot is bad and it sees that the sa_cachethis argument in the Sequence operation for the outstanding RPC request was set false (indicating it is an idempotent RPC) then I believ it could safely do a re-synchronization of the slot at that point in time. --> I chose to not do that and just mark the slot bad, but I can see an argument for doing so, since the RPC is known to be idempotent. rick > > > > Then the correct sequenceid to be used for the session/slot is known. > > Oops, "known" is probably too strong a word here. Since the cause of > > the misordered sequenceid is not known, I think there is a very low > > probability that an RPC using the session/slot is still in flight and > > will change the sequenceid when processed on the server, causing > > another misordered sequenceid error reply. > > > > I agree with you on "There may be still RPC(s) in-progress that have been assigned that session/slot/sequenceid" when misorder reply getting the client, > I will update the draft to specify that client should wait for replies for all flying requests, then query correct sequenceid. > > > > rick > > > > > > > > rick > > > > > > > > > > > rick > > > > > > > > > _______________________________________________ > > > > > nfsv4 mailing list > > > > > nfsv4@ietf.org > > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > Thank you Rick > > Best Regards > Jing > > >
- [nfsv4] Draft is updated following your comments,… yangjing (U)
- Re: [nfsv4] Draft is updated following your comme… Thomas Haynes
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- [nfsv4] 答复: Draft is updated following your comme… yangjing (U)
- [nfsv4] 答复: Draft is updated following your comme… yangjing (U)
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem
- Re: [nfsv4] Draft is updated following your comme… David Noveck
- [nfsv4] 答复: Draft is updated following your comme… yangjing (U)
- Re: [nfsv4] Draft is updated following your comme… Rick Macklem