Re: [nfsv4] The nfs-ganesha server keeps responding with NFS4ERR_SEQ_MISORDERED, and the client's business is stuck and cannot be recovered.

On 5/24/23 10:27, David Noveck wrote:

> *From:*nfsv4 [mailto:nfsv4-bounces@ietf.org] *On Behalf Of *David Noveck
> *Sent:* Wednesday, May 24, 2023 10:05 AM
> *To:* 飞虎郑<zhengfeihu1@outlook.com>
> *Cc:* NFSv4 <nfsv4@ietf.org>
> *Subject:* Re: [nfsv4] The nfs-ganesha server keeps responding with 
> NFS4ERR_SEQ_MISORDERED, and the client's business is stuck and cannot 
> be recovered.

Ganesha developer responding here...

> On Tue, May 23, 2023, 7:03 PM 飞虎 郑 <zhengfeihu1@outlook.com> wrote:
>
>>     Issue:
>
>>     The nfs-ganesha server keeps responding with
>>     NFS4ERR_SEQ_MISORDERED, and the client's business is stuck and
>>     cannot be recovered.
>
> Whether there is a problem with the protocol or the server depends on 
> how this unfortunate situation arose.  See below.
>
>>     Background：
>
>>     We use nfs-ganesha as the server and Linux kernel as the client.
>
>>     We conducted the following test:
>
>>     The NFS server is a three-node environment, and the client
>>     accesses the NFS server through a virtual IP. After the NFS client
>>     was mounted, files were read and written to the NFS server via
>>     vdbench.
>
>>     On the NFS server node, we simulated memory sub-health through a
>>     script (memory continues to increase, and the server node performs
>>     a hard reboot after it exceeds the set critical value), and the
>>     virtual IP would migrate to another node after the node reboot.
>>     After the node reboot, the client's business was always stuck at
>>     0, and accessing the mount point also caused it to hang.
>
> If I understand your description correctly,
>
> the client is not noticing the server reboot and establishing a new 
> clientid and session id. Is that right?  If this doesn’t happen, why 
> does the server accept the old session used by the client to access 
> the old (pre-reboot) server instance.

I'm not sure what is going on here. Ganesha includes information in it's 
clientid4, sessionid4, and stateid4 values that allows it to detect 
clientid4 and stateid4 from another instance. One possibility here is 
that Ganesha does provide a command line option to set the "epoch" or 
server instance to make it easier for some clusters to assure unique 
server instances, otherwise Ganesha uses startup time (a now() system 
call). If that command line option is being used incorrectly, to make 
the server instances identical, then Ganesha can not properly detect a 
clientid4 or stateid4 from a different server instance. One other 
possibility exists, if multiple instances really are being started such 
that now() is the same on all of them (this is why the command line 
option is present, it allows the scripting that startes Ganesha to 
include a node number and usually a monotonic startup counter since this 
value must fit in 32 bits). That seems a bit unlikely in a short lived 
test setup (I wouldn't rely on differing time stamps in a long term 
production environment though).

Notably, Ganesha DOES return NFS4ERR_BADSESSION if a lookup of 
sessionid4 does not find the session. For that lookup to succeed, a 
successful CREATE_SESSION must have occured on that Ganesha instance. If 
the "epoch" values are being improperly set though, the lookup could 
find a session that happens to have the same sessionid4 value.

>>     By capturing packets via Wireshark, we found that the client kept
>>     sending requests in the SEQUENCE, PUTFH, WRITE, GETATTR
>>     combination (or other types of requests) to the server, while the
>>     NFS server kept responding with NFS4ERR_SEQ_MISORDERED.
>
> That implies it is accepting the session as valid.  The question is 
> "Why?".

Certainly, see above...

>>     We have conducted this kind of memory sub-health test many times,
>>     but this problem only occurred in this particular instance.
>
>>     Client kernel version: 4.18.0-193.14.2.el8_2.x86_64
>>     (mockbuild@10-75-9-128).
>
> It seems ro me you have a reboot recovery bug.  It would help to have 
> a packet trace from the server to which failover occurred.

Yes, this would help.

It is interesting that the issue seems to only happen with a particular 
client kernel version. One possibility is a client bug of some sort, I 
don't know, I don't follow the Linux client close enough to understand.

>>     Analysis:
>
>>     In order to verify whether the NFS server's reply of
>>     NFS4ERR_SEQ_MISORDERED would cause the client's business to drop
>>     to 0, I modified the code of ganesha: when the value of the
>>     sequence_id corresponding to the session_id and slot_id is set, I
>>     added 10 to the sequence_id cached on the server to simulate a
>>     request sequence mutation, which would only occur once (subsequent
>>     sequence_id values for other slot_ids.
>
> When  an extension proposal  for a new op to aid recovery was made,  
> the question was raised as to why such recovery would be needed since 
> the protocol described in rfc5661 seemed to be bulletproof in handling 
> sequence values and was important in providing exactly-once-semantics.
>
> Although there seemed to be an implication that such problems did in 
> fact happen, so far only the following have been cited:
>
>   * What appears to be a reboot recovery bug.
>
>   * A case where you force a mismatch by hacking the server.
>
> Is there a case in which this misordering happens without essentially 
> forcing it to happen?
>
> One possibility to be considered is that the protocol described by 
> rfc5661 is not, in fact, implemented.  Note that the fourth paragraph 
> of Section 2.10.6.2 of rfc5661 contains a "MUST" which is not adhered 
> to and never will be because of rpc requests terminated by control-C. 
> Does anyone have experience with sequence misordering due to this issue?
>
> In any case, rfc5661bis will have to address the issue of this 
> incorrect "MUST" somehow.  I would appreciate it if people read and 
> commented on the discussion in Appendix C.1.2 (followed up on in C.2.1 
> and C.2.2) of draft-ietf-nfsv4-rfc5661bis-00.  I would like to make 
> progress on these issues in -01.
>
>>       After the request sequence mutation, the client's business kept
>>     dropping to 0 and could not recover.
>
> That seems to me to be an expected consequence of forcing a request 
> sequence mutation. The client is not designed to be impervious to 
> server hacking that breaks the protocol. Even if it is bulletproof, it 
> has no protection against high-energy anti-tank shells.

Agreed. If server faults are being introduced, then don't blame the 
server. It might be reasonable to ask the client if it could be more 
robust in a situation like this.

>>     By dynamically enabling ganesha's logs and using wireshark to
>>     capture packets, it can be seen that ganesha replied
>>     NFS4ERR_SEQ_MISORDERED to the client. Afterwards, the client used
>>     a new slot_id to send a request with the main request of
>>     GETATTR(SEQUENCE, PUTFH, GETATTR), and the server processed it
>>     normally and replied to the client. In addition, the client kept
>>     sending combination requests with the main request of
>>     LOOKUP(SEQUENCE, PUTFH, LOOKUP, GETFH, GETATTR) using the old
>>     slot_id and corresponding request sequence number, and ganesha
>>     directly replied NFS4ERR_SEQ_MISORDERED after detecting the loss
>>     of the SEQUENCE request sequence number. From the results, it
>>     seems that the client kept using the request sequence number
>>     corresponding to the old slot_id to send some op requests to the
>>     server, and the server kept replying NFS4ERR_SEQ_MISORDERED,
>>     causing the client's business to drop to 0.
>
>
> The client could prevent this by avoiding use of the compromised slot 
> but it is not obliged to do so. Rfc5661bis might advise this 
> non-normatively.
>
>>     I noticed that if there is a request sequence disorder, there is
>>     no interface to query the server's request sequence number.
>
> The question is whether we should provide one.  I think Clear and 
> Convincing Evidence is required but so far we haven’t even seen 
> Probable Cause yet.
>
>>     If there is a request sequence mutation, the client cannot obtain
>>     the request sequence number. It seems meaningless for the client
>>     to keep querying.
>
> If it hurts when you do that,  don't do that :-).
>
>>     In addition, I modified the ganesha code so that if the request
>>     sequence number is out of order for a period of time, ganesha
>>     replies NFS4ERR_BADSESSION or if the request sequence number
>>     returns to normal, the client's business can be restored and
>>     completed.
>
>>     If NFS-Ganesha responds with NFS4ERR_BADSESSION after the request
>>     sequence becomes disordered, does it modify the protocol
>
> It would.
>
>>     and I wonder if this is appropriate?
>
> It is a hack and violates the protocol but the ietf has no enforcement 
> powers.  It appears to me better to focus on returning bad session 
> when the protocol actually requires it.

Yes, that was my thought, but I suggested some conversation here in case 
we were missing something, or if this proposal actually made any sense.

As it stands, this appears to be an injection of a server error not 
induced by the actual production code combined with possibly a client 
bug. If this really is the case, I don't see anything the server should do.

>>     I have already provided feedback on this issue to the nfs-ganesha
>>     community, please visit:
>>     https://github.com/nfs-ganesha/nfs-ganesha/issues/941.
>
>>     获取Outlook for Android <https://aka.ms/AAb9ysg>
>
>>     _______________________________________________
>>     nfsv4 mailing list
>>     nfsv4@ietf.org
>>     https://www.ietf.org/mailman/listinfo/nfsv4