Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design

Chuck Lever <chuck.lever@oracle.com> Fri, 05 August 2016 16:46 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jecc4dqgcgOZ7yH+n-0tMGZ7XmamM_us7fjAn4+H=C=pQ@mail.gmail.com>
Date: Fri, 05 Aug 2016 12:45:56 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <3F97CE0F-AEE9-4882-ADEB-0B6C1A0BADDE@oracle.com>
References: <CADaq8jdor+Ju+F=ZBcV6zY3PWJerpM_sDtuTPy9EZTo6hymFPQ@mail.gmail.com> <7E1F0B4A-EE1D-4686-BC9B-358AF8FCF095@oracle.com> <CADaq8jecc4dqgcgOZ7yH+n-0tMGZ7XmamM_us7fjAn4+H=C=pQ@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/uHsLhUQ2p-nWTjqdzd6SU8F-k2M>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design
Precedence: list

> On Aug 4, 2016, at 4:49 PM, David Noveck <davenoveck@gmail.com> wrote:
> 
> > In this case, however, I think the benefits outweigh the costs
> > of altering the base XDR.
> 
> Whether this is the case depends on the proportion of implementations that are 
> adequately supported by a simpler approach that does not provide the extended
> functionality of allowing per-request selection of invalidation approach.
> 
> If 80% or 90% or 95% of implementations are OK with the simpler implementation, 
> the balance of costs and benefits are different than they would be in 30%, 50%, or 
> 60% need the'more flexible approach.  I think we need to hear from the rest of the 
> working group about what they think is important here.

What I'd like to hear from others on the WG and other implementers
is: XDR churn aside, are there any other technical reasons that
the per-RPC approach is inferior to the "big switch" approach?

> > A client can't use RI at
> > all if it can't tolerate an arbitrary choice of which handle
> > in an RPC is invalidated remotely.
> 
> The question is how common such clients are.

Not exactly. There are two confounding factors that make the
real answer slippery:

- Future innovation
- Clients that can use "big switch" but prefer to use per-RPC

But OK, let's consider only current client implementations:

Today there are three well-known diaspora clients: Linux,
Solaris, and Oracle's dNFS. As far as I know, as they are
currently implemented, two of these would have to use a
per-connection switch to disable RI entirely.

In V1, in any RPC, a requester is free to provide one or more of:

Persistently registered rkeys:
1. the local DMA rkey
2. an rkey that is created for that connection, but that
    doesn't change from RPC to RPC (for a utility buffer, say)
2a. an rkey for a fixed memory region that is updated before
     each time it is advertised

Dynamically registered rkeys:
3. an rkey that is registered not using FRWR just for that RPC
4. an rkey that is registered using FRWR just for that RPC

Any combination is allowed in one RPC, or during a stream of
RPCs on a connection, but only type 4 rkeys may be invalidated
remotely.

Linux clients can use 1, 3, or 4, and do not mix these modes
in one RPC or in a stream of RPCs on one connection. Thus a
per-connection switch happens to be adequate for Linux to
support RI.

Solaris uses 2 or 3 or both in one RPC or in a stream of RPCs.
To use RI, Solaris would need to use 4 instead of 3, but even
then a per-connection switch would not allow Solaris to use RI.

Oracle dNFS uses FMR, and runs only in user space. I think
that means it can't use RI at all. But that's OK since it
uses RDMA only for large payloads, thus RI wouldn't save much.
A per-connection switch works here too, to disable RI.

But I don't believe it's a matter of percentages here. Of the
two current implementations that can use RI, Solaris, a premier
implementation of NFS/RDMA, cannot tolerate responder's choice,
and thus a per-connection switch is not adequate for it to use
Remote Invalidation at all.

> > In Linux, it's simply a matter of adding an "if vers == 2,
> > insert (or expect) two u32 fields here." 
> 
> In lots of places.

In a smart implementation, lots = 2. Forward and backchannel
call, and forward and backchannel reply.

--
Chuck Lever

Re: [nfsv4] preliminary review of draft-cel-nfsv4… karen deitke
Re: [nfsv4] preliminary review of draft-cel-nfsv4… karen deitke
Re: [nfsv4] preliminary review of draft-cel-nfsv4… Chuck Lever
Re: [nfsv4] preliminary review of draft-cel-nfsv4… David Noveck
Re: [nfsv4] preliminary review of draft-cel-nfsv4… Chuck Lever
[nfsv4] preliminary review of draft-cel-nfsv4-rem… David Noveck