Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design

Chuck Lever <chuck.lever@oracle.com> Fri, 05 August 2016 16:46 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 30DAF12D80C for <nfsv4@ietfa.amsl.com>; Fri, 5 Aug 2016 09:46:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.507
X-Spam-Level:
X-Spam-Status: No, score=-5.507 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fkM3Z3kZA-XH for <nfsv4@ietfa.amsl.com>; Fri, 5 Aug 2016 09:46:00 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 309F812D169 for <nfsv4@ietf.org>; Fri, 5 Aug 2016 09:46:00 -0700 (PDT)
Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u75GjwjL025114 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 5 Aug 2016 16:45:58 GMT
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u75Gjwb9024806 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 5 Aug 2016 16:45:58 GMT
Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u75GjvTF009705; Fri, 5 Aug 2016 16:45:58 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 05 Aug 2016 09:45:57 -0700
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jecc4dqgcgOZ7yH+n-0tMGZ7XmamM_us7fjAn4+H=C=pQ@mail.gmail.com>
Date: Fri, 05 Aug 2016 12:45:56 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <3F97CE0F-AEE9-4882-ADEB-0B6C1A0BADDE@oracle.com>
References: <CADaq8jdor+Ju+F=ZBcV6zY3PWJerpM_sDtuTPy9EZTo6hymFPQ@mail.gmail.com> <7E1F0B4A-EE1D-4686-BC9B-358AF8FCF095@oracle.com> <CADaq8jecc4dqgcgOZ7yH+n-0tMGZ7XmamM_us7fjAn4+H=C=pQ@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: userv0021.oracle.com [156.151.31.71]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/uHsLhUQ2p-nWTjqdzd6SU8F-k2M>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Aug 2016 16:46:02 -0000

> On Aug 4, 2016, at 4:49 PM, David Noveck <davenoveck@gmail.com> wrote:
> 
> > In this case, however, I think the benefits outweigh the costs
> > of altering the base XDR.
> 
> Whether this is the case depends on the proportion of implementations that are 
> adequately supported by a simpler approach that does not provide the extended
> functionality of allowing per-request selection of invalidation approach.
> 
> If 80% or 90% or 95% of implementations are OK with the simpler implementation, 
> the balance of costs and benefits are different than they would be in 30%, 50%, or 
> 60% need the'more flexible approach.  I think we need to hear from the rest of the 
> working group about what they think is important here.

What I'd like to hear from others on the WG and other implementers
is: XDR churn aside, are there any other technical reasons that
the per-RPC approach is inferior to the "big switch" approach?


> > A client can't use RI at
> > all if it can't tolerate an arbitrary choice of which handle
> > in an RPC is invalidated remotely.
> 
> The question is how common such clients are.

Not exactly. There are two confounding factors that make the
real answer slippery:

- Future innovation
- Clients that can use "big switch" but prefer to use per-RPC

But OK, let's consider only current client implementations:

Today there are three well-known diaspora clients: Linux,
Solaris, and Oracle's dNFS. As far as I know, as they are
currently implemented, two of these would have to use a
per-connection switch to disable RI entirely.

In V1, in any RPC, a requester is free to provide one or more of:

Persistently registered rkeys:
1. the local DMA rkey
2. an rkey that is created for that connection, but that
    doesn't change from RPC to RPC (for a utility buffer, say)
2a. an rkey for a fixed memory region that is updated before
     each time it is advertised

Dynamically registered rkeys:
3. an rkey that is registered not using FRWR just for that RPC
4. an rkey that is registered using FRWR just for that RPC

Any combination is allowed in one RPC, or during a stream of
RPCs on a connection, but only type 4 rkeys may be invalidated
remotely.

Linux clients can use 1, 3, or 4, and do not mix these modes
in one RPC or in a stream of RPCs on one connection. Thus a
per-connection switch happens to be adequate for Linux to
support RI.

Solaris uses 2 or 3 or both in one RPC or in a stream of RPCs.
To use RI, Solaris would need to use 4 instead of 3, but even
then a per-connection switch would not allow Solaris to use RI.

Oracle dNFS uses FMR, and runs only in user space. I think
that means it can't use RI at all. But that's OK since it
uses RDMA only for large payloads, thus RI wouldn't save much.
A per-connection switch works here too, to disable RI.

But I don't believe it's a matter of percentages here. Of the
two current implementations that can use RI, Solaris, a premier
implementation of NFS/RDMA, cannot tolerate responder's choice,
and thus a per-connection switch is not adequate for it to use
Remote Invalidation at all.


> > In Linux, it's simply a matter of adding an "if vers == 2,
> > insert (or expect) two u32 fields here." 
> 
> In lots of places.

In a smart implementation, lots = 2. Forward and backchannel
call, and forward and backchannel reply.


--
Chuck Lever