Re: [nfsv4] FW: I-D ACTION:draft-faibish-nfsv4-pnfs-access-permissions-check-03.txt

Tom Haynes <tom.haynes@oracle.com> Mon, 12 July 2010 20:21 UTC

Return-Path: <tom.haynes@oracle.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 2976F3A6C71 for <nfsv4@core3.amsl.com>; Mon, 12 Jul 2010 13:21:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.407
X-Spam-Level:
X-Spam-Status: No, score=-6.407 tagged_above=-999 required=5 tests=[AWL=0.192, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ap7XP45V2-Aa for <nfsv4@core3.amsl.com>; Mon, 12 Jul 2010 13:21:03 -0700 (PDT)
Received: from rcsinet10.oracle.com (rcsinet10.oracle.com [148.87.113.121]) by core3.amsl.com (Postfix) with ESMTP id D4EF13A6C45 for <nfsv4@ietf.org>; Mon, 12 Jul 2010 13:21:02 -0700 (PDT)
Received: from rcsinet15.oracle.com (rcsinet15.oracle.com [148.87.113.117]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id o6CKL6ZL019348 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 12 Jul 2010 20:21:08 GMT
Received: from acsmt354.oracle.com (acsmt354.oracle.com [141.146.40.154]) by rcsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id o6CKL0Pk001900; Mon, 12 Jul 2010 20:21:06 GMT
Received: from abhmt005.oracle.com by acsmt354.oracle.com with ESMTP id 398845071278965970; Mon, 12 Jul 2010 13:19:30 -0700
Received: from [192.168.2.6] (/98.184.164.41) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 12 Jul 2010 13:19:30 -0700
Message-ID: <4C3B78CE.6080302@oracle.com>
Date: Mon, 12 Jul 2010 15:19:26 -0500
From: Tom Haynes <tom.haynes@oracle.com>
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: david.black@emc.com
References: <C2D311A6F086424F99E385949ECFEBCB031892DE@CORPUSMX80B.corp.emc.com>
In-Reply-To: <C2D311A6F086424F99E385949ECFEBCB031892DE@CORPUSMX80B.corp.emc.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Source-IP: acsmt354.oracle.com [141.146.40.154]
X-Auth-Type: Internal IP
X-CT-RefId: str=0001.0A090207.4C3B7932.018C:SCFMA4539814,ss=1,fgs=0
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] FW: I-D ACTION:draft-faibish-nfsv4-pnfs-access-permissions-check-03.txt
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jul 2010 20:21:05 -0000

david.black@emc.com wrote:
> This is the revised permissions check draft - it actually deals with any
> circumstance under which a client cannot access a pNFS data server and
> wants to report that inaccessibility.
>
> Thanks,
> --David
>
>   

1)  This:

   To the extent that an 
   MDS can determine whether storage devices are accessible to clients, 
   an MDS SHOULD NOT include a storage device in any pNFS layouts sent 
   to a client that cannot access that storage device. At a minimum, the 
   server SHOULD perform these storage device accessibility checks 
   before exporting a filesystem that supports pNFS and when the device 
   configuration for such an exported filesystem is changed (e.g., to 
   add a storage device).


implies to me that the MDS has to keep track of 
LAYOUT4_RET_REC_FSID_NO_ACCESS
and LAYOUT4_RET_REC_FILE_NO_ACCESS layout return types per client. I.e., 
once it
knows a client has problems with a specific storage device, it should 
avoid using that
device again.

Given that we need this mechanism for the client to report errors, how 
then does the server
know when it can start using these storage devices again? Even if the 
MDS knows a positive
change took place, it has to rely on the client to do the checking.

Is this a difference between MUST and SHOULD? I.e., does having SHOULD 
mean that
the MDS can hand out the storage devices again to see if the client can 
suddenly start
using them again?

2) Is it NFS4ERR_PERM or NFS4ERR_ACCESS for access permission denial?

   o  NFS4ERR_PERM SHOULD be used for access permission denial; and 


 From 5661:

15.1.6.1. NFS4ERR_ACCESS (Error Code 13)

   Indicates permission denied.  The caller does not have the correct
   permission to perform the requested operation.  Contrast this with
   NFS4ERR_PERM (Section 15.1.6.2), which restricts itself to owner or
   privileged-user permission failures, and NFS4ERR_WRONG_CRED
   (Section 15.1.6.4), which deals with appropriate permission to delete
   or modify transient objects based on the credentials of the user that
   created them.

15.1.6.2. NFS4ERR_PERM (Error Code 1)

   Indicates requester is not the owner.  The operation was not allowed
   because the caller is neither a privileged user (root) nor the owner
   of the target of the operation.

Since this document talks about mount issues, I went back to RFC 1813
and the MOUNT protocol. MNT can return MNT3ERR_ACCES if
the client does not have access rights to the export.

I think NFS4ERR_ACCESS is more consistent with prior protocols
than NFS4ERR_PERM.

Going back to this draft, in section 3.3, I see:

   There are two NO_ACCESS layoutreturn_type4 values that indicate lack 
   of storage device access, LAYOUT4_RET_REC_FSID_NO_ACCESS and 
   LAYOUT4_RET_REC_FILE_NO_ACCESS. 

and 

   An NFS error (nfsstat4) is 
   included in the layoutreturn data structures for these two types to 
   distinguish access permission problems from device inaccessibility: 


I think access has been overloaded here and to clarify things, 
NFS4ERR_PERM is selected
over NFS4ERR_ACCESS.

The only other reasons I can see for using NFS4ERR_PERM instead of 
NFS4ERR_ACCESS
are related to security:

a) if the user credentials were insufficient, i.e., kerberized access to 
the storage device failed.

b) Section 13.12 of 5661:

   If the metadata server would
   deny a READ or WRITE operation on a file due to its ACL, mode
   attribute, open access mode, open deny mode, mandatory byte-range
   lock state, or any other attributes and state, the data server MUST
   also deny the READ or WRITE operation.

Which seems to point out a need for error codes for:

a) No access granted (mount for files, block devices have other means)
b) Permission denied for the operation.
c) Permission denied because of security.

The difference between b) and c) is that b) is per fileid and c) is per 
fsid. So a MDS could
still use the storage device in the layout for b), but should avoid 
using it for c).

3) I find:

   An NFS error (nfsstat4) is 
   included in the layoutreturn data structures for these two types to 
   distinguish access permission problems from device inaccessibility: 

   o  NFS4ERR_PERM SHOULD be used for access permission denial; and 

   o  NFS4ERR_NXIO SHOULD be used for inability to access a device. 

   Other NFS errors MAY be used when they are appropriate. All uses of 
   these two layout return types that report errors SHOULD be logged by 
   the client. 


to be under-specified. What are the other errors that a server can see 
and how is it
supposed to react to those errors?

I'd like to see language about which errors are MANDATORY to be 
supported and which
are OPTIONAL. I know I can read the above to see that there are only two 
MANDATORY
ones, but I can also read it to see that all are MANDATORY.

I don't want clients shoehorning every error code back into these two. 
And I do want clarification
on what a server should do with the OPTIONAL codes. I.e. is it free to 
reuse those storage
devices the next time that client asks for a layout?

4)

What if the storage device A returns NFS4ERR_STALE to the client while 
storage device
B returns NFS4_OK for an operation on the same layout? But for a 
different file with the
same layout, it is given NFS4_OK from both DSs. This isn't necessarily 
either a permission issue nor
a device inaccessibility issue. (It could be either: the export was 
changed or the filesystem
indicated by the filehandle does not exist. It could also just mean that 
the indicated file does
not exist.)

Would this be where LAYOUT4_RET_REC_FILE_NO_ACCESS and NFS4ERR_NXIO is 
appropriate?

Which leads to an even more interesting question, what constitutes a 
NFS4ERR_NXIO error?
As NFS4ERR_NXIO is defined by 5661 to not be a valid return code from 
any operation
or CB, I would take it to mean simply that the storage device is not 
responding to the client.
If the client got any error back from the storage device, then it could 
not use NFS4ERR_NXIO.

As the NFS4ERR_STALE could simply mean that the filehandle refers to a 
file which doesn't
exist, is it appropriate to inform the server to no longer use that 
storage device in the layouts
assigned to that client?

Where this is going is perhaps now is a good time to add more 
informative error codes from
the storage device to the client. This would in turn allow the client to 
send these back to the
MDS. I.e., there is a world of difference between this device (fsid) 
does not exist and this file
does not exist.