Re: [nfsv4] New version of sparsedraft(draft-hildebrand-nfsv4-read-sparse-01.txt)

"Roy, Dipankar" <> Sun, 03 October 2010 12:51 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id DA2FE3A6C50 for <>; Sun, 3 Oct 2010 05:51:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -6.298
X-Spam-Status: No, score=-6.298 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_47=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id s5HNV6ubn36O for <>; Sun, 3 Oct 2010 05:51:29 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id E4AD83A6C4C for <>; Sun, 3 Oct 2010 05:51:28 -0700 (PDT)
X-IronPort-AV: E=Sophos; i="4.57,274,1283756400"; d="scan'208,217"; a="462171542"
Received: from ([]) by with ESMTP; 03 Oct 2010 05:52:22 -0700
Received: from ( []) by (8.13.1/8.13.1/NTAP-1.6) with ESMTP id o93CqL8t023062; Sun, 3 Oct 2010 05:52:21 -0700 (PDT)
Received: from ([]) by with Microsoft SMTPSVC(6.0.3790.3959); Sun, 3 Oct 2010 05:52:21 -0700
Received: from ([]) by with Microsoft SMTPSVC(6.0.3790.3959); Sun, 3 Oct 2010 18:22:16 +0530
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CB62F9.CEBBE70F"
Date: Sun, 03 Oct 2010 18:18:58 +0530
Message-ID: <>
Thread-Topic: [nfsv4] New version of sparsedraft(draft-hildebrand-nfsv4-read-sparse-01.txt)
Thread-Index: ActiddidKSovBkYGQM6jmn2NszeangAMtUBgABQqszk=
References: <><> <>
From: "Roy, Dipankar" <>
To: "Erasani, Pranoop" <>, "Matt W. Benjamin" <>, "J. Bruce Fields" <>
X-OriginalArrivalTime: 03 Oct 2010 12:52:16.0922 (UTC) FILETIME=[CEFC6FA0:01CB62F9]
Cc: Benny Halevy <>,
Subject: Re: [nfsv4] New version of sparsedraft(draft-hildebrand-nfsv4-read-sparse-01.txt)
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 03 Oct 2010 12:51:31 -0000

Hi Pranoop,

Thanks a lot for proposing this.

I think the NFS server side copy RFC implementation can definitely benefit from this.


-----Original Message-----
From: Erasani, Pranoop
Sent: Sun 10/3/2010 12:44 PM
To: Matt W. Benjamin; J. Bruce Fields
Cc: Benny Halevy;
Subject: Re: [nfsv4] New version of sparsedraft(draft-hildebrand-nfsv4-read-sparse-01.txt)
> Hi,
> Just relative to pNFS, my immediate reaction was that a DS might have
> the relevant file allocation information, and the MDS might not. 

DS might have relevant file allocation information on data on itself, 
not on other DSs'. AFAIK, pNFS does not mandate that each DS know
about other DSs' information. It's the MDS that has access to DS

Given the current proposal, as part of the READ response, the DS is supposed to
send the offset where the hole ends and the actual data begins. If the hole ends
on a different DS, how is that DS responding to READ supposed to know this.
How does the data server get to compute the information? Are all DSs' conscious
Of how data is organized on other DSs' or is it the duty of the MDS to own
that information? This is where, I feel that the existing proposal puts onerous 
requirements on the pNFS Data Servers.

> With
> READZ (is that the current operation name?), this doesn't seem to
> present a problem. 

The draft seems to addresses the problem by just mentioning that each DS needs
to know other DS's information.

   receives.  In addition, when a data server is returning a
   READ4reshole structure, it should still contain the offset and length
   of the next allocated block in the file, even if that block is not
   located on that particular data server.

However, that is not a requirement that pNFS puts on the data servers. Isn't it? Are
we wading into a path of new requirements for pNFS data servers here?

If my suspicion is true, the spirit of this proposal would discourage pNFS servers
from implementing the sparse hints.

> It would, I guess, perhaps also not present a
> problem if clients could get a hole map from a DS, but I think that's
> not what the prior email seemed to be describing?

Well.. I started with the fact that some vendors could consider that hole map
could be metadata  and thus implied that MDS would be in a better position to
serve that rather than individual data servers (especially, if the holes
span data servers).

To address the pNFS specific concerns from my original e-mail, we need to answer:

1). Who owns the hole information
2). Who sends the hole information
3). How efficiently, they can communicate hole information spanning pNFS server set

- Pranoop

> Thanks,
> Matt
> ----- "J. Bruce Fields" <> wrote:
> >
> > A few questions about a map:
> >
> > 	- What is its lifetime?  Will it be a recallable object like a
> > 	  layout, or does the client invalidate it normally whenever it
> > 	  would invalidate its data cache?
> > 	- Does requesting the block map break write delegations, or (on
> > 	  servers that support atime) update the atime?
> > 	- How does a request for a map they interact with mandatory
> > 	  locks?
> >
> > --b.
> > _______________________________________________
> > nfsv4 mailing list
> >
> >
> --
> Matt Benjamin
> The Linux Box
> 206 South Fifth Ave. Suite 150
> Ann Arbor, MI  48104
> tel. 734-761-4689
> fax. 734-769-8938
> cel. 734-216-5309
nfsv4 mailing list