Re: [nfsv4] New version of sparse draft(draft-hildebrand-nfsv4-read-sparse-01.txt)

"Matt W. Benjamin" <> Sun, 03 October 2010 15:19 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 5D4B43A6CEE for <>; Sun, 3 Oct 2010 08:19:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.55
X-Spam-Status: No, score=-1.55 tagged_above=-999 required=5 tests=[AWL=0.449, BAYES_00=-2.599, J_CHICKENPOX_47=0.6]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id GqyoHnya1eI1 for <>; Sun, 3 Oct 2010 08:19:50 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id E2AA43A6CD8 for <>; Sun, 3 Oct 2010 08:19:49 -0700 (PDT)
Received: from ( []) by (8.13.1/8.13.1/SuSE Linux 0.7) with ESMTP id o93FKVkU009667; Sun, 3 Oct 2010 11:20:33 -0400
Received: from localhost (localhost.localdomain []) by (Postfix) with ESMTP id 9454E3FC83A2; Sun, 3 Oct 2010 11:20:30 -0400 (EDT)
X-Virus-Scanned: amavisd-new at
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 8PWQzR3iIILL; Sun, 3 Oct 2010 11:20:29 -0400 (EDT)
Received: from ( []) by (Postfix) with ESMTP id 3B51B3FC83A1; Sun, 3 Oct 2010 11:20:29 -0400 (EDT)
Date: Sun, 03 Oct 2010 11:20:29 -0400
From: "Matt W. Benjamin" <>
To: Pranoop Erasani <>
Message-ID: <>
In-Reply-To: <>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
X-Originating-IP: []
X-Mailer: Zimbra 6.0.5_GA_2180.CentOS5_64 (ZimbraWebClient - FF3.0 (Linux)/6.0.5_GA_2180.CentOS5_64)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 ( []); Sun, 03 Oct 2010 11:20:35 -0400 (EDT)
Cc: "J. Bruce Fields" <>, Benny Halevy <>,
Subject: Re: [nfsv4] New version of sparse draft(draft-hildebrand-nfsv4-read-sparse-01.txt)
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 03 Oct 2010 15:19:51 -0000

Hi Pranoop,

I noted myself that the next offset return requirement. was potentially onerous, for the reason you state.  However, the response from Dean and explicated by Bruce is that the DS was not meant to be obligated to return the next filled offset, merely to return the information it has.  (Yes, you've raised an interesting point about this.)

It still seems to me, offhand, just as problematic for a different subset of implementations to have a new requirement to propagate hole information (synchronously?) to the MDS, as to have the DSes collectively aware of it.  That is, is it not a potential advantage of pNFS, as currently specified, that the MDS need not have global hole information either?



----- "Pranoop Erasani" <> wrote:

> > Hi,
> > 
> > Just relative to pNFS, my immediate reaction was that a DS might
> have
> > the relevant file allocation information, and the MDS might not. 
> DS might have relevant file allocation information on data on itself,
> not on other DSs'. AFAIK, pNFS does not mandate that each DS know
> about other DSs' information. It's the MDS that has access to DS
> information.
> Given the current proposal, as part of the READ response, the DS is
> supposed to
> send the offset where the hole ends and the actual data begins. If the
> hole ends
> on a different DS, how is that DS responding to READ supposed to know
> this.
> How does the data server get to compute the information? Are all DSs'
> conscious
> Of how data is organized on other DSs' or is it the duty of the MDS to
> own
> that information? This is where, I feel that the existing proposal
> puts onerous 
> requirements on the pNFS Data Servers.
> > With
> > READZ (is that the current operation name?), this doesn't seem to
> > present a problem. 
> The draft seems to addresses the problem by just mentioning that each
> DS needs
> to know other DS's information.

This was clarified in subsequent email, as I attempt to summarize above.

>    receives.  In addition, when a data server is returning a
>    READ4reshole structure, it should still contain the offset and
> length
>    of the next allocated block in the file, even if that block is not
>    located on that particular data server.
> However, that is not a requirement that pNFS puts on the data servers.
> Isn't it? Are
> we wading into a path of new requirements for pNFS data servers here?
> If my suspicion is true, the spirit of this proposal would discourage
> pNFS servers
> from implementing the sparse hints.
> > It would, I guess, perhaps also not present a
> > problem if clients could get a hole map from a DS, but I think
> that's
> > not what the prior email seemed to be describing?
> Well.. I started with the fact that some vendors could consider that
> hole map
> could be metadata  and thus implied that MDS would be in a better
> position to
> serve that rather than individual data servers (especially, if the
> holes
> span data servers).
> To address the pNFS specific concerns from my original e-mail, we need
> to answer:
> 1). Who owns the hole information
> 2). Who sends the hole information
> 3). How efficiently, they can communicate hole information spanning
> pNFS server set

Matt Benjamin

The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309