Re: [nfsv4] New version of sparse draft(draft-hildebrand-nfsv4-read-sparse-01.txt)

Thomas Haynes <thomas@netapp.com> Mon, 04 October 2010 02:53 UTC

Return-Path: <Thomas.Haynes@netapp.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6F3843A6F17 for <nfsv4@core3.amsl.com>; Sun, 3 Oct 2010 19:53:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.49
X-Spam-Level:
X-Spam-Status: No, score=-6.49 tagged_above=-999 required=5 tests=[AWL=0.109, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id craqoFeGR-4H for <nfsv4@core3.amsl.com>; Sun, 3 Oct 2010 19:53:14 -0700 (PDT)
Received: from mx2.netapp.com (mx2.netapp.com [216.240.18.37]) by core3.amsl.com (Postfix) with ESMTP id 776B43A6F12 for <nfsv4@ietf.org>; Sun, 3 Oct 2010 19:53:14 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.57,276,1283756400"; d="scan'208";a="462343032"
Received: from smtp2.corp.netapp.com ([10.57.159.114]) by mx2-out.netapp.com with ESMTP; 03 Oct 2010 19:54:07 -0700
Received: from sacrsexc1-prd.hq.netapp.com (sacrsexc1-prd.hq.netapp.com [10.99.115.27]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id o942s61k013738; Sun, 3 Oct 2010 19:54:07 -0700 (PDT)
Received: from rtprsexc2-prd.hq.netapp.com ([10.100.161.115]) by sacrsexc1-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Sun, 3 Oct 2010 19:54:06 -0700
Received: from RTPMVEXC1-PRD.hq.netapp.com ([10.100.161.111]) by rtprsexc2-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Sun, 3 Oct 2010 22:54:05 -0400
Received: from charles-cancillas-macbook-pro.local ([10.58.49.228]) by RTPMVEXC1-PRD.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Sun, 3 Oct 2010 22:54:04 -0400
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset="us-ascii"
From: Thomas Haynes <thomas@netapp.com>
In-Reply-To: <43EEF8704A569749804F545E3306FCE30921519A@SACMVEXC3-PRD.hq.netapp.com>
Date: Sun, 03 Oct 2010 21:54:00 -0500
Content-Transfer-Encoding: 7bit
Message-Id: <5CD1CFC3-51A4-4DBC-9E7F-EFF1990A0330@netapp.com>
References: <1166093344.184.1286118643092.JavaMail.root@thunderbeast.private.linuxbox.com> <1962942208.186.1286119228985.JavaMail.root@thunderbeast.private.linuxbox.com> <43EEF8704A569749804F545E3306FCE30921519A@SACMVEXC3-PRD.hq.netapp.com>
To: "Erasani, Pranoop" <Pranoop.Erasani@netapp.com>
X-Mailer: Apple Mail (2.1081)
X-OriginalArrivalTime: 04 Oct 2010 02:54:05.0354 (UTC) FILETIME=[685B88A0:01CB636F]
Cc: "J. Bruce Fields" <bfields@fieldses.org>, Benny Halevy <bhalevy@panasas.com>, nfsv4@ietf.org
Subject: Re: [nfsv4] New version of sparse draft(draft-hildebrand-nfsv4-read-sparse-01.txt)
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Oct 2010 02:53:15 -0000

The harder case is for a non-clustered pNFS community, i.e., one
where the MDS and DSes need a control protocol to talk. Let's
assume that is the case here.

A hole is created when a WRITE happens at a DS that is not
contiguous with the previous (if any) WRITE.

If a DS is supposed to return the range until the next non-zero
data, then it will need to contact every other DS or perhaps
the MDS.

If the MDS is supposed to honor a GET_SPARSE_BLOCK_MAP
type of request from the client, then either each DS would have
to send hole entries to the MDS or the MDS would have to
probe each one.

What if instead of a DS returning information about other DSes,
it simply returns information about what it knows? I.e., my next
data is at address X.

Consider a file with a width of 32k. The first write it gets is at 1M
and then it gets 5 other contiguous writes. The next data is at
2M and it gets 4 other contiguous writes. The DS can't assume
that any other DS has zeros in the first 1M. Worst case, all other
N-1 DSes do have data there.

In that scenario, N-1 DSes will return 32k on the first read and
1 DS will return a hole for it only until 1M. The client will then
only send N-1 READs until it it gets to the 1M mark.