Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1

Ted Lemon <mellon@fugue.com> Thu, 03 December 2015 19:14 UTC

Return-Path: <mellon@fugue.com>
X-Original-To: storagesync@ietfa.amsl.com
Delivered-To: storagesync@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E22EB1A002C for <storagesync@ietfa.amsl.com>; Thu, 3 Dec 2015 11:14:12 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.912
X-Spam-Level:
X-Spam-Status: No, score=-1.912 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c49JRLG9VVJl for <storagesync@ietfa.amsl.com>; Thu, 3 Dec 2015 11:14:11 -0800 (PST)
Received: from fugue.com (mail-2.fugue.com [IPv6:2a01:7e01::f03c:91ff:fee4:ad68]) by ietfa.amsl.com (Postfix) with ESMTP id A10761A0016 for <storagesync@ietf.org>; Thu, 3 Dec 2015 11:14:10 -0800 (PST)
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="----sinikael-?=_1-14491700470270.6078331582248211"
From: Ted Lemon <mellon@fugue.com>
To: storagesync@ietf.org
In-Reply-To: <D51EEF1B-13C8-4BBB-91C0-1B473D17759C@cern.ch>
References: <mailman.108.1449000023.26068.storagesync@ietf.org> <1449004445.2745758.455126129.5028FD2B@webmail.messagingengine.com> <CAO_YprZhCmUxEf=aGCYL=+CLbjUoD1ifpDFsrS7N40Npo4wr+w@mail.gmail.com> <1449050174.3667910.455617161.12EEE3C5@webmail.messagingengine.com> <1449051540970-b577e6c2-393e54ef-bbe05be4@gmail.com> <1449052128.3674794.455635937.667C3E1F@webmail.messagingengine.com> <CAPpPfeAdrCZcsYZo7=W6N14K4F2LutXN8BFTetikzKZSr8+vVA@mail.gmail.com> <259424f4.2bca.1516717ef55.Coremail.fsong@bjtu.edu.cn> <56600F0A.9000200@tuxed.net> <CAPpPfeDPHGR+vn0=ji9frF2kr+J=YR76g0e7yOndKzz97bxdHQ@mail.gmail.com> <566014EA.2010705@tuxed.net> <CAO_Yprbc9LMc3TmpkKpmN9hUzAix13nfuSRS5Z8jPf6xu8xjNg@mail.gmail.com> <56601F18.8030409@tuxed.net> <CAO_YpraF1UrV49Po9PZx6ZoSbcLm5gRPEKXAdTT3VvPPPWEAfg@mail.gmail.com> <1449153485919-e58fed74-d7eab50a-01b3670c@fugue.com> <D51EEF1B-13C8-4BBB-91C0-1B473D17759C@cern.ch>
Date: Thu, 03 Dec 2015 19:14:07 +0000
Message-Id: <1449170047359-e297ed6e-b9fd94e9-570ca980@fugue.com>
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/storagesync/wRCt9MOl8mYkcxt1sagT6DtS9KU>
Subject: Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1
X-BeenThere: storagesync@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Mechanisms to synchronize client file systems with Internet-based data storage services <storagesync.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storagesync>, <mailto:storagesync-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/storagesync/>
List-Post: <mailto:storagesync@ietf.org>
List-Help: <mailto:storagesync-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storagesync>, <mailto:storagesync-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Dec 2015 19:14:13 -0000

Thursday, Dec 3, 2015 1:41 PM Jakub Moscicki wrote:
> Could you please explain why you think it does not scale? There are schemes in which you do not need to propfind the entire remote tree to discover changes, only a sequence of propfinds in a direct path leading to a changed entity. That’s not optimal for sure (several calls needed, depending on the depth of the change) but if it comes to WebDAV itself, apart from a bloated XML representation, I do not see much problem.

It doesn't scale because you've just described a lockstep process for locating changes; instead of there being one transaction to get the set of changes from a known common version, you have multiple steps, and the number of steps increases at least logarithmically (I am guessing, since I don't know your algorithm) as the number of changes since last sync increases or the size of the tree increases.   If you are syncing continuously this may not be a big deal, but I don't think that's a reasonable assumption.

This also has the problem that you have no separation of metadata and data, and no versioning, which means that you have to have a client-server model, and you can't have a distributed peer model.

> Okey, but what you need to remember the client state and pass it on to the server to calculate the diff state. Any good ideas how to do that?

Keep the metadata sorted (e.g. by modification time), and do a diff whenever a new version is generated.   Versions that have never been shared with other peers should be trimmed once they are not the head version.   When peer A wants to sync with peer B, it announces the version it has and the version it last remembers syncing with peer B.   If Peer B remembers that version, then the difference can be computed trivially, O(n), and is minimal (n*2 at worst), where n is the number of changes.   If Peer B has lost its memory, then it has to send the complete inventory of its head version, which Peer A can then compare with its head to see what's changed.


--
Sent from Whiteout Mail - https://whiteout.io

My PGP key: https://keys.whiteout.io/mellon@fugue.com