Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1

Ted Lemon <mellon@fugue.com> Wed, 09 December 2015 14:43 UTC

Return-Path: <mellon@fugue.com>
X-Original-To: storagesync@ietfa.amsl.com
Delivered-To: storagesync@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BAF551B2C60 for <storagesync@ietfa.amsl.com>; Wed, 9 Dec 2015 06:43:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.912
X-Spam-Level:
X-Spam-Status: No, score=-1.912 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id idQbT6v_WSdq for <storagesync@ietfa.amsl.com>; Wed, 9 Dec 2015 06:43:39 -0800 (PST)
Received: from fugue.com (mail-2.fugue.com [IPv6:2a01:7e01::f03c:91ff:fee4:ad68]) by ietfa.amsl.com (Postfix) with ESMTP id A44211B2C42 for <storagesync@ietf.org>; Wed, 9 Dec 2015 06:43:13 -0800 (PST)
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="----sinikael-?=_1-14496721898680.6299764064606279"
From: Ted Lemon <mellon@fugue.com>
To: storagesync@ietf.org
In-Reply-To: <1449671733322-9f72a594-b1d5700c-d3631253@fugue.com>
References: <1449452139832-4f314827-a7ecd596-c5312339@fugue.com> <1449454580239-1fd59d90-52f0231b-370f2ef5@gmail.com,> <1449455245871-cb7e86e1-1a0160c5-aa6acce3@fugue.com> <2015120711170621874681@bjtu.edu.cn> <1449459616112-6043cb32-cd69a1f9-1399f1c0@fugue.com> <CAO_Yprbct8wFbS1WFnZZENSp-OruRUk2nRyBv4tNeKv9_CGuCg@mail.gmail.com> <1449511062426-94cdee34-064ef498-327458b6@fugue.com> <CAO_YprZjqs_OFC3RybVvJ4GHWb3spKMMkkFTZO=YDustp825iw@mail.gmail.com> <1449593642163-c107ebb4-0f6d1c5a-a3f1c5e0@fugue.com> <20151208185922.GA9531@localhost.localdomain> <1449609937865-6dbdad8f-eb44d945-cd684f34@fugue.com> <AE0CE9F1-3968-4229-925B-75AA37EDC327@unterwaditzer.net> <1449670262769-e440b1e3-b960232c-260b9165@fugue.com> <506D291C-4F0B-40F3-8848-97DAAF41CAAE@cern.ch> <1449671733322-9f72a594-b1d5700c-d3631253@fugue.com>
Date: Wed, 09 Dec 2015 14:43:09 +0000
Message-Id: <1449672190209-97dbcf5a-4802eeae-a6a33a55@fugue.com>
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/storagesync/_bd4M3XkVO6AwiC_gV7__KZhY7U>
Subject: Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1
X-BeenThere: storagesync@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Mechanisms to synchronize client file systems with Internet-based data storage services <storagesync.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storagesync>, <mailto:storagesync-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/storagesync/>
List-Post: <mailto:storagesync@ietf.org>
List-Help: <mailto:storagesync-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storagesync>, <mailto:storagesync-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Dec 2015 14:43:41 -0000

BTW, it occurs to me that at CERN you deal a lot in very large data sets which are created once and never modified, because they are records of physical events.  So the idea of merging might not seem all that important, because you will never do it on these data sets.

However, one thing that a good versioning system allows for that I think you should consider important is the ability to avoid accidentally losing data.   When you have really big data archives, one of the things that you want to do for redundancy is keep multiple copies.   If one instance gets corrupted, you want to be able to detect that, and you want to be able to avoid losing other instances of the file if the file is lost from an instance of a folder.   The most reliable way to avoid that is to keep versioning metadata.   Multiple copies of versioning metadata are very useful for forensic analysis when something goes wrong, as well.

Additionally, while the bulk of the actual _data_ that CERN stores is really big files, the little files matter just as much--the work researchers are doing, particularly collaboratively.   Enabling efficient collaboration on articles in progress, enabling effective sharing of code, and so on, all are very important despite representing a tiny percentage of the total data stored.


--
Sent from Whiteout Mail - https://whiteout.io

My PGP key: https://keys.whiteout.io/mellon@fugue.com