Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1

"Marc Blanchet" <marc.blanchet@viagenie.ca> Wed, 09 December 2015 14:44 UTC

Return-Path: <marc.blanchet@viagenie.ca>
X-Original-To: storagesync@ietfa.amsl.com
Delivered-To: storagesync@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C3B311B2C30 for <storagesync@ietfa.amsl.com>; Wed, 9 Dec 2015 06:44:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D4myj-JeLw6t for <storagesync@ietfa.amsl.com>; Wed, 9 Dec 2015 06:44:51 -0800 (PST)
Received: from jazz.viagenie.ca (jazz.viagenie.ca [IPv6:2620:0:230:8000::2]) by ietfa.amsl.com (Postfix) with ESMTP id CFCCF1B2C8A for <storagesync@ietf.org>; Wed, 9 Dec 2015 06:44:03 -0800 (PST)
Received: from [192.168.1.111] (modemcable093.65-160-184.mc.videotron.ca [184.160.65.93]) by jazz.viagenie.ca (Postfix) with ESMTPSA id 2919347670; Wed, 9 Dec 2015 09:44:03 -0500 (EST)
From: Marc Blanchet <marc.blanchet@viagenie.ca>
To: Ted Lemon <mellon@fugue.com>
Date: Wed, 09 Dec 2015 09:44:02 -0500
Message-ID: <3A341032-4229-485C-9107-275968F581D1@viagenie.ca>
In-Reply-To: <1449671733322-9f72a594-b1d5700c-d3631253@fugue.com>
References: <1449452139832-4f314827-a7ecd596-c5312339@fugue.com> <1449454580239-1fd59d90-52f0231b-370f2ef5@gmail.com,> <1449455245871-cb7e86e1-1a0160c5-aa6acce3@fugue.com> <2015120711170621874681@bjtu.edu.cn> <1449459616112-6043cb32-cd69a1f9-1399f1c0@fugue.com> <CAO_Yprbct8wFbS1WFnZZENSp-OruRUk2nRyBv4tNeKv9_CGuCg@mail.gmail.com> <1449511062426-94cdee34-064ef498-327458b6@fugue.com> <CAO_YprZjqs_OFC3RybVvJ4GHWb3spKMMkkFTZO=YDustp825iw@mail.gmail.com> <1449593642163-c107ebb4-0f6d1c5a-a3f1c5e0@fugue.com> <20151208185922.GA9531@localhost.localdomain> <1449609937865-6dbdad8f-eb44d945-cd684f34@fugue.com> <AE0CE9F1-3968-4229-925B-75AA37EDC327@unterwaditzer.net> <1449670262769-e440b1e3-b960232c-260b9165@fugue.com> <506D291C-4F0B-40F3-8848-97DAAF41CAAE@cern.ch> <1449671733322-9f72a594-b1d5700c-d3631253@fugue.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="=_MailMate_28B2CF00-030D-427E-B26C-C3FB97FCEF93_="
X-Mailer: MailMate (1.9.3r5187)
Archived-At: <http://mailarchive.ietf.org/arch/msg/storagesync/I1fN71lKevY2vcwH0E1ZVmy4hAA>
Cc: Jakub.Moscicki@cern.ch, storagesync@ietf.org
Subject: Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1
X-BeenThere: storagesync@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Mechanisms to synchronize client file systems with Internet-based data storage services <storagesync.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storagesync>, <mailto:storagesync-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/storagesync/>
List-Post: <mailto:storagesync@ietf.org>
List-Help: <mailto:storagesync-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storagesync>, <mailto:storagesync-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Dec 2015 14:44:53 -0000


On 9 Dec 2015, at 9:35, Ted Lemon wrote:

> Wednesday, Dec 9, 2015 9:21 AM Jakub Moscicki wrote:
>> Unless you want to dig into the type of the file and explore it’s 
>> internal structure to merge, the best you can do is to make sure a 
>> version on the server is generated for each conflict file and these 
>> may be easily accessible for manual merge by the interested users. 
>> For this etags are sufficient. From this point of view there is no 
>> difference between conflicts which are generated because two clients 
>> update the file at the same time (so conflicting uploads overlapping 
>> in time) and a case of offline client which edits locally outdated 
>> version (newer version available meanwhile on the server) and 
>> consequently uploading it when online again (so conflicting uploads 
>> not overlapping in time). In both cases it looks to me good enough if 
>> they end up as versions on the server for manual user merge/revert.
>
> Most files are text files, which are easy to merge,

I don’t agree with that statement. A lot of files nowadays  are 
complex text files in a complex format (docx, odt, …). Not easy to 
merge. Not at all. (I don’t have statistics, but my guess is that 90% 
of shared files are not plain text files)

> so yes, I do want to be able to dig into the structure of the file.   
> Even when the file isn't "easy to merge," e.g. a sound file or a 
> drawing, knowing the order of the changes can be very helpful when 
> someone is doing a manual merge.   And you can build merge tools for 
> commonly-merged files; if conflict detection is a feature of the data 
> store, then there's an incentive to build such tools, whereas if 
> everybody is accustomed to dumb data stores that don't provide this 
> feature, there's no point in making merge easy.
>
> The absence of this feature is a serious problem in my workflows; I 
> don't know if your experience is similar.   I don't really see any 
> point in inventing "yet another" storage sync mechanism that doesn't 
> provide this feature.   E.g., do you really want a calendaring system 
> that automatically re-adds files that have been deleted on one client 
> when a second client connects that hasn't seen the deletion?   An 
> address book that backs out updates?   A mail system that undeletes 
> deleted messages, or deletes messages on the client that were lost on 
> the server due to a crash?

you are giving a good example of something vastly different. Calendering 
is a very well structured and very limited dataset (compared to free 
text) that was made by design with sync considerations. The latest 
version of vcard was also designed with sync considerations, and it took 
a lot of time to get it right, (I can tell you I was the chair of 
vcarddav when we did this). And a vcard is a pretty simple dataset and 
very structured.

So I think your point just shows the inverse. Merging complex, free, not 
structured-limited-dataset files is very very difficult.

I guess your goal is great, but to me, that is for research. So if 
people want to do this work, I would split it into two working groups:
- one on the IRTF side about generic sync-merge algorithms
- one on the IETF side for the protocol components.

Marc.

>
> The use case of a small collection of large files used entirely 
> sequentially is easy to address, but (a) not very interesting and (b) 
> not actually a common use model.   It _seems_ common because we don't 
> provide anything better, and so people make do with it, but it doesn't 
> actually fit with what they are doing.   If it did, we wouldn't see 
> the proliferation of manually-labeled versions of the same file that 
> is so common in such data stores.
>
>
> --
> Sent from Whiteout Mail - https://whiteout.io
>
> My PGP key: 
> https://keys.whiteout.io/mellon@fugue.com_______________________________________________
> Storagesync mailing list
> Storagesync@ietf.org
> https://www.ietf.org/mailman/listinfo/storagesync