Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1

fsong@bjtu.edu.cn Thu, 03 December 2015 08:41 UTC

Return-Path: <fsong@bjtu.edu.cn>
X-Original-To: storagesync@ietfa.amsl.com
Delivered-To: storagesync@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1467C1A802D for <storagesync@ietfa.amsl.com>; Thu, 3 Dec 2015 00:41:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.79
X-Spam-Level:
X-Spam-Status: No, score=0.79 tagged_above=-999 required=5 tests=[AC_DIV_BONANZA=0.001, BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_PSBL=2.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cRwoXq-upSUF for <storagesync@ietfa.amsl.com>; Thu, 3 Dec 2015 00:41:55 -0800 (PST)
Received: from bjtu.edu.cn (mail.bjtu.edu.cn [218.249.29.198]) by ietfa.amsl.com (Postfix) with ESMTP id 3606F1A70FD for <storagesync@ietf.org>; Thu, 3 Dec 2015 00:41:54 -0800 (PST)
Received: by ajax-webmail-Jdweb2 (Coremail) ; Thu, 3 Dec 2015 16:44:08 +0800 (GMT+08:00)
Date: Thu, 03 Dec 2015 16:44:08 +0800
From: fsong@bjtu.edu.cn
To: Michiel de Jong <mbdejong@mozilla.com>
Message-ID: <52aa75c9.2c03.15167034cd6.Coremail.fsong@bjtu.edu.cn>
In-Reply-To: <CAPpPfeBFfwD3NU0Y_zSFQdsT1o3HO1-Cd5RpokOzBVUa-3fC4Q@mail.gmail.com>
References: <mailman.108.1449000023.26068.storagesync@ietf.org> <1449004445.2745758.455126129.5028FD2B@webmail.messagingengine.com> <CAO_YprZhCmUxEf=aGCYL=+CLbjUoD1ifpDFsrS7N40Npo4wr+w@mail.gmail.com> <1449050174.3667910.455617161.12EEE3C5@webmail.messagingengine.com> <1449051540970-b577e6c2-393e54ef-bbe05be4@gmail.com> <1449052128.3674794.455635937.667C3E1F@webmail.messagingengine.com> <CAPpPfeAdrCZcsYZo7=W6N14K4F2LutXN8BFTetikzKZSr8+vVA@mail.gmail.com> <1449060218.3721231.455737161.5D657D6D@webmail.messagingengine.com> <CAO_YprYp+cdCPQ1pEUJLcCh0uQL_mu-Y=MJOA7Oh92TWrM_tWQ@mail.gmail.com> <1449061417.3729762.455755681.08D95D5B@webmail.messagingengine.com> <CAO_Ypra_PWf0Uxt2Rbp_k49hDdjq1zvTQs9qkeZqRo0v0E3+=g@mail.gmail.com> <CAPpPfeDeaTDUqNuQtiTwtWvf_3uUXNY6DTUOeRbOokf6En408A@mail.gmail.com> <CAO_Yprb0LzCmSU42BS=dnm66U+ACSbScmDDKxSGLYqDQ5uD2aA@mail.gmail.com> <CAPpPfeBFfwD3NU0Y_zSFQdsT1o3HO1-Cd5RpokOzBVUa-3fC4Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_41404_1089348070.1449132248276"
X-Originating-IP: [106.2.233.19]
X-Priority: 3
X-Mailer: Coremail Webmail Server Version XT2.1.11 dev build 20150107(58648.7033.6860) Copyright (c) 2002-2015 www.mailtech.cn bjtu
X-SendMailWithSms: false
X-CM-TRANSID: M55wygAnttDYAGBWWQAAAA--.1W
X-CM-SenderInfo: aytwlqpemw3hxhgxhubq/1tbiAQIMB1R9XjYYmwACsI
X-Coremail-Antispam: 1Ur529EdanIXcx71UUUUU7IcSsGvfJ3iIAIbVAYjsxI4VWxJw CS07vEb4IE77IF4wCS07vE1I0E4x80FVAKz4kxMIAIbVAFxVCaYxvI4VCIwcAKzIAtYxBI daVFxhVjvjDU=
Archived-At: <http://mailarchive.ietf.org/arch/msg/storagesync/ctgC1LMVfWMAuPVfMsqGHcrjUQw>
Cc: Linhui Sun <lh.sunlinh@gmail.com>, storagesync <storagesync@ietf.org>, Hugo González Labrador <ietf@hugo.labkode.com>, fkooman <fkooman@tuxed.net>
Subject: Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1
X-BeenThere: storagesync@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Mechanisms to synchronize client file systems with Internet-based data storage services <storagesync.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storagesync>, <mailto:storagesync-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/storagesync/>
List-Post: <mailto:storagesync@ietf.org>
List-Help: <mailto:storagesync-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storagesync>, <mailto:storagesync-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Dec 2015 08:41:59 -0000

Hi Michiel and Linhui,

I think it will be good to have a boundary for this draft and leave some work for the applicaiton layer~



-----原始邮件-----
发件人: "Michiel de Jong" <mbdejong@mozilla.com>
发送时间: 2015年12月2日 星期三
收件人: "Linhui Sun" <lh.sunlinh@gmail.com>
抄送: storagesync <storagesync@ietf.org>, fkooman <fkooman@tuxed.net>, "Hugo González Labrador" <ietf@hugo.labkode.com>
主题: Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1


Ah sure, that's entirely appropriate, remoteStorage treats both the Content-Type header value and the actual body of a document as opaque strings of bytes. It doesn't "care" if you use it to store only data blocks that are chunks of something else.


For instance, you could have a folder on a user's storage that contains only inode-like JSON-documents, which list the URLs of binary documents that make up 1Kb blocks of the actual data. Nice for deduping, delta updates, and also renaming files without reuploading their content.



But yeah, the argument is that *how* to create and manage these chunks, is then still left up to the application developer (or to another spec on top of the remoteStorage spec).



Cheers,

Michiel.



On Wed, Dec 2, 2015 at 3:29 PM, Linhui Sun <lh.sunlinh@gmail.com> wrote:

Hi


2015-12-02 22:05 GMT+08:00 Michiel de Jong <mbdejong@mozilla.com>:

Cool! I created https://github.com/remotestorage/spec/issues/137 about the need for  MOVE verb.


Application-level chunking is partially supported by HTTP itself through `Content-Range` headers (although it's not clear whether these are allowed on PUT requests as well as on GET requests, see https://github.com/remotestorage/spec/issues/131). The problem is that HTTP defines versioning at the document level, you cannot ask a server to produce or check an ETag for a specific byte-range of a document, only for the entire document.

Actually what I'm saying is a chunking before transmitting (using http), in this way, they are treated as individual documents from the standpoint of http. But I don't know whether this is appropriate for remoteStorage, just a comment.


Regards,
Linhui


A comparison document sounds good! See also http://www.servicedenuages.fr/en/generic-storage-ecosystem.



Cheers,

Michiel.




On Wed, Dec 2, 2015 at 2:32 PM, Linhui Sun <lh.sunlinh@gmail.com> wrote:

That's cool for me, a separate section for this might make sense.


Another thing is do we need to include an application-layer chunking here (not just for a browser sync), since if we want to further extend other capabilities it is essential.



Regards,
Linhui


2015-12-02 21:03 GMT+08:00 Hugo González Labrador <ietf@hugo.labkode.com>:

I propose to come up with a list of advantages and disadvantages of using WebDAV and compare them against a JSON/REST based approach, like remoteStorage.

 
Does it sound good ?
 
 
On Wed, Dec 2, 2015, at 01:59 PM, Linhui Sun wrote:

 
 
2015-12-02 20:43 GMT+08:00 Hugo González Labrador <ietf@hugo.labkode.com>:



 
 
 
 
On Wed, Dec 2, 2015, at 01:30 PM, Michiel de Jong wrote:

Hi all!

Thanks for all your reactions to the remoteStorage Internet-Draft.

 
We get the question about WebDAV a lot, in the next version we should add a remark about it in the intro. The folder descriptions returned when you GET a URL that ends with a / are indeed a deviation from the XML returned by WebDAV server, and this is just because nowadays JSON is easier to use than XML for developers, both client-side and server-side.

 
 
I totally agree here, this was going to happen soon or later and it is really appreciated.

 
 
The fact that we don't require servers to support WebDAV's custom verbs like PROPPATCH etc. is for three reasons:

1) it's a lot of work to implement this without using an existing WebDAV library

2) in practice, a lot of WebDAV servers get it wrong, or don't implement all of WebDAV. It's very annoying for client implementers to have to deal with servers that e.g. chose not to implement LOCK and UNLOCK.

3) we don't really need all these advanced features on top of standard HTTP, just supporting GET/PUT/DELETE for resources, and adding a simple folder description format, is enough for most applications.

 
Other than that, the remoteStorage Internet-Draft specifies a *lot* more than just how each HTTP verb should behave:

* requiring support for OAuth implicit-grant flow

* requiring ETag support and nested versioning (i.e. the folder's ETag changes if anything within that folder changes)

* requiring CORS headers

* requiring a WebFinger announcement for service discovery

It would be easy to add these three things on top of WebDAV, instead of putting them on top of our minimal GET/PUT/DELETE API definition. In fact, we could probably separate it into two Internet-Drafts: one for the 'RESTful folders' API which is our simplification of WebDAV, and a second one for OAuth/ETags/CORS/WebFinger on top of either WebDAV or 'RESTful folders' or whatever other HTTP-based API you want.

 
 
There is one requirement that all synchronisers need to handle: moving resources. In this spec the alternative of a WebDAV MOVE is not specified. 

 
It is true that a MOVE could be replaced with a DELETE + UPLOAD but that is not acceptable in most cases due to the inefficiency of such operation (cpu, bandwidth consumption ...)

 
Is there a plan to support such basic feature?

 
From the current remoteStorage spec, the ownCloud sync protocol can be implemented. The missing bit is tracking those remote moves.

I agree with Hugo that MOVE is useful, sometimes if you just rename a file it will be perfect. But the question I have is that whether we need to make two documents? Multiple choices is not good for standardization. In my view, webdav is something that we need to have a very clear decision on whether to consider it or not.

 
Regards,

Linhui

 
 
Cheers

 
On Wed, Dec 2, 2015 at 11:28 AM, Hugo González Labrador <ietf@hugo.labkode.com> wrote:



 
 
 
 
On Wed, Dec 2, 2015, at 11:18 AM, Linhui Sun wrote:

Hi

 
On 周三, 12月 2, 2015 at 17:56, Hugo González Labrador <ietf@hugo.labkode.com> wrote:

 
 
 
On Wed, Dec 2, 2015, at 08:20 AM, Linhui Sun wrote:

Hi

 
2015-12-02 5:14 GMT+08:00 Hugo González Labrador <ietf@hugo.labkode.com>:

Hi,

 
from my point of view the remoteStorage project addresses a subset of

the use cases of the  WebDAV specification.

 
The main difference I can observe is that the specification is built on

REST instead of XML-based communication.

 
I personally like much more REST than WebDAV because it is more

developer friendly and it is faster to develop.

 Maybe the remoteStorage API becomes the next WebDAV :)

 
However, the remoteStorage API does not provide a way of synchronising

data, this task is delegated to the developers.

Is there a plan to provide such feature based on the outcome of this

draft?

I'm a little bit confused why you say the remoteStorage does not provide that. Is that because remoteStorage does not perform like a typical sync services (e.g. dropbox...) or you are saying something else?

Yes, because it does not offer synchronisation capabilities.

Got it. And What I am wondering is that do we need to include those capabilities in a base specifications. Since it is hard to standardize the capabilities like dedup or delta. Maybe a better way is to define those in a separate specification,

 
 
Thanks for giving these examples - so by 'synchronisation capabilities' you mean 1) deduplication and 2) delta updates? Anything else or is that an exhaustive list?

 
something like extensions. For a base document, we just need to define how to perform sync operations.

 
 
Yes, I agree that may be an extension of this draft could handle the synchronisation part.

 
 
 
 
Our Internet-Draft is heavily focused on the world wide web, whose URLs are not content-addressable, we can't change that. So that architecture is not very friendly to deduplication, compared to for instance BitTorrent. As you already said, developers can still dedupe on the server-side when storing blobs to disk, and can also dedupe on the client side before creating the resources the client uploads.

As far as I know, delta updates are not supported by the ETag system - you cannot do a range request to find out if certain bytes within a document have changed. However, the folder system we define does encourage you, for instance when you develop a To Do List app, put each task on its own document, and then query the folder to see which of them changed, instead of putting them all in one big document and retrieving the whole document each time.

 
Cheers,

Michiel.

 
 
 
 
BTW, I want to introduce ClawIO (http://clawio.github.io), a research

project to benchmark different synchronisation protocols against

different data backends with special attention to provide a common sync

API.

 
A common API for different sync protocols is being created based on the

architecture specified in this draft (control and data servers) and the

I cannot find a distributed architecture in this draft. It seems that they handle metadata and content data together, just like a normal web service.

 
ClawIO is fully distributed. Every logical unit is a different server than be scaled out. Data and Metadata channels are independent from each other and reside on different servers.

That is widely employed in popular sync services. And that is also beneficial to privacy to some extent. But in the context of sync in browser (which is mainly considered in the remoteStorage), I don't know whether this is reasonable. But I think we should deploy distributed architecture although it will make things complicated.

 
 
Of course, the remoteStorage is targeted to browsers, so syncing does not make too much sense in this case.

With the rise of Linux container micro-service based architectures, the deployment of  such highly complex systems should become easier and faster.

 
Best,

 
 
Hugo

 
 
Regards,

Linhui 

 
 
Regards,

Linhui

one from the CERN EOS project (management, disk and queue servers).

 
The Phase I has implemented the ownCloud Sync Protocol and Phase II will

implement the SeaFile Sync Protocol.

The choice of these protocols among others is because they are really

opposed to each other in terms of syncing (delta vs non-delta,

state-based vs log/event/git-based sync …), so finding a common approach

is more challenging.

 
Providing a base specification/architecture to measure the feasibility

of this draft is one of the objectives of the project.

 
I believe that the work being done here and in ClawIO are supplementary

to each other and I think mutual collaboration could be beneficial for

both sides.

 
Also, if there is interest, the remoteStorage API can be added to

ClawIO.

 
Best regards,

 
Hugo Gonzalez Labrador

 
On Tue, Dec 1, 2015, at 09:00 PM, storagesync-request@ietf.org wrote:

> Send Storagesync mailing list submissions to

>       storagesync@ietf.org

>

> To subscribe or unsubscribe via the World Wide Web, visit

>       https://www.ietf.org/mailman/listinfo/storagesync

> or, via email, send a message with subject or body 'help' to

>       storagesync-request@ietf.org

>

> You can reach the person managing the list at

>       storagesync-owner@ietf.org

>

> When replying, please edit your Subject line so it is more specific

> than "Re: Contents of Storagesync digest..."

> Today's Topics:

>

>    1. New version of draft-dejong-remotestorage    Internet-Draft

>       available (Michiel de Jong)

>    2. Re: New version of draft-dejong-remotestorage Internet-Draft

>       available (Gihan Dias)

>    3. Re: New version of draft-dejong-remotestorage Internet-Draft

>       available (Fei Song)

> _______________________________________________

> Storagesync mailing list

> Storagesync@ietf.org

> https://www.ietf.org/mailman/listinfo/storagesync

> Email had 3 attachments:

> + [Storagesync] New version of draft-dejong-remotestorage

> Internet-Draft available

>   2k (message/rfc822)

> + Re: [Storagesync] New version of draft-dejong-remotestorage

> Internet-Draft available

>   1k (message/rfc822)

> + Re: [Storagesync] New version of draft-dejong-remotestorage

> Internet-Draft available

>   2k (message/rfc822)

 
_______________________________________________

Storagesync mailing list

Storagesync@ietf.org

https://www.ietf.org/mailman/listinfo/storagesync