Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1

Linhui Sun <lh.sunlinh@gmail.com> Wed, 02 December 2015 15:37 UTC

Return-Path: <lh.sunlinh@gmail.com>
X-Original-To: storagesync@ietfa.amsl.com
Delivered-To: storagesync@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F263A1A90D1 for <storagesync@ietfa.amsl.com>; Wed, 2 Dec 2015 07:37:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[AC_DIV_BONANZA=0.001, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IuSlzdcHimQN for <storagesync@ietfa.amsl.com>; Wed, 2 Dec 2015 07:37:09 -0800 (PST)
Received: from mail-qg0-x231.google.com (mail-qg0-x231.google.com [IPv6:2607:f8b0:400d:c04::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9C0C81A90D0 for <storagesync@ietf.org>; Wed, 2 Dec 2015 07:37:08 -0800 (PST)
Received: by qgec40 with SMTP id c40so35560083qge.2 for <storagesync@ietf.org>; Wed, 02 Dec 2015 07:37:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:from:to:cc:in-reply-to:references:subject:date :message-id:mime-version; bh=0iAnDs3F+QcfMdRsig4OEFFGZQgY2B9DZHHGVNOmP3c=; b=Qe5h/rSvj8H715q0p7Tl5FPDFzgd+v+RmyEejrtbfHuCRzDxDLtuK4HFo8o7UWVoJq +bmCCay2ZYEqwhPSSSDU9hsM97nsNyBWb+jo1noOLcEUVbRoG/gufqfgXFbaEZ2VL9Vm MutCaSsquQ6BF3GMmALrMZ6j7uCEaoAsdG/9gJ4QPz7AJbwt+1LLJLB0YhPRHbI00Xci yDZoegkEBbrhwXNregjiq78imsHmrcduKFiwQVbEqu6ZFGWvgmyMU2bl+m8qGIov/IC6 fCtDWkgasrVr+d9N2Z12eIDXHdPjOWcDlG3sS6exYcMaFNmO4ei7G1QysSZ6AQTPL8AE xNeA==
X-Received: by 10.140.27.238 with SMTP id 101mr4770353qgx.4.1449070627746; Wed, 02 Dec 2015 07:37:07 -0800 (PST)
Received: from [127.0.0.1] (ec2-52-70-63-168.compute-1.amazonaws.com. [52.70.63.168]) by smtp.gmail.com with ESMTPSA id z65sm1399175qhb.22.2015.12.02.07.37.06 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 02 Dec 2015 07:37:06 -0800 (PST)
Content-Type: multipart/alternative; boundary="----sinikael-?=_1-14490706262430.07383038755506277"
From: Linhui Sun <lh.sunlinh@gmail.com>
To: Michiel de Jong <mbdejong@mozilla.com>
In-Reply-To: <CAPpPfeBFfwD3NU0Y_zSFQdsT1o3HO1-Cd5RpokOzBVUa-3fC4Q@mail.gmail.com>
References: <mailman.108.1449000023.26068.storagesync@ietf.org> <1449004445.2745758.455126129.5028FD2B@webmail.messagingengine.com> <CAO_YprZhCmUxEf=aGCYL=+CLbjUoD1ifpDFsrS7N40Npo4wr+w@mail.gmail.com> <1449050174.3667910.455617161.12EEE3C5@webmail.messagingengine.com> <1449051540970-b577e6c2-393e54ef-bbe05be4@gmail.com> <1449052128.3674794.455635937.667C3E1F@webmail.messagingengine.com> <CAPpPfeAdrCZcsYZo7=W6N14K4F2LutXN8BFTetikzKZSr8+vVA@mail.gmail.com> <1449060218.3721231.455737161.5D657D6D@webmail.messagingengine.com> <CAO_YprYp+cdCPQ1pEUJLcCh0uQL_mu-Y=MJOA7Oh92TWrM_tWQ@mail.gmail.com> <1449061417.3729762.455755681.08D95D5B@webmail.messagingengine.com> <CAO_Ypra_PWf0Uxt2Rbp_k49hDdjq1zvTQs9qkeZqRo0v0E3+=g@mail.gmail.com> <CAPpPfeDeaTDUqNuQtiTwtWvf_3uUXNY6DTUOeRbOokf6En408A@mail.gmail.com> <CAO_Yprb0LzCmSU42BS=dnm66U+ACSbScmDDKxSGLYqDQ5uD2aA@mail.gmail.com> <CAPpPfeBFfwD3NU0Y_zSFQdsT1o3HO1-Cd5RpokOzBVUa-3fC4Q@mail.gmail.com>
Date: Wed, 02 Dec 2015 23:36:58 +0800
X-Cm-Message-Id: 1449070626120761df696ade6fa5ec1fae4c35dae591bdb9565f10221d86a248836388
X-Cm-Draft-Id: WyJhIiwzLCJkcmFmdF9pZCIsIjE0NDkwNzA2MTgwMDAiLCJjIiwiMTUxOTM5MTI5MTk2ODkzMDAyMCIsInYiLDFd
X-Mailer: CloudMagic
Message-Id: <1449070626419-021b1ac3-2a56d9db-c17be220@gmail.com>
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/storagesync/sU8rJY_HHaEIwjCGIJpfuYwE8Yc>
Cc: storagesync <storagesync@ietf.org>, fkooman <fkooman@tuxed.net>, Hugo González Labrador <ietf@hugo.labkode.com>
Subject: Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1
X-BeenThere: storagesync@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Mechanisms to synchronize client file systems with Internet-based data storage services <storagesync.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storagesync>, <mailto:storagesync-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/storagesync/>
List-Post: <mailto:storagesync@ietf.org>
List-Help: <mailto:storagesync-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storagesync>, <mailto:storagesync-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Dec 2015 15:37:14 -0000

Sounds reasonable! How to perform this chunking and how to manage chunks should
be left for the higher level. Thanks for that : )
On 周三, 12月 2, 2015 at 23:05, Michiel de Jong <mbdejong@mozilla.com> wrote:
Ah sure, that's entirely appropriate, remoteStorage treats both the Content-Type
header value and the actual body of a document as opaque strings of bytes. It
doesn't "care" if you use it to store only data blocks that are chunks of
something else.

For instance, you could have a folder on a user's storage that contains only
inode-like JSON-documents, which list the URLs of binary documents that make up
1Kb blocks of the actual data. Nice for deduping, delta updates, and also
renaming files without reuploading their content.

But yeah, the argument is that *how* to create and manage these chunks, is then
still left up to the application developer (or to another spec on top of the
remoteStorage spec).


Cheers,
Michiel.

On Wed, Dec 2, 2015 at 3:29 PM, Linhui Sun < lh.sunlinh@gmail.com [lh.sunlinh@gmail.com] > wrote:
Hi
2015-12-02 22:05 GMT+08:00 Michiel de Jong < mbdejong@mozilla.com [mbdejong@mozilla.com] > :
Cool! I created [https://github.com/remotestorage/spec/issues/137] https://github.com/remotestorage/spec/issues/137
[https://github.com/remotestorage/spec/issues/137] about the need for MOVE verb.

Application-level chunking is partially supported by HTTP itself through
`Content-Range` headers (although it's not clear whether these are allowed on
PUT requests as well as on GET requests, see [https://github.com/remotestorage/spec/issues/131] https://github.com/remotestorage/spec/issues/131
[https://github.com/remotestorage/spec/issues/131] ). The problem is that HTTP defines versioning at the document level, you
cannot ask a server to produce or check an ETag for a specific byte-range of a
document, only for the entire document.
Actually what I'm saying is a chunking before transmitting (using http), in this
way, they are treated as individual documents from the standpoint of http. But I
don't know whether this is appropriate for remoteStorage, just a comment.
Regards, Linhui
A comparison document sounds good! See also [http://www.servicedenuages.fr/en/generic-storage-ecosystem] http://www.servicedenuages.fr/en/generic-storage-ecosystem
[http://www.servicedenuages.fr/en/generic-storage-ecosystem] .


Cheers,
Michiel.


On Wed, Dec 2, 2015 at 2:32 PM, Linhui Sun < lh.sunlinh@gmail.com [lh.sunlinh@gmail.com] > wrote:
That's cool for me, a separate section for this might make sense.
Another thing is do we need to include an application-layer chunking here (not
just for a browser sync), since if we want to further extend other capabilities
it is essential.

Regards, Linhui
2015-12-02 21:03 GMT+08:00 Hugo González Labrador < ietf@hugo.labkode.com [ietf@hugo.labkode.com] > :
I propose to come up with a list of advantages and disadvantages of using WebDAV
and compare them against a JSON/REST based approach, like remoteStorage.
Does it sound good ? On Wed, Dec 2, 2015, at 01:59 PM, Linhui Sun wrote:
2015-12-02 20:43 GMT+08:00 Hugo González Labrador < ietf@hugo.labkode.com [ietf@hugo.labkode.com] > :

On Wed, Dec 2, 2015, at 01:30 PM, Michiel de Jong wrote:
Hi all!
Thanks for all your reactions to the remoteStorage Internet-Draft.
We get the question about WebDAV a lot, in the next version we should add a
remark about it in the intro. The folder descriptions returned when you GET a
URL that ends with a / are indeed a deviation from the XML returned by WebDAV
server, and this is just because nowadays JSON is easier to use than XML for
developers, both client-side and server-side.
I totally agree here, this was going to happen soon or later and it is really
appreciated.
The fact that we don't require servers to support WebDAV's custom verbs like
PROPPATCH etc. is for three reasons:
1) it's a lot of work to implement this without using an existing WebDAV library
2) in practice, a lot of WebDAV servers get it wrong, or don't implement all of
WebDAV. It's very annoying for client implementers to have to deal with servers
that e.g. chose not to implement LOCK and UNLOCK.
3) we don't really need all these advanced features on top of standard HTTP,
just supporting GET/PUT/DELETE for resources, and adding a simple folder
description format, is enough for most applications.
Other than that, the remoteStorage Internet-Draft specifies a *lot* more than
just how each HTTP verb should behave:
* requiring support for OAuth implicit-grant flow
* requiring ETag support and nested versioning (i.e. the folder's ETag changes
if anything within that folder changes)
* requiring CORS headers
* requiring a WebFinger announcement for service discovery
It would be easy to add these three things on top of WebDAV, instead of putting
them on top of our minimal GET/PUT/DELETE API definition. In fact, we could
probably separate it into two Internet-Drafts: one for the 'RESTful folders' API
which is our simplification of WebDAV, and a second one for
OAuth/ETags/CORS/WebFinger on top of either WebDAV or 'RESTful folders' or
whatever other HTTP-based API you want.
There is one requirement that all synchronisers need to handle: moving
resources. In this spec the alternative of a WebDAV MOVE is not specified.
It is true that a MOVE could be replaced with a DELETE + UPLOAD but that is not
acceptable in most cases due to the inefficiency of such operation (cpu,
bandwidth consumption ...)
Is there a plan to support such basic feature?
>From the current remoteStorage spec, the ownCloud sync protocol can be
implemented. The missing bit is tracking those remote moves.
I agree with Hugo that MOVE is useful, sometimes if you just rename a file it
will be perfect. But the question I have is that whether we need to make two
documents? Multiple choices is not good for standardization. In my view, webdav
is something that we need to have a very clear decision on whether to consider
it or not.
Regards,
Linhui
Cheers
On Wed, Dec 2, 2015 at 11:28 AM, Hugo González Labrador < ietf@hugo.labkode.com [ietf@hugo.labkode.com] > wrote:

On Wed, Dec 2, 2015, at 11:18 AM, Linhui Sun wrote:
Hi
On 周三, 12月 2, 2015 at 17:56, Hugo González Labrador < ietf@hugo.labkode.com [ietf@hugo.labkode.com] > wrote:
On Wed, Dec 2, 2015, at 08:20 AM, Linhui Sun wrote:
Hi
2015-12-02 5:14 GMT+08:00 Hugo González Labrador < ietf@hugo.labkode.com [ietf@hugo.labkode.com] > :
Hi,
from my point of view the remoteStorage project addresses a subset of
the use cases of the WebDAV specification.
The main difference I can observe is that the specification is built on
REST instead of XML-based communication.
I personally like much more REST than WebDAV because it is more
developer friendly and it is faster to develop.
Maybe the remoteStorage API becomes the next WebDAV :)
However, the remoteStorage API does not provide a way of synchronising
data, this task is delegated to the developers.
Is there a plan to provide such feature based on the outcome of this
draft?
I'm a little bit confused why you say the remoteStorage does not provide that.
Is that because remoteStorage does not perform like a typical sync services
(e.g. dropbox...) or you are saying something else?
Yes, because it does not offer synchronisation capabilities.
Got it. And What I am wondering is that do we need to include those capabilities
in a base specifications. Since it is hard to standardize the capabilities like
dedup or delta. Maybe a better way is to define those in a separate
specification,
Thanks for giving these examples - so by 'synchronisation capabilities' you mean
1) deduplication and 2) delta updates? Anything else or is that an exhaustive
list?
something like extensions. For a base document, we just need to define how to
perform sync operations.
Yes, I agree that may be an extension of this draft could handle the
synchronisation part.
Our Internet-Draft is heavily focused on the world wide web, whose URLs are not
content-addressable, we can't change that. So that architecture is not very
friendly to deduplication, compared to for instance BitTorrent. As you already
said, developers can still dedupe on the server-side when storing blobs to disk,
and can also dedupe on the client side before creating the resources the client
uploads.
As far as I know, delta updates are not supported by the ETag system - you
cannot do a range request to find out if certain bytes within a document have
changed. However, the folder system we define does encourage you, for instance
when you develop a To Do List app, put each task on its own document, and then
query the folder to see which of them changed, instead of putting them all in
one big document and retrieving the whole document each time.
Cheers,
Michiel.
BTW, I want to introduce ClawIO ( [http://clawio.github.io] [http://clawio.github.io] http://clawio.github.io [http://clawio.github.io] ), a research
project to benchmark different synchronisation protocols against
different data backends with special attention to provide a common sync
API.
A common API for different sync protocols is being created based on the
architecture specified in this draft (control and data servers) and the
I cannot find a distributed architecture in this draft. It seems that they
handle metadata and content data together, just like a normal web service.
ClawIO is fully distributed. Every logical unit is a different server than be
scaled out. Data and Metadata channels are independent from each other and
reside on different servers.
That is widely employed in popular sync services. And that is also beneficial to
privacy to some extent. But in the context of sync in browser (which is mainly
considered in the remoteStorage), I don't know whether this is reasonable. But I
think we should deploy distributed architecture although it will make things
complicated.
Of course, the remoteStorage is targeted to browsers, so syncing does not make
too much sense in this case.
With the rise of Linux container micro-service based architectures, the
deployment of such highly complex systems should become easier and faster.
Best,
Hugo
Regards,
Linhui
Regards,
Linhui
one from the CERN EOS project (management, disk and queue servers).
The Phase I has implemented the ownCloud Sync Protocol and Phase II will
implement the SeaFile Sync Protocol.
The choice of these protocols among others is because they are really
opposed to each other in terms of syncing (delta vs non-delta,
state-based vs log/event/git-based sync …), so finding a common approach
is more challenging.
Providing a base specification/architecture to measure the feasibility
of this draft is one of the objectives of the project.
I believe that the work being done here and in ClawIO are supplementary
to each other and I think mutual collaboration could be beneficial for
both sides.
Also, if there is interest, the remoteStorage API can be added to
ClawIO.
Best regards,
Hugo Gonzalez Labrador
On Tue, Dec 1, 2015, at 09:00 PM, storagesync-request@ietf.org [storagesync-request@ietf.org] wrote:
> Send Storagesync mailing list submissions to
> storagesync@ietf.org [storagesync@ietf.org]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> [https://www.ietf.org/mailman/listinfo/storagesync] [https://www.ietf.org/mailman/listinfo/storagesync] https://www.ietf.org/mailman/listinfo/storagesync
[https://www.ietf.org/mailman/listinfo/storagesync]
> or, via email, send a message with subject or body 'help' to
> storagesync-request@ietf.org [storagesync-request@ietf.org]
>
> You can reach the person managing the list at
> storagesync-owner@ietf.org [storagesync-owner@ietf.org]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Storagesync digest..."
> Today's Topics:
>
> 1. New version of draft-dejong-remotestorage Internet-Draft
> available (Michiel de Jong)
> 2. Re: New version of draft-dejong-remotestorage Internet-Draft
> available (Gihan Dias)
> 3. Re: New version of draft-dejong-remotestorage Internet-Draft
> available (Fei Song)
> _______________________________________________
> Storagesync mailing list
> Storagesync@ietf.org [Storagesync@ietf.org]
> [https://www.ietf.org/mailman/listinfo/storagesync] [https://www.ietf.org/mailman/listinfo/storagesync] https://www.ietf.org/mailman/listinfo/storagesync
[https://www.ietf.org/mailman/listinfo/storagesync]
> Email had 3 attachments:
> + [Storagesync] New version of draft-dejong-remotestorage
> Internet-Draft available
> 2k (message/rfc822)
> + Re: [Storagesync] New version of draft-dejong-remotestorage
> Internet-Draft available
> 1k (message/rfc822)
> + Re: [Storagesync] New version of draft-dejong-remotestorage
> Internet-Draft available
> 2k (message/rfc822)
_______________________________________________
Storagesync mailing list
Storagesync@ietf.org [Storagesync@ietf.org]
[https://www.ietf.org/mailman/listinfo/storagesync] [https://www.ietf.org/mailman/listinfo/storagesync] https://www.ietf.org/mailman/listinfo/storagesync
[https://www.ietf.org/mailman/listinfo/storagesync]