Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1

Michiel de Jong <mbdejong@mozilla.com> Wed, 02 December 2015 15:05 UTC

Return-Path: <mbdejong@mozilla.com>
X-Original-To: storagesync@ietfa.amsl.com
Delivered-To: storagesync@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C72B81A92E3 for <storagesync@ietfa.amsl.com>; Wed, 2 Dec 2015 07:05:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.277
X-Spam-Level:
X-Spam-Status: No, score=-1.277 tagged_above=-999 required=5 tests=[AC_DIV_BONANZA=0.001, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lfJ4Z5Iv3ShB for <storagesync@ietfa.amsl.com>; Wed, 2 Dec 2015 07:05:44 -0800 (PST)
Received: from mail-ig0-x235.google.com (mail-ig0-x235.google.com [IPv6:2607:f8b0:4001:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5B5121A9301 for <storagesync@ietf.org>; Wed, 2 Dec 2015 07:05:44 -0800 (PST)
Received: by igcto18 with SMTP id to18so33285641igc.0 for <storagesync@ietf.org>; Wed, 02 Dec 2015 07:05:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mozilla-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=27LYvEsWsCmRDNt8yWqHddVfsgCfWRfUlnIjTgyTDuc=; b=kwxFQIPSzro4qoxHpQnZarw90Ip72s9E4+5bDhqhmRCmM2ShQi5sLnDgHGBezGneJW tDNrUXCb0xQPNpmPIjp3FQWeZZmrolC/TiJgM/noRKtgm1LBFbk1lhUb++Ic5rDi15nZ 12fTI1YNI4RWll60gXH2LleDcLvFi8FNcAHeWc3nRMXAxuzrAr8qOf+R8FOWco09IpJV 65ejiP6N/3sdMTw9g87Mxu1/2QTfZOm6qKghgS5Vc0EmgJwfEBAZ/JzJw7bhv4NYuxMJ SmKjGHETvusY6j++cElXyppKogVOBKll0wX2nGki2YW9fCIzZfZ248Xt5lw6FwvDLO0x fOjA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=27LYvEsWsCmRDNt8yWqHddVfsgCfWRfUlnIjTgyTDuc=; b=TtLOtfDOx09gRUv6N9n/WHlzCrp1PD6MuOQ+x6zhuJfIWim5d686aIacb9mkHtRNdx a1n2s3XTmuH3dCWjIp7ZLs9Kb8K4F+WotZMcB7Zfyz5ZLBnllmjoeUjgTq0wEmvNvlrY 3i6Rj4OrOeecTLZoXsFfgY1ox/11yCbHVfONxjIvBjyNji90qJXJYNasEnvT1MXBz7gS LiQepjFOzrIOmBQolO0RfaV30EvRBcVuRCQXluGmp1ssT8u5mGtFYGYI5XH8C74Sn4gP 2Csc/n4RjWJR2Bifx7gi1p3M2cGAbZC1BH/Si/Lk1ovSj4ziKGk3vLuROnqPOfMyJuq/ Mo8Q==
X-Gm-Message-State: ALoCoQkb5tTQEBBtGmfJH2+4K5PH0wWgcPYVA1X/cOesv9QbyDePeJoFbQ5Hkun78ywyvpbxc4V6
MIME-Version: 1.0
X-Received: by 10.50.57.84 with SMTP id g20mr30788699igq.44.1449068743572; Wed, 02 Dec 2015 07:05:43 -0800 (PST)
Received: by 10.107.137.68 with HTTP; Wed, 2 Dec 2015 07:05:43 -0800 (PST)
In-Reply-To: <CAO_Yprb0LzCmSU42BS=dnm66U+ACSbScmDDKxSGLYqDQ5uD2aA@mail.gmail.com>
References: <mailman.108.1449000023.26068.storagesync@ietf.org> <1449004445.2745758.455126129.5028FD2B@webmail.messagingengine.com> <CAO_YprZhCmUxEf=aGCYL=+CLbjUoD1ifpDFsrS7N40Npo4wr+w@mail.gmail.com> <1449050174.3667910.455617161.12EEE3C5@webmail.messagingengine.com> <1449051540970-b577e6c2-393e54ef-bbe05be4@gmail.com> <1449052128.3674794.455635937.667C3E1F@webmail.messagingengine.com> <CAPpPfeAdrCZcsYZo7=W6N14K4F2LutXN8BFTetikzKZSr8+vVA@mail.gmail.com> <1449060218.3721231.455737161.5D657D6D@webmail.messagingengine.com> <CAO_YprYp+cdCPQ1pEUJLcCh0uQL_mu-Y=MJOA7Oh92TWrM_tWQ@mail.gmail.com> <1449061417.3729762.455755681.08D95D5B@webmail.messagingengine.com> <CAO_Ypra_PWf0Uxt2Rbp_k49hDdjq1zvTQs9qkeZqRo0v0E3+=g@mail.gmail.com> <CAPpPfeDeaTDUqNuQtiTwtWvf_3uUXNY6DTUOeRbOokf6En408A@mail.gmail.com> <CAO_Yprb0LzCmSU42BS=dnm66U+ACSbScmDDKxSGLYqDQ5uD2aA@mail.gmail.com>
Date: Wed, 02 Dec 2015 16:05:43 +0100
Message-ID: <CAPpPfeBFfwD3NU0Y_zSFQdsT1o3HO1-Cd5RpokOzBVUa-3fC4Q@mail.gmail.com>
From: Michiel de Jong <mbdejong@mozilla.com>
To: Linhui Sun <lh.sunlinh@gmail.com>
Content-Type: multipart/alternative; boundary="047d7bdc12eab770ca0525eb9bcd"
Archived-At: <http://mailarchive.ietf.org/arch/msg/storagesync/6HmIUARiwn__OpbR5wQujibcIfI>
Cc: storagesync <storagesync@ietf.org>, fkooman <fkooman@tuxed.net>, Hugo González Labrador <ietf@hugo.labkode.com>
Subject: Re: [Storagesync] Storagesync Digest, Vol 5, Issue 1
X-BeenThere: storagesync@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Mechanisms to synchronize client file systems with Internet-based data storage services <storagesync.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storagesync>, <mailto:storagesync-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/storagesync/>
List-Post: <mailto:storagesync@ietf.org>
List-Help: <mailto:storagesync-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storagesync>, <mailto:storagesync-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Dec 2015 15:05:50 -0000

Ah sure, that's entirely appropriate, remoteStorage treats both the
Content-Type header value and the actual body of a document as opaque
strings of bytes. It doesn't "care" if you use it to store only data blocks
that are chunks of something else.

For instance, you could have a folder on a user's storage that contains
only inode-like JSON-documents, which list the URLs of binary documents
that make up 1Kb blocks of the actual data. Nice for deduping, delta
updates, and also renaming files without reuploading their content.

But yeah, the argument is that *how* to create and manage these chunks, is
then still left up to the application developer (or to another spec on top
of the remoteStorage spec).


Cheers,
Michiel.

On Wed, Dec 2, 2015 at 3:29 PM, Linhui Sun <lh.sunlinh@gmail.com> wrote:

> Hi
>
> 2015-12-02 22:05 GMT+08:00 Michiel de Jong <mbdejong@mozilla.com>:
>
>> Cool! I created https://github.com/remotestorage/spec/issues/137 about
>> the need for  MOVE verb.
>>
>> Application-level chunking is partially supported by HTTP itself through
>> `Content-Range` headers (although it's not clear whether these are allowed
>> on PUT requests as well as on GET requests, see
>> https://github.com/remotestorage/spec/issues/131). The problem is that
>> HTTP defines versioning at the document level, you cannot ask a server to
>> produce or check an ETag for a specific byte-range of a document, only for
>> the entire document.
>>
> Actually what I'm saying is a chunking before transmitting (using http),
> in this way, they are treated as individual documents from the standpoint
> of http. But I don't know whether this is appropriate for remoteStorage,
> just a comment.
>
> Regards,
> Linhui
>
>>
>> A comparison document sounds good! See also
>> http://www.servicedenuages.fr/en/generic-storage-ecosystem.
>>
>>
>> Cheers,
>> Michiel.
>>
>>
>> On Wed, Dec 2, 2015 at 2:32 PM, Linhui Sun <lh.sunlinh@gmail.com> wrote:
>>
>>> That's cool for me, a separate section for this might make sense.
>>>
>>> Another thing is do we need to include an application-layer chunking
>>> here (not just for a browser sync), since if we want to further extend
>>> other capabilities it is essential.
>>>
>>> Regards,
>>> Linhui
>>>
>>> 2015-12-02 21:03 GMT+08:00 Hugo González Labrador <ietf@hugo.labkode.com
>>> >:
>>>
>>>> I propose to come up with a list of advantages and disadvantages of
>>>> using WebDAV and compare them against a JSON/REST based approach, like
>>>> remoteStorage.
>>>>
>>>> Does it sound good ?
>>>>
>>>>
>>>> On Wed, Dec 2, 2015, at 01:59 PM, Linhui Sun wrote:
>>>>
>>>>
>>>>
>>>> 2015-12-02 20:43 GMT+08:00 Hugo González Labrador <
>>>> ietf@hugo.labkode.com>:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Dec 2, 2015, at 01:30 PM, Michiel de Jong wrote:
>>>>
>>>> Hi all!
>>>> Thanks for all your reactions to the remoteStorage Internet-Draft.
>>>>
>>>> We get the question about WebDAV a lot, in the next version we should
>>>> add a remark about it in the intro. The folder descriptions returned when
>>>> you GET a URL that ends with a / are indeed a deviation from the XML
>>>> returned by WebDAV server, and this is just because nowadays JSON is easier
>>>> to use than XML for developers, both client-side and server-side.
>>>>
>>>>
>>>>
>>>> I totally agree here, this was going to happen soon or later and it is
>>>> really appreciated.
>>>>
>>>>
>>>>
>>>> The fact that we don't require servers to support WebDAV's custom verbs
>>>> like PROPPATCH etc. is for three reasons:
>>>> 1) it's a lot of work to implement this without using an existing
>>>> WebDAV library
>>>> 2) in practice, a lot of WebDAV servers get it wrong, or don't
>>>> implement all of WebDAV. It's very annoying for client implementers to have
>>>> to deal with servers that e.g. chose not to implement LOCK and UNLOCK.
>>>> 3) we don't really need all these advanced features on top of standard
>>>> HTTP, just supporting GET/PUT/DELETE for resources, and adding a simple
>>>> folder description format, is enough for most applications.
>>>>
>>>> Other than that, the remoteStorage Internet-Draft specifies a *lot*
>>>> more than just how each HTTP verb should behave:
>>>> * requiring support for OAuth implicit-grant flow
>>>> * requiring ETag support and nested versioning (i.e. the folder's ETag
>>>> changes if anything within that folder changes)
>>>> * requiring CORS headers
>>>> * requiring a WebFinger announcement for service discovery
>>>> It would be easy to add these three things on top of WebDAV, instead of
>>>> putting them on top of our minimal GET/PUT/DELETE API definition. In fact,
>>>> we could probably separate it into two Internet-Drafts: one for the
>>>> 'RESTful folders' API which is our simplification of WebDAV, and a second
>>>> one for OAuth/ETags/CORS/WebFinger on top of either WebDAV or 'RESTful
>>>> folders' or whatever other HTTP-based API you want.
>>>>
>>>>
>>>>
>>>> There is one requirement that all synchronisers need to handle: moving
>>>> resources. In this spec the alternative of a WebDAV MOVE is not specified.
>>>>
>>>> It is true that a MOVE could be replaced with a DELETE + UPLOAD but
>>>> that is not acceptable in most cases due to the inefficiency of such
>>>> operation (cpu, bandwidth consumption ...)
>>>>
>>>> Is there a plan to support such basic feature?
>>>>
>>>> From the current remoteStorage spec, the ownCloud sync protocol can be
>>>> implemented. The missing bit is tracking those remote moves.
>>>>
>>>> I agree with Hugo that MOVE is useful, sometimes if you just rename a
>>>> file it will be perfect. But the question I have is that whether we need to
>>>> make two documents? Multiple choices is not good for standardization. In my
>>>> view, webdav is something that we need to have a very clear decision on
>>>> whether to consider it or not.
>>>>
>>>> Regards,
>>>> Linhui
>>>>
>>>>
>>>>
>>>> Cheers
>>>>
>>>>
>>>> On Wed, Dec 2, 2015 at 11:28 AM, Hugo González Labrador <
>>>> ietf@hugo.labkode.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Dec 2, 2015, at 11:18 AM, Linhui Sun wrote:
>>>>
>>>> Hi
>>>>
>>>> On 周三, 12月 2, 2015 at 17:56, Hugo González Labrador <
>>>> ietf@hugo.labkode.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Dec 2, 2015, at 08:20 AM, Linhui Sun wrote:
>>>>
>>>> Hi
>>>>
>>>> 2015-12-02 5:14 GMT+08:00 Hugo González Labrador <ietf@hugo.labkode.com
>>>> >:
>>>>
>>>> Hi,
>>>>
>>>> from my point of view the remoteStorage project addresses a subset of
>>>> the use cases of the  WebDAV specification.
>>>>
>>>> The main difference I can observe is that the specification is built on
>>>> REST instead of XML-based communication.
>>>>
>>>>
>>>> I personally like much more REST than WebDAV because it is more
>>>> developer friendly and it is faster to develop.
>>>>  Maybe the remoteStorage API becomes the next WebDAV :)
>>>>
>>>> However, the remoteStorage API does not provide a way of synchronising
>>>> data, this task is delegated to the developers.
>>>> Is there a plan to provide such feature based on the outcome of this
>>>> draft?
>>>>
>>>> I'm a little bit confused why you say the remoteStorage does not
>>>> provide that. Is that because remoteStorage does not perform like a typical
>>>> sync services (e.g. dropbox...) or you are saying something else?
>>>>
>>>> Yes, because it does not offer synchronisation capabilities.
>>>>
>>>> Got it. And What I am wondering is that do we need to include those
>>>> capabilities in a base specifications. Since it is hard to standardize the
>>>> capabilities like dedup or delta. Maybe a better way is to define those in
>>>> a separate specification,
>>>>
>>>>
>>>>
>>>>
>>>> Thanks for giving these examples - so by 'synchronisation capabilities'
>>>> you mean 1) deduplication and 2) delta updates? Anything else or is that an
>>>> exhaustive list?
>>>>
>>>>
>>>>
>>>> something like extensions. For a base document, we just need to define
>>>> how to perform sync operations.
>>>>
>>>>
>>>>
>>>> Yes, I agree that may be an extension of this draft could handle the
>>>> synchronisation part.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Our Internet-Draft is heavily focused on the world wide web, whose URLs
>>>> are not content-addressable, we can't change that. So that architecture is
>>>> not very friendly to deduplication, compared to for instance BitTorrent. As
>>>> you already said, developers can still dedupe on the server-side when
>>>> storing blobs to disk, and can also dedupe on the client side before
>>>> creating the resources the client uploads.
>>>> As far as I know, delta updates are not supported by the ETag system -
>>>> you cannot do a range request to find out if certain bytes within a
>>>> document have changed. However, the folder system we define does encourage
>>>> you, for instance when you develop a To Do List app, put each task on its
>>>> own document, and then query the folder to see which of them changed,
>>>> instead of putting them all in one big document and retrieving the whole
>>>> document each time.
>>>>
>>>> Cheers,
>>>> Michiel.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> BTW, I want to introduce ClawIO ( <http://clawio.github.io>
>>>> http://clawio.github.io), a research
>>>> project to benchmark different synchronisation protocols against
>>>> different data backends with special attention to provide a common sync
>>>> API.
>>>>
>>>> A common API for different sync protocols is being created based on the
>>>> architecture specified in this draft (control and data servers) and the
>>>>
>>>> I cannot find a distributed architecture in this draft. It seems that
>>>> they handle metadata and content data together, just like a normal web
>>>> service.
>>>>
>>>>
>>>> ClawIO is fully distributed. Every logical unit is a different server
>>>> than be scaled out. Data and Metadata channels are independent from each
>>>> other and reside on different servers.
>>>>
>>>> That is widely employed in popular sync services. And that is also
>>>> beneficial to privacy to some extent. But in the context of sync in browser
>>>> (which is mainly considered in the remoteStorage), I don't know whether
>>>> this is reasonable. But I think we should deploy distributed architecture
>>>> although it will make things complicated.
>>>>
>>>>
>>>>
>>>> Of course, the remoteStorage is targeted to browsers, so syncing does
>>>> not make too much sense in this case.
>>>> With the rise of Linux container micro-service based architectures, the
>>>> deployment of  such highly complex systems should become easier and faster.
>>>>
>>>> Best,
>>>>
>>>>
>>>> Hugo
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Linhui
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Linhui
>>>>
>>>> one from the CERN EOS project (management, disk and queue servers).
>>>>
>>>> The Phase I has implemented the ownCloud Sync Protocol and Phase II will
>>>> implement the SeaFile Sync Protocol.
>>>> The choice of these protocols among others is because they are really
>>>> opposed to each other in terms of syncing (delta vs non-delta,
>>>> state-based vs log/event/git-based sync …), so finding a common approach
>>>> is more challenging.
>>>>
>>>> Providing a base specification/architecture to measure the feasibility
>>>> of this draft is one of the objectives of the project.
>>>>
>>>> I believe that the work being done here and in ClawIO are supplementary
>>>> to each other and I think mutual collaboration could be beneficial for
>>>> both sides.
>>>>
>>>> Also, if there is interest, the remoteStorage API can be added to
>>>> ClawIO.
>>>>
>>>> Best regards,
>>>>
>>>> Hugo Gonzalez Labrador
>>>>
>>>> On Tue, Dec 1, 2015, at 09:00 PM, storagesync-request@ietf.org wrote:
>>>> > Send Storagesync mailing list submissions to
>>>> >       storagesync@ietf.org
>>>> >
>>>> > To subscribe or unsubscribe via the World Wide Web, visit
>>>> >        <https://www.ietf.org/mailman/listinfo/storagesync>
>>>> https://www.ietf.org/mailman/listinfo/storagesync
>>>> > or, via email, send a message with subject or body 'help' to
>>>> >       storagesync-request@ietf.org
>>>> >
>>>> > You can reach the person managing the list at
>>>> >       storagesync-owner@ietf.org
>>>> >
>>>> > When replying, please edit your Subject line so it is more specific
>>>> > than "Re: Contents of Storagesync digest..."
>>>> > Today's Topics:
>>>> >
>>>> >    1. New version of draft-dejong-remotestorage    Internet-Draft
>>>> >       available (Michiel de Jong)
>>>> >    2. Re: New version of draft-dejong-remotestorage Internet-Draft
>>>> >       available (Gihan Dias)
>>>> >    3. Re: New version of draft-dejong-remotestorage Internet-Draft
>>>> >       available (Fei Song)
>>>> > _______________________________________________
>>>> > Storagesync mailing list
>>>> > Storagesync@ietf.org
>>>> > <https://www.ietf.org/mailman/listinfo/storagesync>
>>>> https://www.ietf.org/mailman/listinfo/storagesync
>>>> > Email had 3 attachments:
>>>> > + [Storagesync] New version of draft-dejong-remotestorage
>>>> > Internet-Draft available
>>>> >   2k (message/rfc822)
>>>> > + Re: [Storagesync] New version of draft-dejong-remotestorage
>>>> > Internet-Draft available
>>>> >   1k (message/rfc822)
>>>> > + Re: [Storagesync] New version of draft-dejong-remotestorage
>>>> > Internet-Draft available
>>>> >   2k (message/rfc822)
>>>>
>>>> _______________________________________________
>>>> Storagesync mailing list
>>>> Storagesync@ietf.org
>>>> <https://www.ietf.org/mailman/listinfo/storagesync>
>>>> https://www.ietf.org/mailman/listinfo/storagesync
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>