Re: Draft for Resumable Uploads

Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com> Mon, 11 April 2022 10:43 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 12AF03A15E4 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 11 Apr 2022 03:43:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.66
X-Spam-Level:
X-Spam-Status: No, score=-2.66 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.248, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lyjOXNeUW4Oc for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 11 Apr 2022 03:43:38 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 283B93A15D7 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 11 Apr 2022 03:43:37 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1ndrUC-0002xf-3C for ietf-http-wg-dist@listhub.w3.org; Mon, 11 Apr 2022 10:41:32 +0000
Resent-Date: Mon, 11 Apr 2022 10:41:32 +0000
Resent-Message-Id: <E1ndrUC-0002xf-3C@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <gs-lists-ietf-http-wg@gluelogic.com>) id 1ndrUB-0002wp-8i for ietf-http-wg@listhub.w3.org; Mon, 11 Apr 2022 10:41:31 +0000
Received: from smtp1.atof.net ([52.86.233.228]) by mimas.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.92) (envelope-from <gs-lists-ietf-http-wg@gluelogic.com>) id 1ndrUA-0006K5-1A for ietf-http-wg@w3.org; Mon, 11 Apr 2022 10:41:31 +0000
X-Spam-Language: en
X-Spam-Relay-Country:
X-Spam-DCC: B=MGTINTERNET; R=smtp1.atof.net 1170; Body=1 Fuz1=1 Fuz2=1
X-Spam-RBL:
X-Spam-PYZOR: Reported 0 times.
Date: Mon, 11 Apr 2022 06:41:09 -0400
From: Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com>
To: Guoye Zhang <guoye_zhang@apple.com>
Cc: Eric J Bowman <mellowmutt@zoho.com>, Julian Reschke <julian.reschke@gmx.de>, ietf-http-wg <ietf-http-wg@w3.org>
Message-ID: <YlQFxaE2XuwmL23F@xps13>
References: <6e64f598-e82b-bff5-5ed9-c3c3f4b01439@gmx.de> <C6907036-146C-4FAB-938E-238473CB42B4@apple.com> <17ff7558cda.10ad81f8113705.2829201994677815148@zoho.com> <2FADC394-0954-4AA2-8F55-6CDF88833CB3@apple.com> <17ff85458eb.119b6ffbd16630.2281063094525551184@zoho.com> <589722AC-F37C-437F-80EA-E948150DE291@apple.com> <17ffd23de4c.b13d2f5e34074.4788478825389085155@zoho.com> <0CB3B5FB-B319-4C70-8579-95AEFDEEB0CD@apple.com> <17ffe2c9412.11d0eba3d36875.6608163927083538389@zoho.com> <0C2D6E56-8385-44F3-B195-ACCA25F76C53@apple.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
In-Reply-To: <0C2D6E56-8385-44F3-B195-ACCA25F76C53@apple.com>
Received-SPF: pass client-ip=52.86.233.228; envelope-from=gs-lists-ietf-http-wg@gluelogic.com; helo=smtp1.atof.net
X-W3C-Hub-Spam-Status: No, score=-6.2
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1ndrUA-0006K5-1A ad3fbf6b60230a3edc21e6100a62be77
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Draft for Resumable Uploads
Archived-At: <https://www.w3.org/mid/YlQFxaE2XuwmL23F@xps13>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/39988
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

The tus-v2 specification reads to me as an application.  Much thought
and effort has been put into this application, but it is an application
with a protocol for client and server.

Guoye Zhang seems to be proposing tus-v2 to implement
"upload (resumable) *transactions*" and define new behavior that the
server must track and handle partial uploads and be able to report
the sparse areas back to the client.  For that, the rsync protocol,
among others, already exists, but is this necessary?  (For parallel
uploads, it might be, though other protocols like zchunk may help
client discovery of server-side state.)


Large incremental uploads are already achievable using servers which
support partial-PUT, including SabreDAV and lighttpd mod_webdav:
upload file serially in chunks, extending the file with each chunk.
Cancellation is achievable with WebDAV DELETE.  The client can choose
an alternate filename while uploading, and then use WebDAV MOVE to
rename the file into place once the upload is complete.

Support for partial-PUT is a non-standard extension to WebDAV,
at least in part due to implementation tradeoffs (resource usage,
lock timeouts, impacting UX).  tus-v2 may be trying to address this.

A robust PUT implementation (whole file replacement) allows downloads
to proceed while a new version is uploaded.  This may be mapped onto
typical filesystems by uploading to a temporary file and atomically
renaming into place when the upload is complete.  WebDAV LOCK can be
used to ensure only one uploader at a time.

When making changes to a file, whole file replacement works well for
small files and often well-enough for medium-sized files.  It is the
case of large files that network bandwidth, reconnection and re-upload
costs, server disk space, and other resource usage might have a larger
impact.

On modern filesystems with support for cloning, a server might be able
to clone the file extents into a temporary file, PATCH a portion of the
file, and then atomically rename the new file into place.  This could be
done with PATCH or with partial-PUT.  For large files, enforcing use of
WebDAV LOCK is recommended to avoid excessive numbers of large
temporary files, especially if copying a large file to a temporary
file instead of cloning.

Another solution for PATCH-like behavior is to use DVCS protocols,
e.g. git, and serve the completed files from a repository working copy.


From my reading of the tus-v2 spec, only parallel upload is not
addressed by the solutions above.  Parallel upload and file
reconstruction is something that is currently achievable by
application-specific implementations, including tus-v2, and
potentially by some DVCS.


Eric Bowman makes numerous excellent points in prior messages, and I
would like to repeat one:

Eric J Bowman wrote:
> Unless you're coding an endpoint instead of a resource, in which
> case the only help I can offer you, is to think in terms of resources
> not endpoints.

> Indeed! In your example /upload is a tightly-coupled RPC endpoint;
> if the request body is the file content why are you using POST instead
> of PUT? 'HEAD/resource' lets the server know exactly which "upload" is
> referenced: if it was interrupted, the server knows it, and responds
> 206. Because REST.

If tus is aimed at /upload, an endpoint, then in my mind tus is an
application handling that endpoint.  Given the alternatives mentioned
above for handling resources, I do not see why a web server would
implement tus as an HTTP standard when end-users can configure tus as an
application running behind a web server to handle configured endpoints
such as /upload.


As others have pointed out, there is room for improvement in PATCH,
e.g. defining a new media type and associated behavior for PATCH.

Mark Nottingham wrote:
> PATCH intentionally leaves everything up to the media type of the
> PATCH request, not the implementation. With hindsight, at least one
> or two well-defined PATCH media types should have been defined at the
> same time as 5789 - their absence (especially JSON's) created a lot
> of confusion.

Eric J Bowman wrote:
> I think you and mnot are correct that we need better-defined PATCH
> media types, I believe that's where to solve this problem, but how
> any media type is rendered has traditionally and properly been a
> client-side concern in HTTP.

> I don't think anyone has even meant to imply that this isn't a
> problem worth solving. That being said... 20 years ago we figured
> _every_ upload was *replaceable* on failure, while realizing PATCH
> would increase in value hand-in-hand with filesize over time.
> Reckoning day has arrived! ;) Thanks for your contribution, and I
> mean that, otherwise I wouldn't bother.

Cheers, Glenn