Re: Draft v1 Update for Resumable Uploads

Guoye Zhang <guoye_zhang@apple.com> Sun, 19 June 2022 08:08 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 37D1AC15AAC5 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 19 Jun 2022 01:08:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.505
X-Spam-Level:
X-Spam-Status: No, score=-3.505 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.745, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=apple.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KsDqEKwQLnQK for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 19 Jun 2022 01:08:09 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 01455C15AAC4 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 19 Jun 2022 01:08:08 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1o2pvj-0007HS-Se for ietf-http-wg-dist@listhub.w3.org; Sun, 19 Jun 2022 08:05:11 +0000
Resent-Date: Sun, 19 Jun 2022 08:05:11 +0000
Resent-Message-Id: <E1o2pvj-0007HS-Se@lyra.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <guoye_zhang@apple.com>) id 1o2pvh-0007G9-W8 for ietf-http-wg@listhub.w3.org; Sun, 19 Jun 2022 08:05:10 +0000
Received: from ma1-aaemail-dr-lapp03.apple.com ([17.171.2.72]) by titan.w3.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <guoye_zhang@apple.com>) id 1o2pvg-000bUb-JZ for ietf-http-wg@w3.org; Sun, 19 Jun 2022 08:05:09 +0000
Received: from pps.filterd (ma1-aaemail-dr-lapp03.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp03.apple.com (8.16.0.42/8.16.0.42) with SMTP id 25J84Vmr024880; Sun, 19 Jun 2022 01:04:47 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=from : message-id : content-type : mime-version : subject : date : in-reply-to : cc : to : references; s=20180706; bh=dK7p2quokndFA5DzKTFRpGm2POsg7m8mzgU9/ybD9pY=; b=qvy3v9beD6IlN6vJtNXZ47RYKPWJP+v4nPjArujqklJHzcaOn8sBfWxV/J2EBbbK9GR+ oPYMWzQvE4UwGzxA//+nwK/bJ81Yf0vr+4MV83ANhg2FPU9TdFLxt18saBjYnne0cGzM RWZwoyD3V4wlRdi3HjWXnVfuDyLzbNytrgRBg8rrOAQ3H+Sl9lc+DlKJW6mAfg+MdY6q aYbHSgjTw9KATZzFKVYW5WOmjfRs3u5k0NVXbF9vVNaN5LdR8U6vCBuKylDuCFEmmID4 qhNcU3gMCImT5IRqp3tQ5swnfOUvwG5zuiblSwR0WWXxvyooJTUdlX2qSfOUlagvOyKe 2Q==
Received: from rn-mailsvcp-mta-lapp04.rno.apple.com (rn-mailsvcp-mta-lapp04.rno.apple.com [10.225.203.152]) by ma1-aaemail-dr-lapp03.apple.com with ESMTP id 3gsdpv7nr3-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sun, 19 Jun 2022 01:04:47 -0700
Received: from rn-mailsvcp-mmp-lapp04.rno.apple.com (rn-mailsvcp-mmp-lapp04.rno.apple.com [17.179.253.17]) by rn-mailsvcp-mta-lapp04.rno.apple.com (Oracle Communications Messaging Server 8.1.0.18.20220407 64bit (built Apr 7 2022)) with ESMTPS id <0RDP011YST3YRM30@rn-mailsvcp-mta-lapp04.rno.apple.com>; Sun, 19 Jun 2022 01:04:46 -0700 (PDT)
Received: from process_milters-daemon.rn-mailsvcp-mmp-lapp04.rno.apple.com by rn-mailsvcp-mmp-lapp04.rno.apple.com (Oracle Communications Messaging Server 8.1.0.18.20220407 64bit (built Apr 7 2022)) id <0RDP00400SJVK400@rn-mailsvcp-mmp-lapp04.rno.apple.com>; Sun, 19 Jun 2022 01:04:46 -0700 (PDT)
X-Va-A:
X-Va-T-CD: a6b43a3d3c4e9b7d003cccb3294e6828
X-Va-E-CD: 7fa589823f194c8498e6df6440bddbf3
X-Va-R-CD: 87a202228b76ae5a02807a21fbbc1b7c
X-Va-CD: 0
X-Va-ID: 26888495-3e14-438a-8df5-2fbeccc397fb
X-V-A:
X-V-T-CD: a6b43a3d3c4e9b7d003cccb3294e6828
X-V-E-CD: 7fa589823f194c8498e6df6440bddbf3
X-V-R-CD: 87a202228b76ae5a02807a21fbbc1b7c
X-V-CD: 0
X-V-ID: 5fb83246-c039-4132-aa2f-f0f2837ef8d4
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-06-19_06:2022-06-17,2022-06-19 signatures=0
Received: from smtpclient.apple (unknown [17.11.78.123]) by rn-mailsvcp-mmp-lapp04.rno.apple.com (Oracle Communications Messaging Server 8.1.0.18.20220407 64bit (built Apr 7 2022)) with ESMTPSA id <0RDP00P3CT3Y0500@rn-mailsvcp-mmp-lapp04.rno.apple.com>; Sun, 19 Jun 2022 01:04:46 -0700 (PDT)
From: Guoye Zhang <guoye_zhang@apple.com>
Message-id: <D149DCFE-A5C9-418D-80B4-3B5F138AA497@apple.com>
Content-type: multipart/alternative; boundary="Apple-Mail=_A483B3D7-0219-4C89-8B23-F520B7EBBB02"
MIME-version: 1.0 (Mac OS X Mail 16.0 \(3724.0.1.1.31\))
Date: Sun, 19 Jun 2022 01:04:35 -0700
In-reply-to: <Yq67WGkb0LtJIAP9@xps13>
Cc: ietf-http-wg@w3.org
To: gs-lists-ietf-http-wg@gluelogic.com
References: <BED5A5BC-3F7F-47E2-815E-DC0483328DFD@apple.com> <Yq67WGkb0LtJIAP9@xps13>
X-Mailer: Apple Mail (2.3724.0.1.1.31)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-06-19_06:2022-06-17,2022-06-19 signatures=0
Received-SPF: pass client-ip=17.171.2.72; envelope-from=guoye_zhang@apple.com; helo=ma1-aaemail-dr-lapp03.apple.com
X-W3C-Hub-DKIM-Status: validation passed: (address=guoye_zhang@apple.com domain=apple.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-7.0
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.571, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1o2pvg-000bUb-JZ cf80206865d4d87abeba7771c8cf12d9
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Draft v1 Update for Resumable Uploads
Archived-At: <https://www.w3.org/mid/D149DCFE-A5C9-418D-80B4-3B5F138AA497@apple.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/40162
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>


> On Jun 18, 2022, at 22:59, gs-lists-ietf-http-wg@gluelogic.com wrote:
> 
> On Thu, Jun 16, 2022 at 02:30:59PM -0700, Guoye Zhang wrote:
>> Our previous resumable upload draft generated a lot of discussions.
> 
> At least in my case, I attempted to be polite after you submitted a
> draft without first doing a survey of existing RFCs.  You admitted no
> knowledge of WebDAV RFCs, which I deemed a large oversight considering
> the nature of the tus-v2 protocol.

We have looked into WebDAV protocol, but we do not think it’s the direction we want to go. tus-v2 is designed to be a lightweight single-purpose protocol that’s easily implementable by clients and servers. We do not want to design a discovery method for WebDAV and force servers to implement the full WebDAV just for this one feature.

Sorry I did not directly address your feedback. Happy to expand on our thoughts and discuss more if you wish.
> 
>> I’m glad to announce that we have a new draft ready to address many feedbacks that suggested adopting the PATCH method.
> 
> The draft abstract begins with unsubstantiated claims to justify itself,
> and I believe that almost all of those claims are also misleading.
> 
> "HTTP clients often encounter interrupted data transfers as a result of canceled requests or dropped connections. [...] it is often desirable to issue subsequent requests that transfer only the remainder of the representation."
> 
> The multiple uses of "often" are misrepresentations, IMHO.
> 
> A large percentage of HTTP requests are GET/HEAD and have no body.
> A sizable percentage (if not more) of HTTP POST requests are small,
> e.g. using POST as an alternative to GET along with XSRF tokens.
> 
> What data do you have to support the claims in the draft Abstract?
> What percentage of requests have request bodies, and further have
> request bodies that are sufficiently large that it is excessively
> wasteful to resend the entire representation? (and when safe to do so!)

> 
> For high-quality wired networks, interrupted data transfers are less
> common, though more possible over long-distance links.  For wireless
> and mobile, interruptions may be more common, e.g. while uploading
> pictures and videos.

We can revise the wording, but we don’t think it’s a misleading claim.

Apple has a Feedback Assistant app which allows customers to file bug reports and upload device diagnostics. These diagnostics are usually hundreds of megabytes in size, and if interrupted, we have to upload them again from the beginning. This has been one of the most common complains we receive.

> 
> Now, it is true that non-idempotent requests such as POST and PUT
> are not generically safe to automatically retry upon failure.
> 
> If you are trying to come up with a generic solution to recover a
> non-idempotent request, that should be more explicit and better scoped
> in the draft than potentially extending multiple existing HTTP request
> methods.  Such a goal would require specifying that a server not start
> processing the upload in any non-idempotent way until the upload was
> complete.  Other requirements might also be necessary.
> 
This is not true. The resumable upload protocol is designed so the server can start processing data immediately, since clients are required to resume from the exact interruption point. The protocol can be implemented by a CDN so the origin server just receives a regular upload.
> 
> 
> Using WebDAV HTTP methods for upload:
> 
> I see two categories of targets for large uploads:
> 1. uploading to a target where the target is a resource
> 2. uploading to a target where the target is an endpoint
>   (e.g. script which may process the upload)
> 
> The first is already possible using WebDAV (explained below, yet again).
> The second can be implemented by an application, and IMHO should not
> require any changes to HTTP servers and proxies.  More specifically,
> tus-v2 should not require new resource management by HTTP servers and
> proxies, instead delegating that management to specific user
> applications.  Additionally, the second item might be implemented using
> a similar WebDAV solution as the first:
> 
> RFC 9110 HTTP Semantics
> 14.5. Partial PUT
> https://httpwg.org/specs/rfc9110.html#partial.PUT
> notes that Partial PUT may be implemented by an HTTP server for some
> resources.
> 
> lighttpd 1.4.65+ allows Partial PUT safely with config:
>  webdav.opts += ("partial-put-copy-modify" => "enable")
> This includes extending files.  (This is safe in earlier versions of
> lighttpd, too, but only if the targets are uniquely named so as to
> not possibly be in the process of being downloaded by other clients,
> i.e. temporary files.)
> 
> Using lighttpd mod_webdav and the WebDAV protocol, a client can
> incrementally upload to a temporary file, and then rename the file when
> the upload is complete.  The client could also DELETE the temporary file
> to cancel.
> 
> A client uploading to an endpoint might upload the request body to an
> alternate location on the same server, and when the upload is complete,
> send a request to the endpoint with a request header containing the path
> to the completed upload of the request body.
> 
> Here is an example set of pseudo-HTTP requests, uploading a file in
> 256k chunks, and recovering from a disconnect:
> 
> 
> LOCK /file.XXXXXX HTTP/1.1
> 
> 201 Created
> ETag: "aaaaaa"
> 
> 
> PUT /file.XXXXXX HTTP/1.1
> Content-Range: bytes 0-262143/262144
> If-Match: "aaaaaa"
> 
> 204 No Content
> ETag: "bbbbbb"
> 
> 
> PUT /file.XXXXXX HTTP/1.1
> Content-Range: bytes 262144-524287/524288
> If-Match: "bbbbbb"
> 
> <disconnect>
> 
> 
> # (recovery resynchronization if disconnect occurs)
> 
> HEAD /file.XXXXXX HTTP/1.1
> 
> 200 OK
> Content-Length: 262144
> ETag: "bbbbbb"
> 
> 
> PUT /file.XXXXXX HTTP/1.1
> Content-Range: bytes 262144-524287/524288
> If-Match: "bbbbbb"
> 
> 204 No Content
> ETag: "cccccc"
> 
> 
> # (... further PUT to append additional blocks ...)
> 
> 
> # side-effect of MOVE does equivalent of UNLOCK /file.XXXXXX in lighttpd
> 
> MOVE /file.XXXXXX HTTP/1.1
> Destination: /file
> 
> 201 Created

Partial PUT isn’t a clear defined standard, and we cannot use “Content-Range” as explained above since the ability to upload with unknown length is required.

We are happy to revise the method and header names used by Upload Appending Procedure and all other procedures as long as we maintain the capabilities of tus protocol. If the consensus is that PUT is better than PATCH, we will modify our draft to adopt it.
> 
> 
>> 2. Media types
>> 
>> PATCH currently doesn’t define a media type. We went through the list of media types but couldn’t find the appropriate category for the Upload Appending Procedure. It is a generic byte-appending operation that can modify any types of media, so we don’t think it fits into an application media type.
> 
> If tus-v2 is going to use PATCH:
> Why is tus-v2 not handled as PATCH with media-type application/tus-v2?
> tus-v2 is an application protocol.  Content-Type: application/tus-v2
> along with tus-v2 request headers would indicate how the request body is
> treated by PATCH implementations, if they support application/tus-v2.

From my reading of the PATCH standard, media type should be the type of the content that we are trying to modify. Since this is a generic PATCH that operates on any types of content, picking an application media type seems odd. That being said, the suggestion in the other thread of using “message/byte-range” PATCH could be a good fit.
> 
> 
>> 3. 1xx intermediate response
>> 
>> We surveyed the most popular HTTP libraries in many languages, and nearly all of them consider 1xx responses an internal signaling mechanism so they don’t expose the ability for applications to handle them. (We are also guilty of this as maintainers of URLSession API on Apple platforms.) If we use 1xx response for any critical information, it would prevent nearly all tus-v1 adopters to switch to this new protocol until it’s natively supported in HTTP libraries.
> 
> Multiple 1xx HTTP responses may be sent by an HTTP server before sending
> the final HTTP response.  Not all existing HTTP servers support this,
> and there may be security and resource implications.  lighttpd 1.4.56
> and later forward 1xx responses from a backend to the client, but can be
> configured to ignore 1xx responses (besides 101 Switching Protocols)
> from backends if site security policy dictates.  In short, an
> application behind lighttpd could send an additional "100 Continue"
> with Upload-Token response header.
> 
> Client HTTP libraries already need to be extended to support tus-v2,
> so would access to 1xx response headers be unworkable where a new 104
> HTTP status would succeed?  If client HTTP libraries do not have a
> callback or some other interface for applications to receive HTTP 1xx
> intermediate responses, that would need to be added for tus-v2 feature
> detection, wouldn't it?

Feature detection is an optional part of the protocol. If an application controls both the client and the server (which is the case today with tus-v1), they can implement the protocol without using 1xx status code. We only require feature detection when a generic HTTP client tries to upgrade a regular upload to a resumable upload.
> 
>> We think having just the feature detection part using 1xx response is a good balance, both eliminating any extra round trips for HTTP libraries implementing this protocol and allowing application adopters to ignore it.
> 
> For a sufficiently large upload, clients should send
>  Expect: 100-continue
> and the extra round-trip should be lost in the noise.
> Also, there are many reasons for Expect: 100-continue,
> among others: including verifying authn/authz, and upload size limits
> (i.e. if the server will reject a very large Content-Length)
> before beginning a large upload.
> 
> If the client also sent a hypothetical
>  Upload-Token: .
> then the origin server supporting resumable uploads could respond with
>  100 Continue
>  Upload-Token: /uri/path/to/file.XXXXXX
> to indicate that it is storing the request body as a resource at that
> location, and the client may query and extend it using WebDAV HTTP
> methods should a disconnection occur.  There could be additional
> headers which convey policy information such as how long the
> Upload-Token is valid, i.e. how long the server may store the temporary
> resource before deleting it.  To avoid clients abusing this temporary
> storage and sharing the link with others, it may be advisable to limit
> access to the upload path to HEAD, PUT (partial PUT), PATCH, OPTIONS,
> and perhaps PROPFIND methods, and to reject GET, QUERY, and all other
> methods.
> 
> If 100 Continue is not desirable for some reason, would it be possible to
> repurpose the WebDAV RFC 2518 response status 102 Processing?
> (removed in RFC 4918)
> http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml
> Why is a new status needed?  (104 Upload Resumption Supported)
> 
We’ve not seen consistent support of “Expect: 100-continue”. Some middleboxes reply with 100 immediately, and some middleboxes drop the 100 response. Therefore, we think a different 1xx status code would work better. We will explore different status code such as 102, but defining a new status code for a new purpose seems like the most straightforward option, as it will be least likely to break existing software.

> 
> I used "Upload-Token" above as that is in the tus draft, but it could be
> named something else.  Also, the token could be an encrypted unique
> identifier, and the client could query/resume a disconnected request by
> providing the identifier to the endpoint along with request headers to
> indicate resumed upload, e.g. the tus resumable uploads protocol but
> application-specific, without the need for HTTP servers or proxies to
> know about this optional protocol which has application-specific policy
> for resource management of resumable uploads.
> 
> 
>> there isn’t a straightforward to mechanically change the URI to distinguish between attempts.
> 
> Content-Location?

The protocol currently works with any upload and with no extra configuration. Would the value of “Content-Location” be automatically derived from the current URI? Or does it require manual configuration?
> 
> 
>> Looking forward to continuing the discussions and refinements of the draft.
> 
> The draft fails to indicate why existing, standard WebDAV HTTP methods
> are not sufficient.  I believe they are sufficient and have given
> examples above.  The draft makes no mention of partial PUT and its
> potential shortcomings compared to PATCH.  The draft does not
> distinguish between uploading a resource -- for which WebDAV methods
> are already a viable solution -- and uploading to an endpoint -- for
> which WebDAV HTTP methods may be a viable solution.
> 
> I urge IETF reviewers to strongly recommend that the optional tus-v2
> protocol be implementable by clients and server-side applications
> without requiring new support beyond existing standards (e.g. 1xx
> informational responses and WebDAV) from HTTP servers and proxies.
> 
> Thank you, Glenn

Maybe our goal isn’t very clear from the draft. We don’t just want this to be an application protocol. Yes, it can be implemented by an application on top of existing HTTP libraries, but the reason we are bringing this to the HTTP workgroup is that we hope to build support for this in the HTTP library itself. The goal is to move toward a future where every upload is resumable.

Guoye