Re: Draft for Resumable Uploads

Austin Wright <aaa@bzfx.net> Sun, 10 April 2022 22:35 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1614D3A1595 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Apr 2022 15:35:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.759
X-Spam-Level:
X-Spam-Status: No, score=-2.759 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.248, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=bzfx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ht4vQOcjQxF2 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Apr 2022 15:34:58 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2BADE3A1583 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 10 Apr 2022 15:34:57 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1ndg6k-0007jT-0l for ietf-http-wg-dist@listhub.w3.org; Sun, 10 Apr 2022 22:32:34 +0000
Resent-Date: Sun, 10 Apr 2022 22:32:34 +0000
Resent-Message-Id: <E1ndg6k-0007jT-0l@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <aaa@bzfx.net>) id 1ndg6j-0007ia-7R for ietf-http-wg@listhub.w3.org; Sun, 10 Apr 2022 22:32:33 +0000
Received: from mail-pj1-f50.google.com ([209.85.216.50]) by mimas.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from <aaa@bzfx.net>) id 1ndg6h-00032I-I2 for ietf-http-wg@w3.org; Sun, 10 Apr 2022 22:32:33 +0000
Received: by mail-pj1-f50.google.com with SMTP id z6-20020a17090a398600b001cb9fca3210so1028607pjb.1 for <ietf-http-wg@w3.org>; Sun, 10 Apr 2022 15:32:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bzfx.net; s=google; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=3r+sGaQhUZgQBHvK4S+b9LS8Q+lKswEdmqBLZ/yiiGk=; b=Vl/kLm0usW5lfHQUVR8m32bZGvRTKkScdUejWiAQvR0pGQwXCZDM1D3d2ylzgAa28v xcjUV9FJ+RVYXYM5QqzfRyvYpiL/hmnh/nk8gIzgZQFErfn1x9RhNIc8/dh3XxwvoExn +sygPsqkP0A3+UO1h12OUTjBM0qGB4tMTUI5s=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=3r+sGaQhUZgQBHvK4S+b9LS8Q+lKswEdmqBLZ/yiiGk=; b=a7EDDU8a5M2a+Ltov/NZlyV13yEkKmVlRJHWWY80+7dEAjPZWyAwooizJB3aHRs5Vo WCqWssMW9r6Anq+RxsUluE2zYo1EuZ2Cum9dUiKLPbzUkRpZZmF9r2nTUzx4wI8ZDedB epDeqJfY1+zIyX+3P/yeJZPGe0aMcGE4ym2rZOrJQuihLq9RKYEN29g7dMSwcIMKRKm0 leAlCqnHuZOlWa3zn81WrCRy43n+lCjN7fqD4YJgKSpvhr672FQmMqW9rHZ9Rhkc7vno dvF/2jS8dCDe6ddj8E2cxDubHe29q7BU0n9kI8ZGiEPE9TIQbHg5pLI1Qdfab7YzWH6W qOAg==
X-Gm-Message-State: AOAM531If4bAaULj+xNf/yO7IJ08khWjLIqkpopdM7555eZiZc1H1WGC K7uHwQoXO12vTdmkWEHVNQbqxg==
X-Google-Smtp-Source: ABdhPJy+SSBzOEKJUb9dnNMzEvDAel0qS7MrBah6PfEM3Lb/bTB0dtLSoiDAJ2BsonORYFmpz+UMww==
X-Received: by 2002:a17:90a:4889:b0:1cb:646b:e33a with SMTP id b9-20020a17090a488900b001cb646be33amr9221785pjh.136.1649629880154; Sun, 10 Apr 2022 15:31:20 -0700 (PDT)
Received: from smtpclient.apple (71-223-73-157.phnx.qwest.net. [71.223.73.157]) by smtp.gmail.com with ESMTPSA id o27-20020a63731b000000b0038232af858esm26986973pgc.65.2022.04.10.15.31.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 10 Apr 2022 15:31:18 -0700 (PDT)
From: Austin Wright <aaa@bzfx.net>
Message-Id: <96617128-CFEB-471D-80BB-71C85684A552@bzfx.net>
Content-Type: multipart/alternative; boundary="Apple-Mail=_6AB73490-630A-4824-AF63-38343002EED0"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\))
Date: Sun, 10 Apr 2022 15:31:17 -0700
In-Reply-To: <AFC87471-BF4E-4F50-8CA5-182E90753D33@apple.com>
Cc: Marius Kleidl <marius@transloadit.com>, ietf-http-wg <ietf-http-wg@w3.org>
To: Guoye Zhang <guoye_zhang@apple.com>
References: <CANY19NvMcPQaHRamFe-yy-E38xKo2XrmFCKVRoPbyBMQhoY6vA@mail.gmail.com> <43E868F0-457C-4BFB-A8D8-AAF84A06C3C3@bzfx.net> <AFC87471-BF4E-4F50-8CA5-182E90753D33@apple.com>
X-Mailer: Apple Mail (2.3696.80.82.1.1)
Received-SPF: pass client-ip=209.85.216.50; envelope-from=aaa@bzfx.net; helo=mail-pj1-f50.google.com
X-W3C-Hub-DKIM-Status: validation passed: (address=aaa@bzfx.net domain=bzfx.net), signature is good
X-W3C-Hub-Spam-Status: No, score=-6.1
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1ndg6h-00032I-I2 ca66f84001512fbff057919d3803b565
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Draft for Resumable Uploads
Archived-At: <https://www.w3.org/mid/96617128-CFEB-471D-80BB-71C85684A552@bzfx.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/39987
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

> On Apr 6, 2022, at 01:29, Guoye Zhang <guoye_zhang@apple.com> wrote:
> 
> Originally, an earlier internal draft version had “Upload Creation Procedure” and “Upload Appending Procedure” with PATCH. Then we recognized that they are nearly the same thing with all the same requirements, just different offsets, so we merged them into a single “Upload Transfer Procedure”.
> 
> We can definitely revisit this decision if the consensus is to adopt and improve PATCH.

I understand them to be different things based on how they’re using HTTP semantics. "Upload Appending Procedure” seemed to follow the method semantics. In contrast, "Upload Transfer Procedure" expands each HTTP method to do something it could not previously do: Ignore the head of this request, and instead combine its body with a previous request (violating the understanding that HTTP messages are stateless).

You work around this by requiring that clients not send requests if the origin server would misunderstand the feature. This is never the client’s responsibility in HTTP, because a message may pass through multiple programs and servers, not all of which may understand the expanded semantics. And even with respect to the origin server, signals like a DNS record, or even something as specific as an OPTIONS response on the same URI, cannot technically guarantee that the origin will honor the expanded semantics. I think the difference between using HTTP as a substrate versus using it as an application, is: an HTTP substrate merely re-uses existing libraries (e.g. message & protocol parsers) but not necessarily the semantics, and does not support the entire ecosystem of caches, gateways, and proxies (which collectively form HTTP The Application).

However, I see some somewhat straightforward changes could fix this.

In general, there’s three options to adding a new feature to HTTP:

1. The server has to transparently fall back to acceptable behavior (e.g. 200 OK instead of 206 Partial Content)
2. The server has to produce an error (e.g. unknown method, or unknown media type)
3. It has to be implemented a different layer (e.g. as a feature of TCP, TLS, SCTP, HTTP/2 framing, or QUIC)

I see a combination of these being necessary:

(1) The client should assume no support, and make its first request as normal (requesting support for resumable uploads). If the server supports the feature, it can communicate this back, with a URI representing the attempted operation. This satisfies option (1) transparently fall back.

(2) If the connection needs to be resumed, the client can use a new method, or PATCH with a new media type, on the server’s selected operation URI. This satisfies option (2) produce an error.

(3) Do resumable uploads really need to be specific to HTTP? Suppose there were a feature of HTTP/2 framing, where a client could ask the server “Please generate a UUID for this stream and keep it in the background in the event of a disconnection” or “Please tell me where that stream left off, and resume this stream from there”. This satisfies option (3) different layer.

Finally, your proposal is general and low-level (to the point where the same effect could be achieved in TLS instead). Since virtually all Web applications are implemented above the transport layer, it may be technically difficult to implement, at very little benefit. It follows general stream semantics (ordered bytes), which prohibits features like multiple parallel uploads, which is important as server workloads become increasingly parallel. There would be only a handful of ways this could be implemented in Web applications:

1. The OS or HTTP server implements resumable uploads, and combines multiple HTTP messages in a manner transparent to the written application. If the subsequent HTTP requests hit a different origin server than the first one (a different node in the cluster), it would have to hand-off the request body somehow.

2. The application is written as some sort of state machine that can be handed off or shared between multiple nodes in a cluster. I’m not aware of any development frameworks that do anything remotely like this.

3. Developers choose the specific resources & methods that are likely to benefit from resumable uploads, and describe how the database stores the intermediate progress of each class of operation. This seems like a huge potential for errors due to how infrequently many code paths would be run, and how difficult it is to trigger in tests.

Resumable requests are still reasonable, but I think where we stand to benefit the most are application-layer approaches. In particular, segmented file uploading, which would enable “parallel PUT”.

Thanks,

Austin.