Re: Indeterminate-length partial content messages

Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com> Fri, 03 November 2023 05:52 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=ietf.org@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 09B32C15C2BA for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 2 Nov 2023 22:52:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.656
X-Spam-Level:
X-Spam-Status: No, score=-7.656 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wNKmI7_RB_fU for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 2 Nov 2023 22:51:56 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 00D0DC14CEF9 for <httpbisa-archive-bis2Juki@ietf.org>; Thu, 2 Nov 2023 22:51:55 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.94.2) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1qyn3B-005oR8-Ao for ietf-http-wg-dist@listhub.w3.org; Fri, 03 Nov 2023 05:48:57 +0000
Resent-Date: Fri, 03 Nov 2023 05:48:57 +0000
Resent-Message-Id: <E1qyn3B-005oR8-Ao@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <gs-lists-ietf-http-wg@gluelogic.com>) id 1qyn39-005oQA-Is for ietf-http-wg@listhub.w3.org; Fri, 03 Nov 2023 05:48:55 +0000
Received: from smtp1.atof.net ([52.86.233.228]) by mimas.w3.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (Exim 4.94.2) (envelope-from <gs-lists-ietf-http-wg@gluelogic.com>) id 1qyn37-001Vef-SI for ietf-http-wg@w3.org; Fri, 03 Nov 2023 05:48:55 +0000
X-Spam-Language: en
X-Spam-Relay-Country:
X-Spam-DCC: B=; R=smtp1.atof.net 1206; Body=1 Fuz1=1 Fuz2=1
X-Spam-RBL:
X-Spam-PYZOR: Reported 0 times.
Date: Fri, 03 Nov 2023 01:48:40 -0400
From: Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com>
To: Austin William Wright <aaa@bzfx.net>
Cc: ietf-http-wg <ietf-http-wg@w3.org>
Message-ID: <ZUSJuHPNdy/8MXXP@xps13>
References: <24E33249-6BFD-4625-9AFB-621D91C4E4A8@bzfx.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <24E33249-6BFD-4625-9AFB-621D91C4E4A8@bzfx.net>
Received-SPF: pass client-ip=52.86.233.228; envelope-from=gs-lists-ietf-http-wg@gluelogic.com; helo=smtp1.atof.net
X-W3C-Hub-Spam-Status: No, score=-6.2
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1qyn37-001Vef-SI 93b818e76990be98b172a45a2cbf34a1
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Indeterminate-length partial content messages
Archived-At: <https://www.w3.org/mid/ZUSJuHPNdy/8MXXP@xps13>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/51551
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/email/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Thu, Nov 02, 2023 at 09:26:17PM -0700, Austin William Wright wrote:
> Hello HTTP WG,
> 
> HTTP APIs recently adopted my “Byte Range Patch” draft, a media type to overwrite or append to a resource, especially when Partial PUT is not suitable (e.g. when server support is undetermined, or when the patch must be exchanged as a file). It re-uses standard HTTP fields including Content-Range, but notably this field doesn’t support ranges of indeterminate length, so there's no way to encode an indefinitely long write at a specific offset—you must know the length of the write when you begin the request.
> 
> Currently, as a workaround, the draft specifies a special case for the Content-Range syntax. Since workarounds like this are questionable, I’ll specify a different field name (perhaps "Content-Offset"). However, this brings its own problems: There would be a field that only functions in the body of these PATCH requests, and isn’t used in HTTP headers or any other context. While it's plausible for Byte Range PATCH to abandon the HTTP field design, I think it may be simpler on the whole to share semantics with Range requests and multipart messages. And further, standardizing partial content offsets would have greater utility, including in 206 Partial Content responses, and synchronization more broadly.
> 
> This problem was previously observed in streaming media, treated by the experimental RFC 8673 <https://www.rfc-editor.org/rfc/rfc8673.html> (HTTP Random Access and Live Content). This RFC suggests using a very large Range endpoint to request the server stream content as it becomes available. Then, it uses this large number in the response Content-Range to indicate that the response is of indeterminate length.
> 
> While this solution caters to the requirements of Range requests, it cannot be used for uploads, where no Range field is used. And if a client sends “Content-Range: bytes 100-999999999999” in a request, but ends the stream with less content than that, this should only be seen as an error, not as an understanding that the exact length was unknown. So I think a different, more general solution is warranted.
> 
> This would also serve as an important building block for synchronization over HTTP, since this could be used in Partial PUT <https://www.rfc-editor.org/rfc/rfc9110.html#name-partial-put> or a Byte Range PATCH <https://datatracker.ietf.org/doc/draft-ietf-httpapi-patch-byterange/> to append to shift buffers <https://www.rfc-editor.org/rfc/rfc8673.html#name-shift-buffer-representation>, and other “live resources” whose content is not entirely known at request-time, but may be streamed as it becomes available (as opposed to transferring only the data defined at the time of the request). Features that could build on top of this may include:
> 
> - Indicating support for indeterminate-length 206 (Partial Content) responses.
> - Indicating preference for “stream live data” versus “snapshot-at-request-time” messages.
> - Managing sparse resources, including shift buffers and more complicated synchronization (e.g. multiple clients uploading to the same resource in parallel).
> - Optimizing caching for shift buffers (e.g. indicate that content may grow and/or become forgotten, but does not change once defined).
> - Subscribing to changes to an underlying resource (in realtime or as desired).
> 
> The two most obvious ways to define this feature would be (1) to extend Content-Range to a form like “bytes 10-*/*” (where the star indicates exact value unknown), or (2) a new header like “Content-Offset: 10” (or what Resumable Uploads calls “Upload-Offset” <https://httpwg.org/http-extensions/draft-ietf-httpbis-resumable-upload.html#name-upload-offset>).
> 
> I would like to propose "Content-Offset = sf-integer”, since there's a certain symmetry to it (other HTTP fields that change when the message is of indeterminate length). Though, in cachable responses, modifying Content-Range may be desirable instead, depending on how origins want to non-implementing caches to act. While some amount of compatibility must be considered (especially caches), I feel this is a problem that will spawn domain-specific solutions over and over until there’s a general solution.
> 
> Please send me feedback on this proposal, I would especially like to hear from anyone with experience implementing HTTP Random Access and Live Content, or with any of the use cases I’m describing here. Then if this seems reasonable, I’d can draft an experimental I-D.

When uploading a live stream, it is inefficient to send one byte of data
at a time.  A quantum of data is sent instead.  This could be 1 second
of audio/video, or could be 5 or 15 or 30 seconds.  Portable CD players
back in the 1990's would read some 30 seconds of data off the disc so
that skips could be recovered before the listener noticed.  But I
digress.

For a server that is storing an upload, writes to disk occur in
blocks, minimum 512 bytes (POSIX) and often 4k or larger.  Even if
storing in memory, memory is allocated from the OS to the process
in blocks of memory page sizes, often 4k or larger.

Sending a chunk of data, preferably block-sized or larger, is
recommended for efficiency.

Appending to the end of a file is a special-case of patching a file.
After all, O_APPEND is a POSIX flag to open(2).


=> Why must an "indeterminate length" be transmitted in a range request?
Why not send a sequence of append requests, in 1-sec-of-data increments?


Separately, another problem with "indeterminate length" is file range
locking, which might prevent scenarios of multiple uploads of different
ranges.  Another is resource allocation on the server, indeterminant
versus guaranteed allocation of specific size (think posix_fallocate()).

When designing a synchronization mechanism, why is HTTP the right layer
to implement this, versus using HTTP as an opaque transport layer,
similar to TCP or UDP?

Cheers, Glenn