Re: draft-ietf-httpbis-resumable-upload-01

Lucas Pardue <lucaspardue.24.7@gmail.com> Wed, 09 August 2023 21:01 UTC

Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <lucaspardue.24.7@gmail.com>) id 1qTqER-00FAww-Ar for ietf-http-wg@listhub.w3.org; Wed, 09 Aug 2023 21:01:25 +0000
Received: from mail-oa1-x2a.google.com ([2001:4860:4864:20::2a]) by mimas.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <lucaspardue.24.7@gmail.com>) id 1qTqJ0-00H5Jg-V1 for ietf-http-wg@w3.org; Wed, 09 Aug 2023 21:01:24 +0000
Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-1be267b1c88so893411fac.1 for <ietf-http-wg@w3.org>; Wed, 09 Aug 2023 14:01:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691614879; x=1692219679; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=kjnfWqBDaWFDGyEN0nOQf5i2cDJxLSD4bjOThFU8yEU=; b=puJYuJS/Nmdt1+IBEEkWYVAYwkd7QjhkEsDZvxkXV67psC+py7QC6+/6lThD7PWSAF 6HDmFzwIR5q+i9dLN+Vr9aglhIknpjy+ejq7l8CLJ0D2HqWgdLTRwJ+CL3/eawxadX7W 9rfwmbjfZ7ezLCm2VacJ+ZcCuPUWpV29mSR1twhlEt1QTwdjAWhSRNQS/bOjnFEL16p7 QcbNL6dszFrxyUHbE4SlwJRhNIxv4HUy5KGlOaOmbvAvRlHgGGkGmQg5ZqifzSqoXZOM GAs0TMi10SUqL7Zd6aTG7tzaSUQ0vbQI7JG6veXDZYi12U6BvA/G/FAqrb7N9DsK/mvL Qrgw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691614879; x=1692219679; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kjnfWqBDaWFDGyEN0nOQf5i2cDJxLSD4bjOThFU8yEU=; b=egUE1SEvB28D01ehqoJT1cibrYkEGiHIesvNxDHIpZ7nnPkOoWJ7nSY8sT2KlFvmLt yu2Ua7SBAbsN+WvQjlBdKDhP1w3CU3v7aY2rI/jA4NN+lpBQ3Dt/tEOgmWGHIqtF8T/B QFtYNKI0D6BsNW7l/KmLOwqEH3vIPWPNT1qbZVEKNlzsyajPeO50q17e2J8/sSBj1CRe hIu3bzAu4mhF+YiR4PTWH/0jHsmqT3eXu2NulwhldKAZ9N7DHQG6+kJ1spF60Hwg0AAg ZIg7GoMXIemPLslAJ3XkmnilmMtPLfKZQ7fYbgb2R7+o7NmD0x5zjCsrPNdu4N9kb7QR xODQ==
X-Gm-Message-State: AOJu0Yzdn22c1MSMd0VeHyYkdPLLjjXm08f/mcRDnMM6T8SKYs8tT4oW 0tSiVyaItnnoSV549lU3TVGgotbbPULVaM6BP2c=
X-Google-Smtp-Source: AGHT+IE2MNZEGLY8feh/2AoaF/JdQvSBkRf8W02wEO3cOtEtg2CH0elnHlT4ch2wUMoK2+IO3V4jvTyBig4zXSoKCkY=
X-Received: by 2002:a05:6870:3118:b0:1b0:24d0:5554 with SMTP id v24-20020a056870311800b001b024d05554mr34909oaa.11.1691614878905; Wed, 09 Aug 2023 14:01:18 -0700 (PDT)
MIME-Version: 1.0
References: <CAChr6Sys-N_mL5-_GvPHXOm3nnTZbefwq2aHx9ftRrHrwkVtuQ@mail.gmail.com> <671AEB39-DD9A-4993-8D60-40A769A76A60@apple.com> <CALGR9oZQPpL0ijz2+nLDSwKCxnNxQKjb_C+OAWPJCD=cwFMyCg@mail.gmail.com> <CAChr6Szy8SySSWOB+X7ZT+3uP+O5HJ3f3FwADn_-LwMvNJY6_A@mail.gmail.com> <CALGR9oZhr8RuuqU+3a4Ca4+e_xU+T98SZoDAeaGazQg5cXX3iQ@mail.gmail.com> <CANY19Nsoyy9NB9VcNWGx0=R3bFzL1t+SBT1H2ZtCA7-f28djsQ@mail.gmail.com> <CALGR9oYFC8m4f2DxodXoCw0=RD-MoCeB3i-p0KYEWXhooz-kRA@mail.gmail.com> <CANY19NsRNDgoCXTtpCmWUKV+K3qnLcF4wzgGgecUbNC2aSMT6A@mail.gmail.com>
In-Reply-To: <CANY19NsRNDgoCXTtpCmWUKV+K3qnLcF4wzgGgecUbNC2aSMT6A@mail.gmail.com>
From: Lucas Pardue <lucaspardue.24.7@gmail.com>
Date: Wed, 09 Aug 2023 22:01:05 +0100
Message-ID: <CALGR9obpnPHjp+mb7S=mc64Dak5EhXO4ukQutQWx09-rX+0UOQ@mail.gmail.com>
To: Marius Kleidl <marius@transloadit.com>
Cc: Rob Sayre <sayrer@gmail.com>, Guoye Zhang <guoye_zhang@apple.com>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="000000000000f437d9060283c799"
Received-SPF: pass client-ip=2001:4860:4864:20::2a; envelope-from=lucaspardue.24.7@gmail.com; helo=mail-oa1-x2a.google.com
X-W3C-Hub-DKIM-Status: validation passed: (address=lucaspardue.24.7@gmail.com domain=gmail.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-4.8
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1qTqJ0-00H5Jg-V1 f951694b9d97ee4b179f8c3d3b0e7b33
X-Original-To: ietf-http-wg@w3.org
Subject: Re: draft-ietf-httpbis-resumable-upload-01
Archived-At: <https://www.w3.org/mid/CALGR9obpnPHjp+mb7S=mc64Dak5EhXO4ukQutQWx09-rX+0UOQ@mail.gmail.com>

Responses in-line

On Wed, 9 Aug 2023, 20:02 Marius Kleidl, <marius@transloadit.com> wrote:

> Hi Lucas,
>
> thanks for the feedback! I agree that is probably best if the uploads
> draft briefly mentions the interaction with digest to ensure that
> implementations are compatible.
>

I think that's the crux of the matter, the interop compatibility depends on
what requirements are put on the applications. Just saying endpoints can
use digest fields isn't enough for broad support and interop. Mandating
digests is probably too much for several use cases.



> > Note however that the representation depends on factors like
> content-encoding. Which makes me consider that a resumable upload is also
> tied to representation. I.e. it isn't possible to start an upload using
> gzip, have it fail part way through, and then resume an upload with brotli.
> If that's true, we probably want some text in the document to make it super
> clear.
>
> That's a good point. I agree that the encoding should not change
> throughout the upload.
>

Not just encoding but anything that might lead to an attempt to resume with
a different representation :-)

>
> > It would be equally fine for a server to just send the digest without
> observing a Want- header. But maybe you're thinking about trying to avoid
> unecessary server overhead?
>
> True, the server might include unsolicited digest. Avoiding additional
> overhead is a good use for Want-, but maybe I was also just looking for a
> nice use for this header :)
>
> > I agree it seems wrong that a server would reject a partial upload if
> the content-digest didn't match. So perhaps the caveat would be the
> content-digest is only validated when the full content is received.
>
> That's a though spot. I think that dropping the Content-Digest headers is
> probably equally wrong, because then integrity issues might slight in while
> the client is relying on the digest headers to verify the integrity. Let's
> see what other think about this.
>

I think this depends on the threat model applied to expected deployments.
For example, if a upload append was performed for a portion of the whole
resource and the request included a content-digest. What should happen if
the server validates the digest and finds an error? Should the client retry
for some reason? One might expect the server to send back an error response
status, or perhaps not update the upload offset.

Also loosely related, the old mice encoding proposal [1] provided a way for
progressive integrity checking. The design choice there specifically
wouldn't work for a truncated upload (because it's verified from back to
front) but an alternative design could work for disrupted uploads.

Cheers
Lucas

[1] -
https://datatracker.ietf.org/doc/html/draft-thomson-http-mice-03

>
> Best regards
> Marius
>
> On Wed, Aug 9, 2023 at 7:39 PM Lucas Pardue <lucaspardue.24.7@gmail.com>
> wrote:
>
>> Hi Marius,
>>
>> Responding in-line
>>
>>
>>
>> On Wed, Aug 9, 2023 at 6:16 PM Marius Kleidl <marius@transloadit.com>
>> wrote:
>>
>>> Hi Lucas,
>>>
>>> I had a brief read through the digest draft today while thinking how
>>> this could be applied to resumable uploads. The new integrity fields with
>>> its separation between content and representation should fit perfectly to
>>> the approach of resumable uploads. I thought I would name a few concrete
>>> examples to see if my understanding of the digest draft is correct:
>>>
>>> 1) Assume that the client knows the digest of the entire file that it
>>> wants to upload. It can then set Repr-Digest in the Upload Creation
>>> Procedure and upload the data. If the transfer is interrupted it is just
>>> resumed as normal with HEAD and PATCH. Once the upload is complete, the
>>> server can verify the integrity of the uploaded data by comparing it to the
>>> Repr-Digest from the Upload Creation Procedure. If they don't match, the
>>> server can reject the upload and return an error to the client or similar.
>>>
>>
>> Yeah I think that mostly holds true. Note however that the representation
>> depends on factors like content-encoding. Which makes me consider that a
>> resumable upload is also tied to representation. I.e. it isn't possible to
>> start an upload using gzip, have it fail part way through, and then resume
>> an upload with brotli. If that's true, we probably want some text in the
>> document to make it super clear.
>>
>>>
>>> 2) If the client does not know the digest of the entire upload at the
>>> beginning (because it is streamed from another resource), it can compute
>>> the digest while the upload is running and then include Repr-Digest as a
>>> trailer on the final Upload Creation Procedure or Upload Appending
>>> Procedure. The server can then verify as in 1).
>>>
>>
>> Yep
>>
>>
>>>
>>> 3) If the client does not compute the digest, but wants to query the
>>> server's digest, it can include the Want-Repr-Digest in the Upload Creation
>>> Procedure. On the final response, the server should then include the
>>> Repr-Digest whose value corresponds to the entire upload that it has
>>> received. The client then may verify the digest or provide it to the
>>> client's users.
>>>
>>
>> The Want- headers are a bit nebulous in reality. This might work or it
>> might not. It would be equally fine for a server to just send the digest
>> without observing a Want- header. But maybe you're thinking about trying to
>> avoid unecessary server overhead?
>>
>>
>>> 4) If the clients sets Content-Digest on an Upload Creation Procedure or
>>> Upload Appending Procedure, it only applies to the specific request body.
>>> So if the request transmission gets interrupted, the server is not able to
>>> verify the integrity and must reject the entire request without appending
>>> its content to the upload. This makes the request effectively
>>> transactional, where either the entire body or nothing is appended to the
>>> upload. This is a bit contrary to the intention of uploads that can be
>>> resumed from the point where they failed, but if you split an upload into
>>> multiple requests, this might still be interesting for some applications.
>>>
>>
>> How endpoints deal with digest validation failures is not defined by the
>> digest spec itself, meaning there a many possible options. I agree it seems
>> wrong that a server would reject a partial upload if the content-digest
>> didn't match. So perhaps the caveat would be the content-digest is only
>> validated when the full content is received.
>>
>>
>>> Do these examples align with the intentions behind the digest draft or
>>> did I get something wrong? All in all, this appears like a great fit!
>>>
>>
>> They seem to align to me, modulo some details or options. Initially I was
>> hesitant to add to too much detail or proscribed behaviour but, given your
>> list of use cases, perhaps we should consider what should be in scope for
>> resumable clients or servers to do and state it clearly what the interop
>> expectations would be. Thoughts from the WG?
>>
>> Cheers
>> Lucas
>>
>>
>>
>>>