Re: Digest: use in requests

Lucas Pardue <lucaspardue.24.7@gmail.com> Tue, 29 December 2020 12:21 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 679E73A1399 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 29 Dec 2020 04:21:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.747
X-Spam-Level:
X-Spam-Status: No, score=-2.747 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wPibNa3x_MjV for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 29 Dec 2020 04:21:04 -0800 (PST)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7848E3A1398 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 29 Dec 2020 04:21:04 -0800 (PST)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1kuDwu-0007VW-4d for ietf-http-wg-dist@listhub.w3.org; Tue, 29 Dec 2020 12:18:00 +0000
Resent-Date: Tue, 29 Dec 2020 12:18:00 +0000
Resent-Message-Id: <E1kuDwu-0007VW-4d@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <lucaspardue.24.7@gmail.com>) id 1kuDwt-0007Ug-02 for ietf-http-wg@listhub.w3.org; Tue, 29 Dec 2020 12:17:59 +0000
Received: from mail-ej1-x62f.google.com ([2a00:1450:4864:20::62f]) by mimas.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from <lucaspardue.24.7@gmail.com>) id 1kuDwq-0003Kh-Oy for ietf-http-wg@w3.org; Tue, 29 Dec 2020 12:17:58 +0000
Received: by mail-ej1-x62f.google.com with SMTP id b9so17960013ejy.0 for <ietf-http-wg@w3.org>; Tue, 29 Dec 2020 04:17:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6+HyXtpFC3WmB8E1tN88hr6VMQT9hqsyDKJxBMMw72M=; b=jQwIL+bxbF9pKVwwZlno8RdZ2/Arqjd1t1MdPsmbv7go2jwYsZy9tjH4P+UrPBINYd YxVnc9AF5l9ca0XJn4Uz7s5M2v6/5EV/aQ+eLRAngzLvu8m6VwEaHb/B5eV/2kLlhuFo OF4EzwK1HOf96OjDlUhABf8+QoYRuIJq3scJBtAbk9KQraLVAMSByJn6lanbWFD/43Ut s0Ed0GhOJrJnQ56bFku9Dtfy1sCFt1wWjWpYQ3YTaDEe/oHVaf0VvihKuGXzNessBL0p YooRlxSwXuPNrS1r7ZHT2KHmWszLuU+VuBx8OrpZPXbonSPjbcNmLgjpg5hzP3RTjj9C LGOw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6+HyXtpFC3WmB8E1tN88hr6VMQT9hqsyDKJxBMMw72M=; b=EgxWbiVejGKiniQJlkyK5AajP22RR/tqqqAVfOlRSVCGIvSw/mvR3caPPOvM9Y6mcq ZqWOwXOirTXRppXb1MCYq6UbnNb7wl4Kav7SHoFm5h+YDYTOLjyhSRQiCpOaI1SO4da0 S7YskTBiOliRPvkHaK8QbEIQXsH6Mvf9SjuKWV8Fm80Jf02ScRSi71i3ecCK4jKIKFt1 ycSUHkRDFdPZtxaLSBjCiODWXHGRx0knA22MbCF9O6cEfLNk0/6VQ/hd9T8U8anOmZJq Ba9mpMlp9Ws9ij1TIYcU6CiLXmrs+1Gah/G9c30yZdIgkbYjKfeVYIVPC2FHgf9Fgt9Z bbrA==
X-Gm-Message-State: AOAM533rkKuPca9UGSL14d2+WcHDq8UJo78DG7EIJ6z+/1OWs2dnr6eE kC/ldWI1G+Mi6ke3uCpWDBFY23ketAVZb5cbX0/eVvxuTEM=
X-Google-Smtp-Source: ABdhPJyWEYM1t+rFFMi8igk3/u83wA+MFZJQr2GabKGbDBJaFvlNbTlt3FjXHmqeq3D1mkFsWHocyzr8qltvqM9JeJY=
X-Received: by 2002:a17:906:58f:: with SMTP id 15mr33448186ejn.67.1609244265470; Tue, 29 Dec 2020 04:17:45 -0800 (PST)
MIME-Version: 1.0
References: <CAP9qbHVwt35L_h_F=8BsK3zSjPpSWmnhCVDGKhe4kp9Z3umkLg@mail.gmail.com> <0d0e7e90-2a4d-a4b0-3782-7ec3da1c892f@gmx.de> <CAP9qbHWMRsok2C=6JAEVUULTt2BXJ3kHGGDJ9TmNRrA_1J9mKg@mail.gmail.com> <edbc0f95-fc71-e09d-a35d-014356e3b51a@gmx.de>
In-Reply-To: <edbc0f95-fc71-e09d-a35d-014356e3b51a@gmx.de>
From: Lucas Pardue <lucaspardue.24.7@gmail.com>
Date: Tue, 29 Dec 2020 12:17:34 +0000
Message-ID: <CALGR9oadRYc-oQHuX13HPSCVmYcNu5z-7RL1JFKzHWkMreeL3w@mail.gmail.com>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="000000000000ccdda205b7996018"
Received-SPF: pass client-ip=2a00:1450:4864:20::62f; envelope-from=lucaspardue.24.7@gmail.com; helo=mail-ej1-x62f.google.com
X-W3C-Hub-Spam-Status: No, score=-4.8
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1kuDwq-0003Kh-Oy 2be306fce86a55aa72798f2e41f66e46
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Digest: use in requests
Archived-At: <https://www.w3.org/mid/CALGR9oadRYc-oQHuX13HPSCVmYcNu5z-7RL1JFKzHWkMreeL3w@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/38354
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hi Julian,

Just adding my 2c as responses in-line:

On Tue, Dec 29, 2020 at 10:28 AM Julian Reschke <julian.reschke@gmx.de>
wrote:

> Hm, that seems like an odd choice for a protocol spec. If the spec
> doesn't say what the Digest means for any request, it's not really
> defining a protocol.
>
> I would *hope* that we can define things so that Digests can
> automatically produced and checked by user agents (browsers) and servers
> (such as a servlet container).
>

FWIW, subresource integrity (SRI) is implemented in browsers. The specifics
are different, the hash applies to the identity encoding, so UAs need to
reverse any content encoding before validation. The fundamentals carry over
so it should be possible but I've not seen any signals that browsers are
interested in automatic Digest validation (yet?).


> > Reading
> https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#rfc.section.6.4.1.p.1
> > ```The purpose of a payload in a request is defined by the method
> semantics```
> > iiuc the receiver, aware of the request semantic, knows its purpose
> > and how to process it, including whether it conveys a partial
> > representation or not.
>
> But "partial repesentation" is a term defined by HTTP; there is (or
> should be) an algorithm that - when inspecting *any* HTTP message -
> tells you whether it's "partial" or not. In HTTP, this is defined by the
> appearance of "Content-Range" for some specific response status codes.
>

> <snip>
>
> It *really* would be good to discuss something *concrete* here.
>
> Let's consider an upload protocol that sends multiple chunks, and then
> lets the server combine these into the final resource.
>
> In that protocol, Digest on each chunk would be use to check the
> integrity of each chunk.
>
> For the final step of creating the final full resource, the client could
> send the expectec digest of the final resource in a *custom* field
> defined for the upload protocol (it would use the same algorithms etc,
> but use a different way to convey it to the server).
>
> With that, generic libraries could at least verify Digests on each of
> the chunks.
>

This is indeed the most likely use case. A very quick survey indicates that
there seem to be some examples of PUT requests with Content-Range in the
wild. I have no experience with these, nor knowledge of how popular they
actually are.

* Amazon S3 Glacier [1]

"This multipart upload operation uploads a part of an archive. You can
upload archive parts in any order because in your Upload Part request you
specify the range of bytes in the assembled archive that will be uploaded
in this part."

Example:
PUT /AccountId/vaults/VaultName/multipart-uploads/uploadID HTTP/1.1
Host: glacier.Region.amazonaws.com
Date: Date
Authorization: SignatureValue
Content-Range: ContentRange
Content-Length: PayloadSize
Content-Type: application/octet-stream
x-amz-sha256-tree-hash: Checksum of the part
x-amz-content-sha256: Checksum of the entire payload
x-amz-glacier-version: 2012-06-01

* Google Drive [2]

"Upload the content in multiple chunks. Use this approach if you need to
reduce the amount of data transferred in any single request. You might need
to reduce data transferred when there is a fixed time limit for individual
requests, as can be the case for certain classes of Google App Engine
requests."

"Add these HTTP headers:
    Content-Length. Set to the number of bytes in the current chunk.
    Content-Range. Set to show which bytes in the file you upload. For
example, Content-Range: bytes 0-524287/2000000 shows that you upload the
first 524,288 bytes (256 x 1024 x 2) in a 2,000,000 byte file."

* Google Cloud Storage [3]

"This page describes how to make a resumable upload request in the Cloud
Storage JSON and XML APIs. This protocol allows you to resume an upload
operation after a communication failure interrupts the flow of data."

Example:
curl -i -X PUT --data-binary @CHUNK_LOCATION \
    -H "Content-Length: CHUNK_SIZE" \
    -H "Content-Range: bytes
CHUNK_FIRST_BYTE-CHUNK_LAST_BYTE/TOTAL_OBJECT_SIZE" \
    "SESSION_URI"

* draft-wright-http-partial-upload-01 (expired) [4]

"This document specifies a new media type intended for use in PATCH
   payloads that allows a resource to be uploaded in several segments,
   instead of a single large request."

Example:
PATCH /uploads/foo HTTP/1.1
   Content-Type: message/byterange
   Content-Length: 283
   If-Match: "xyzzy"
   If-Unmodified-Since: Sat, 29 Oct 1994 19:43:31 GMT

   Content-Range: bytes 100-299/600
   Content-Type: text/plain
   Content-Length: 200

Finally, Dropbox [5] does things a little differently and uses the
Dropbox-API-Arg JSON header field to communicate a cursor containing an
offset of the bytes uploaded so far (which I guess means that parallel
transfers aren't supported).

Example:
curl -X POST https://content.dropboxapi.com/2/files/upload_session/append_v2
\
    --header "Authorization: Bearer"
    --header "Dropbox-API-Arg: {\"cursor\": {\"session_id\":
\"1234faaf0678bcde\",\"offset\": 0},\"close\": false}"
    --header "Content-Type: application/octet-stream"
    --data-binary @local_file.txt

To conclude, I'm not exactly sure how these examples influence the
discussion. It seems that there are actually concrete cases of "partial
requests" but it's unclear to me if these break HTTP semantic rules and/or
if it should be documented for formally. The examples I've seen are for
APIs that also have their own custom means for integrity checks, or still
use Content-MD5. It would be nice if something like Digest covered all
avenues and we could get folks to switch to it, but I've not seen any
signals that such APIs are interested in Digest. Therefore, I'm wary of
Digest taking on too much work to describe something without any
implementer interest. In the interest of progress, if partial requests for
uploads is something people think needs standardising, I think that could
be done as an independent follow-on work item e.g. a document that updates
Digest.

Cheers
Lucas

[1] -
https://docs.aws.amazon.com/amazonglacier/latest/dev/api-upload-part.html
[2] -
https://developers.google.com/drive/api/v3/manage-uploads#http---multiple-requests
[3] - https://cloud.google.com/storage/docs/performing-resumable-uploads
[4] - https://tools.ietf.org/html/draft-wright-http-partial-upload-01
[5] -
https://www.dropbox.com/developers/documentation/http/documentation#files-upload_session-append:2