Re: New Version Notification for draft-toomim-httpbis-versions-00.txt

Michael Toomim <toomim@gmail.com> Mon, 22 July 2024 22:31 UTC

Received: by ietfa.amsl.com (Postfix) id 06024C1D8774; Mon, 22 Jul 2024 15:31:10 -0700 (PDT)
Delivered-To: ietfarch-httpbisa-archive-bis2juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0529EC1D6FDA for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 22 Jul 2024 15:31:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.856
X-Spam-Level:
X-Spam-Status: No, score=-2.856 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=w3.org header.b="KHbGAV0S"; dkim=pass (2048-bit key) header.d=w3.org header.b="iVff9SOm"; dkim=pass (2048-bit key) header.d=gmail.com header.b="e5NgL5KN"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mojaQEmNrqCN for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 22 Jul 2024 15:31:08 -0700 (PDT)
Received: from mab.w3.org (mab.w3.org [IPv6:2600:1f18:7d7a:2700:d091:4b25:8566:8113]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 996F2C1D8779 for <httpbisa-archive-bis2Juki@ietf.org>; Mon, 22 Jul 2024 15:31:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=w3.org; s=s1; h=Subject:In-Reply-To:From:References:Cc:To:MIME-Version:Date: Message-ID:Content-Type:Reply-To; bh=qzCRPUTI36yW/9Yc/hOp6NVFst/OEMrqZrzYfNDT4JQ=; b=KHbGAV0SrP+qlX1roVy1Ian6t5 DVysGZsTqhSEfs89tmap6vyUsP7JyLmBm+mWbxW64SF4E/c9jqLqVwvszDMSlHUlpsI/1G5+fjjfG Vqrm3t0HFyglPU+FZSEJBBD46pq+lt7DdEDjexU5hEergEiqD83lLJQIbVP/UpMKOuFdFeh7fFfmK 65l9LyIgOw3y5bUdha6y0ZZA0OaH+1HY7nZBGpzlVt0BYsmlm/M/juaT5iaR2hIT+XcQP35/vWpuR NxwxZNg4AZX2Wbj3kpwfe/hIpxr9EUqqIR/LBR4T72G0roXjUK3YZ4hlESVBufVfkZ93HOqjF2rhX 8DfM0xTw==;
Received: from lists by mab.w3.org with local (Exim 4.96) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1sW1Y9-00Hajb-39 for ietf-http-wg-dist@listhub.w3.org; Mon, 22 Jul 2024 22:30:33 +0000
Resent-Date: Mon, 22 Jul 2024 22:30:33 +0000
Resent-Message-Id: <E1sW1Y9-00Hajb-39@mab.w3.org>
Received: from ip-10-0-0-144.ec2.internal ([10.0.0.144] helo=pan.w3.org) by mab.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from <toomim@gmail.com>) id 1sW1Y7-00Hai2-2m for ietf-http-wg@listhub.w3.internal; Mon, 22 Jul 2024 22:30:31 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=w3.org; s=s1; h=In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Content-Type:Reply-To; bh=qzCRPUTI36yW/9Yc/hOp6NVFst/OEMrqZrzYfNDT4JQ=; t=1721687431; x=1722551431; b=iVff9SOmFjgGxphc40oKlvg9Pla+CJaO3Pb54mvlVZAYJqpjFc0ZGZP19w2TBtlua93QeH7JKmP z+p8k+J9Q+8oVqVgbAw6kBt3bLGayfVkuxyDRdFaWcJjwPV8UxyLliRwlpckaQF1rl/BuM2fpzB0v V2JlIeD/QpZANIZ8sxAEdCMMY5vQ3qP/x47zLU1SJYQUVgWRZyWOXty33xjpF9OrFYKMCtoe+eE3P w5hO9dcEFP79TlAbLIo+H3Skq/ZvhJknzdq/P02pCf+Bgo/s6OwhR1vDDLz1pzuMxplfA+qDiHAE6 lgDqzWkP7++KnKjm8hzMD5zzJYnHrGGKnRHA==;
Received-SPF: pass (pan.w3.org: domain of gmail.com designates 2607:f8b0:4864:20::c36 as permitted sender) client-ip=2607:f8b0:4864:20::c36; envelope-from=toomim@gmail.com; helo=mail-oo1-xc36.google.com;
Received: from mail-oo1-xc36.google.com ([2607:f8b0:4864:20::c36]) by pan.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from <toomim@gmail.com>) id 1sW1Y7-00Bhw3-06 for ietf-http-wg@w3.org; Mon, 22 Jul 2024 22:30:31 +0000
Received: by mail-oo1-xc36.google.com with SMTP id 006d021491bc7-5d31ef2c73cso2585562eaf.3 for <ietf-http-wg@w3.org>; Mon, 22 Jul 2024 15:30:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721687427; x=1722292227; darn=w3.org; h=in-reply-to:from:content-language:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=qzCRPUTI36yW/9Yc/hOp6NVFst/OEMrqZrzYfNDT4JQ=; b=e5NgL5KNN3AEvHCBSeDT8IWucb3UGLeHJcy5XBAXOgrj45uCSuPrXbWeZwdKLeyRZA JMwrcPxXNijlpJpHAsv3TdLPcoWsvMbSHoWtYQQyj0O3bmfYHkuWwC1lmK9j1lze/kCw aBk4nPLCNy4nO79fHFQiGqR8X763E8OpeJ2FRDWYoK4QLSl02qgsJN7h4DFGjV4ZXVVn csBn8U/ckxu5OAUZDjhBMl8kUOHdnYCyw7epRZjn08Ng74p2JM2eOskzr95NftXOJSjs xvFBjOOkrSya3cco02jXjVMGWOjvxQVvOBN/kDLxRWnbWOz60244aWugeiL8hE+RugRV A1tA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721687427; x=1722292227; h=in-reply-to:from:content-language:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=qzCRPUTI36yW/9Yc/hOp6NVFst/OEMrqZrzYfNDT4JQ=; b=IRvZ2pIYgj53mO8HjditlY70R5tddUa5bIIrqnmDZu5hJBInOWSzh5MARHnFEBsRp1 jE7zASNiMI3iQL4DhlD9ifHqGYav11witLu8dB1gZA0sIEA/Vk0hZ0CBFjSV+qvMmr8z aHSH5Fl1K+71yYkyz1Yq0sA59Yupbc7lSfOTC++juAyBR/7uCqgRRKkYGHTJ8abWQPQh wqzr8giGQvsr1+gj1dVKsb9NLk0r9KB40XfkPSxAE1GZu7Lf53z4lVK/Mo+uFIQ40+9v zYdoG7oAnvN1z62sra9eiN7Xgw+dSmANZ/X8A1qVZyE8Y6FdjJmai7XlOJr4t07tTho+ 9bBQ==
X-Gm-Message-State: AOJu0YwFP0YoqqaWF9HN78nJ8tOdOo3CtvzAHGq+zrZ0uqHxSeeYAl7h gaNSrJsxUjF0Tju8mh2smn9Dy07xDdwhFiH7eL9Mj5Hx9wdVGs+X
X-Google-Smtp-Source: AGHT+IGAdFSDyIo2q0llJ9qwICXhG6Ev6IG4eMXVJoRI03zKimPn8Zr4GJ9PznY0BhyAqhmGEYlx0Q==
X-Received: by 2002:a05:6358:3125:b0:1a6:7e01:e4ea with SMTP id e5c5f4694b2df-1acc5c60a91mr797235955d.30.1721687426506; Mon, 22 Jul 2024 15:30:26 -0700 (PDT)
Received: from ?IPV6:2001:67c:370:128:4cd5:4910:6f32:d434? ([2001:67c:370:128:4cd5:4910:6f32:d434]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7a1a1b9c8f9sm2482626a12.26.2024.07.22.15.30.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 22 Jul 2024 15:30:26 -0700 (PDT)
Content-Type: multipart/alternative; boundary="------------aDbsDCvmoJwdM9XuKzx9Z2Xi"
Message-ID: <d713500c-c4db-4bf8-8096-edb0b5ff1751@gmail.com>
Date: Mon, 22 Jul 2024 15:30:24 -0700
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: Rory Hewitt <rory.hewitt@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>, Braid <braid-http@googlegroups.com>
References: <172046173132.445281.15041630415895010148@dt-datatracker-5f88556585-j5r2h> <ff54cd4f-c30e-4447-8744-3297e53b74be@gmail.com> <CAEmMwDxBnLtjRCasVz8ogz1c_Q=9XjYtpNu+sJ6UO==xO4QzJw@mail.gmail.com>
Content-Language: en-US
From: Michael Toomim <toomim@gmail.com>
In-Reply-To: <CAEmMwDxBnLtjRCasVz8ogz1c_Q=9XjYtpNu+sJ6UO==xO4QzJw@mail.gmail.com>
X-W3C-Hub-DKIM-Status: validation passed: (address=toomim@gmail.com domain=gmail.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-5.1
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, DMARC_PASS=-0.001, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, W3C_AA=-1, W3C_DB=-1, W3C_WL=-1
X-W3C-Scan-Sig: pan.w3.org 1sW1Y7-00Bhw3-06 3b1c0e4661f53ab23488f2fcc01b567e
X-Original-To: ietf-http-wg@w3.org
Subject: Re: New Version Notification for draft-toomim-httpbis-versions-00.txt
Archived-At: <https://www.w3.org/mid/d713500c-c4db-4bf8-8096-edb0b5ff1751@gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/52096
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/email/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Rory, thanks for these excellent thoughts! It's exciting to see other 
people digging into the versioning problem with us. :)

Responses:

*== Versioning with ETag ==*

You make a good point that ETag headers, like the proposed Version 
header, are opaque strings that can be formatted to express additional 
information if we want to. This is true for both ETag and Version:

    ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
    Version: "Sat, 6 Jul 2024 07:28:00 GMT"

    ETag: "v1.0.2"
    Version: "v1.0.2"

We propose articulating the structure of these version ids using a 
Version-Type header. You could, for instance, use "Version-Type: date" 
for the first example, and "Version-Type: semver" for the second.

The main problem with ETag, though, is that it marks *unique content* 
rather than *unique time*. If you mutate the state of the resource from 
"foo" to "bar" and then back to "foo", you'll revert to the same ETag, 
even though this is at a different point in time. This breaks 
collaborative editing algorithms.

Finally, I'll note that your claim that ETags don't have to be sensitive 
to content-encoding is only true for *weak* ETags. Strong ETags must 
change whenever the byte sequence of the response body changes. This 
means they should be sensitive to content-encoding. RFC9110 is also 
explicit that they depend on content-type:

     > A strong validator might change for reasons other than a change
    to the representation data, such as when a semantically significant
    part of the representation metadata is changed (e.g., Content-Type)
    https://datatracker.ietf.org/doc/html/rfc9110#section-8.8.1

Consider the case where a user edits a markdown resource:

    PUT /foo
    Content-Type: text/markdown
    Version: "mike-99"

    # This is a markdown file

    Hello world!

And the server then shares this as HTML:

    GET /foo
    Accept: application/html


    HTTP/1.1 200 OK
    Content-Type: application/html
    Version: "mike-99"

    <html>
       <body>
         <h1>This is a markdown file</h1>
         <p>Hello world!</p>
       </body>
    </html>

Using the Version header, we're able to express that these are two 
representations of the resource at the same point in time. You can't do 
this with a strong ETag.

*== Version and Parents headers ==*

I think there's been a miscommunication here. The reason there are 
multiple version IDs in the Parents header is for edits that happen *in 
parallel*, not for edits that happen in sequence. This is to represent a 
version DAG:

                   a  <-- oldest version
                  / \
                 b   c
                  \ /
                   d  <-- current version

In this example, the current version "d" would have:

    Parents: "b", "c"

This is not allowed:

    Parents: "d", "b"

Because of this language in the spec:

    For any two version IDs A and B that are specified in a Version or
    Parents header, A cannot be a descendent of B or vice versa. The
    ordering of version IDs within the header carries no meaning.

Good question!

*== Client-generated Version IDs on PUT ==*

Yes, there would be a problem if two clients generate the same version 
IDs for two different PUTs. Then the versions would not be unique!

However, requiring the server to generate versions is only one possible 
solution— and is a solution that requires a server. We also want to 
support distributed p2p systems, which don't have servers.

In these systems, it's quite common for clients to generate version IDs. 
There are two common ways to solve this problem:

 1. Use a large random hash space so that collisions are extremely
    unlikely. This works well enough for git, for instance.
 2. Each client gets a unique ID, possibly by coordinating with a
    server, and then versions are constructed by concatenating
    "<client-id>:<counter>" for each client.

Does this all make sense?

Again, good questions, and I am glad to see this interest in the topic! 
I think we can do a lot with it!

Michael

On 7/17/24 2:56 PM, Rory Hewitt wrote:
> Hey Michael,
>
> A few thoughts...
>
> First, I agree that the concept of versioning hasn't been thought 
> about enough, and this is definitely a 'good idea (TM)'.
>
> However, I have a few concerns:
>
> *1.1.2 Versioning with ETag*
>
> Because ETags are, by definition, unformatted, while it's true to say 
> that you often can't rely on them to establish a version, that's 
> entirely dependent on the format chosen by the user. An ETag *could* 
> validly be specified as a date:
>
>   ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
>
> or as a version number:
>
>   ETag: "v1.0.2"
>
> or as a random string:
>
>   ETag: "Michael is cool"
>
> IOW, it's totally possible for a site that cares about versioning to 
> use a format that specifies a version number. I recognize this isn't 
> *necessarily* the case, but it helps to be clear here. It should be 
> noted that many web servers that include the creation of ETags 
> natively (e.g. Apache) include an effective version as part of the ETag.
>
> Likewise ETags don't *have* to be sensitive to encoding - there's 
> nothing to stop a server from sending the exact same ETag for two 
> differently-encoded copies of the same underlying resource. It's just 
> that they typically do.
>
> None of this is to say that ETags are better or worse than you 
> describe - just to say that they *can* be better than they are.
>
> *2.3 Version and Parents headers*
>
> You state that the Parents header can include multiple parents 
> (parents, grandparents, great-grandparents?) and provide an example:
>
>     Parents: "ajtva12kid", "cmdpvkpll2"
>
> and then say "Any version can be recreated by first merging its 
> parents, and then applying the its update onto that merger." (Nit: 
> additional "the" in this sentence). However, you also say that the 
> order of the values in a Parents header makes no difference.
>
> Maybe I'm missing something, but in this scenario, how could that 
> work? Using your example above, here are two possible scenarios:
>
> * Version "ajtva12kid" is earlier. Version "cmdpvkpll2" is later and 
> contains an additional section of HTML
> * Version "ajtva12kid" is earlier and contains a section of HTML which 
> is removed in the later "cmdpvkpll2" version
>
> If you merge the two parent versions, then does the outcome (onto 
> which you will apply the update) include that section of HTML?
>
> I guess it just makes sense to me to have the order in the Parents 
> have some meaning - whether oldest first or last. Or you could specify 
> that both Version and Parent values must be integers.
>
> 2.4.3 PUT a new version
>
> This seems like it could lead to either race conditions or some other 
> issue with duplicate Version values. Surely it's better to have the 
> client submit a new version of a resource (passing the Parents header 
> but *not* passing the Version header) and have the server, which is 
> presumably the prime source of versioning truth, calculate a version 
> (perhaps after retrieving other PUT requests from other clients) and 
> return that value in the Version response header?
>
> I see you discuss this later with the Current-Version header, so 
> perhaps you covered this and my old eyes missed it.
>
> Rory
>
>
> On Mon, Jul 15, 2024 at 6:31 PM Michael Toomim <toomim@gmail.com> wrote:
>
>     Hi everyone in HTTP!
>
>     Last fall we solicited feedback on the Braid State Synchronization
>     proposal [draft
>     <https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-braid-http-04>,
>     slides
>     <https://datatracker.ietf.org/meeting/118/materials/slides-118-httpbis-braid-http-add-synchronization-to-http-00>],
>     which I'd summarize as:
>
>         "We're enthusiastic about the general work, but the proposal
>         is too high-level. Break the spec up into multiple independent
>         specs, and work bottom-up. Focus on concrete 'bits-on-the-wire'."
>
>     So I'm breaking the spec up, and have drafted up the first chunk
>     for you. I would very much like your review on:
>
>         *Versioning of HTTP Resources*
>         draft-toomim-httpbis-versions
>         https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00
>
>     Versioning is necessary for state synchronization—and occurs in a
>     range of HTTP systems:
>
>       * Caching
>       * Archiving
>       * Version Control
>       * Collaborative Editing
>
>     Today, HTTP has resource versions in the Last-Modified and ETag
>     headers, and sometimes embeds versions in URLs, like with WebDAV.
>     Each of these options serves some needs, but also has specific
>     limitations. An improved general approach is proposed, which
>     provides new features, that could enable cool new applications,
>     such as incrementally-updated RSS feeds, and could simplify
>     existing specifications, such as resumeable uploads, and history
>     compression in OT/CRDT algorithms.
>
>     I would love to know if people find this work interesting. I think
>     we could improve performance, interoperability, and be one step
>     closer to having Google Docs power within HTTP URLs.
>
>     Michael
>