Re: [Wpack] About content-based origins

Martin Thomson <mt@lowentropy.net> Tue, 31 March 2020 04:41 UTC

Return-Path: <mt@lowentropy.net>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4F76E3A1AB1 for <wpack@ietfa.amsl.com>; Mon, 30 Mar 2020 21:41:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.2
X-Spam-Level:
X-Spam-Status: No, score=-0.2 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b=XdnfijNs; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=x/m9wCtU
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZHtr2o4MKM6u for <wpack@ietfa.amsl.com>; Mon, 30 Mar 2020 21:41:24 -0700 (PDT)
Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F1D6D3A1AAF for <wpack@ietf.org>; Mon, 30 Mar 2020 21:41:23 -0700 (PDT)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.west.internal (Postfix) with ESMTP id C0D5D730; Tue, 31 Mar 2020 00:41:21 -0400 (EDT)
Received: from imap2 ([10.202.2.52]) by compute2.internal (MEProxy); Tue, 31 Mar 2020 00:41:21 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=mime-version:message-id:in-reply-to:references:date:from:to :cc:subject:content-type; s=fm1; bh=iLsAy//FTHKlR76MBiXOqKhk87zv eV3uA7HGvCzTOZQ=; b=XdnfijNsy6VNRJ8OOedvGjJ0WDCIMFPeLkO/RX+fJUdK 4fQrk3ohdYgJoX8XUjfQrAd/NqhOr5Db41NnBZq7Ha7RK9lrC3nBWW8BDDkz/E5W 8V24PcYqrtbK6M5FzzQcGbxV2+Xbp6QCrY6/S8GNx1d1g77AQfxulMYwoLSWyzhB J3YYTbi3IlZeANzpIIdpJs4s3mhBPea+3vpQWBdBWtu+iy9tv1m/BVf1OL7f64NQ vGFNugdUAwQkz8Y4bxl4zYsEQV6TcjJ7kqAWXPnqF60D/Hdf+4AfbLU6GkCk4yDx ptP9JvxS6KQdPe8reJqqBaEGmpDgYNqtAVl4zSWQ7Q==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=iLsAy/ /FTHKlR76MBiXOqKhk87zveV3uA7HGvCzTOZQ=; b=x/m9wCtUErMmhLSyLvvG5j eV3d1N/O/W3Afgj21zDlCWQCC+hfSwjJweTFyZBqjJMZIQCQzsIiOIENcyQHF17l BLqHR5a2nd0AVO+nzp8Xu/5MaYBgxpNZvnDLmAhyQf2i9J0iu0IzYySjD/S665tc H5ZXrct5kTtsonr0TWravTvQZdsFKusO46vIqCXDh6V5A5zSgPg+AQDw8lRDdlGF XtkeaFBIumlIuk0OdwSsoXzh01G9TQFf04+td+kibBG4C5cXiOk0f/c/Xi2BFo/2 kyEmEOr2LKdfu5Iom664FJYQEhs7UGMzb8fkc8FspXQaUC+5BprgasttqDy2W26A ==
X-ME-Sender: <xms:8cmCXos3BazfoehJf68I7q8onFqkykhNkTkDRNwOuhLNQmQ9kQKznw>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedugedrudeiiedgkeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucgoufhushhpvggtthffohhmrghinhculdegledmne cujfgurhepofgfggfkjghffffhvffutgesthdtredtreertdenucfhrhhomhepfdforghr thhinhcuvfhhohhmshhonhdfuceomhhtsehlohifvghnthhrohhphidrnhgvtheqnecuff homhgrihhnpegvgigrmhhplhgvrdgtohhmpdhgihhthhhusgdrihhopdhgihhthhhusgdr tghomhdprghmphdruggvvhdpfigvsgdruggvvhdpshgvohgshihthhgvshgvrgdrtghomh dpihgvthhfrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghi lhhfrhhomhepmhhtsehlohifvghnthhrohhphidrnhgvth
X-ME-Proxy: <xmx:8cmCXhzCw_MzdGnm_DbRzXvOwr9WPONd9LSMsrQ7GCvEsLQU_rKqgQ> <xmx:8cmCXocTaqcSI4ZiM92jLUc3WW0n9xk_y_KLUWbB4zc0xmf-sOLL5g> <xmx:8cmCXvuQiFB31Lw-pgkAhsBDvXlkKX4rUGsOhw8cAnXw3RQlE--SQw> <xmx:8cmCXsXUTZRVWtMdYNwRsCmJY3dTy5YCZrutUzTI16JKZzHTghClyw>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 1CD1AE00D3; Tue, 31 Mar 2020 00:41:21 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.1.7-1021-g152deaf-fmstable-20200319v1
Mime-Version: 1.0
Message-Id: <defb6ae4-2e3c-4c89-9849-2991ac875049@www.fastmail.com>
In-Reply-To: <CANjwSimZAkAC0JJBjUjZr4k0514QRqDxBReOkq_AGTeGJ2OTzQ@mail.gmail.com>
References: <260dfc2f-8399-483e-859d-08f92821c823@www.fastmail.com> <CANjwSimZAkAC0JJBjUjZr4k0514QRqDxBReOkq_AGTeGJ2OTzQ@mail.gmail.com>
Date: Tue, 31 Mar 2020 15:41:00 +1100
From: Martin Thomson <mt@lowentropy.net>
To: Devin Mullins <twifkak@google.com>
Cc: wpack@ietf.org
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/z43SEI4wT724koJkgMIOigvTglw>
Subject: Re: [Wpack] About content-based origins
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 31 Mar 2020 04:41:26 -0000

Wow, that's a long list.

I think that a lot of this relates to the inherent difficulty of providing customization under a regime like was described.

This includes all the experimentation things, CCPA/GDPR consent notices [1], analytics, loading resources with referring origin-based ACLs [2], personalization of content, etc...

This is something that a signature-based system helps with because you can front-load all of this into custom bundles that are all signed with the same key, but you treat all of these as equivalent.

I think that Ekr and I both separately mentioned in on the call last week, but it's worth writing this down somewhere (because I failed to do so thus far).  There is a variant of this design that uses signature public keys as the primary identifier and considers all content signed with the corresponding private key as the one co-extant origin.  That avoids most of this mess.  There are a few challenges, but as the liveness test (Sec-Signer-Origin maybe) remains, concerns about revocation are approximately the same as it pertains to protecting the target origin.

The reputation issue remains, as the granularity of key use becomes imperative, and I don't think we should accept any signer-based origins as valid if the the same key is used for a different target origin or a certificate.  In other words, don't cross the streams: if you want to attest as example.com that this content is yours, then you will need to use a different key for that purpose.  (I suggest we continue to discuss that question on the other thread.)

The other question you raise is the difficult of implementation for this.  I don't know to what extent state is used in frameworks and sites, so I don't have a good handle on this.  The suggestion to keep a record of pre-transfer state is a good one that seems immediately useful.  However, for the general problem, I posited offline that there are a few cases in which merge strategies could be trivial:

1. Exclusive access means no merges: If a content-based origin has exclusive ownership over a particular store, then merges are trivial.  This is true for a single content-based origin that transitions to a target origin: you can likely assume that the target origin has no pre-existing state before the transfer.  You only have to worry about multiple different content-based origins that write to the same logical store, or a content-based origin that doesn't transfer to the target origin for some reason.  It is quite likely that many transfers will take information for which the target origin has no pre-existing state.

Signer-based origins are more likely to enjoy exclusive access, though we do have to admit the possibility of a single target origin having multiple signing keys and so have multiple sets of state to reconcile, but it might also be possible to have good alignment between access to stores and signing keys to avoid the need for merging.

2. Simple merge strategies: If state is trivial, then very simple merge strategies can be effective with very little effort.  Say that you had a control that set a preference.  That might be a boolean (dark mode on/off), or something more complex.  Last change wins is a perfectly workable strategy in case of conflict.  A less simple strategy is to record discrete changes with timestamps and then taking changes in time order.

Finally, for non-trivial state, my hypothesis is that servers want to record that state server-side, so the question becomes whether any sort of offline modification of that state is desirable anyway.  At the point that a site wants that capability, then yes, this is engineering.  But that is engineering they have to undertake anyway at that point.

[1] The informed consent doctrine is something I think is despicable, but I can't deny the force of law.
[2] Not secure in any real sense, but this method seems to be pretty effective in practice.

On Wed, Mar 25, 2020, at 17:35, Devin Mullins wrote:
> Hi Martin,
> 
> Thanks for the engineering work on this. For others, I've already 
> reviewed this, had some chats with others on the AMP team, and had a 
> few back-and-forths with Martin (he's been very gracious in engaging 
> his time, thank you). I'm trying to estimate the feasibility of this, 
> both for the AMP use-case, and for the non-AMP use-case [1], which we 
> hope to see flourish to the extent that publishers and users desire it, 
> and only impeded by implementation costs where necessary for the common 
> good.
> 
> With the caveat that this is very early feedback, the high-level bit is 
> that, with some modifications, it seems mostly feasible for AMP. 
> However, there are two main downsides:
>  1. worse UX on sites that run A/B experiments
>  2. technical constraints make this harder for publishers to adopt on 
> non-AMP, as compared to SXG
> 
> A lot of details follow. I suggest to rename the subject if you reply 
> to a detail below, in order to reduce cross-talk and make browsing the 
> archive easier.
> 
> Details:
> 
> We would need the display URL to be the attested URL even before 
> transfer. I suspect a flash in the URL bar would be frustrating for 
> both users and publishers. Jeffrey had proposed using signatures for 
> this. I'd suppose UAs could choose to render this with a "may be stale" 
> indicator.
> 
> In terms of publisher implementation feasibility, I think the mostly 
> likely implementation would be for a publisher to produce a 
> Sec-Content-Origin (Sec-CO) response for the current version of the 
> resource iff it matches the Sec-CO request. I suspect that keeping 
> state of past versions of the resource would be infeasible, especially 
> as it couldn't be done purely at the edge; it would require something 
> like a region-wide cache. The value of the above stateless 
> implementation would be inversely proportional to how often the 
> resource changes (e.g. "transient content" [2] such as Related 
> Articles). I think this would be a workable constraint for publishers 
> wishing to publish such bundles, but felt it's worth noting, 
> nonetheless.
> 
> Re: fallback behavior on failed state transfer, distributors would need 
> a way to monitor failure rates. This serves two purposes:
>  1. Automatically detecting errors in the distribution pipeline. This 
> is the same purpose served by 
> https://wicg.github.io/webpackage/loading.html#signed-exchange-report.
>  2. Verifying that, for some fraction of navigations, bundles are 
> meeting AMP's stated UX goals (by using the verified content in the 
> bundle and not potentially arbitrary content after the redirect). This 
> is a dynamic equivalent of what can be done mostly statically with SXG, 
> because the distributor can run the same algorithm as the UA, modulo 
> client/server skew in clocks and root stores. I think this is a 
> trade-off AMP could make. The alternative is that the fallback behavior 
> keeps the content-based origin (CBO).
> 
> There is a possible vector for user ID transfer from distributor to 
> publisher; filed as 
> https://github.com/martinthomson/wpack-content/issues/1.
> 
> This limits the ability to run session-sticky experiments (in 
> https://amp.dev/documentation/components/amp-experiment/ and many 
> non-AMP frameworks). The publisher has the following options:
>  1. Delay rendering content under experiment until after state 
> transfer. This hurts UX for the sake of experimentation, creating a 
> trade-off that otherwise doesn't exist on origin (modulo ease of 
> implementation).
>  2. Generate a client-side UUID before state transfer, and join it with 
> the pre-existing user ID after transfer. This means the session could 
> not include CBO pageviews anywhere except in the first pageview.
>  3. Generate variants of the bundle for each experimental state. The 
> distributor chooses which bundle to send to the user (based on what? 
> not sure). For a page with M independent experiments each with N arms, 
> that's N^M variants. For large enough M and N, this is likely both 
> infeasible for publishers and martinthomson/wpack-content#1. (But I 
> need some guidance from experts on typical ranges for M and N.)
> 
> Perhaps this is a need the UA could address. A straw proposal -- a 
> selectExperimentArm() API that:
>  1. only exposes as many bits as the UA deems OK, and
>  2. exposes different bits to different attested origins, so they can't 
> be joined
> On the one hand, perhaps this is too much scope for wpack. On the 
> other, it addresses a problem that is somewhat particular to CBOs.
> 
> This impedes content management (e.g. dialogs for GDPR and CCPA), 
> compared to SXG. The publisher has a few choices:
>  1. Delay rendering the dialog under after transfer. I think this would 
> cause layout shift (https://web.dev/cls/).
>  2. Render the dialog, and risk discovering that it's already been 
> acknowledged. At that point:
>  a. Hide it, causing layout shift.
>  b. Leave it there. This is way outside my domain expertise, but ISTR 
> there are some rules or at least best practices wrt when it's okay to 
> re-show an opt-out dialog.
> One of these options may be an okay trade-off; I'm not sure.
> 
> Analytics providers (via 
> https://amp.dev/documentation/components/amp-analytics/ and many 
> non-AMP libraries) would have to do one of two things:
>  1. Delay pingback until after transfer. This risks a lower fraction of 
> successful pingbacks, as e.g. the user might close the tab before 
> transfer.
>  2. Generate a client-side UUID, send pingback before transfer, and 
> send a follow-up after transfer to join with pre-existing user ID. This 
> captures those otherwise lost pingbacks pseudonymously (anonymously?), 
> though it requires server-side work on the pingback endpoint.
> 
> For the above client-side mitigations, two feature requests seem 
> necessary:
>  1. Store the transfer metadata (e.g. indexedDB renames by time) 
> somewhere more permanent. For various reasons the event handler may not 
> fire or finish, and it would be good to later scan for stragglers to 
> merge or expunge.
>  2. DOM API to wait for transfer to complete.
> 
> Some subresources are ACLs by Origin header (e.g. paid fonts). Likely 
> mitigation is to defer loading until transfer, with a timeout whose 
> duration depends on the publisher's font-display preference. (Martin 
> suggests an optimization where the encrypted subresource is loaded 
> early, and only the decryption key after transfer.)
> 
> Last but not least, I am concerned that this is a bar that is 
> disproportionately difficult for non-AMP publishers to meet. The above 
> client-side mitigations are possible, but quite a bit of work.
>  1. For frameworks like AMP that strongly encourage publishers to use a 
> rolling release, it is possible to upgrade many use-cases in the wild 
> with minimal publisher effort.
>  2. For other frameworks, it may be possible to provide this support, 
> but the requirement to upgrade versions may limit applicability. 
> Because upgrades occur less often, they often require more work.
>  3. For custom JS that manages state, authors will need to modify it to 
> move or copy state into indexedDB, and handle transfer events and merge 
> conflicts.
> 
> From a "minimal publisher effort" perspective, I am especially 
> interested in making it possible to have as close to a turnkey solution 
> as is reasonable, for instance at the level of the CDN or CMS. If such 
> a solution exists, sites would likely contain a hybrid of all three 
> cases above, and thus want to opt into such support incrementally. How 
> should a CDN determine which pages to provide Sec-CO responses for?
>  1. URL patterns are the simplest answer, but I suspect hard for 
> publishers to create and maintain with minimal false negatives and 
> positives.
>  2. Would it be helpful to thread an "I support state transfer" 
> annotation for some portion of the build journey, from individual JS 
> function all the way to bundle? Would folks use this?
> 
> I don't have a good solution to the issue of non-AMP developer cost, 
> and thus fear this is a solution that would feature a disproportionate 
> amount of AMP pages compared to the background distribution of the web. 
> But I'm hopeful that other JS framework developers can chime in with 
> respect to feasibility of support for CBO and state transfer.
> 
> I'm leaving out some of Martin's replies to the above; no offense 
> intended. It took me long enough just to summarize the above, and I'd 
> better send it early enough so at least a few people have time to read 
> it before the meeting.
> 
> Thanks,
> Devin
> 
> [1] 
> https://blog.amp.dev/2019/05/22/privacy-preserving-instant-loading-for-all-web-content/
> [2] 
> http://www.seobythesea.com/2011/12/how-google-might-identify-transient-content-on-webpages/
> 
> On Mon, Mar 23, 2020 at 5:35 PM Martin Thomson <mt@lowentropy.net> wrote:
> > Ted's note prompted me to send a much-belated announcement (sorry folks, I forgot).
> > 
> >  The draft is here:
> > https://tools.ietf.org/html/draft-thomson-wpack-content-origin-00
> > 
> >  A nicer version here:
> > https://martinthomson.github.io/wpack-content/draft-thomson-wpack-content-origin.html
> > 
> >  This approach could a dramatically different approach to addressing the use cases set out in our charter.
> > 
> >  In short, this aims to address the core question of how offline content might *ultimately* be attributed to a web origin in a fundamentally different way. There are two key concepts:
> > 
> >  1. Content is given its own origin, using a new system for identification.
> > 
> >  2. A target origin can "accept" content and state from one of these new origins.
> > 
> >  There are a lot of details here (read the draft), but the major advantage I see is that you don't have to make an offline decision about authority, and that means you can be offline for much longer (lifting the 7 day limit).
> > 
> >  What it does have in common with signed exchanges approach is the need for a bundling format, but in its current form it is less dependent on the details of the format. That might allow that to be simpler, but I'm sure that the need to mint new identifier types will more than make up for any slack there.
> > 
> >  The draft is quite rough. I'm sure that it has the remnants of a few bad ideas still hanging around. Ask questions if you think something is unclear.
> > 
> >  _______________________________________________
> >  Wpack mailing list
> > Wpack@ietf.org
> > https://www.ietf.org/mailman/listinfo/wpack