Re: [Wpack] About content-based origins
Devin Mullins <twifkak@google.com> Wed, 25 March 2020 06:36 UTC
Return-Path: <twifkak@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 03C1E3A041C for <wpack@ietfa.amsl.com>; Tue, 24 Mar 2020 23:36:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y4_hULLIq8LT for <wpack@ietfa.amsl.com>; Tue, 24 Mar 2020 23:36:13 -0700 (PDT)
Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 78BB03A0415 for <wpack@ietf.org>; Tue, 24 Mar 2020 23:36:13 -0700 (PDT)
Received: by mail-wm1-x335.google.com with SMTP id a9so1169730wmj.4 for <wpack@ietf.org>; Tue, 24 Mar 2020 23:36:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=57min23d321BBXVaf1BMjJyL2qVIjjNsd+OG1tV4pJ4=; b=RlTS0XtESFHUYo6966mQSelLjEGZBwZgPjogQ9Yxswpv2uz2qB/3OSdSv95yAICxa6 3BrXB2XoX3XqNV3QwJmFCHK3oNThgtvDI4PlCrs4gqmmft6msg4lN0iwmX7z5RKksaF9 CRtCr5iiBqssohZEd8k9RnDwSrSkES+UeSR6b89t+XbFVzkhuvAweRBxCKovsPc0Axif KiewkzBxNaVSqp8LshRivVWSSbBB362dz2cbI/IvzhatPgod5BDRRnFaaiC30kc2zhj7 5BAiMz8eurMoy1HwdvUp+0vwD4GGM0ww90Ry3kSlyXQ+TUftt98NMGq9ThV9RN73FrgA ZEfw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=57min23d321BBXVaf1BMjJyL2qVIjjNsd+OG1tV4pJ4=; b=s6ZYiPm741y9s5Lb/5ocmXwqtqdYKhz73sv4yCp8M+vVmz+1NjjHRlpJRhQEcpUOtN W7+qiV7xo7A5f24U8QreXxFmPm6P6B/RV83S+4ScNNaqh8T2tMbOmF6KGYrzaEyEWfTD la7Peur1yH06m/3AFzcrmYTu3Yp09+jHfjsuvOxwC50AtkgPgn7RjtNsUvUqGi01hp74 ZdFZvGSIrHrfTNERsC/8J1uGWFgmQpJNCNSFtCyfBwy2DrULtX/oPaeP2ijrparlqFv7 Qs14QIkXedLVTXzpuiaGVasZ4WKeQekbZzwrzrcjWF4HSH1488KXMIGUvRYHMkbIP62e vJfA==
X-Gm-Message-State: ANhLgQ0OdrQgdCo9b6ay4pRXEXTpllycUD9SyEgsnwawNcs6jcdKHunl +LOxhDeHghFEhObugTKva+0316JflVTAkMFwUA1uYk77LP8=
X-Google-Smtp-Source: ADFU+vsxVsRo95h6HXnM3+6niGjzB9Eeyyxj08BJHrFbDPxcfIh1/pAA3mQKXiTucYa5tSec6U4OlUP5x3RwmHMLt8U=
X-Received: by 2002:a1c:f409:: with SMTP id z9mr1915771wma.51.1585118171222; Tue, 24 Mar 2020 23:36:11 -0700 (PDT)
MIME-Version: 1.0
References: <260dfc2f-8399-483e-859d-08f92821c823@www.fastmail.com>
In-Reply-To: <260dfc2f-8399-483e-859d-08f92821c823@www.fastmail.com>
From: Devin Mullins <twifkak@google.com>
Date: Tue, 24 Mar 2020 23:35:44 -0700
Message-ID: <CANjwSimZAkAC0JJBjUjZr4k0514QRqDxBReOkq_AGTeGJ2OTzQ@mail.gmail.com>
To: Martin Thomson <mt@lowentropy.net>
Cc: wpack@ietf.org
Content-Type: multipart/alternative; boundary="00000000000086343805a1a81544"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/zQjwiAeUrCZHHdhmixL9iBup7ag>
Subject: Re: [Wpack] About content-based origins
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Mar 2020 06:36:16 -0000
Hi Martin, Thanks for the engineering work on this. For others, I've already reviewed this, had some chats with others on the AMP team, and had a few back-and-forths with Martin (he's been very gracious in engaging his time, thank you). I'm trying to estimate the feasibility of this, both for the AMP use-case, and for the non-AMP use-case [1], which we hope to see flourish to the extent that publishers and users desire it, and only impeded by implementation costs where necessary for the common good. With the caveat that this is very early feedback, the high-level bit is that, with some modifications, it seems mostly feasible for AMP. However, there are two main downsides: 1. worse UX on sites that run A/B experiments 2. technical constraints make this harder for publishers to adopt on non-AMP, as compared to SXG A lot of details follow. I suggest to rename the subject if you reply to a detail below, in order to reduce cross-talk and make browsing the archive easier. Details: We would need the display URL to be the attested URL even before transfer. I suspect a flash in the URL bar would be frustrating for both users and publishers. Jeffrey had proposed using signatures for this. I'd suppose UAs could choose to render this with a "may be stale" indicator. In terms of publisher implementation feasibility, I think the mostly likely implementation would be for a publisher to produce a Sec-Content-Origin (Sec-CO) response for the current version of the resource iff it matches the Sec-CO request. I suspect that keeping state of past versions of the resource would be infeasible, especially as it couldn't be done purely at the edge; it would require something like a region-wide cache. The value of the above stateless implementation would be inversely proportional to how often the resource changes (e.g. "transient content" [2] such as Related Articles). I think this would be a workable constraint for publishers wishing to publish such bundles, but felt it's worth noting, nonetheless. Re: fallback behavior on failed state transfer, distributors would need a way to monitor failure rates. This serves two purposes: 1. Automatically detecting errors in the distribution pipeline. This is the same purpose served by https://wicg.github.io/webpackage/loading.html#signed-exchange-report. 2. Verifying that, for some fraction of navigations, bundles are meeting AMP's stated UX goals (by using the verified content in the bundle and not potentially arbitrary content after the redirect). This is a dynamic equivalent of what can be done mostly statically with SXG, because the distributor can run the same algorithm as the UA, modulo client/server skew in clocks and root stores. I think this is a trade-off AMP could make. The alternative is that the fallback behavior keeps the content-based origin (CBO). There is a possible vector for user ID transfer from distributor to publisher; filed as https://github.com/martinthomson/wpack-content/issues/1. This limits the ability to run session-sticky experiments (in https://amp.dev/documentation/components/amp-experiment/ and many non-AMP frameworks). The publisher has the following options: 1. Delay rendering content under experiment until after state transfer. This hurts UX for the sake of experimentation, creating a trade-off that otherwise doesn't exist on origin (modulo ease of implementation). 2. Generate a client-side UUID before state transfer, and join it with the pre-existing user ID after transfer. This means the session could not include CBO pageviews anywhere except in the first pageview. 3. Generate variants of the bundle for each experimental state. The distributor chooses which bundle to send to the user (based on what? not sure). For a page with M independent experiments each with N arms, that's N^M variants. For large enough M and N, this is likely both infeasible for publishers and martinthomson/wpack-content#1. (But I need some guidance from experts on typical ranges for M and N.) Perhaps this is a need the UA could address. A straw proposal -- a selectExperimentArm() API that: 1. only exposes as many bits as the UA deems OK, and 2. exposes different bits to different attested origins, so they can't be joined On the one hand, perhaps this is too much scope for wpack. On the other, it addresses a problem that is somewhat particular to CBOs. This impedes content management (e.g. dialogs for GDPR and CCPA), compared to SXG. The publisher has a few choices: 1. Delay rendering the dialog under after transfer. I think this would cause layout shift (https://web.dev/cls/). 2. Render the dialog, and risk discovering that it's already been acknowledged. At that point: a. Hide it, causing layout shift. b. Leave it there. This is way outside my domain expertise, but ISTR there are some rules or at least best practices wrt when it's okay to re-show an opt-out dialog. One of these options may be an okay trade-off; I'm not sure. Analytics providers (via https://amp.dev/documentation/components/amp-analytics/ and many non-AMP libraries) would have to do one of two things: 1. Delay pingback until after transfer. This risks a lower fraction of successful pingbacks, as e.g. the user might close the tab before transfer. 2. Generate a client-side UUID, send pingback before transfer, and send a follow-up after transfer to join with pre-existing user ID. This captures those otherwise lost pingbacks pseudonymously (anonymously?), though it requires server-side work on the pingback endpoint. For the above client-side mitigations, two feature requests seem necessary: 1. Store the transfer metadata (e.g. indexedDB renames by time) somewhere more permanent. For various reasons the event handler may not fire or finish, and it would be good to later scan for stragglers to merge or expunge. 2. DOM API to wait for transfer to complete. Some subresources are ACLs by Origin header (e.g. paid fonts). Likely mitigation is to defer loading until transfer, with a timeout whose duration depends on the publisher's font-display preference. (Martin suggests an optimization where the encrypted subresource is loaded early, and only the decryption key after transfer.) Last but not least, I am concerned that this is a bar that is disproportionately difficult for non-AMP publishers to meet. The above client-side mitigations are possible, but quite a bit of work. 1. For frameworks like AMP that strongly encourage publishers to use a rolling release, it is possible to upgrade many use-cases in the wild with minimal publisher effort. 2. For other frameworks, it may be possible to provide this support, but the requirement to upgrade versions may limit applicability. Because upgrades occur less often, they often require more work. 3. For custom JS that manages state, authors will need to modify it to move or copy state into indexedDB, and handle transfer events and merge conflicts. >From a "minimal publisher effort" perspective, I am especially interested in making it possible to have as close to a turnkey solution as is reasonable, for instance at the level of the CDN or CMS. If such a solution exists, sites would likely contain a hybrid of all three cases above, and thus want to opt into such support incrementally. How should a CDN determine which pages to provide Sec-CO responses for? 1. URL patterns are the simplest answer, but I suspect hard for publishers to create and maintain with minimal false negatives and positives. 2. Would it be helpful to thread an "I support state transfer" annotation for some portion of the build journey, from individual JS function all the way to bundle? Would folks use this? I don't have a good solution to the issue of non-AMP developer cost, and thus fear this is a solution that would feature a disproportionate amount of AMP pages compared to the background distribution of the web. But I'm hopeful that other JS framework developers can chime in with respect to feasibility of support for CBO and state transfer. I'm leaving out some of Martin's replies to the above; no offense intended. It took me long enough just to summarize the above, and I'd better send it early enough so at least a few people have time to read it before the meeting. Thanks, Devin [1] https://blog.amp.dev/2019/05/22/privacy-preserving-instant-loading-for-all-web-content/ [2] http://www.seobythesea.com/2011/12/how-google-might-identify-transient-content-on-webpages/ On Mon, Mar 23, 2020 at 5:35 PM Martin Thomson <mt@lowentropy.net> wrote: > Ted's note prompted me to send a much-belated announcement (sorry folks, I > forgot). > > The draft is here: > https://tools.ietf.org/html/draft-thomson-wpack-content-origin-00 > > A nicer version here: > > https://martinthomson.github.io/wpack-content/draft-thomson-wpack-content-origin.html > > This approach could a dramatically different approach to addressing the > use cases set out in our charter. > > In short, this aims to address the core question of how offline content > might *ultimately* be attributed to a web origin in a fundamentally > different way. There are two key concepts: > > 1. Content is given its own origin, using a new system for identification. > > 2. A target origin can "accept" content and state from one of these new > origins. > > There are a lot of details here (read the draft), but the major advantage > I see is that you don't have to make an offline decision about authority, > and that means you can be offline for much longer (lifting the 7 day limit). > > What it does have in common with signed exchanges approach is the need for > a bundling format, but in its current form it is less dependent on the > details of the format. That might allow that to be simpler, but I'm sure > that the need to mint new identifier types will more than make up for any > slack there. > > The draft is quite rough. I'm sure that it has the remnants of a few bad > ideas still hanging around. Ask questions if you think something is > unclear. > > _______________________________________________ > Wpack mailing list > Wpack@ietf.org > https://www.ietf.org/mailman/listinfo/wpack >
- [Wpack] About content-based origins Martin Thomson
- Re: [Wpack] About content-based origins Ted Hardie
- Re: [Wpack] About content-based origins Ben Schwartz
- Re: [Wpack] About content-based origins Martin Thomson
- Re: [Wpack] About content-based origins Devin Mullins
- [Wpack] Sec-Content-Origin clarification question… Ted Hardie
- Re: [Wpack] Sec-Content-Origin clarification ques… Devin Mullins
- Re: [Wpack] Sec-Content-Origin clarification ques… Jeffrey Yasskin
- [Wpack] On double-hashing (was: Re: About content… Devin Mullins
- Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
- Re: [Wpack] On double-hashing (was: Re: About con… Martin Thomson
- Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
- Re: [Wpack] On double-hashing (was: Re: About con… Martin Thomson
- Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
- Re: [Wpack] On double-hashing (was: Re: About con… Martin Thomson
- Re: [Wpack] About content-based origins Martin Thomson
- Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
- Re: [Wpack] About content-based origins Devin Mullins
- Re: [Wpack] About content-based origins Martin Thomson