Re: [Wpack] About content-based origins

Devin Mullins <twifkak@google.com> Wed, 25 March 2020 06:36 UTC

Return-Path: <twifkak@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 03C1E3A041C for <wpack@ietfa.amsl.com>; Tue, 24 Mar 2020 23:36:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y4_hULLIq8LT for <wpack@ietfa.amsl.com>; Tue, 24 Mar 2020 23:36:13 -0700 (PDT)
Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 78BB03A0415 for <wpack@ietf.org>; Tue, 24 Mar 2020 23:36:13 -0700 (PDT)
Received: by mail-wm1-x335.google.com with SMTP id a9so1169730wmj.4 for <wpack@ietf.org>; Tue, 24 Mar 2020 23:36:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=57min23d321BBXVaf1BMjJyL2qVIjjNsd+OG1tV4pJ4=; b=RlTS0XtESFHUYo6966mQSelLjEGZBwZgPjogQ9Yxswpv2uz2qB/3OSdSv95yAICxa6 3BrXB2XoX3XqNV3QwJmFCHK3oNThgtvDI4PlCrs4gqmmft6msg4lN0iwmX7z5RKksaF9 CRtCr5iiBqssohZEd8k9RnDwSrSkES+UeSR6b89t+XbFVzkhuvAweRBxCKovsPc0Axif KiewkzBxNaVSqp8LshRivVWSSbBB362dz2cbI/IvzhatPgod5BDRRnFaaiC30kc2zhj7 5BAiMz8eurMoy1HwdvUp+0vwD4GGM0ww90Ry3kSlyXQ+TUftt98NMGq9ThV9RN73FrgA ZEfw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=57min23d321BBXVaf1BMjJyL2qVIjjNsd+OG1tV4pJ4=; b=s6ZYiPm741y9s5Lb/5ocmXwqtqdYKhz73sv4yCp8M+vVmz+1NjjHRlpJRhQEcpUOtN W7+qiV7xo7A5f24U8QreXxFmPm6P6B/RV83S+4ScNNaqh8T2tMbOmF6KGYrzaEyEWfTD la7Peur1yH06m/3AFzcrmYTu3Yp09+jHfjsuvOxwC50AtkgPgn7RjtNsUvUqGi01hp74 ZdFZvGSIrHrfTNERsC/8J1uGWFgmQpJNCNSFtCyfBwy2DrULtX/oPaeP2ijrparlqFv7 Qs14QIkXedLVTXzpuiaGVasZ4WKeQekbZzwrzrcjWF4HSH1488KXMIGUvRYHMkbIP62e vJfA==
X-Gm-Message-State: ANhLgQ0OdrQgdCo9b6ay4pRXEXTpllycUD9SyEgsnwawNcs6jcdKHunl +LOxhDeHghFEhObugTKva+0316JflVTAkMFwUA1uYk77LP8=
X-Google-Smtp-Source: ADFU+vsxVsRo95h6HXnM3+6niGjzB9Eeyyxj08BJHrFbDPxcfIh1/pAA3mQKXiTucYa5tSec6U4OlUP5x3RwmHMLt8U=
X-Received: by 2002:a1c:f409:: with SMTP id z9mr1915771wma.51.1585118171222; Tue, 24 Mar 2020 23:36:11 -0700 (PDT)
MIME-Version: 1.0
References: <260dfc2f-8399-483e-859d-08f92821c823@www.fastmail.com>
In-Reply-To: <260dfc2f-8399-483e-859d-08f92821c823@www.fastmail.com>
From: Devin Mullins <twifkak@google.com>
Date: Tue, 24 Mar 2020 23:35:44 -0700
Message-ID: <CANjwSimZAkAC0JJBjUjZr4k0514QRqDxBReOkq_AGTeGJ2OTzQ@mail.gmail.com>
To: Martin Thomson <mt@lowentropy.net>
Cc: wpack@ietf.org
Content-Type: multipart/alternative; boundary="00000000000086343805a1a81544"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/zQjwiAeUrCZHHdhmixL9iBup7ag>
Subject: Re: [Wpack] About content-based origins
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Mar 2020 06:36:16 -0000

Hi Martin,

Thanks for the engineering work on this. For others, I've already reviewed
this, had some chats with others on the AMP team, and had a few
back-and-forths with Martin (he's been very gracious in engaging his time,
thank you). I'm trying to estimate the feasibility of this, both for the
AMP use-case, and for the non-AMP use-case [1], which we hope to see
flourish to the extent that publishers and users desire it, and only
impeded by implementation costs where necessary for the common good.

With the caveat that this is very early feedback, the high-level bit is
that, with some modifications, it seems mostly feasible for AMP. However,
there are two main downsides:
  1. worse UX on sites that run A/B experiments
  2. technical constraints make this harder for publishers to adopt on
non-AMP, as compared to SXG

A lot of details follow. I suggest to rename the subject if you reply to a
detail below, in order to reduce cross-talk and make browsing the archive
easier.

Details:

We would need the display URL to be the attested URL even before transfer.
I suspect a flash in the URL bar would be frustrating for both users and
publishers. Jeffrey had proposed using signatures for this. I'd suppose UAs
could choose to render this with a "may be stale" indicator.

In terms of publisher implementation feasibility, I think the mostly likely
implementation would be for a publisher to produce a Sec-Content-Origin
(Sec-CO) response for the current version of the resource iff it matches
the Sec-CO request. I suspect that keeping state of past versions of the
resource would be infeasible, especially as it couldn't be done purely at
the edge; it would require something like a region-wide cache. The value of
the above stateless implementation would be inversely proportional to how
often the resource changes (e.g. "transient content" [2] such as Related
Articles). I think this would be a workable constraint for publishers
wishing to publish such bundles, but felt it's worth noting, nonetheless.

Re: fallback behavior on failed state transfer, distributors would need a
way to monitor failure rates. This serves two purposes:
  1. Automatically detecting errors in the distribution pipeline. This is
the same purpose served by
https://wicg.github.io/webpackage/loading.html#signed-exchange-report.
  2. Verifying that, for some fraction of navigations, bundles are meeting
AMP's stated UX goals (by using the verified content in the bundle and not
potentially arbitrary content after the redirect). This is a dynamic
equivalent of what can be done mostly statically with SXG, because the
distributor can run the same algorithm as the UA, modulo client/server skew
in clocks and root stores. I think this is a trade-off AMP could make. The
alternative is that the fallback behavior keeps the content-based origin
(CBO).

There is a possible vector for user ID transfer from distributor to
publisher; filed as https://github.com/martinthomson/wpack-content/issues/1.

This limits the ability to run session-sticky experiments (in
https://amp.dev/documentation/components/amp-experiment/ and many non-AMP
frameworks). The publisher has the following options:
  1. Delay rendering content under experiment until after state transfer.
This hurts UX for the sake of experimentation, creating a trade-off that
otherwise doesn't exist on origin (modulo ease of implementation).
  2. Generate a client-side UUID before state transfer, and join it with
the pre-existing user ID after transfer. This means the session could not
include CBO pageviews anywhere except in the first pageview.
  3. Generate variants of the bundle for each experimental state. The
distributor chooses which bundle to send to the user (based on what? not
sure). For a page with M independent experiments each with N arms, that's
N^M variants. For large enough M and N, this is likely both infeasible for
publishers and martinthomson/wpack-content#1. (But I need some guidance
from experts on typical ranges for M and N.)

Perhaps this is a need the UA could address. A straw proposal -- a
selectExperimentArm() API that:
  1. only exposes as many bits as the UA deems OK, and
  2. exposes different bits to different attested origins, so they can't be
joined
On the one hand, perhaps this is too much scope for wpack. On the other, it
addresses a problem that is somewhat particular to CBOs.

This impedes content management (e.g. dialogs for GDPR and CCPA), compared
to SXG. The publisher has a few choices:
  1. Delay rendering the dialog under after transfer. I think this would
cause layout shift (https://web.dev/cls/).
  2. Render the dialog, and risk discovering that it's already been
acknowledged. At that point:
    a. Hide it, causing layout shift.
    b. Leave it there. This is way outside my domain expertise, but ISTR
there are some rules or at least best practices wrt when it's okay to
re-show an opt-out dialog.
One of these options may be an okay trade-off; I'm not sure.

Analytics providers (via
https://amp.dev/documentation/components/amp-analytics/ and many non-AMP
libraries) would have to do one of two things:
  1. Delay pingback until after transfer. This risks a lower fraction of
successful pingbacks, as e.g. the user might close the tab before transfer.
  2. Generate a client-side UUID, send pingback before transfer, and send a
follow-up after transfer to join with pre-existing user ID. This captures
those otherwise lost pingbacks pseudonymously (anonymously?), though it
requires server-side work on the pingback endpoint.

For the above client-side mitigations, two feature requests seem necessary:
  1. Store the transfer metadata (e.g. indexedDB renames by time) somewhere
more permanent. For various reasons the event handler may not fire or
finish, and it would be good to later scan for stragglers to merge or
expunge.
  2. DOM API to wait for transfer to complete.

Some subresources are ACLs by Origin header (e.g. paid fonts). Likely
mitigation is to defer loading until transfer, with a timeout whose
duration depends on the publisher's font-display preference. (Martin
suggests an optimization where the encrypted subresource is loaded early,
and only the decryption key after transfer.)

Last but not least, I am concerned that this is a bar that is
disproportionately difficult for non-AMP publishers to meet. The above
client-side mitigations are possible, but quite a bit of work.
  1. For frameworks like AMP that strongly encourage publishers to use a
rolling release, it is possible to upgrade many use-cases in the wild with
minimal publisher effort.
  2. For other frameworks, it may be possible to provide this support, but
the requirement to upgrade versions may limit applicability. Because
upgrades occur less often, they often require more work.
  3. For custom JS that manages state, authors will need to modify it to
move or copy state into indexedDB, and handle transfer events and merge
conflicts.

>From a "minimal publisher effort" perspective, I am especially interested
in making it possible to have as close to a turnkey solution as is
reasonable, for instance at the level of the CDN or CMS. If such a solution
exists, sites would likely contain a hybrid of all three cases above, and
thus want to opt into such support incrementally. How should a CDN
determine which pages to provide Sec-CO responses for?
  1. URL patterns are the simplest answer, but I suspect hard for
publishers to create and maintain with minimal false negatives and
positives.
  2. Would it be helpful to thread an "I support state transfer" annotation
for some portion of the build journey, from individual JS function all the
way to bundle? Would folks use this?

I don't have a good solution to the issue of non-AMP developer cost, and
thus fear this is a solution that would feature a disproportionate amount
of AMP pages compared to the background distribution of the web. But I'm
hopeful that other JS framework developers can chime in with respect to
feasibility of support for CBO and state transfer.

I'm leaving out some of Martin's replies to the above; no offense intended.
It took me long enough just to summarize the above, and I'd better send it
early enough so at least a few people have time to read it before the
meeting.

Thanks,
Devin

[1]
https://blog.amp.dev/2019/05/22/privacy-preserving-instant-loading-for-all-web-content/
[2]
http://www.seobythesea.com/2011/12/how-google-might-identify-transient-content-on-webpages/

On Mon, Mar 23, 2020 at 5:35 PM Martin Thomson <mt@lowentropy.net> wrote:

> Ted's note prompted me to send a much-belated announcement (sorry folks, I
> forgot).
>
> The draft is here:
> https://tools.ietf.org/html/draft-thomson-wpack-content-origin-00
>
> A nicer version here:
>
> https://martinthomson.github.io/wpack-content/draft-thomson-wpack-content-origin.html
>
> This approach could a dramatically different approach to addressing the
> use cases set out in our charter.
>
> In short, this aims to address the core question of how offline content
> might *ultimately* be attributed to a web origin in a fundamentally
> different way.  There are two key concepts:
>
> 1. Content is given its own origin, using a new system for identification.
>
> 2. A target origin can "accept" content and state from one of these new
> origins.
>
> There are a lot of details here (read the draft), but the major advantage
> I see is that you don't have to make an offline decision about authority,
> and that means you can be offline for much longer (lifting the 7 day limit).
>
> What it does have in common with signed exchanges approach is the need for
> a bundling format, but in its current form it is less dependent on the
> details of the format.  That might allow that to be simpler, but I'm sure
> that the need to mint new identifier types will more than make up for any
> slack there.
>
> The draft is quite rough.  I'm sure that it has the remnants of a few bad
> ideas still hanging around.  Ask questions if you think something is
> unclear.
>
> _______________________________________________
> Wpack mailing list
> Wpack@ietf.org
> https://www.ietf.org/mailman/listinfo/wpack
>