Re: [Wpack] About content-based origins

Devin Mullins <twifkak@google.com> Fri, 03 April 2020 22:59 UTC

Return-Path: <twifkak@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F0E2C3A0DDA for <wpack@ietfa.amsl.com>; Fri, 3 Apr 2020 15:59:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i3_5mEW1iaEO for <wpack@ietfa.amsl.com>; Fri, 3 Apr 2020 15:59:29 -0700 (PDT)
Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DF6B13A0DD4 for <wpack@ietf.org>; Fri, 3 Apr 2020 15:59:28 -0700 (PDT)
Received: by mail-wr1-x433.google.com with SMTP id g3so8132681wrx.2 for <wpack@ietf.org>; Fri, 03 Apr 2020 15:59:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DuMj53OkN8ed45jz36VPxQ8kGeO+EMfwsJLRd+HyzTM=; b=Hvx/8h8tlJlc8NYWNtUVuqQXtvJYsATHKD6ZaYAMWK3p6rMOEQZSP6xu4QQ/9OQpHl RpmroS1+8JPkyb8w3M1gKYh/jZkH/k8D4VT5syUvTRjE3zthYhKmNRrx4OPudQl35z2+ 8xjtQiTgaTL1TgD4XKgo7Fr5CgBeiT0NrHr0iX1SORVdH36x20zekPL7/RpBAnD+gXqy NItMOu6PlMNVcdEql4haVnAE33xphKAQFrYu1fxanRZoCPrahDcpz/Kvmkf/1eyCp+sI yHsG9k1mpsQPWa/CqofvjhYLwBHjrp393tzTKHPoKgz+Q2ImI5iwuMgMJL9k1g9RAX8f +pcQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DuMj53OkN8ed45jz36VPxQ8kGeO+EMfwsJLRd+HyzTM=; b=uAW20obvRUKJ74bMjVEr3YvUVJocfg5F7Qs+OxWR5e6Kn/32meA7cPuiYL8FtFqSCi OHOnHSU6stsp+l5FVWZPwAMuwFq6XtKvuv90QDB76ZkC+/X9+Tn8V6N4OdI2bK9x/Nst 3g4lrZehIg+QUyvK/lFyCioFlR0ebWG1r+ZObTXXZec7QdjnSeddIi6RGU6HbGlE70qx IikhGxdeZJH8S09w81yHFu7z26K7xa+p+Nkqloh5038dD7RboVfnayxREhAFHaHORDVV D9alWCCzkNqh/t4Ot83BFTVrsOUoh0qKM4clBTi6uwUT1JePZQjRFMPJsZvhaIAyfVfj Op5A==
X-Gm-Message-State: AGi0PuZ/IrKdhfafPBvWkBGK3DIDvl9uCC116k7JyzDcFAvwGDGBkPhm +M8pKo/voaLcgWjo+/UbpJGHurhDPzfptPyk1rmWq5O76VE=
X-Google-Smtp-Source: APiQypJ4lquj+s5dBy390loJi87lgxYhrRbcEZaZiNh4VpztUApvgaSC2O58rF/jRZ+jTYkfQ0dFlgXK+YF8yMf4eSM=
X-Received: by 2002:a5d:60ca:: with SMTP id x10mr11875046wrt.372.1585954766737; Fri, 03 Apr 2020 15:59:26 -0700 (PDT)
MIME-Version: 1.0
References: <260dfc2f-8399-483e-859d-08f92821c823@www.fastmail.com> <CANjwSimZAkAC0JJBjUjZr4k0514QRqDxBReOkq_AGTeGJ2OTzQ@mail.gmail.com> <defb6ae4-2e3c-4c89-9849-2991ac875049@www.fastmail.com>
In-Reply-To: <defb6ae4-2e3c-4c89-9849-2991ac875049@www.fastmail.com>
From: Devin Mullins <twifkak@google.com>
Date: Fri, 03 Apr 2020 15:59:00 -0700
Message-ID: <CANjwSik10FB4WJDcJvs6usV-Sf7cLkq82jShb+_U3ONMb8_ThQ@mail.gmail.com>
To: Martin Thomson <mt@lowentropy.net>
Cc: wpack@ietf.org
Content-Type: multipart/alternative; boundary="00000000000080c6b105a26ade5b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/cSiRCHdj4CVtrnnExBJzXrB1lE0>
Subject: Re: [Wpack] About content-based origins
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Apr 2020 22:59:31 -0000

>
> I think that Ekr and I both separately mentioned in on the call last week,
> but it's worth writing this down somewhere (because I failed to do so thus
> far).  There is a variant of this design that uses signature public keys as
> the primary identifier and considers all content signed with the
> corresponding private key as the one co-extant origin.  That avoids most of
> this mess.  There are a few challenges, but as the liveness test
> (Sec-Signer-Origin maybe) remains, concerns about revocation are
> approximately the same as it pertains to protecting the target origin.
>

Ah, interesting. This could solve a problem I just learned about.
https://amp.dev/documentation/components/amp-access/#amp-reader-id is used
for access control (e.g. paywalls). It has different user IDs across
different (publisher, delivery, browser) tuples, so this would bring that
to parity with existing behavior.

For experiments, I think the origin would want a way to opt into sharing
some subset of its state with this co-extant origin, before Sec-CO response
(e.g. for those middle-of-session pageviews). Otherwise, it has to delay
rendering or risk generating a session should be excluded from analysis.
Similar for consent.

For analytics & origin-based ACLs, their libraries/services would need code
changes (either to defer requests until origin is correct, or to accept
some alternate attestation of origin). My guess is that, even outside of
AMP, these are usually rolling releases [1], and thus changes would be
largely achievable. Just a guess.

This is of course coming from my document-centric view of the web, since
that's the use case I'm thinking most about.

[1] e.g.
https://developers.google.com/analytics/devguides/collection/analyticsjs

The other question you raise is the difficult of implementation for this.
> I don't know to what extent state is used in frameworks and sites, so I
> don't have a good handle on this.


It occurred to me it may be possible for a bundling service to determine
whether a page uses state, with ~0 false negatives. Easy version: disallow
any feature that uses state, such as JS. Medium version: hardcode
allowances for certain common 3p scripts. Difficult version: analyze
trivial scripts (classifying "don't know" as "uses state").

This would allow easier implementation for at least some portion of the
long tail of the publisher distribution. For the torso, the publisher could
provide an override annotation for any page that the bundler classifies as
"uses state".