Re: [Wpack] On double-hashing (was: Re: About content-based origins)
Devin Mullins <twifkak@google.com> Thu, 26 March 2020 00:16 UTC
Return-Path: <twifkak@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B8E693A00C0 for <wpack@ietfa.amsl.com>; Wed, 25 Mar 2020 17:16:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w2sF4Enk-bru for <wpack@ietfa.amsl.com>; Wed, 25 Mar 2020 17:16:31 -0700 (PDT)
Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 66C103A07FC for <wpack@ietf.org>; Wed, 25 Mar 2020 17:16:30 -0700 (PDT)
Received: by mail-wm1-x32d.google.com with SMTP id d198so4782510wmd.0 for <wpack@ietf.org>; Wed, 25 Mar 2020 17:16:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aAnTSQqU1VV8/LAdEfVrJEsQa1LfRc0ZTfKJ1dL//1Q=; b=Hjk8Em6oIVHutVae+kL7tSXIChB4g+wbJsveHM3iVG047/9qz91Fyd+wmlM3gODhEL gMInL6J/7fS1BA4ckGAc4wW3Imcz6S5RbAUZVI56shlj5xiyMSPawryVn6z80daY+cgE OEFT48OURp2ayxSy8SxwHYLRPrwedKCe5B+l0iorTockd21SMIm9kCEgfVJsu6z+LPZF CitmnKw9M4EhozjpWjqmGfdX/MKtAH+uXK4NNmPWbw1FY4/9iHFKKTblDyJ4qzrfy6iy 8Q/dLygsL45xfVIugiv328zdQD5E+JBwMCG2g1FQEo3aIx9v2x1uy6hjVtQoKw451xY3 NkRg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aAnTSQqU1VV8/LAdEfVrJEsQa1LfRc0ZTfKJ1dL//1Q=; b=klm368bTuQTpaNqfrD4nN2awfpb+I1X/cQtUETnkJucQe6lPY8n3kl2QxEfqVZ7esI AwbPi0m4gh36sItPhXwyHS+gqlX9eqoGzO/MIxuKd65n0Qyr00GxsHdN/cjvnQZWE3+v O+n4RtxMM5769rKWf6Itj18cA/H1aPjgp4Tn+hzAW9pKA0Hqe+KojInJ1XhfCiHx9uT3 ZX8q0nyoVTo5MG+a46Nq4RdQIUzwuBL/QsPVaCO0k4jzwsE1vOAjCIgd5ckZEB/Dompi QmlJKITeJ8ue/bbvhOWyD53hwqw65HdW/712PHg9MojcVheFNM48vqfiTctHF9azDktl ZNuA==
X-Gm-Message-State: ANhLgQ3ucgGgUeBm2xlWaPDUGh+5O2kgkeFHXlTnbsMx/qp6AVIC+cO0 cNRBDq2Zlp0g/sY2iKU+n6sxHpi7YzRINyOv9LFdwxTu
X-Google-Smtp-Source: ADFU+vtrY/YyteOer61wKkIvzcM1/yIF3flb+GXvob+OI/QUj4Yv2aj4g0HkLcu8Oj8ADm7cltzU6FkF4LDDjsACHfY=
X-Received: by 2002:a05:600c:224d:: with SMTP id a13mr164077wmm.53.1585181788203; Wed, 25 Mar 2020 17:16:28 -0700 (PDT)
MIME-Version: 1.0
References: <260dfc2f-8399-483e-859d-08f92821c823@www.fastmail.com> <CANjwSimZAkAC0JJBjUjZr4k0514QRqDxBReOkq_AGTeGJ2OTzQ@mail.gmail.com> <CANjwSiniWmO+pTfFOdxW9tasy_eQiUiGwWvTsWF2KGR8yGtXqA@mail.gmail.com>
In-Reply-To: <CANjwSiniWmO+pTfFOdxW9tasy_eQiUiGwWvTsWF2KGR8yGtXqA@mail.gmail.com>
From: Devin Mullins <twifkak@google.com>
Date: Wed, 25 Mar 2020 17:16:02 -0700
Message-ID: <CANjwSikzX2fBejL2VbMQqmNHgZn-tPHFKNOZt46UM7+snMjO0A@mail.gmail.com>
To: Martin Thomson <mt@lowentropy.net>
Cc: wpack@ietf.org
Content-Type: multipart/alternative; boundary="000000000000645ffa05a1b6e555"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/vMseijstxuFasTN2pQPfdb9i0TI>
Subject: Re: [Wpack] On double-hashing (was: Re: About content-based origins)
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Mar 2020 00:16:35 -0000
Augh, I forgot that 30x responses change the attested URL. Ignore my comments about CNAME, I suppose. On Wed, Mar 25, 2020 at 4:03 PM Devin Mullins <twifkak@google.com> wrote: > In light of my comment about racing the Sec-CO request, a colleague > pointed out a potential hole in the double-hash scheme. There may be flaws > in below; I haven't thought about it too much. <trite tone=apologetic>It > may be worth more clearly defining the threat model.</trite> > > The publisher could delay responding to the Sec-CO request. In the > interim, the page could send its own hash to > //publisher.example/this-is-my-hash. Then the publisher would respond with > the preimage sent by the bundle. This requires: > > A way for the bundle to get its own hash. Ideas: > > 1. document.location -- we could fix the ni scheme to be based on the > double-hash > 2. a fetch of distributor.example/what-is-the-hash-of-referer > 3. compute client-side based on document.body, in which the publisher > could also include an encoding of the headers > 4. somebody computes the fixmanifold of a hash function -- is that a > thing? i can't math [1] > > A way for the publisher to associate the this-is-my-hash request with the > Sec-CO request. Ideas: > > 1. IP+timestamp > 2. a token as a query param to both requests > > This depends on the publisher server to have an ability that most don't > have, but it wouldn't be hard to purpose-build a server that only handled > this-is-my-hash and Sec-CO requests. By segregating its workload from the > other content requests, it would be easier to scale -- its need for > session-stickiness or regional storage wouldn't affect latency of the > edge-serving of normal content. > > Overall, this still requires more effort than responding `Sec-CO: yes`, > but possibly less than a distributor/publisher backchannel. Also, possibly > immune to IP privacy depending on above. > > I hope first that my above napkin sketch is wrong, and secondly that the > fix for this isn't deferral of all requests until post-transfer. My guess > is that would harm performance of the instant navigation use case. > > One possible mitigation is deferral of all on-origin requests. Then there > needs to be a backchannel between the this-is-my-hash origin and the state > transfer origin. This is maybe vaguely isomorphic to the > distributor/publisher backchannel? Not really, since subdomains. The > publisher could easily CNAME sec-co.publisher.example to > sec-co.distributor.example. > > Actually, I wonder if CNAME is an issue regardless of everything above. > https://tools.ietf.org/html/draft-thomson-wpack-content-origin-00#section-5.1 > states the UA must follow redirects. The publisher could easily encode a > redirect from publisher.example/foo to sec-co.publisher.example/foo, and > then CNAME to the distributor, and let the distributor respond with its > known preimage. > > [1] "fixmanifold" being vaguely an extension of a fixpoint. A function > f(x) s.t. for all x, hash(concat(x, f(x)) == f(x). Then the bundle could > encode its own hash. > > On Tue, Mar 24, 2020 at 11:35 PM Devin Mullins <twifkak@google.com> wrote: > >> Hi Martin, >> >> Thanks for the engineering work on this. For others, I've already >> reviewed this, had some chats with others on the AMP team, and had a few >> back-and-forths with Martin (he's been very gracious in engaging his time, >> thank you). I'm trying to estimate the feasibility of this, both for the >> AMP use-case, and for the non-AMP use-case [1], which we hope to see >> flourish to the extent that publishers and users desire it, and only >> impeded by implementation costs where necessary for the common good. >> >> With the caveat that this is very early feedback, the high-level bit is >> that, with some modifications, it seems mostly feasible for AMP. However, >> there are two main downsides: >> 1. worse UX on sites that run A/B experiments >> 2. technical constraints make this harder for publishers to adopt on >> non-AMP, as compared to SXG >> >> A lot of details follow. I suggest to rename the subject if you reply to >> a detail below, in order to reduce cross-talk and make browsing the archive >> easier. >> >> Details: >> >> We would need the display URL to be the attested URL even before >> transfer. I suspect a flash in the URL bar would be frustrating for both >> users and publishers. Jeffrey had proposed using signatures for this. I'd >> suppose UAs could choose to render this with a "may be stale" indicator. >> >> In terms of publisher implementation feasibility, I think the mostly >> likely implementation would be for a publisher to produce a >> Sec-Content-Origin (Sec-CO) response for the current version of the >> resource iff it matches the Sec-CO request. I suspect that keeping state of >> past versions of the resource would be infeasible, especially as it >> couldn't be done purely at the edge; it would require something like a >> region-wide cache. The value of the above stateless implementation would be >> inversely proportional to how often the resource changes (e.g. "transient >> content" [2] such as Related Articles). I think this would be a workable >> constraint for publishers wishing to publish such bundles, but felt it's >> worth noting, nonetheless. >> >> Re: fallback behavior on failed state transfer, distributors would need a >> way to monitor failure rates. This serves two purposes: >> 1. Automatically detecting errors in the distribution pipeline. This is >> the same purpose served by >> https://wicg.github.io/webpackage/loading.html#signed-exchange-report. >> 2. Verifying that, for some fraction of navigations, bundles are >> meeting AMP's stated UX goals (by using the verified content in the bundle >> and not potentially arbitrary content after the redirect). This is a >> dynamic equivalent of what can be done mostly statically with SXG, because >> the distributor can run the same algorithm as the UA, modulo client/server >> skew in clocks and root stores. I think this is a trade-off AMP could make. >> The alternative is that the fallback behavior keeps the content-based >> origin (CBO). >> >> There is a possible vector for user ID transfer from distributor to >> publisher; filed as >> https://github.com/martinthomson/wpack-content/issues/1. >> >> This limits the ability to run session-sticky experiments (in >> https://amp.dev/documentation/components/amp-experiment/ and many >> non-AMP frameworks). The publisher has the following options: >> 1. Delay rendering content under experiment until after state transfer. >> This hurts UX for the sake of experimentation, creating a trade-off that >> otherwise doesn't exist on origin (modulo ease of implementation). >> 2. Generate a client-side UUID before state transfer, and join it with >> the pre-existing user ID after transfer. This means the session could not >> include CBO pageviews anywhere except in the first pageview. >> 3. Generate variants of the bundle for each experimental state. The >> distributor chooses which bundle to send to the user (based on what? not >> sure). For a page with M independent experiments each with N arms, that's >> N^M variants. For large enough M and N, this is likely both infeasible for >> publishers and martinthomson/wpack-content#1. (But I need some guidance >> from experts on typical ranges for M and N.) >> >> Perhaps this is a need the UA could address. A straw proposal -- a >> selectExperimentArm() API that: >> 1. only exposes as many bits as the UA deems OK, and >> 2. exposes different bits to different attested origins, so they can't >> be joined >> On the one hand, perhaps this is too much scope for wpack. On the other, >> it addresses a problem that is somewhat particular to CBOs. >> >> This impedes content management (e.g. dialogs for GDPR and CCPA), >> compared to SXG. The publisher has a few choices: >> 1. Delay rendering the dialog under after transfer. I think this would >> cause layout shift (https://web.dev/cls/). >> 2. Render the dialog, and risk discovering that it's already been >> acknowledged. At that point: >> a. Hide it, causing layout shift. >> b. Leave it there. This is way outside my domain expertise, but ISTR >> there are some rules or at least best practices wrt when it's okay to >> re-show an opt-out dialog. >> One of these options may be an okay trade-off; I'm not sure. >> >> Analytics providers (via >> https://amp.dev/documentation/components/amp-analytics/ and many non-AMP >> libraries) would have to do one of two things: >> 1. Delay pingback until after transfer. This risks a lower fraction of >> successful pingbacks, as e.g. the user might close the tab before transfer. >> 2. Generate a client-side UUID, send pingback before transfer, and send >> a follow-up after transfer to join with pre-existing user ID. This captures >> those otherwise lost pingbacks pseudonymously (anonymously?), though it >> requires server-side work on the pingback endpoint. >> >> For the above client-side mitigations, two feature requests seem >> necessary: >> 1. Store the transfer metadata (e.g. indexedDB renames by time) >> somewhere more permanent. For various reasons the event handler may not >> fire or finish, and it would be good to later scan for stragglers to merge >> or expunge. >> 2. DOM API to wait for transfer to complete. >> >> Some subresources are ACLs by Origin header (e.g. paid fonts). Likely >> mitigation is to defer loading until transfer, with a timeout whose >> duration depends on the publisher's font-display preference. (Martin >> suggests an optimization where the encrypted subresource is loaded early, >> and only the decryption key after transfer.) >> >> Last but not least, I am concerned that this is a bar that is >> disproportionately difficult for non-AMP publishers to meet. The above >> client-side mitigations are possible, but quite a bit of work. >> 1. For frameworks like AMP that strongly encourage publishers to use a >> rolling release, it is possible to upgrade many use-cases in the wild with >> minimal publisher effort. >> 2. For other frameworks, it may be possible to provide this support, >> but the requirement to upgrade versions may limit applicability. Because >> upgrades occur less often, they often require more work. >> 3. For custom JS that manages state, authors will need to modify it to >> move or copy state into indexedDB, and handle transfer events and merge >> conflicts. >> >> From a "minimal publisher effort" perspective, I am especially interested >> in making it possible to have as close to a turnkey solution as is >> reasonable, for instance at the level of the CDN or CMS. If such a solution >> exists, sites would likely contain a hybrid of all three cases above, and >> thus want to opt into such support incrementally. How should a CDN >> determine which pages to provide Sec-CO responses for? >> 1. URL patterns are the simplest answer, but I suspect hard for >> publishers to create and maintain with minimal false negatives and >> positives. >> 2. Would it be helpful to thread an "I support state transfer" >> annotation for some portion of the build journey, from individual JS >> function all the way to bundle? Would folks use this? >> >> I don't have a good solution to the issue of non-AMP developer cost, and >> thus fear this is a solution that would feature a disproportionate amount >> of AMP pages compared to the background distribution of the web. But I'm >> hopeful that other JS framework developers can chime in with respect to >> feasibility of support for CBO and state transfer. >> >> I'm leaving out some of Martin's replies to the above; no offense >> intended. It took me long enough just to summarize the above, and I'd >> better send it early enough so at least a few people have time to read it >> before the meeting. >> >> Thanks, >> Devin >> >> [1] >> https://blog.amp.dev/2019/05/22/privacy-preserving-instant-loading-for-all-web-content/ >> [2] >> http://www.seobythesea.com/2011/12/how-google-might-identify-transient-content-on-webpages/ >> >> On Mon, Mar 23, 2020 at 5:35 PM Martin Thomson <mt@lowentropy.net> wrote: >> >>> Ted's note prompted me to send a much-belated announcement (sorry folks, >>> I forgot). >>> >>> The draft is here: >>> https://tools.ietf.org/html/draft-thomson-wpack-content-origin-00 >>> >>> A nicer version here: >>> >>> https://martinthomson.github.io/wpack-content/draft-thomson-wpack-content-origin.html >>> >>> This approach could a dramatically different approach to addressing the >>> use cases set out in our charter. >>> >>> In short, this aims to address the core question of how offline content >>> might *ultimately* be attributed to a web origin in a fundamentally >>> different way. There are two key concepts: >>> >>> 1. Content is given its own origin, using a new system for >>> identification. >>> >>> 2. A target origin can "accept" content and state from one of these new >>> origins. >>> >>> There are a lot of details here (read the draft), but the major >>> advantage I see is that you don't have to make an offline decision about >>> authority, and that means you can be offline for much longer (lifting the 7 >>> day limit). >>> >>> What it does have in common with signed exchanges approach is the need >>> for a bundling format, but in its current form it is less dependent on the >>> details of the format. That might allow that to be simpler, but I'm sure >>> that the need to mint new identifier types will more than make up for any >>> slack there. >>> >>> The draft is quite rough. I'm sure that it has the remnants of a few >>> bad ideas still hanging around. Ask questions if you think something is >>> unclear. >>> >>> _______________________________________________ >>> Wpack mailing list >>> Wpack@ietf.org >>> https://www.ietf.org/mailman/listinfo/wpack >>> >>
- [Wpack] About content-based origins Martin Thomson
- Re: [Wpack] About content-based origins Ted Hardie
- Re: [Wpack] About content-based origins Ben Schwartz
- Re: [Wpack] About content-based origins Martin Thomson
- Re: [Wpack] About content-based origins Devin Mullins
- [Wpack] Sec-Content-Origin clarification question… Ted Hardie
- Re: [Wpack] Sec-Content-Origin clarification ques… Devin Mullins
- Re: [Wpack] Sec-Content-Origin clarification ques… Jeffrey Yasskin
- [Wpack] On double-hashing (was: Re: About content… Devin Mullins
- Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
- Re: [Wpack] On double-hashing (was: Re: About con… Martin Thomson
- Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
- Re: [Wpack] On double-hashing (was: Re: About con… Martin Thomson
- Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
- Re: [Wpack] On double-hashing (was: Re: About con… Martin Thomson
- Re: [Wpack] About content-based origins Martin Thomson
- Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
- Re: [Wpack] About content-based origins Devin Mullins
- Re: [Wpack] About content-based origins Martin Thomson