Re: [Wpack] On double-hashing (was: Re: About content-based origins)

Martin Thomson <mt@lowentropy.net> Mon, 30 March 2020 04:28 UTC

User-Agent: Cyrus-JMAP/3.1.7-1021-g152deaf-fmstable-20200319v1
Mime-Version: 1.0
Message-Id: <0ae3f1b1-7133-4d12-bf6c-a1ee2c257218@www.fastmail.com>
In-Reply-To: <CANjwSikybC7tnkWJVYCGcE=mc9ScM5oFBP5HWjwtd8+-e1EPFg@mail.gmail.com>
References: <260dfc2f-8399-483e-859d-08f92821c823@www.fastmail.com> <CANjwSimZAkAC0JJBjUjZr4k0514QRqDxBReOkq_AGTeGJ2OTzQ@mail.gmail.com> <CANjwSiniWmO+pTfFOdxW9tasy_eQiUiGwWvTsWF2KGR8yGtXqA@mail.gmail.com> <32395446-c14e-4bca-9c09-4804934c487b@www.fastmail.com> <CANjwSikybC7tnkWJVYCGcE=mc9ScM5oFBP5HWjwtd8+-e1EPFg@mail.gmail.com>
Date: Mon, 30 Mar 2020 15:28:11 +1100
From: Martin Thomson <mt@lowentropy.net>
To: Devin Mullins <twifkak@google.com>
Cc: wpack@ietf.org
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/zVsugw_2onXTfhOi0MKrc6ch4dE>
Subject: Re: [Wpack] On double-hashing (was: Re: About content-based origins)
Precedence: list

On Sat, Mar 28, 2020, at 07:06, Devin Mullins wrote:
> Ah, good to know. To clarify: Am I right in assuming my above proposed 
> attack is somewhere on the spectrum of 
> https://github.com/WICG/webpackage/pull/424 ? That is, it requires 
> publisher server-side work, but is a potential end-run around 
> mitigations to backchannel sharing such as per-origin IP obfuscation 
> and fingerprint entropy reduction. If so, then what motivates 
> https://github.com/martinthomson/wpack-content/issues/1 ?

This particular threat model is a really difficult one for me to get my head around properly, I have to admit.

I think that the key mechanism that this assumes we need to look at is navigation.  This appears to aim to protect is the transfer of information from the linking site to the target of a link.  In this case, from a distributor to a publisher.

Today, navigation carries information in essentially two obvious places: the URL and the Referer header (I think that's all).  Web packaging potentially adds another path whereby information can travel.  Namely the content of the bundle.

Assuming that a distributor is given license to generate bundles by the target origin, the linking party can pack any state it chooses into a bundle[1].  That information is then available to the target origin via requests to the resources in the bundle, or via requests to state established by the bundle.  It seems fairly obvious that the amount of state that is transferred could be considerable.

How the target origin receives this information follows similar lines to your email.  But in practice, it doesn't need to be that fancy.  If we are talking about hashes, the total amount of data involved isn't that much.  I don't think that a site would be particularly put off by the cost of transferring hashes to the sites that they link to.  32 bytes per user per target isn't that much data to save or transfer.  You can then rely on post-facto linkage. I'm sure that RTB systems transfer orders of magnitudes more than this as part of their normal operation.

(As an aside, this isn't exactly equivalent to giving the distributor control over your private key unless you exercise the options in footnote [1], but it comes close.  In any case, I don't believe that there is any material difference between this option and having the distributor ask the publisher to sign bundles containing the data on a per-request basis, aside from perhaps having better performance characteristics at bundle creation time.)

Let's say that we had a system that was able to detect and maybe block information transfer in URL or Referer[2].  Is the contention that this system would be unable to detect this sort of information transfer when it was general purpose content?

Separately, I'm about your stated desire to provide diversification of content for the purposes of experimentation here.  It seems to be in direct tension with this privacy requirement.  I'm interested in knowing how you might choose to trade the two off; I don't have any reference here as we haven't spent a whole lot of time looking at this specific problem.

Cheers,
Martin

[1] There is a variant of this that requires only a tiny bit more information to travel between distributor and publisher, but allows the publisher to more closely audit what is being shared.

[2] I don't know if this is possible in general, but I remain open to being convinced otherwise.  The best defense I can think of for this sort of system requires no such detection.  Instead, you try to prevent individual targeting by confirming that others have the same content. However, those designs are trivially vulnerable to sybil attacks.

... The webex calls we were on this week had a URL that was ~422 characters, most of it opaque.  Maybe that was critical information, I don't know.  But I am not sure it would be possible to tell if this contained information that was designed to de-anonymize someone.  Similarly, with only a fifth of the characters, the (up to) 244 bits of random gunk in a Google docs URL might all be used finding the document, but I could imagine that far fewer than this would be needed if someone wanted some bits spare.

[Wpack] About content-based origins Martin Thomson
Re: [Wpack] About content-based origins Ted Hardie
Re: [Wpack] About content-based origins Ben Schwartz
Re: [Wpack] About content-based origins Martin Thomson
Re: [Wpack] About content-based origins Devin Mullins
[Wpack] Sec-Content-Origin clarification question… Ted Hardie
Re: [Wpack] Sec-Content-Origin clarification ques… Devin Mullins
Re: [Wpack] Sec-Content-Origin clarification ques… Jeffrey Yasskin
[Wpack] On double-hashing (was: Re: About content… Devin Mullins
Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
Re: [Wpack] On double-hashing (was: Re: About con… Martin Thomson
Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
Re: [Wpack] On double-hashing (was: Re: About con… Martin Thomson
Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
Re: [Wpack] On double-hashing (was: Re: About con… Martin Thomson
Re: [Wpack] About content-based origins Martin Thomson
Re: [Wpack] On double-hashing (was: Re: About con… Devin Mullins
Re: [Wpack] About content-based origins Devin Mullins
Re: [Wpack] About content-based origins Martin Thomson