Re: [Wpack] On double-hashing (was: Re: About content-based origins)

Martin Thomson <mt@lowentropy.net> Mon, 30 March 2020 04:28 UTC

Return-Path: <mt@lowentropy.net>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1AA473A0C08 for <wpack@ietfa.amsl.com>; Sun, 29 Mar 2020 21:28:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.2
X-Spam-Level:
X-Spam-Status: No, score=-0.2 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b=gLqAH0Rm; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=E3Xcf5b8
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sgLGdcQXx5PD for <wpack@ietfa.amsl.com>; Sun, 29 Mar 2020 21:28:33 -0700 (PDT)
Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com [64.147.123.21]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 410733A0C07 for <wpack@ietf.org>; Sun, 29 Mar 2020 21:28:33 -0700 (PDT)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.west.internal (Postfix) with ESMTP id 482B778B; Mon, 30 Mar 2020 00:28:32 -0400 (EDT)
Received: from imap2 ([10.202.2.52]) by compute2.internal (MEProxy); Mon, 30 Mar 2020 00:28:32 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=mime-version:message-id:in-reply-to:references:date:from:to :cc:subject:content-type; s=fm1; bh=iTYBtYqGA+eq2mxkRl2EQ2/QTtZQ ELulk1R7EMdcz4c=; b=gLqAH0Rm8wnsdkRbCJDx49M+2dsTPhStUTLx45qUorhp o+KcH0+T/12qES7BmjQO4ZvW8QBFkUhlVRseba4W3SQXEvGiplov011WdXw+iYKL dtbEicJdGXDtnxcROCjbMGuxwMTa2uC0zjkeq1oHJgVUAQcqD5hSJLI05ereiByB Tf2JvAmVaV713njjO0pYI0c37OFIU3wsDw/FPtZMQ9DSdroE01L4DXoxZJeKpi16 QjuIIpQn7oSFjsGbxm4d3aoFK9Eg5yJ0BcJukUF5qqZXDQjzWHUcyR+0/UrC9T+r Vu1G0XOUfN06egz39o/w2EX1K37RTY7mXpKHxtCqOQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=iTYBtY qGA+eq2mxkRl2EQ2/QTtZQELulk1R7EMdcz4c=; b=E3Xcf5b8Bf06UfeSvfFp0M SGrtV2zhu9zD4SRNgS613B5DOjxtOgw/fB+BB5AMG3cJ63arZo6EYs+2MMliVeNu FaCupzALN9OnEemPG4eCARPMg29pW/SwcMLFDpDTqkOpgQ61Bm+MDFKfS/2sCxl9 PxaKrhH3e2uil4g95bO2EODYmkwGxgJRcM1yT42XoxSARJYQUAKyw+nLXqQ53Wgd 1Sem1YU5f/ki1zmoXIIODK52N+h0aIX6hkvcnNfVtQ7F2ppaQ7Otf3sny197LEpN vntW7gzh8dG7QAupAc4078Vbew+p3uBD1CLq8ct3Kz9ZHpo4U2aQxuOcsWzxJuSA ==
X-ME-Sender: <xms:b3WBXpoBtxFTGxDys5nZpGmC4V2bfhdYc8QmkRNywy5Jo1WSGNN3WQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedugedrudeigedgkeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsehttd ertderredtnecuhfhrohhmpedfofgrrhhtihhnucfvhhhomhhsohhnfdcuoehmtheslhho figvnhhtrhhophihrdhnvghtqeenucffohhmrghinhepghhithhhuhgsrdgtohhmnecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepmhhtsehlohif vghnthhrohhphidrnhgvth
X-ME-Proxy: <xmx:b3WBXlYWZE8FdacbVq-5gLwYmdhWCLl3wK3CuGNl9Gplp8KuX6i7CQ> <xmx:b3WBXitH2NxhjYSUtfCDGIhXqaNGSHDBfL-i4AeqoTdK3rsh8B9bQQ> <xmx:b3WBXosETVn1S6tmJIDf-QtQNIdhZZIH9xnwONJICKLLPY6eMipc8w> <xmx:b3WBXr7NDbJw6fh4bv6yadmmvv8GxuumotfAeCTm-IE3i19sDpyGCg>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 82146E00C2; Mon, 30 Mar 2020 00:28:31 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.1.7-1021-g152deaf-fmstable-20200319v1
Mime-Version: 1.0
Message-Id: <0ae3f1b1-7133-4d12-bf6c-a1ee2c257218@www.fastmail.com>
In-Reply-To: <CANjwSikybC7tnkWJVYCGcE=mc9ScM5oFBP5HWjwtd8+-e1EPFg@mail.gmail.com>
References: <260dfc2f-8399-483e-859d-08f92821c823@www.fastmail.com> <CANjwSimZAkAC0JJBjUjZr4k0514QRqDxBReOkq_AGTeGJ2OTzQ@mail.gmail.com> <CANjwSiniWmO+pTfFOdxW9tasy_eQiUiGwWvTsWF2KGR8yGtXqA@mail.gmail.com> <32395446-c14e-4bca-9c09-4804934c487b@www.fastmail.com> <CANjwSikybC7tnkWJVYCGcE=mc9ScM5oFBP5HWjwtd8+-e1EPFg@mail.gmail.com>
Date: Mon, 30 Mar 2020 15:28:11 +1100
From: Martin Thomson <mt@lowentropy.net>
To: Devin Mullins <twifkak@google.com>
Cc: wpack@ietf.org
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/zVsugw_2onXTfhOi0MKrc6ch4dE>
Subject: Re: [Wpack] On double-hashing (was: Re: About content-based origins)
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Mar 2020 04:28:35 -0000

On Sat, Mar 28, 2020, at 07:06, Devin Mullins wrote:
> Ah, good to know. To clarify: Am I right in assuming my above proposed 
> attack is somewhere on the spectrum of 
> https://github.com/WICG/webpackage/pull/424 ? That is, it requires 
> publisher server-side work, but is a potential end-run around 
> mitigations to backchannel sharing such as per-origin IP obfuscation 
> and fingerprint entropy reduction. If so, then what motivates 
> https://github.com/martinthomson/wpack-content/issues/1 ?

This particular threat model is a really difficult one for me to get my head around properly, I have to admit.

I think that the key mechanism that this assumes we need to look at is navigation.  This appears to aim to protect is the transfer of information from the linking site to the target of a link.  In this case, from a distributor to a publisher.

Today, navigation carries information in essentially two obvious places: the URL and the Referer header (I think that's all).  Web packaging potentially adds another path whereby information can travel.  Namely the content of the bundle.

Assuming that a distributor is given license to generate bundles by the target origin, the linking party can pack any state it chooses into a bundle[1].  That information is then available to the target origin via requests to the resources in the bundle, or via requests to state established by the bundle.  It seems fairly obvious that the amount of state that is transferred could be considerable.

How the target origin receives this information follows similar lines to your email.  But in practice, it doesn't need to be that fancy.  If we are talking about hashes, the total amount of data involved isn't that much.  I don't think that a site would be particularly put off by the cost of transferring hashes to the sites that they link to.  32 bytes per user per target isn't that much data to save or transfer.  You can then rely on post-facto linkage. I'm sure that RTB systems transfer orders of magnitudes more than this as part of their normal operation.

(As an aside, this isn't exactly equivalent to giving the distributor control over your private key unless you exercise the options in footnote [1], but it comes close.  In any case, I don't believe that there is any material difference between this option and having the distributor ask the publisher to sign bundles containing the data on a per-request basis, aside from perhaps having better performance characteristics at bundle creation time.)

Let's say that we had a system that was able to detect and maybe block information transfer in URL or Referer[2].  Is the contention that this system would be unable to detect this sort of information transfer when it was general purpose content?

Separately, I'm about your stated desire to provide diversification of content for the purposes of experimentation here.  It seems to be in direct tension with this privacy requirement.  I'm interested in knowing how you might choose to trade the two off; I don't have any reference here as we haven't spent a whole lot of time looking at this specific problem.

Cheers,
Martin

[1] There is a variant of this that requires only a tiny bit more information to travel between distributor and publisher, but allows the publisher to more closely audit what is being shared.

[2] I don't know if this is possible in general, but I remain open to being convinced otherwise.  The best defense I can think of for this sort of system requires no such detection.  Instead, you try to prevent individual targeting by confirming that others have the same content. However, those designs are trivially vulnerable to sybil attacks.

... The webex calls we were on this week had a URL that was ~422 characters, most of it opaque.  Maybe that was critical information, I don't know.  But I am not sure it would be possible to tell if this contained information that was designed to de-anonymize someone.  Similarly, with only a fifth of the characters, the (up to) 244 bits of random gunk in a Google docs URL might all be used finding the document, but I could imagine that far fewer than this would be needed if someone wanted some bits spare.