[Model-t] Web Tracking

Eric Rescorla <ekr@rtfm.com> Mon, 17 February 2020 17:32 UTC

Return-Path: <ekr@rtfm.com>
X-Original-To: model-t@ietfa.amsl.com
Delivered-To: model-t@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1E055120855 for <model-t@ietfa.amsl.com>; Mon, 17 Feb 2020 09:32:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rtfm-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uKYy4YUF1Ufj for <model-t@ietfa.amsl.com>; Mon, 17 Feb 2020 09:32:24 -0800 (PST)
Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 317E212084C for <model-t@iab.org>; Mon, 17 Feb 2020 09:32:24 -0800 (PST)
Received: by mail-lf1-x12d.google.com with SMTP id v201so12415620lfa.11 for <model-t@iab.org>; Mon, 17 Feb 2020 09:32:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rtfm-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=elkHk81LxZCGEw3EieKGpfg8riinHToEDeL4vPNzJyY=; b=qOvytPIWZVRfOi57pCIWABXcKn1FpU3ekpezNu8djH7DA3im9PDqhx4Mjs8dcbuBoM I7zGhfH39mMOZ1dSqP9pIDuG5PVPGgMI8B72qJ8sfSTaLFpz4A2A6j346Jb6XmukH73B IvCeBHF7gx5XA5mZirWSsyYSvRDiYhfg8ssIeoB1wltRCNz5naFV6iGxZAcRUSI0s6Qo q3wDviUj6xQ4FgDpuVTA9gEim4VpxdSXMXOA+RFhLLCxYiRZVjXIGQBDdUdU3845bWI3 VLLcVrOlApVDPjQ3q5JfiDwKhIz3unmxpCPCBb7a+jdsW1SlxY2TRpPDb/8ro5pNG176 FNrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=elkHk81LxZCGEw3EieKGpfg8riinHToEDeL4vPNzJyY=; b=mabxo0wyH1/rwnuKg9txEgdvIMPjFKeKNx8maBHeAhku7g0I2rGw340oBi30R1+WTU vOTbjsJoL5Rcm1F4CpXRmX1U+UGaQw7B14ZqXmiSKzIplScnxk8N+5WghduYzDvza8dv e6L6dqsM2c0CDW/btU+aPkEOwsuZjm6GMPstmq3k75ajiHk17X4XlCPVwwJPRNR0K7Tx 2VrZqLj3GEH27qsoVNjRFRH56GvwsmHzlP0X/jO50FZEeazRj0Bwn0lPaFmD0AYc66DN yMxlGuC4e31FiQCE1NY05v5/fQqPpiiQsrF17cevIoS46K5eEgWNQmo0XRN3S3ww4fIp rF0A==
X-Gm-Message-State: APjAAAX+qFqVXNM6lCE8vnw3imhKichXGllign2XyA4/wt8KAPbsgt7P IY9YgDFl8/pW99g0YMrfaMDI3MbPgcWxzuqnnZ3wrM+g/DM=
X-Google-Smtp-Source: APXvYqzUthd93pwC2mXgI7la2rnoAf4cZ1tKTVRcinfipDd4AXtefWH2svWCK0CKNUIL66YUTyJw8Eiv/xOYt1/cwyc=
X-Received: by 2002:ac2:53b9:: with SMTP id j25mr8332315lfh.140.1581960741614; Mon, 17 Feb 2020 09:32:21 -0800 (PST)
MIME-Version: 1.0
From: Eric Rescorla <ekr@rtfm.com>
Date: Mon, 17 Feb 2020 09:31:45 -0800
Message-ID: <CABcZeBN-HNe-j2japnCT5HR49__mxR7jiFAJ4NO27CdpuvirXw@mail.gmail.com>
To: model-t@iab.org
Content-Type: multipart/alternative; boundary="0000000000000d85c2059ec8f0c8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/model-t/qRU8naXGsWHPZgHutSdfkmwnkK0>
Subject: [Model-t] Web Tracking
X-BeenThere: model-t@iab.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussions of changes in Internet deployment patterns and their impact on the Internet threat model <model-t.iab.org>
List-Unsubscribe: <https://www.iab.org/mailman/options/model-t>, <mailto:model-t-request@iab.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/model-t/>
List-Post: <mailto:model-t@iab.org>
List-Help: <mailto:model-t-request@iab.org?subject=help>
List-Subscribe: <https://www.iab.org/mailman/listinfo/model-t>, <mailto:model-t-request@iab.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Feb 2020 17:32:27 -0000

And here is our text on Web Tracking

One of the biggest threats to user privacy on the Web is ubiquitous
third party tracking. This takes advantage of HTTP Cookies [RFC6265]
in what is called a "third party context". The basic idea here is that
whenever a resource is loaded from a server, that server can include a
cookie which will be sent back to the server on future loads. This
includes situations where the resource is loaded as a "subresource" on
a page (e.g., an image, a piece of JavaScript, etc.). In addition,
those loads include a Referer header which contains the top-level page
that the subresource is being loaded from.

The combination of these features makes it fairly straightforward to
build a system which tracks the user across the Web. The way this
works is that the tracker convinces a number of content sites ("first
parties") to include a subresource from the tracker site.  Sometimes
this subresource also performs some other function such as displaying
an ad or providing analytics to the first party site, but sometimes it
is simply a tracker. Then, whenever the user visits one of those
content sites, the tracker receives the pair of (1) the Referer header
and (2) the cookie, which is the same for each browser client
regardless of which site the tracker is on. Together these allow the
tracker to build up a picture of the user's browsing history. This
can then be used for various purposes, but is most commonly used
for ad targeting.

This capability itself constitutes a major threat to user privacy.
However, there are a number of practices which increase the threat:

* Cookie Syncing: any given tracker may not be on all sites,
  which gives the tracker incomplete coverage. However, trackers
  often collude (a practice called "cookie syncing") to bridge
  different tracking cookies.

* Identifier correlation: sometimes trackers will be embedded
  on a site which collects a user identifier (e.g., an e-mail
  address), in which case the site can inform the tracker of the
  address which allows the tracker to tie it to the cookie.

* Fingerprinting: Cookies are a form of explicit state, which allows
  browsers to blook or erase them. However, it is also possible to use
  characteristics of the browser to track the user.  For instance,
  features such as User-Agent string, plugin and font support, screen
  resolution, and timezone can yield a fingerprint that is sometimes
  unique to a single user [0] and which persists beyond cookie
  deletion. Even in cases where this fingerprint is not unique, the
  anonymity set may be sufficiently small that, when coupled with yet
  more data, yields a unique, per-user identifier. Fingerprinting of
  this type is more prevalent on systems and platforms wherein data
  set features are flexible, such as desktops, wherein plugins are
  more commonly in use.  Fingerprinting prevention is an active
  research area; see [1] for more information.


A number of browsers have started adding anti-tracking technologies.
This is a rapidly moving field and so it is difficult to characterize
here, but there are several basic ideas:

* Blocking any communication with known trackers
* Identifying trackers and suppressing their ability to store
  and access cookies and other state.
* "Double keying" in which each third party load on different
  first party sites is treated as a different context, thereby
  isolating cookies and other state, e.g., TLS-layer information.



[0] Gómez-Boix, Alejandro, Pierre Laperdrix, and Benoit Baudry. "Hiding in
the crowd: an analysis of the effectiveness of browser fingerprinting at
large scale." Proceedings of the 2018 world wide web conference. 2018.
[1] https://amiunique.org