Privacy and IPv6 Addresses

 == Introduction ==

IPv6 was designed to improve upon IPv4 in many respects, and
mechanisms for address assignment were one such area for improvement.
In addition to static address assignment and DHCP, stateless
autoconfiguration was developed as a less intensive, fate-shared means
of performing address assignment. With stateless autoconifguration,
routers advertise on-link prefixes and hosts generate their own
suffixes to complete their addresses. Over the years, many suffix
generation mechanisms have been defined:

-- Manual configuration 
-- Link-layer address-derived: 
   -- MAC address (RFC 1972/2464) 
   -- IPv4 address (many RFCs) 
   -- IPv4 address + port (RFC 4380) 
-- Random (RFC 3041/4941) 
-- Hash of public key (RFC 3972)

Deriving the suffix from a globally unique MAC address (RFC 2462) was
one of the earliest mechanisms developed. Although a number of privacy
problems associated with this approach were later identified, major
IPv6 compliance testing suites required (and still require)
implementations to support MAC-derived suffixes in order to be
approved as compliant. Implementations that fail to support
MAC-derived suffixes are therefore largely not eligible to receive the
benefits of compliance certification (e.g., use of the IPv6 logo,
eligibility for government contracts, etc.).

 == Problems with MAC-derived addresses ==

A number of privacy issues related to the MAC-derived mechanism were
identified after its standardization. MAC-derived suffixes are unique,
allowing users to be tracked as they move. As RFC 3041/4941 explain:

"The use of a non-changing interface identifier to form addresses is a
specific instance of the more general case where a constant identifier
is reused over an extended period of time and in multiple independent
activities. Anytime the same identifier is used in multiple contexts,
it becomes possible for that identifier to be used to correlate
seemingly unrelated activity. For example, a network sniffer placed
strategically on a link across which all traffic to/from a particular
host crosses could keep track of which destinations a node
communicated with and at what times. Such information can in some
cases be used to infer things, such as what hours an employee was
active, when someone is at home, etc. ... The use of a constant
identifier within an address is of special concern because addresses
are a fundamental requirement of communication and cannot easily be
hidden from eavesdroppers and other parties. Even when higher layers
encrypt their payloads, addresses in packet headers appear in the
clear. Consequently, if a mobile host (e.g., laptop) accessed the
network from several different locations, an eavesdropper might be
able to track the movement of that mobile host from place to place,
even if the upper layer payloads were encrypted."

RFC 3041 reflects some notion of what the threats against MAC-derived
suffixes could be like -- the network sniffer mentioned above, the
possibility for web sites (alone or together with other sites or
network-based entities) to correlate user activity across visits --
but it did not include any in-depth threat modeling. (Notably, work on
RFC 3041 took place around the time when Intel's inclusion of a
unique, retrievable serial number in its Pentium III processors was
creating some privacy controversy in the press and among regulators.)
Attackers were assumed to not be on the same LAN as the user, which
precludes certain kinds of adversaries (e.g., governments).

The tracking made possible by MAC-derived suffixes is of particular
concern because MAC addresses are much more permanent than, say, DHCP
leases. MAC addresses tend to last roughly the lifetime of a device's
network interface, allowing tracking on the order of years, compared
to days for DHCP.

The structure of the MAC address also makes it easier for an attacker
to identify an individual device than if the suffix were comprised of
purely random bits and it creates a potential security vulnerability
not present in IPv4 addresses. MAC addresses contain 24-bit
organizationally unique identifiers (OUIs) that identify vendors or
manufacturers. Because popular OUIs are well known and easy to find,
their presence in IPv6 address suffixes reduces the number of variable
bits in a MAC-derived address, making it easier for an attacker to
scan for individual addresses. An attacker with knowledge of
device-specific vulnerabilities could also use the presence of known
OUIs as a way to quickly scan for particular devices to exploit. These
problems only surfaced subsequent to the realization of the tracking
issues associated with MAC addresses. Had they been identified during
the development of RFC 1972, using a hash of the MAC address instead
of the address itself to derive the suffix could have provided a
simple way to avoid these problems.

 == Addressing the problems ==

RFC 3041 (later obsoleted by RFC 4941) sought to address the problems
with user tracking by defining "temporary addresses" (commonly
referred to as "privacy addresses") for outbound connections.
Temporary addresses are meant to supplement the other suffixes that a
device might use, not to replace them. They are randomly generated and
change daily by default. The idea is for temporary addresses to be
used for things like web browsing while maintaining the ability to use
a "public" address when more address stability is required (e.g., in
DNS advertisements).

For a temporary address to provide protection against tracking and
re-identification, it cannot be mixed together with a public address
for the same device, otherwise the temporary address could be
correlated to the persistent public address. This aspect has sometimes
been overlooked in applications' use of multiple addresses generated
in different ways.

RFC 3484 specifies that public addresses be used for outbound
connections unless an application explicitly prefers temporary
addresses. The default preference for public addresses was established
to avoid applications potentially failing due to the short lifetime of
temporary addresses or the possibility of a reverse look-up failure or
error. However, RFC 3484 allowed that "implementations for which
privacy considerations outweigh these application compatibility
concerns MAY reverse the sense of this rule and by default prefer
temporary addresses over public addresses." Some major implementations
(e.g., Windows) default to temporary addresses for outbound
connections, but the default preference against using temporary
addresses remains as the normative standard.

The address-scanning problem can be mitigated by using
randomly-generated suffixes for all addresses, whether they are
public/persistent or temporary. These addresses can still be used for
tracking, but they remove the OUI-associated vulnerability and make
address scanning for IPv6 addresses no easier than for IPv4 addresses.
Windows uses random suffixes for all addresses, but many other
implementations do not.

 == Other IPv6 privacy issues ==

Since IPv6 subnets have unique prefixes, they reveal some information
about the location of the subnet, just as IPv4 addresses do. However,
routing IPv6 traffic at a location where IPv4 NAT is in use can reveal
far more locational granularity than IPv6 alone. Hiding this
information is one motivation for usng NAT in IPv6 (see RFC 5902
section 2.4).

Teredo (RFC 4380) specifies a means to generate an IPv6 address from
the underlying IPv4 address and port, leaving many other bits set to
zero. This makes it relatively easy for an attacker to scan for IPv6
addresses by guessing the Teredo client's IPv4 address and port (which
for many NATs is not randomized). After this vulnerability was pointed
out in a number of security analyses, some implementations began
deviating from the standard by including 12 random bits in place of
zero bits. This modification was later standardized in RFC 5991.

 == Lessons learned ==

1. It is more difficult to retrofit privacy protection onto an
already-established standard than it is to build the protection in
initially. Once the MAC-derived suffix mechanism was standardized, it
was perceived to be required and therefore became part of compliance
suites, which continue to compel implementations to support it many
years after the associated vulnerabilities have been identified. Once
vulnerabilities are discovered, implementors that develop fixes may be
put in the position of choosing to deploy fixed-but-non-standard
implementations while they wait for standardized solutions to catch
up.

2. Even with the vulnerabilities identified, a comprehensive privacy
threat model was never developed (which seems to be a recurring theme
with older protocol development efforts). This may be one reason why
the same underlying vulnerabilities appear again and again (e.g.,
address scanning).

3. Notions of the implications of using particular identifiers that
are contemporaneous with standards development will inform how those
identifiers are used within standards work. In the late 1990's when
IPv6 stateless autoconfiguration was being developed, notions of what
constituted "personally identifiable information" (PII) were limited
to identifiers such as name, address, and telephone number. If stable
identifiers like MAC addresses had been more widely considered to be
PII, or if the privacy implications of persistent re-use of stable
identifiers had been better understood, the temporary addressing
mechanism would have been more likely to have emerged sooner and with
a stronger normative default. This is particularly important given
current debates about whether it is even possible to continue to draw
strict lines between PII and non-PII.

4. Even when "private" identifiers are available, combining or
correlating them with persistent identifiers may often happen by
accident. Ensuring that identifier silos are maintained often needs to
happen outside of the standards process and can be difficult to
enforce.

5. Implementability/usability can trump privacy. Temporary addresses
are not recommended by default because of the risks that they pose to
applications that rely on stable addresses and/or reverse look-up.