Privacy and IPv6 Addresses == Introduction == IPv6 was designed to improve upon IPv4 in many respects, and mechanisms for address assignment were one such area for improvement. In addition to static address assignment and DHCP, stateless autoconfiguration was developed as a less intensive, fate-shared means of performing address assignment. With stateless autoconifguration, routers advertise on-link prefixes and hosts generate their own suffixes to complete their addresses. Over the years, many suffix generation mechanisms have been defined: -- Manual configuration -- Link-layer address-derived: -- MAC address (RFC 1972/2464) -- IPv4 address (many RFCs) -- IPv4 address + port (RFC 4380) -- Random (RFC 3041/4941) -- Hash of public key (RFC 3972) Deriving the suffix from a globally unique MAC address (RFC 2462) was one of the earliest mechanisms developed. Although a number of privacy problems associated with this approach were later identified, major IPv6 compliance testing suites required (and still require) implementations to support MAC-derived suffixes in order to be approved as compliant. Implementations that fail to support MAC-derived suffixes are therefore largely not eligible to receive the benefits of compliance certification (e.g., use of the IPv6 logo, eligibility for government contracts, etc.). == Problems with MAC-derived addresses == A number of privacy issues related to the MAC-derived mechanism were identified after its standardization. MAC-derived suffixes are unique, allowing users to be tracked as they move. As RFC 3041/4941 explain: "The use of a non-changing interface identifier to form addresses is a specific instance of the more general case where a constant identifier is reused over an extended period of time and in multiple independent activities. Anytime the same identifier is used in multiple contexts, it becomes possible for that identifier to be used to correlate seemingly unrelated activity. For example, a network sniffer placed strategically on a link across which all traffic to/from a particular host crosses could keep track of which destinations a node communicated with and at what times. Such information can in some cases be used to infer things, such as what hours an employee was active, when someone is at home, etc. ... The use of a constant identifier within an address is of special concern because addresses are a fundamental requirement of communication and cannot easily be hidden from eavesdroppers and other parties. Even when higher layers encrypt their payloads, addresses in packet headers appear in the clear. Consequently, if a mobile host (e.g., laptop) accessed the network from several different locations, an eavesdropper might be able to track the movement of that mobile host from place to place, even if the upper layer payloads were encrypted." RFC 3041 reflects some notion of what the threats against MAC-derived suffixes could be like -- the network sniffer mentioned above, the possibility for web sites (alone or together with other sites or network-based entities) to correlate user activity across visits -- but it did not include any in-depth threat modeling. (Notably, work on RFC 3041 took place around the time when Intel's inclusion of a unique, retrievable serial number in its Pentium III processors was creating some privacy controversy in the press and among regulators.) Attackers were assumed to not be on the same LAN as the user, which precludes certain kinds of adversaries (e.g., governments). The tracking made possible by MAC-derived suffixes is of particular concern because MAC addresses are much more permanent than, say, DHCP leases. MAC addresses tend to last roughly the lifetime of a device's network interface, allowing tracking on the order of years, compared to days for DHCP. The structure of the MAC address also makes it easier for an attacker to identify an individual device than if the suffix were comprised of purely random bits and it creates a potential security vulnerability not present in IPv4 addresses. MAC addresses contain 24-bit organizationally unique identifiers (OUIs) that identify vendors or manufacturers. Because popular OUIs are well known and easy to find, their presence in IPv6 address suffixes reduces the number of variable bits in a MAC-derived address, making it easier for an attacker to scan for individual addresses. An attacker with knowledge of device-specific vulnerabilities could also use the presence of known OUIs as a way to quickly scan for particular devices to exploit. These problems only surfaced subsequent to the realization of the tracking issues associated with MAC addresses. Had they been identified during the development of RFC 1972, using a hash of the MAC address instead of the address itself to derive the suffix could have provided a simple way to avoid these problems. == Addressing the problems == RFC 3041 (later obsoleted by RFC 4941) sought to address the problems with user tracking by defining "temporary addresses" (commonly referred to as "privacy addresses") for outbound connections. Temporary addresses are meant to supplement the other suffixes that a device might use, not to replace them. They are randomly generated and change daily by default. The idea is for temporary addresses to be used for things like web browsing while maintaining the ability to use a "public" address when more address stability is required (e.g., in DNS advertisements). For a temporary address to provide protection against tracking and re-identification, it cannot be mixed together with a public address for the same device, otherwise the temporary address could be correlated to the persistent public address. This aspect has sometimes been overlooked in applications' use of multiple addresses generated in different ways. RFC 3484 specifies that public addresses be used for outbound connections unless an application explicitly prefers temporary addresses. The default preference for public addresses was established to avoid applications potentially failing due to the short lifetime of temporary addresses or the possibility of a reverse look-up failure or error. However, RFC 3484 allowed that "implementations for which privacy considerations outweigh these application compatibility concerns MAY reverse the sense of this rule and by default prefer temporary addresses over public addresses." Some major implementations (e.g., Windows) default to temporary addresses for outbound connections, but the default preference against using temporary addresses remains as the normative standard. The address-scanning problem can be mitigated by using randomly-generated suffixes for all addresses, whether they are public/persistent or temporary. These addresses can still be used for tracking, but they remove the OUI-associated vulnerability and make address scanning for IPv6 addresses no easier than for IPv4 addresses. Windows uses random suffixes for all addresses, but many other implementations do not. == Other IPv6 privacy issues == Since IPv6 subnets have unique prefixes, they reveal some information about the location of the subnet, just as IPv4 addresses do. However, routing IPv6 traffic at a location where IPv4 NAT is in use can reveal far more locational granularity than IPv6 alone. Hiding this information is one motivation for usng NAT in IPv6 (see RFC 5902 section 2.4). Teredo (RFC 4380) specifies a means to generate an IPv6 address from the underlying IPv4 address and port, leaving many other bits set to zero. This makes it relatively easy for an attacker to scan for IPv6 addresses by guessing the Teredo client's IPv4 address and port (which for many NATs is not randomized). After this vulnerability was pointed out in a number of security analyses, some implementations began deviating from the standard by including 12 random bits in place of zero bits. This modification was later standardized in RFC 5991. == Lessons learned == 1. It is more difficult to retrofit privacy protection onto an already-established standard than it is to build the protection in initially. Once the MAC-derived suffix mechanism was standardized, it was perceived to be required and therefore became part of compliance suites, which continue to compel implementations to support it many years after the associated vulnerabilities have been identified. Once vulnerabilities are discovered, implementors that develop fixes may be put in the position of choosing to deploy fixed-but-non-standard implementations while they wait for standardized solutions to catch up. 2. Even with the vulnerabilities identified, a comprehensive privacy threat model was never developed (which seems to be a recurring theme with older protocol development efforts). This may be one reason why the same underlying vulnerabilities appear again and again (e.g., address scanning). 3. Notions of the implications of using particular identifiers that are contemporaneous with standards development will inform how those identifiers are used within standards work. In the late 1990's when IPv6 stateless autoconfiguration was being developed, notions of what constituted "personally identifiable information" (PII) were limited to identifiers such as name, address, and telephone number. If stable identifiers like MAC addresses had been more widely considered to be PII, or if the privacy implications of persistent re-use of stable identifiers had been better understood, the temporary addressing mechanism would have been more likely to have emerged sooner and with a stronger normative default. This is particularly important given current debates about whether it is even possible to continue to draw strict lines between PII and non-PII. 4. Even when "private" identifiers are available, combining or correlating them with persistent identifiers may often happen by accident. Ensuring that identifier silos are maintained often needs to happen outside of the standards process and can be difficult to enforce. 5. Implementability/usability can trump privacy. Temporary addresses are not recommended by default because of the risks that they pose to applications that rely on stable addresses and/or reverse look-up.