Re: [ietf-privacy] [Int-area] Privacy thoughts re draft-boucadair-intarea-nat-reveal-analysis

Alissa Cooper <acooper@cdt.org> Thu, 08 December 2011 16:43 UTC

Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset="us-ascii"
From: Alissa Cooper <acooper@cdt.org>
In-Reply-To: <6.2.5.6.2.20111208052110.0afbab10@resistor.net>
Date: Thu, 08 Dec 2011 16:43:00 +0000
Content-Transfer-Encoding: quoted-printable
Message-Id: <A29AC308-31B6-4194-A4D4-A7EF4582CFC3@cdt.org>
References: <6D1019B7-4A38-48A0-917F-735BC63132ED@cdt.org> <6.2.5.6.2.20111208052110.0afbab10@resistor.net>
To: SM <sm@resistor.net>
Cc: ietf-privacy@ietf.org
Subject: Re: [ietf-privacy] [Int-area] Privacy thoughts re draft-boucadair-intarea-nat-reveal-analysis
Precedence: list

Hi SM,

Copied my original message below for those on ietf-privacy who are not also subscribed to int-area.

On Dec 8, 2011, at 2:03 PM, SM wrote:

> Hi Alissa,
> At 04:10 08-12-2011, Alissa Cooper wrote:
>> Along these lines, in the sections below I've either suggested new text entirely (1.2) or inserted notes/questions in brackets where I think more text would be helpful.
> 
> I am commenting on the privacy angle.
> 
>> While many individual users are unaware of and uninvolved in decisions about whether their unique IPv4 addresses get revealed when they send data via IP, some users realize privacy benefits associated with IP address sharing, and some may even take steps to ensure that NAT functionality sits between them and the public Internet. IP address sharing makes the actions of all users behind the NAT function unattributable to any single host, creating room for abuse but also providing some identity protection for non-abusive users who wish to transmit data with reduced risk of being uniquely identified.
> 
> The above text infers that there are privacy benefits associated with IP address sharing.  That is somewhat like arguing that NAT provides security; it's an argument that has been going around for years.

If an abusive user can hide his the traceability of his behavior by sitting behind a NAT (which, as far as I can tell, is a major part of the motivation for draft-boucadair-nat-reveal-analysis), I don't see how one could argue that NAT does not provide any privacy protection. No matter how old the argument is, it is clearly worth stating in this document.

> 
> The text uses terms in common use in non-technical literature, e.g. (user) identity protection.  A device identifier such as an IPv4 address is necessary for two endpoints to engage in IPv4 communication.  The question of whether the device is tied to a user is out of scope.  The term "identity protection" in the above text makes it in scope.  It's a path that can cause problems for future work.  I suggest rephrasing the above in terms of identifiers instead of (user) identity.
> 

Feel free to suggest changes to the authors, I was just throwing out some hastily prepared starting points.

Best,
Alissa

> Regards,
> -sm 
> 


Original message:

> From: Alissa Cooper <acooper@cdt.org>
> Date: December 8, 2011 12:10:46 PM GMT
> To: int-area@ietf.org
> Subject: Privacy thoughts re draft-boucadair-intarea-nat-reveal-analysis
> 
> I spent some time reviewing draft-boucadair-intarea-nat-reveal-analysis-04 today. These are my preliminary thoughts about the document with respect to privacy. Note that I'm not a network layer expert so in some cases I raise questions to which there may be obvious answers.
> 
> The primary way that I think the text could be improved would be to make it more specific about 4 properties of the various solutions:
> 
> 1) Which identifiers are candidates for being included in the HOST_ID. 
> Looking just at the first 5 solutions discussed in the text (since the last two are only given a cursory treatment anyway), there are at least 8 different kinds of identifiers mentioned:
> 
> full IPv4 address (IP Option, XFF, Proxy Protocol)
> IPv6 prefix (IP Option, XFF)
> any unique 16-bit value (IP-ID)
> lower 16 bits of IPv4 address (TCP Option)
> VLAN ID (TCP Option)
> VRF ID (TCP Option)
> subscriber ID (TCP Option)
> INET + IPv4 address + TCP source port + TCP dest port (Proxy Protocol)
> 
> I realized that the general goal of all of these solutions -- to disambiguate hosts behind the same public IP -- is the same, but the implications of using these different identifiers are not always the same (more on that in my suggested text below). I also realize that the selection of which identifier to use may be carrier-specific or implementation-specific. Nonethless, I think it would be helpful to be as precise as possible when discussing which identifiers are candidates for being included in each solution proposal.
> 
> 2) Uniqueness of identifiers in HOST_ID
> The document and other documents that it references talk in various places about disambiguation, uniqueness, and global uniqueness, but there is no consistent statement about each solution proposal as to whether the proposal may/should/must/will support identifiers at a certain level of uniqueness. It would be helpful to state this explicitly. I've made some suggestions about this in my suggested text below. Also, if it's possible it would be good to include a recommendation that HOST_IDs be limited to providing local uniqueness rather than global uniqueness where implementers have a choice.
> 
> 3) Refresh rate of HOST_ID
> The reference to the volatility of HOST_ID information in 1.2 is good, but again I would suggest adding solution-specific text about this where each solution is discussed.
> 
> 4) Interactions between multiple solutions
> Section 3.2.2 makes brief mention of interference when multiple solutions are used, but such interaction also has privacy implications (e.g., if a TCP option exposes subscriber ID and XFF exposes IPv4 address). To the extent that combinations like this are being envisioned, they need a more thorough treatment.
> 
> Along these lines, in the sections below I've either suggested new text entirely (1.2) or inserted notes/questions in brackets where I think more text would be helpful.
> 
> 1.2.  HOST_ID and Privacy
> 
> IP address sharing is motivated by a number of different factors. For years, many network operators have conserved the use of public IPv4 addresses by making use of customer premises equipment (CPE) that assigns a single public IPv4 address to all hosts within the customer's local area network and uses NAT to translate between locally unique private IPv4 addresses and the CPE's public address. With the exhaustion of IPv4 address space, address sharing between customers on a much larger scale is likely to become much more prevalent.
> 
> While many individual users are unaware of and uninvolved in decisions about whether their unique IPv4 addresses get revealed when they send data via IP, some users realize privacy benefits associated with IP address sharing, and some may even take steps to ensure that NAT functionality sits between them and the public Internet. IP address sharing makes the actions of all users behind the NAT function unattributable to any single host, creating room for abuse but also providing some identity protection for non-abusive users who wish to transmit data with reduced risk of being uniquely identified.
> 
> The proposals considered in this document add a measure of uniqueness back to hosts that share a public IPv4 address. The extent of that uniqueness depends on which information is included in the HOST_ID and is discussed in each solution proposal section. 
> 
> Similarly, the volatility of the HOST_ID information depends on the particular solution proposal, and in some cases, the particular implementation. In some cases the HOST_ID may be recycled when the host reboots or obtains a new internal IP addresses, while in other cases the HOST_ID may be persistent. As with persistent IP addresses, persistent HOST_IDs facilitate user tracking over time. 
> 
> As a general matter, the HOST_ID proposals do not seek to make hosts any more identifiable than they would be if they were using a public, non-shared IP address. However, depending on the solution proposal, the addition of HOST_ID information may allow a device to be fingerprinted more easily than it otherwise would be. Should multiple solutions be combined (e.g., TCP Option and XFF) that include different pieces of information in the HOST_ID, fingerprinting may become even easier. 
> 
> The trust placed in the information conveyed in the HOST_ID is likely to be the same as for current practices with source IP addresses. In that sense, a HOST_ID can be spoofed as this is also the case for spoofing an IP address. [Note: Is this statement really true for HOST_ID solutions that rely on something other than IP address, e.g., subscriber ID? What about when SAVI is in use? Also, what are the implications of spoofing for return reachablity? It seems that if spoofing is being put forth as some sort of user-enabled protection mechanism, the actual implications of spoofing require further discussion.] Furthermore, users of network-based anonymity services (like Tor) may be capable of stripping HOST_ID information before it reaches its destination.
> 
> [Is it envisioned that the HOST_ID solutions will be used by mobile operators? If so, there is probably a bit more to be said here about a mobile device maintaining its HOST_ID even if its public IP changes.]
> 
> 3.1.  Define an IP Option
> 
> 3.1.1.  Description
> 
>  This proposal aims to define an IP option [RFC0791] to convey a "host
>  identifier".  This identifier can be inserted by the address sharing
>  function to uniquely distinguish a host among those sharing the same
>  IP address.  The option can convey an IPv4 address, the prefix part
>  of an IPv6 address, etc.
> 
> [This seems pretty unspecific. What are all the identifiers that are/could be used here?]
> 
> 3.1.2.  Analysis
> 
>  Unlike the solution presented in Section 3.2, this proposal can apply
>  for any transport protocol.  Nevertheless, it is widely known that
>  routers (and other middle boxes) filter IP options.  IP packets with
>  IP options can be dropped by some IP nodes.  Previous studies
>  demonstrated that "IP Options are not an option" (Refer to
>  [Not_An_Option], [Options]).
> 
> [Depending on the answer to my question posed in 3.1.1, there should be some discussion here of the differences in uniqueness and volatility of the different potential identifiers.]
> 
> 3.2.  Define a TCP Option
> 3.2.2.  Analysis
> 
> [Looking at draft-wing-nat-reveal-option, it does a good job of discussing the max refresh rate of the TCP option, but doesn't discuss the min at all. I presume some implementations might use a persistent identifier if they're not sharing among more than 2^16 hosts. Is it practical to recommend against that (probably would make more sense to do so in draft-wing-nat-reveal-option, but seems like it needs to be discussed somewhere)? 
> 
> I don't know enough about the various kinds of IDs listed, but it seems inadvisable to use something globally unique when all you need is local uniqueness.]
> 
> o  Interference with current usages such as X-Forwarded-For (see
>  Section 3.4) should be elaborated to specify the behavior of
>  servers when both options are used; in particular specify which
>  information to use: the content of the TCP option or what is
>  conveyed in the application headers.
> 
> [If the use of both the TCP option and XFF together is a real possibility, it would be good to be able to recommend that they both contain subsets of the same information (e.g., full IP and lower 16 bits of IP).]
> 
> 3.3.  Use the Identification Field of IP Header (IP-ID)
> 3.3.1.  Description
> 
>  IP-ID (Identification field of IP header) can be used to insert an
>  information which uniquely distinguishes a host among those sharing
>  the same IPv4 address.  An address sharing function can re-write the
>  IP-ID field to insert a value unique to the host (16 bits are
>  sufficient to uniquely disambiguate hosts sharing the same IP
>  address).
> 
> [Is it possible to be more specific about what these bits are?]
> 
> 3.4.  Inject Application Headers
> 
> [It seems like this solution raises some broader issues beyond privacy -- does it really make sense to promote a model where access to more and more resources via HTTP is gated on the presence of a non-standardized extension header, with exceptions made on the basis of a Wikipedia-based list of ISPs?]
> 
> Cheers,
> Alissa
>