Re: [Ideas] Addressing the privacy issues exposed by IDEAS

Tom,

Excellent analysis.  Thanks for this.

Bob

On 10/18/2017 12:18 PM, Tom Herbert wrote:
> On Wed, Oct 18, 2017 at 6:04 AM, Robert Moskowitz
> <rgm-ietf@htt-consult.com> wrote:
>> I chose the subject line carefully as you will see by my analysis of the
>> privacy issue(s).  I have discussed this with Padma before bringing this up
>> to the list.
>>
>> Here is the privacy attack, as I see it:
>>
>> It is fairly well established that web sites collect a lot of personal
>> information and information about the device(s) connected to that personal
>> information.  IP addresses, even the actual address of NATed clients are
>> part of the harvest.  Mal and his cousins are busy stealing this information
>> and putting it all together in their own big data pile.
>>
>> Meanwhile, Eve and her cousins are busy watching the network and seeing
>> which IP addresses are communicating with other IP addresses.  Eve and
>> cohorts put their data together with Mal and cohorts' data and then is able
>> to note that:  "Hey look, Alice is talking directly with Barb."  Oh, look,
>> they both moved to new addresses, but we can see it is still Alice and Barb.
>>
>> What is going on here?
>>
>> ID/Loc technologies, enhanced with IDEAS technology, will make Peer-to-Peer
>> communications without any triangular routing achievable.  As long as these
>> P2P communications use the same IP addresses as used in web Client/Server
>> communications, the linkage is there to the privacy leakage occuring through
>> those websites.
>>
>> Three things have to happen to protect the privacy of P2P communications
>> from the swamp of privacy leakage in C/S communications.
>>
>> Identities need to be masked/hidden by both the ID/Loc technologies and
>> IDEAS.
>>
>> Identifiers of all ilk, both in the control channel and the data channel
>> need to change with each move using some Perfect Forward Secrecy (PFS)
>> technology.
>>
>> Multiple IP addresses MUST be used, at least separating the P2P from C/S
>> communications.  Different addresses for different P2P connections is wise.
>>
> Bob,
>
> It's more than just using multiple addresses. Today carriers are
> assigning multiple addresses giving /64s so that a UE is getting 2^64
> addresses. The problem is that this is done by a prefix assignment for
> each device which means the device is easily tracked by that. What we
> want are multiple addresses with some specific properties for privacy.
>
> Here the properties of addresses that I came up with:
>
>       o They are composed of a global routing prefix and a suffix that
>          is internal to an organization or provider. This is the same
> property for IP
>          addresses [RFC3513].
>
>        o The registry and organization of an address can be determined by
>          the network prefix. This is true for any global address.
>
>        o The organizational bits in the address should have minimal
>          hierarchy to prevent inferences. It might be reasonable to have
>          an internal prefix that divides identifiers based on broad
>          geographic regions, but detailed information such as location,
>          department in an enterprise, or device type should not be
>          encoded in a globally visible address.
>
>        o Given two addresses and no other information, the
>          desired properties of correlating them are:
>
>           o It can be inferred if they belong the same organization and
>             registry. This is true for any two global IP addresses.
>
>           o It may be inferred that they belong to the same broad
>             grouping, such as a geographic region, if the information is
>             encoded in the organizational bits of the address.
>
>           o No other correlation can be established. For example, it
>             cannot be inferred that the IP addresses address the same
>             node, the addressed nodes reside in the same subnet, rack, or
>             department, or that the nodes for the two addresses have any
>             geographic proximity to one another.
>
>> Note that if IDEAS-ID/Loc does everything to hide and confuse
>> Identity/Identifier, it is all for naught if multiple IP addresses are not
>> used.  At this point I should mention that TLS 1.3 may have a similar
>> privacy risk, but that is for a different soapbox.
>>
>> Action plan:
>>
>> The IDEAS charter should say something like:
>>
>> "IDEAS will act as an enabling technology for the various ID/Loc
>> technologies currently specified within the IETF.  As such it will result in
>> a wider deployment of, mobile, Peer to Peer communications.  Care will be
>> taken in the design of the IDEAS technology not to enable the privacy
>> leakage attacks in current Client/Server (predominately web-based) to be
>> linked to these P2P communications."
>>
>> This means that whatever technology we come up for IDEAS will mask/hide
>> PII/Identity/Identifier.  So that Eve is in the dark and we need only defend
>> the IDEAS data store from Mal.
>>
>> Each ID/Loc technolgy (and this means ME with HIP) will need revisions to
>> both their control and data plane (this means ESP for HIP) to change how
>> Indentity and Identifiers are handled to break privacy tracking by Eve.
>> This may require using IDEAS as an enabler of privacy functions (I suspect I
>> will need it in HIP to deal with the HI in the R1 packet).  TLS 1.3 may also
>> need revisions with its zero RT method.
>>
>> The final, and potentially big one that is outside the IETF's control is
>> that OSs and ISPs MUST enable support for multiple addresses per host and
> ISP support requires a protocol to do bulk address assignment. This is
> supported with DHCP, although it would be nice to have a method to
> compress addresses in a response to 64 bits (identifiers) assuming
> they all have a common 64 bit prefix. Of course Android doesn't
> support DHCPv6 so they're going to need to be convinced that /128
> address assignments are a leap forward.
>
> OSes support multiple addresses to be configured on an interface
> (order of 1000s). But the use of addresses needs to change to support
> privacy. The concept of different address per outgoing connection
> needs to be implemented. The semantics of INADDR_ANY need to be
> modified to restrict the addresses allowed for incoming connections
> (this is already be worked on container virtualization). There's also
> a few "philosophical" questions relating to expected uses of any
> assigned address-- like how to deal with ICMP. For instance, should
> all of the addresses assigned to a device respond to ping?
>
>> let technologies within the hosts (like ID/Loc) to get addresses to provide
>> privacy separation.  This ALSO extends to MAC addresses!  Eve could be
>> tapping into those IPFIX flows (now there is a BIG privacy leakage attack
>> that no one is talking about) and getting all the MAC/IP address mappings!
>>
> RFC4941 talks about the problem of embedding IEEE identifiers into
> IPv6 addresses. That practice is no longer considered acceptable. In
> some sense, identifier-locator takes this it's logical extreme where
> the "identifier" used to create addresses changes at the time
> granularity of every new connection.
>
>> One caveat that makes the multiple address not so big of a challenge is that
>> ISPs are already providing some level of multiple address support by
>> allowing hotspot usage on the mobile devices.  The IP address seen on the
>> network MAY be from a given device or a device using it as a gateway.  This
>> will become increasingly more common with automotive hotspots.  But this is
>> NOT something we should count on as a mitigation of this privacy attack.
>>
> I was thinking about this problem. The normal way to implement a hot
> spot is to give a device a prefix and delegate addresses from that
> prefix. But that means the prefix is encoded in addresses which breaks
> the address privacy properties above. I think the alternative is to
> just to assign a host spot a whole bunch of /128 addresses and let
> them do what they please with them. They can delegate addresses to the
> their tethered clients.  So devices in the identifier-locator network
> may each be assigned 1000s of addresses, and device that are hot spots
> for many clients may end up needing 100s of thousands or more. The net
> result is that the mapping system is going to need to scale to very
> large numbers, I am assuming the system will need to track more than
> 1T identifiers at scale. Not going to be easy :-)
>
> Tom
>
> _______________________________________________
> Ideas mailing list
> Ideas@ietf.org
> https://www.ietf.org/mailman/listinfo/ideas