Re: [Ideas] Addressing the privacy issues exposed by IDEAS

Tom Herbert <tom@herbertland.com> Wed, 18 October 2017 16:18 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: ideas@ietfa.amsl.com
Delivered-To: ideas@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8DED41321CB for <ideas@ietfa.amsl.com>; Wed, 18 Oct 2017 09:18:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qBlEJHstQCRs for <ideas@ietfa.amsl.com>; Wed, 18 Oct 2017 09:18:38 -0700 (PDT)
Received: from mail-qk0-x22d.google.com (mail-qk0-x22d.google.com [IPv6:2607:f8b0:400d:c09::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E63B413301E for <ideas@ietf.org>; Wed, 18 Oct 2017 09:18:37 -0700 (PDT)
Received: by mail-qk0-x22d.google.com with SMTP id y23so6898715qkb.10 for <ideas@ietf.org>; Wed, 18 Oct 2017 09:18:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=qV3TJXi7kaF74g833APqJ1D278J54pDmV92YJ0rIK8w=; b=MEYDg9lsgAOcGXGN33NWD2gtqpWazMPwbvPbEeduq8+a8TGWq8SUrFsPguW/lXJMQc 8mN2ZEHSu49tqJRjlWV10eupW1BTDpqNiV7ZW8XTn459oRtWkCGOvYb3rDOe4UIKYoo7 vgfED5ujBdmcPq4Ur0zbjMQeXv+Sptcx6PYT9uVeNs4w1HAaxl4op4bOCxxItd1RiF2C 73Qh6j6X/KSG0ENAY6yO2ToNteaH4MbcBm7vCoQNXyQotY+6SaBUs8SGI80TkezbvmZY xRVowCQLzjPFo4Lf93gZkHejLZAcbB+SiWUpFeDuzsNKX9/mO2nLL54rebIcrNRq9qdI cYnQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=qV3TJXi7kaF74g833APqJ1D278J54pDmV92YJ0rIK8w=; b=e/lBJVR2ft3Grgoqzi1IJA62QTphA9pGJ23ncfTaTnX6f9ppaNfpwLBRBwFlZQXNXm N4A4V8rYvVtIcNvUTyCZoz+F2CaZksl+5+U1ZMt9d3lmp847+CG0PjPXQa2UpLQE1YVS KkeH0LL323nEH1zfPL/v3ImdVSiJ+uQR+yZyof157aIF41wT4BSrkvxMoBoy2/e5C4+U ScXKrUQTBtLvd7mlfi8E03FoYcNgVDC9RwcyOqDglLu/RWAUYUpCM+wlTUF+b93zHSmu D5zptejiZlXOyz2gkJRpBuMwBo1lAsrkKqWSCmZHJo1XoSfcVPshHUO+kXtOxuhny4AA hB1A==
X-Gm-Message-State: AMCzsaXD0bxK3rLrtiJIA8sAXHuqf4JwdwhWRSoaoSiIiCBXS96i1ftJ 6hAoIqshl2StFGVfsR0+3tuLJ/HSwg6akqKs6YUl4w==
X-Google-Smtp-Source: ABhQp+RUnAaaUxFaIybDrzdbfshwPLgHVofrUkcVm699AizCgWV9SaUBofSjYvV+WJXRIULNKyUwsBe4EFAHlXnsOWQ=
X-Received: by 10.55.106.132 with SMTP id f126mr2986491qkc.295.1508343516835; Wed, 18 Oct 2017 09:18:36 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.237.54.4 with HTTP; Wed, 18 Oct 2017 09:18:36 -0700 (PDT)
In-Reply-To: <9155d3fe-cbe2-ae2d-9c59-f3dee85b1409@htt-consult.com>
References: <9155d3fe-cbe2-ae2d-9c59-f3dee85b1409@htt-consult.com>
From: Tom Herbert <tom@herbertland.com>
Date: Wed, 18 Oct 2017 09:18:36 -0700
Message-ID: <CALx6S371UYq027pvVYTS2F0UE8kknd7LmTk-0z7KAQwu8=q5=w@mail.gmail.com>
To: Robert Moskowitz <rgm-ietf@htt-consult.com>
Cc: "ideas@ietf.org" <ideas@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ideas/K4ww6dsmSx5H3SOyigR5jIFqCtE>
Subject: Re: [Ideas] Addressing the privacy issues exposed by IDEAS
X-BeenThere: ideas@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Discussions relating to the development, clarification, and implementation of control-plane infrastructures and functionalities in ID enabled networks." <ideas.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ideas>, <mailto:ideas-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ideas/>
List-Post: <mailto:ideas@ietf.org>
List-Help: <mailto:ideas-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ideas>, <mailto:ideas-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Oct 2017 16:18:40 -0000

On Wed, Oct 18, 2017 at 6:04 AM, Robert Moskowitz
<rgm-ietf@htt-consult.com> wrote:
> I chose the subject line carefully as you will see by my analysis of the
> privacy issue(s).  I have discussed this with Padma before bringing this up
> to the list.
>
> Here is the privacy attack, as I see it:
>
> It is fairly well established that web sites collect a lot of personal
> information and information about the device(s) connected to that personal
> information.  IP addresses, even the actual address of NATed clients are
> part of the harvest.  Mal and his cousins are busy stealing this information
> and putting it all together in their own big data pile.
>
> Meanwhile, Eve and her cousins are busy watching the network and seeing
> which IP addresses are communicating with other IP addresses.  Eve and
> cohorts put their data together with Mal and cohorts' data and then is able
> to note that:  "Hey look, Alice is talking directly with Barb."  Oh, look,
> they both moved to new addresses, but we can see it is still Alice and Barb.
>
> What is going on here?
>
> ID/Loc technologies, enhanced with IDEAS technology, will make Peer-to-Peer
> communications without any triangular routing achievable.  As long as these
> P2P communications use the same IP addresses as used in web Client/Server
> communications, the linkage is there to the privacy leakage occuring through
> those websites.
>
> Three things have to happen to protect the privacy of P2P communications
> from the swamp of privacy leakage in C/S communications.
>
> Identities need to be masked/hidden by both the ID/Loc technologies and
> IDEAS.
>
> Identifiers of all ilk, both in the control channel and the data channel
> need to change with each move using some Perfect Forward Secrecy (PFS)
> technology.
>
> Multiple IP addresses MUST be used, at least separating the P2P from C/S
> communications.  Different addresses for different P2P connections is wise.
>
Bob,

It's more than just using multiple addresses. Today carriers are
assigning multiple addresses giving /64s so that a UE is getting 2^64
addresses. The problem is that this is done by a prefix assignment for
each device which means the device is easily tracked by that. What we
want are multiple addresses with some specific properties for privacy.

Here the properties of addresses that I came up with:

     o They are composed of a global routing prefix and a suffix that
        is internal to an organization or provider. This is the same
property for IP
        addresses [RFC3513].

      o The registry and organization of an address can be determined by
        the network prefix. This is true for any global address.

      o The organizational bits in the address should have minimal
        hierarchy to prevent inferences. It might be reasonable to have
        an internal prefix that divides identifiers based on broad
        geographic regions, but detailed information such as location,
        department in an enterprise, or device type should not be
        encoded in a globally visible address.

      o Given two addresses and no other information, the
        desired properties of correlating them are:

         o It can be inferred if they belong the same organization and
           registry. This is true for any two global IP addresses.

         o It may be inferred that they belong to the same broad
           grouping, such as a geographic region, if the information is
           encoded in the organizational bits of the address.

         o No other correlation can be established. For example, it
           cannot be inferred that the IP addresses address the same
           node, the addressed nodes reside in the same subnet, rack, or
           department, or that the nodes for the two addresses have any
           geographic proximity to one another.

> Note that if IDEAS-ID/Loc does everything to hide and confuse
> Identity/Identifier, it is all for naught if multiple IP addresses are not
> used.  At this point I should mention that TLS 1.3 may have a similar
> privacy risk, but that is for a different soapbox.
>
> Action plan:
>
> The IDEAS charter should say something like:
>
> "IDEAS will act as an enabling technology for the various ID/Loc
> technologies currently specified within the IETF.  As such it will result in
> a wider deployment of, mobile, Peer to Peer communications.  Care will be
> taken in the design of the IDEAS technology not to enable the privacy
> leakage attacks in current Client/Server (predominately web-based) to be
> linked to these P2P communications."
>
> This means that whatever technology we come up for IDEAS will mask/hide
> PII/Identity/Identifier.  So that Eve is in the dark and we need only defend
> the IDEAS data store from Mal.
>
> Each ID/Loc technolgy (and this means ME with HIP) will need revisions to
> both their control and data plane (this means ESP for HIP) to change how
> Indentity and Identifiers are handled to break privacy tracking by Eve.
> This may require using IDEAS as an enabler of privacy functions (I suspect I
> will need it in HIP to deal with the HI in the R1 packet).  TLS 1.3 may also
> need revisions with its zero RT method.
>
> The final, and potentially big one that is outside the IETF's control is
> that OSs and ISPs MUST enable support for multiple addresses per host and

ISP support requires a protocol to do bulk address assignment. This is
supported with DHCP, although it would be nice to have a method to
compress addresses in a response to 64 bits (identifiers) assuming
they all have a common 64 bit prefix. Of course Android doesn't
support DHCPv6 so they're going to need to be convinced that /128
address assignments are a leap forward.

OSes support multiple addresses to be configured on an interface
(order of 1000s). But the use of addresses needs to change to support
privacy. The concept of different address per outgoing connection
needs to be implemented. The semantics of INADDR_ANY need to be
modified to restrict the addresses allowed for incoming connections
(this is already be worked on container virtualization). There's also
a few "philosophical" questions relating to expected uses of any
assigned address-- like how to deal with ICMP. For instance, should
all of the addresses assigned to a device respond to ping?

> let technologies within the hosts (like ID/Loc) to get addresses to provide
> privacy separation.  This ALSO extends to MAC addresses!  Eve could be
> tapping into those IPFIX flows (now there is a BIG privacy leakage attack
> that no one is talking about) and getting all the MAC/IP address mappings!
>
RFC4941 talks about the problem of embedding IEEE identifiers into
IPv6 addresses. That practice is no longer considered acceptable. In
some sense, identifier-locator takes this it's logical extreme where
the "identifier" used to create addresses changes at the time
granularity of every new connection.

> One caveat that makes the multiple address not so big of a challenge is that
> ISPs are already providing some level of multiple address support by
> allowing hotspot usage on the mobile devices.  The IP address seen on the
> network MAY be from a given device or a device using it as a gateway.  This
> will become increasingly more common with automotive hotspots.  But this is
> NOT something we should count on as a mitigation of this privacy attack.
>
I was thinking about this problem. The normal way to implement a hot
spot is to give a device a prefix and delegate addresses from that
prefix. But that means the prefix is encoded in addresses which breaks
the address privacy properties above. I think the alternative is to
just to assign a host spot a whole bunch of /128 addresses and let
them do what they please with them. They can delegate addresses to the
their tethered clients.  So devices in the identifier-locator network
may each be assigned 1000s of addresses, and device that are hot spots
for many clients may end up needing 100s of thousands or more. The net
result is that the mapping system is going to need to scale to very
large numbers, I am assuming the system will need to track more than
1T identifiers at scale. Not going to be easy :-)

Tom