Re: Last Call: <draft-hardie-privsec-metadata-insertion-05.txt> (Design considerations for Metadata Insertion) to Informational RFC

Ted Hardie <ted.ietf@gmail.com> Thu, 02 March 2017 18:02 UTC

Return-Path: <ted.ietf@gmail.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9E24E1295A9; Thu, 2 Mar 2017 10:02:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lvC_AZW8blfD; Thu, 2 Mar 2017 10:02:37 -0800 (PST)
Received: from mail-ot0-x22a.google.com (mail-ot0-x22a.google.com [IPv6:2607:f8b0:4003:c0f::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0E8851295A8; Thu, 2 Mar 2017 10:02:37 -0800 (PST)
Received: by mail-ot0-x22a.google.com with SMTP id i1so57749259ota.3; Thu, 02 Mar 2017 10:02:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=/PwEi0pgrHRolw8ymj9DEE31KYUKF+XI3OArsTw7mQk=; b=INQQ6zfh4t0mC8cFUxmEUU4FppmJ5XH19IErS2AxbGT1wAWUK97AAWjcnAZ2xI3B/d m4aa4hUPC9SrpJoNH26ZYSTl0hL3XWzyzInWw3IPUD0Ycs704hSpztLobjZzd9WlBhWJ NU27L3E62NpSeLGMNaxT98KqnrItXF1Mrqp/ldW1XfWQ4cuOVwH5p842iIAiQacY1VgW snLdQsG33L5KMvtH4VrbDVJnN0fA7Xxrg1zRHgNMKSbLRCoFpJdIQLLrfH8YkGJicuv4 hVBtO0nfw6/hMaiL9/14dQxWOVYTCDlcOvENPT3lGeXEnfaUGQ8jg3syC/kbsisY9lGg 8/3A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=/PwEi0pgrHRolw8ymj9DEE31KYUKF+XI3OArsTw7mQk=; b=LTi9rJSOP03r+EY+yHq04M54f0uK28c7FWLi/PUNq0fn1nC2RdSVhvjOUhSkFicqs6 yKovoAPHCeUbObxnTtexUC0YWELocA7jF5lgnfucKkkQX8Yh8ICtu+QHw8/dwTAjxRib sLZezyNlTDwp/0eWd1a1nC+DLHkIXYcTOj3CzWMxKUGv/Ln7BXiffzOJnQR3BP+NqYE/ PYWJ9zpsftksUCk/35Q/adiYDd9fs9AT1CKmQOxgetC4nAhEsY6zURp5QkoR7R/enbRS eBdp/apCY6TCCywzHCGWi4eXkJj1QiapMean3SsbqMWPHVyDQ1wtIVtSjpK5CtZeKxNr 4rVg==
X-Gm-Message-State: AMke39lvkDs8DqgqEQgqyK6bcdhWGnhhQH7FW7xFmeYkO5kgrofMXsWWdC8pcycWQzepY8a8DDQrKLfRk2CbgA==
X-Received: by 10.157.21.82 with SMTP id z18mr8955262otz.29.1488477755903; Thu, 02 Mar 2017 10:02:35 -0800 (PST)
MIME-Version: 1.0
Received: by 10.74.142.85 with HTTP; Thu, 2 Mar 2017 10:02:05 -0800 (PST)
In-Reply-To: <787AE7BB302AE849A7480A190F8B933009E1B2E6@OPEXCLILMA3.corporate.adroot.infra.ftgroup>
References: <148527996733.12573.15522530300481191993.idtracker@ietfa.amsl.com> <787AE7BB302AE849A7480A190F8B933009DEB0B5@OPEXCLILMA3.corporate.adroot.infra.ftgroup> <CA+9kkMC8d9dRGA0mYm-ALbZOnnq6LTLE=56imUFqK9JZ0wC=pw@mail.gmail.com> <787AE7BB302AE849A7480A190F8B933009E16627@OPEXCLILMA3.corporate.adroot.infra.ftgroup> <CA+9kkMBw-QbaDzDanWs6sH-z7rEteofCvp8-d-qSf9J31zJykA@mail.gmail.com> <787AE7BB302AE849A7480A190F8B933009E183CF@OPEXCLILMA3.corporate.adroot.infra.ftgroup> <CA+9kkMC4-e=HXSa=QX4m1GgKFA1y-PmKsHkwQTg-ckEM2tGUbw@mail.gmail.com> <787AE7BB302AE849A7480A190F8B933009E1B2E6@OPEXCLILMA3.corporate.adroot.infra.ftgroup>
From: Ted Hardie <ted.ietf@gmail.com>
Date: Thu, 02 Mar 2017 10:02:05 -0800
Message-ID: <CA+9kkMBATNM9VAAoRVzAPrvshqeAkNjL_VA_Bwz_JhU75DTRiA@mail.gmail.com>
Subject: Re: Last Call: <draft-hardie-privsec-metadata-insertion-05.txt> (Design considerations for Metadata Insertion) to Informational RFC
To: "mohamed.boucadair@orange.com" <mohamed.boucadair@orange.com>
Content-Type: multipart/alternative; boundary="94eb2c1917a2e59ea90549c33b12"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/KUeAeVZ7Rb93CQR6agFsBE7Nf8s>
Cc: "draft-hardie-privsec-metadata-insertion@ietf.org" <draft-hardie-privsec-metadata-insertion@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Mar 2017 18:02:40 -0000

wing are missing from the document:

> It's difficult to say how something will be used in the future.
>
> [Med] An advice that is not implementable makes more troubles, IMHO.
>
Sorry, I thought you were asking what wgs or protocols planned to reference
this.  For that, I don't know.  The intent is that it is information useful
to those considering whether restoring metadata lost to encryption in
mid-network is the right way to go.


> My intent (and the understanding of other reviewers) is to highlight that
> these mechanisms have a privacy-damaging result and that this should be
> considered.
>
> [Med] I do think existing documents already make that job. I do think we
> need more.
>
>
Sorry, did you mean "do not think we need more"?  If so, I obviously
disagree.  This design pattern is used uncritically enough that a brief
document describing why it isn't safe still seems to me useful.  Were it
incorporated into a more general document (as noted before), that would
also work.  If it later is, that more general work could obsolete this
(though that's a bid for an informational document).


>  In particularly, I'm concerned that some application functions in the
> network (e.g. recursive resolvers or proxies) do not consider the postive
> privacy implications of their aggregation and so do not consider adding
> this data back as problematic.
>
> [Med] I’m also concerned with that, too (see e.g.,
> http://www1.icsi.berkeley.edu/~narseo/papers/hotm42-vallinarodriguez.pdf).
> In the meantime, I’m also concerned with (1) some applications that leak
> privacy information without the consent of the user and (2) some
> application servers that may correlate various information shared by an
> application client to track users (e.g., https://panopticlick.eff.org/).
> BTW, I see that you are using “application function” which may not have the
> same meaning as the general “protocol” wording used in draft-hardie-*. Do
> you consider a DHCP relay as an “application function”?
>
>    Highlighting this enables them to see this traffic in a different
> context.
>
> [Med] Isn’t this already assumed by some protocol designers (e.g.,
> RFC6973, SIP)? BTW, there are subtleties when proxies are in the same trust
> domain of the client or server.
>
> There are certainly some protocol designers that have internalized this,
but my experience has been that this is not always the case.  In a fair few
cases, folks deploy  methods like this because they see encryption of
metadata in data integrity terms or see aggregation only in terms of data
usage minimization.  They restore the metadata mid-network because it is
the quickest solution for them to get back to the status quo ante for their
understanding of the system.




> * that data may not be always available to the endhost
>
> Understood, but even in this case, it is better to make the permission to
> add the data explicit.
>
> [Med] This may be easy to implement for some applications, but this may
> not be generalized to ** all ** protocols.
>

You are certainly correct that many deployed protocols would find it hard
to retrofit this consent model into their existing flows.    This is,
however, advice for folks at the design phase.  If RFC 6788 were being
written after the publication of this document, its authors might well have
looked at the protocol mechanics in section 5.2:

   The AN
   intercepts and then tunnels the received Router Solicitation in a
   newly created IPv6 datagram with the Line-Identification Option
   (LIO).  The AN forms a new IPv6 datagram whose payload is the
   received Router Solicitation message as described in [RFC2473
<https://tools.ietf.org/html/rfc2473>],
   except that the Hop Limit field of the Router Solicitation message
   MUST NOT be decremented.

and asked whether the circuit identifier corresponding to the logical
access loop port of the AN from which the RS was initiated PII.  If so, this
document would have them consider whether transparent interception
is the appropriate choice if it is.  There clearly are flows in which
the AN's role
would be explicit.

I don't know, frankly, which choice is right in this case, but I would
prefer that
the choice be made with an easy reference to the implications of
inserting metadata
at hand.

Putting aside the interaction with a user to get a consent and how that
> consent will need to be changed when another user uses the same device to
> connect to the Internet. Consider a user who does not want an upstream DHPC
> relay to insert the line-id (https://tools.ietf.org/html/rfc6788) to the
> server, and let’s suppose the relay received a signal (by some means, to be
> yet specified) that for this particular DHCP client, the line-id must not
> be inserted. For this case, connectivity won’t be provided to that user.
> This would mean extra calls to the hotline for that network provider. This
> is not desirable for both customers and network providers.
>
> I
>


> f this can be done in parallel with other actions, then the latency impact
> can be minimized.
>
> [Med] These are assumptions and implications that are worth to be added to
> the draft.
>

Okay, how about the following text being added to section 5.

There also tensions with latency of operation. For example, where the end
system does not initially know the information which would be added by
on-path devices, it must engage the protocol mechanisms to determine it.
Determining a public IP address to include in a locally supplied header
might require a STUN exchange, and the additional latency of this exchange
discourages deployment of host-based solutions.  To minimize this latency,
engaging those mechanisms may need to be done in parallel with or in
advance of the core protocol exchanges with which this metadata would be
supplied.



> BTW, this falls into this general discussion in
> https://tools.ietf.org/html/rfc6973:
>
>
>
>    a.  Trade-offs.  Does the protocol make trade-offs between privacy
>
>        and usability, privacy and efficiency, privacy and
>
>        implementability, or privacy and other design goals?  Describe
>
>        the trade-offs and the rationale for the design chosen.
>
> * a misbehaving node may be tempted to spoof the data to be injected. A
> remote device that will use that data to enforce policies will be broken.
>
> This point was discussed extensively in the GEOPRIV work and essentially a
> single carve-out was made:  for emergency services, where falsely asserted
> location data could be used to SWAT individuals or consume safety
> resources.    I don't think that falls into this narrow advice, but I would
> be willing to add something like this to the security considerations:
>
> "Note that some emergency service recipients, notably PSAPs (Public Safety
> Answering Points) may prefer data provided by a network to data provided by
> end system, because an end system could use false data to attack others or
> consume resources.   While this has the consequence that the data available
> to the PSAP is often more coarse than that available to the end system, the
> risk of false data being provided involved a risk to the lives of those
> targeted."
>
> [Med] Thank you. Providing PSAP as an example is OK, but I’d like the
> issue to be called out as a generic one while PSAP is provided as an
> example. What about the following:
>
>
>
> "Note that some servers (e.g., emergency service recipients, notably PSAPs
> (Public Safety Answering Points) [RFC6443]) may prefer data provided by a
> network to data provided by the end system, because an end system could use
> false data to attack others or consume resources.  While this has the
> consequence that the data available to the server is often more coarse than
> that available to the end system, the risk of false data being provided
> involved a risk to the lives of those targeted."
>
>
>

I don't think that emergency service recipients shifting to an example
works here, because it broadens the carve out.  In the emergency services
case, the resources consumed are fire trucks, ambulances, and swat teams.
For other servers, resources consumed could simply be  CPU cycles or disk;
that's really not the same.  Balancing location consent requirements
against one was agreed; balancing it against the other was not.



> * it was reported in the past that some browsers leak the MSISDN and other
> sensitive data.
>
> This is true, but it seems to me unrelated to the point of the document.
>
> [Med] It is related because blindly trusting an application client (and
> server) has its own privacy risks. This is even exacerbated given the rich
> data that is available to an application client and also because of the
> visibility on various layers available to an application server.
>

I agree that it has its own privacy risks, but I don't think this is the
document that should explore them.

> From that flow some of your other concerns about audience, at least as I
> understand.  As written, this is narrow advice for a broad audience:
> basically, anyone who would consider the form of metadata insertion it
> describes.  You would, if I understand you, prefer a narrower description
> of the audience in a larger context.
>
>
>
> [Med] The key point here is about the practicality of implementing the
> advice NOT changing the scope. For example, the document says that it is
> better that a host is injecting the data but the document does not question
> whether that supplied data can be trusted or not,
>
>
Broadening this a bit, you're looking at two cases: one in which the data
the host has is wrong and one in which there is an adversarial
relationship.  For the first case, we can add text saying that when an end
system supplies data it is the end system's responsibility to ensure that
it is correct; don't use a STUN result from last week as fresh, for
example.   For the second case,  in which the server treats user supplied
data as potentially misleading because the user may wish to circumvent
restrictions, I'll point out the Wikimedia example demonstrates that simply
shifting the trust to a mid-point entity doesn't work; it has to be shifted
to an entity within the trust domain of the server.  So the question isn't
really "end-user system supplied data can be trusted or not", the same
question applies to whomever supplies the data.

or how the consent will be obtained from a user.

You're right that I'm leaving aside the question of how the user sets the
policies, because it may vary by protocol and type of device too much to
make general advice useful.  If you would like me to add an explicit
statement to that effect, I am happy to note that it is not covered.


>
>
> In general, the point of the document is that the host should be able to
> omit the data without mid-network devices adding it back.  That's the point
> of protecting the traffic in the first place, after all.  I am saying that
> if the protocols require the data, then getting it from the end host has
> better privacy properties than getting from it from mid-network entities.
>
> [Med] I’m not sure we can have such general statement because the data may
> not be available (e.g., DHCP for example) to clients + the data supplied by
> clients (when possible) may not be reliable + enforcing policies based on
> client-supplied data may have implication on other users (e.g., spoofing
> XFF for example). Obviously, getting some of the information from a client
> may have implications on QoE…the user needs to understand the root causes
> of a degradation of QoE. Of course, these implications may not be new for
> users who are familiar with disabling Java scripts and cie.
>
>
>
> For example, the document states that the information in a Forward-For
> header can be supplied by the host itself and then communicated to a remote
> consumer. This is indeed possible, but because of abusing hosts some
> servers implement whitelists to trust proxies; see
> https://meta.wikimedia.org/w/extensions/TrustedXFF/trusted-hosts.txt.
>
>
>
>
>
>
> The Wikimedia case is a very interesting one to raise, because it derives
> from a set of assumptions about the network that are somewhat flawed and
> then attempts to patch those flaws in ways that actually damage the
> mechanisms of the system they originally built.
>
> Wikimedia wants to allow folks to edit without login credentials.  This
> allows for anonymous users to make corrections or additions; this is a
> goal.  The consequence of that goal being achieved is that trolls or
> malicious editors can have at anything they want.
>
> Rather than institute credentials and ACLs, Wikimedia attempts to
> substitute blocking by IP for blocking by credential.  The property they
> are looking for in IPs is not really there, though:  they are not unique to
> individuals, especially over time.
>
> This damages those who share IP addresses (due to NATs or proxies).  As
> far as I can tell, the NAT problem is simply treated as collateral damage.
> For the proxies, they attempt to work around the damage using XFF.  That's
> spoofable, though, so they attempt to limit it to specific proxies whose
> XFF they trust--many of which require logins.  That shifts the information
> about who is editing Wikipedia out of their hands, but leaves it in the
> network and thus not truly anonymous.  I understand the engineering balance
> they are trying to strike, but I'm not sure I can recommend their solution.
>
>
>
> [Med] I’m not recommending their solution either, but I’m trying to raise
> the point that an engineering balance is out there. ACKing that deployment
> reality is better than ignoring it.
>
>
>

The deployment considerations text is meant to point out the engineering
balance.  I'm happy to add the text noted above (on latency, the end user
responsibility for correct data, the PSAP carve out, and the explicit note
that the document does not treat how to obtain consent from a user so that
an end system can supply data).

I'm less happy to add language on adversarial treatment of client-supplied
data.  This is partly because many of the systems which use
network-supplied data are based on a misunderstanding of the properties of
the data being added.  It is partly because the adversarial relationship
can extend to network-supplied data.  It is also because a fair few of them
are simply security theater.  If you have a specific edit you would like to
propose, though, I will consider it.

Thanks again,

Ted