Re: [IAB] Call for Comment: 'Privacy Considerations for Internet Protocols'

Alissa Cooper <acooper@cdt.org> Wed, 10 April 2013 12:22 UTC

Return-Path: <acooper@cdt.org>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C7BF221F8EE1 for <ietf@ietfa.amsl.com>; Wed, 10 Apr 2013 05:22:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oPnk3Vlihwyp for <ietf@ietfa.amsl.com>; Wed, 10 Apr 2013 05:22:08 -0700 (PDT)
Received: from mail.maclaboratory.net (mail.maclaboratory.net [209.190.215.232]) by ietfa.amsl.com (Postfix) with ESMTP id 33ACB21F8A4E for <ietf@ietf.org>; Wed, 10 Apr 2013 05:22:07 -0700 (PDT)
X-Footer: Y2R0Lm9yZw==
Received: from localhost ([127.0.0.1]) by mail.maclaboratory.net (using TLSv1/SSLv3 with cipher AES128-SHA (128 bits)); Wed, 10 Apr 2013 08:21:53 -0400
Content-Type: text/plain; charset="windows-1252"
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: [IAB] Call for Comment: 'Privacy Considerations for Internet Protocols'
From: Alissa Cooper <acooper@cdt.org>
In-Reply-To: <5141D8E3.4000605@dcrocker.net>
Date: Wed, 10 Apr 2013 14:21:46 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <BB682B03-F1E3-487D-B56D-7BDBEC4EAA78@cdt.org>
References: <5141D8E3.4000605@dcrocker.net>
To: dcrocker@bbiw.net
X-Mailer: Apple Mail (2.1499)
Cc: IAB IAB <iab@iab.org>, draft-iab-privacy-considerations.all@tools.ietf.org, IETF Discussion <ietf@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Apr 2013 12:22:09 -0000

Hi Dave,

Thanks for your review. Some comments are inline. A pre-publication -08 version is available at <http://www.alissacooper.com/files/draft-iab-privacy-considerations-08.txt>. The diff from the -07 is available at <https://www.cdt.org/Z4Q>. 

On Mar 14, 2013, at 10:04 AM, Dave Crocker <dhc@dcrocker.net> wrote:

> Apologies for my sending this after the deadline.  I hope the comments are still usable...
> 
> 
> 
> Review of:    Privacy Considerations for Internet Protocols
> 
> I-D:          draft-iab-privacy-considerations-07.txt
> 
> Reviewed by:  D. Crocker
> 
> Review date:  14 March 2013
> 
> 
> Summary:
> 
>   The document provides a broad introduction to the needs, nature and details of adding privacy considerations to IETF specifications. Broadly, it is divided into introduction, terminology, generic exposure/analysis model, threats, mitigations, and analysis guidelines.  The document is generally well-organized and written clearly.  An example analysis is provided that concretely demonstrates the approach to doing a considerations analysis; it was intentionally chosen as a difficult case, with inherent tradeoffs between privacy and required functionality.
> 
>   As an introduction to the topic, the document is accessible and practical.
> 
>   A glaring deficiency of the document is its conscious choice to refrain from defining the term 'privacy'.  The choice is understandable, a given long, messy and varying real-world history with term. However the reader is left with having to formulate their own -- possibly unvoiced and therefore entirely ambiguous -- working definition.  For doing the technical worked needed in a specification, this simply does not give the reader the linchpin to the topic, needed to anchor their understanding in a way that will be consistent across authors and readers of specifications.  The draft needs to choose a definition, in spite of the fact that other groups, people and contexts will use other definitions.  We do specifications and this starts with definitions.  It simply makes no sense to be missing a definition for the key word.
> 
> By way of priming that pump, I'll proffer the simplest definition that seems plausible here:
> 
>     Privacy is the concern for protecting information
>     of or about an individual person.
> 
> Tweak this or replace it entirely, but /please/ provide a concrete, pragmatic definition that explicitly defines what is in scope and what is out, for them to focus their considerations on.

This suggestion has been debated at length within the IAB privacy program over the life of this document. Our thinking is that trying to define "privacy" in one sentence would be as counterproductive as trying to define "security" or "extensibility" in one sentence. All of those concepts are rich and nuanced enough to have entire documents dedicated to explaining them as concepts and exploring how those concepts should be tackled in the IETF. That is the purpose of this document. Given the extent to which we outline all the different facets of privacy threats, the feeling is that it would undercut the value of the document to boil privacy down to one sentence. What we want readers to do is take in the nuance and not think of privacy as one box they can simply tick off during the design process.

> 
> Also, given the challenges of this topic and the desire to get useful privacy considerations into IETF work, I suggest creating a privacy directorate, which can be asked to assist authors and review their work.  Think of it as a topic-specific mentoring group…

This has been tried before and did not work out so well <http://www.ietf.org/mail-archive/web/privacydir/current/maillist.html>, but there is some talk of trying again.

> 
> Except for the requirement to define its motivating term, the draft is usable in its current form, although a number of specific improvements cited in the detailed comments are recommended.

Great!

> 
> 
> 
> Detailed Comments:
> 
> The following comments are left raw, written as I read the draft...
> 
> 
>> Abstract
>> 
>>   This document offers guidance for developing privacy considerations
>>   for inclusion in protocol specifications.  It aims to make protocol
>>   designers aware of privacy-related design choices.  It suggests that
>>   whether any individual RFC warrants a specific privacy considerations
>>   section will depend on the document's content.
> 
> Given the degree of ambiguity in the word 'privacy' -- since there is such a wide range of definitions people assign it, as noted in the second paragraph of the Introduction -- the Abstract needs to provide a summary of its definition here, so that the reader can understand the focus and scope of the term's use in this document.  The definitional text needs to refrain from using the word 'privacy' as part of the definition…

Per my comment above, it is probably better to avoid stating this as concisely as would be necessary in the abstract.

> 
> 
>> 1. Introduction
>> 
>> 
>>   [RFC3552] provides detailed guidance to protocol designers about both
>>   how to consider security as part of protocol design and how to inform
>>   readers of protocol specifications about security issues.  This
>>   document intends to provide a similar set of guidance for considering
>>   privacy in protocol design.
>> 
>>   Privacy is a complicated concept with a rich history that spans many
>>   disciplines.  With regard to data, often it is a concept applied to
> 
> "With regard to data" implies that it could be with regard to something else.  What?

Peeping toms, for example. In many circles a distinction is made between "data protection" and "privacy," which can comprise aspects of personal intrusion that are not associated with stored or transmitted data.

> 
> 
>>   "personal data," information relating to an identified or
>>   identifiable individual.  Many sets of privacy principles and privacy
>>   design frameworks have been developed in different forums over the
>>   years.  These include the Fair Information Practices [FIPs], a
>>   baseline set of privacy protections pertaining to the collection and
>>   use of personal data (often based on the principles established in
>>   [OECD], for example), and the Privacy by Design concept, which
>>   provides high-level privacy guidance for systems design (see [PbD]
>>   for one example).  The guidance provided in this document is inspired
>>   by this prior work, but it aims to be more concrete, pointing
>>   protocol designers to specific engineering choices that can impact
>>   the privacy of the individuals that make use of Internet protocols.
>> 
>>   Different people have radically different conceptions of what privacy
>>   means, both in general, and as it relates to them personally
>>   [Westin].  Furthermore, privacy as a legal concept is understood
>>   differently in different jurisdictions.  The guidance provided in
>>   this document is generic and can be used to inform the design of any
>>   protocol to be used anywhere in the world, without reference to
>>   specific legal frameworks.
>> 
>>   Whether any individual document warrants a specific privacy
>>   considerations section will depend on the document's content.
>>   Documents whose entire focus is privacy may not merit a separate
> 
> OK.  Enough is enough.  It's fine to have a quick survey of earlier work, but that's not sufficient.
> 
> You keep using the word privacy, and I don't know what you mean.
> 
> The typical writer and reader of RFCs is not experienced in the topic of privacy.  They won't know what you mean either:  they need very concrete guidance about the word's meaning.
> 
> Telling me that different people mean different things with the term merely assures me that I have no idea what /you/ mean unless you tell me.  Having each reader make guesses about the meaning is a way to ensure non-interoperability of the construct.

Per my note above, the expectation is that the document as a whole will provide a rich explanation of what is meant by privacy. Of course, some people won't read the whole document, or even parts of it, but others will, and hopefully more so over time.

> 
> Guidance can't be very helpful if the reader has no idea when to apply it.

If the reader is unsure about whether to go through the thought process outlined in section 7, there is no harm (other than the use of the reader's time) in doing it and then finding out that a particular specification is already solidly designed when it comes to privacy.

> 
> 
>>   section (for example, "Private Extensions to the Session Initiation
>>   Protocol (SIP) for Asserted Identity within Trusted Networks"
>>   [RFC3325]).  For certain specifications, privacy considerations are a
>>   subset of security considerations and can be discussed explicitly in
> 
> I strongly suggest that any explicit privacy discussion be required to be an entirely separate from the 'security considerations' section.
> 
> My reasoning is simple:  This community sees 'security' in terms of encryption and signing, traffic analysis, and other such mechanical, relatively low-level components.  Privacy is an entirely different and broader and more human beast, even when its details devolve to these familiar mechanics.
> 
> At the least, making it a separate section will help writers and readers to distinguish privacy from the security stuff we are used to seeing discussed.

I think sections 4 and 7 demonstrate that privacy and security are interrelated at least in some respects. 

There are two motivations for suggesting that privacy can be incorporated into security considerations in some cases. First, in a way we are trying to key off of the familiarity that people already have with security, and asking them to expand their security thinking a bit might be an easier sell than making a whole new/separate requirement. Second, it is a means to avoid duplication. It already happens that when authors insert a separate privacy considerations section it ends up making a bunch of references to the security considerations section. We don't want to recommend a document structure that will just end up seeming extraneous.

> 
> 
>>   the security considerations section.  Some documents will not require
>>   discussion of privacy considerations (for example, "Definition of the
>>   Opus Audio Codec" [RFC6716]).  The guidance provided here can and
>>   should be used to assess the privacy considerations of protocol,
>>   architectural, and operational specifications and to decide whether
>>   those considerations are to be documented in a stand-alone section,
>>   within the security considerations section, or throughout the
>>   document.
> 
> Not sure whether this is a question or a suggestion; if it's the latter, I'm not sure what to suggest:  privacy issues often develop as a combinatorial problem -- 'correlation' as you note farther down -- that is, developing out of unpredicted integration of information from discrete services.  While any specific IETF specification might have its own, direct privacy issues needing consideration, where should discussion of these combinatorial dangers be discussed?

That is a good question and I'm not sure I know the answer. Of course there is nothing to prevent people from writing drafts about the privacy considerations associated with the combination of discrete services/protocols. 

> 
> 
>> 2. Terminology
>> 
>> 
>>   This section defines basic terms used in this document, with
>>   references to pre-existing definitions as appropriate.  As in
>>   [RFC4949], each entry is preceded by a dollar sign ($) and a space
>>   for automated searching.  Note that this document does not try to
>>   attempt to define the term 'privacy' itself.  Instead privacy is the
>>   sum of what is contained in this document.  We therefore follow the
>>   approach taken by [RFC3552].
> 
> Sorry.  Not workable, if you want meaningful consideration by authors and meaningful understanding by readers.

See above.

> 
> 
>> 
>> 2.1. Entities
>> 
>> 
>>   Several of these terms are further elaborated in Section 3.
>> 
>>   $ Attacker:   An entity that intentionally works against some privacy
>>      protection goal.  Unlike observers, attackers' behavior is
>>      unauthorized.
> 
> This precludes accidental privacy violations?

Fixed.

> 
> 
>> 
>>   $ Eavesdropper:   A type of attacker that passively observes an
>>      initiator's communications without the initiator's knowledge or
>>      authorization.  See [RFC4949].
>> 
>>   $ Enabler:   A protocol entity that facilitates communication between
>>      an initiator and a recipient without being directly in the
>>      communications path.
> 
> For example…?

This is elaborated in section 3.

> 
> 
>> 2.3. Identifiability
> ...
> 
>>   $ Personal Name:   A natural name for an individual.  Personal names
>>      are often not unique, and often comprise given names in
>>      combination with a family name.  An individual may have multiple
>>      personal names at any time and over a lifetime, including official
>>      names.  From a technological perspective, it cannot always be
>>      determined whether a given reference to an individual is, or is
>>      based upon, the individual's personal name(s) (see Pseudonym).
> 
> Official Names also are typically not unique.

Added a note to this affect.

> 
> 
>>   $ Pseudonym:   A name assumed by an individual in some context,
>>      unrelated to the individual's personal names known by others in
>>      that context, with an intent of not revealing the individual's
>>      identities associated with her other names.
> 
> (Might be worth mentioning that this is sometimes called "persona".)
> 
> Pseudonyms also often are not unique.
> 
> My point is that it's good that you mentioned this issue and should repeat it for each term to which it applies.

Fixed.

> 
> 
>> 3. Communications Model
>> 
>> 
>>   To understand attacks in the privacy-harm sense, it is helpful to
>>   consider the overall communication architecture and different actors'
>>   roles within it.  Consider a protocol entity, the "initiator," that
>>   initiates communication with some recipient.  Privacy analysis is
>>   most relevant for protocols with use cases in which the initiator
>>   acts on behalf of an individual (or different individuals at
>>   different times).  It is this individual whose privacy is potentially
>>   threatened.
> 
> If I receive a credit dunning notice or a legal notification, I'm the recipient, but unauthorized disclosure of such messages would be privacy-harm for me.  It isn't just initiator-side individuals.

Fixed.

> 
> 
>> 
>>   Communications may be direct between the initiator and the recipient,
>>   or they may involve an application-layer intermediary (such as a
>>   proxy or cache) that is necessary for the two parties to communicate.
> 
> proxy or cache -> proxy, cache or (mail) relay

Fixed.

> 
> 
>>   In some cases this intermediary stays in the communication path for
>>   the entire duration of the communication and sometimes it is only
>>   used for communication establishment, for either inbound or outbound
>>   communication.  In rare cases there may be a series of intermediaries
> 
> For email, it isn't rare at all.  In fact, it's universal, probably for literally every email sent.

Fixed.

> 
> 
>>   that are traversed.  At lower layers, additional entities are
>>   involved in packet forwarding that may interfere with privacy
>>   protection goals as well.
> ...
> 
>>   Protocol design is often predicated on the notion that recipients,
>>   intermediaries, and enablers are assumed to be authorized to receive
>>   and handle data from initiators.  As [RFC3552] explains, "we assume
>>   that the end-systems engaging in a protocol exchange have not
>> 
>> 
>> 
>> Cooper, et al.           Expires August 27, 2013               [Page 10]
>> 
>> 
>> Internet-Draft           Privacy Considerations            February 2013
>> 
>> 
>>   themselves been compromised."  However, by its nature privacy
> 
> which nature?
> 
> seriously, how is the reader to know (or even guess) what exactly is being implied?

Fixed.

> 
> 
>>   analysis requires questioning this assumption since systems are often
>>   compromised for the purpose of obtaining personal data.
>> 
>>   Although recipients, intermediaries, and enablers may not generally
>>   be considered as attackers, they may all pose privacy threats
>>   (depending on the context) because they are able to observe, collect,
> 
> exactly!
> 
> 
> 
>> 4. Privacy Threats
> ...
> 
>>   This section lists common privacy threats (drawing liberally from
>>   [Solove], as well as [CoE]), showing how each of them may cause
>>   individuals to incur privacy harms and providing examples of how
>>   these threats can exist on the Internet.
>> 
>>   Some privacy threats are already considered in IETF protocols as a
> 
> cite some examples.

These are explained throughout section 4. This is just the introductory text to the section.

> 
> 
>>   matter of routine security analysis.  Others are more pure privacy
> 
> What does it mean to be a "more pure privacy threat"?  Really, I can't guess.

Same as above -- this is explained throughout section 4.

> 
> 
>>   threats that existing security considerations do not usually address.
>>   The threats described here are divided into those that may also be
>>   considered security threats and those that are primarily privacy
>>   threats.
>> 
>>   Note that an individual's awareness of and consent to the practices
>>   described below may change an individual's perception of and concern
>>   for the extent to which they threaten privacy.  If an individual
>>   authorizes surveillance of his own activities, for example, the
>>   individual may be able to take actions to mitigate the harms
>>   associated with it, or may consider the risk of harm to be tolerable.
>> 
>> 4.1. Combined Security-Privacy Threats
> 
> The fact that you have a string like "Combined Security-Privacy" supports the view that Privacy Considerations is distinct from Security and should not be in the Security Considerations section…

Actually I think it's the opposite. Section 4 makes concrete distinctions between privacy threats that are already commonly covered by security considerations and those that are not.

> 
> 
> 
>> 4.1.4. Misattribution
>> 
>> 
>>   Misattribution occurs when data or communications related to one
>>   individual are attributed to another.  Misattribution can result in
>>   adverse reputational, financial, or other consequences for
>>   individuals that are misidentified.
> 
> It's probably worth mentioning that for spam, this is often called "spoofing".

Fixed.

> 
> 
> 
>> 5.1. Data Minimization
>> 
> ...
> 
>>   However, the most direct application of data minimization to protocol
>>   design is limiting identifiability.  Reducing the identifiability of
>>   data by using pseudonyms or no identifiers at all helps to weaken the
>>   link between an individual and his or her communications.  Allowing
>>   for the periodic creation of new identifiers reduces the possibility
> 
> also randomization of chosen identifiers

Fixed.

> 
> 
> 
>> 5.2. User Participation
>> 
>> 
>>   As explained in Section 4.2.5, data collection and use that happens
>>   "in secret," without the individual's knowledge, is apt to violate
>>   the individual's expectation of privacy and may create incentives for
>>   misuse of data.  As a result, privacy regimes tend to include
>>   provisions to require informing individuals about data collection and
>>   use and involving them in decisions about the treatment of their
>>   data.  In an engineering context, supporting the goal of user
>>   participation usually means providing ways for users to control the
>>   data that is shared about them.  It may also mean providing ways for
>>   users to signal how they expect their data to be used and shared.
> 
> There is a serious downside to this.  It presumes that this burden on users is reasonable.  For many scenarios, it isn't.  Rather, the focus on user participation is often used as an alternative to the difficult work (or research) on mechanisms that require less user participation.

I agree that sole reliance on user participation is undesirable. It's listed here as one of several protections, so sole reliance is not implied. For protocol design I would actually argue that we tend not to think about user participation enough (whereas I agree that privacy policy tends to focus on it too much).

> 
> 
> 
> 
>> 6. Scope of Privacy Implications of Internet Protocols
>> 
>> 
>>   Internet protocols are often built flexibly, making them useful in a
>>   variety of architectures, contexts, and deployment scenarios without
>>   requiring significant interdependency between disparately designed
>>   components.  Although protocol designers often have a particular
>>   target architecture or set of architectures in mind at design time,
>>   it is not uncommon for architectural frameworks to develop later,
>>   after implementations exist and have been deployed in combination
>>   with other protocols or components to form complete systems.
> 
> Independent of the purpose of this draft, the above paragraph is quite a nice bit of text about an aspect of IETF technical work.

Thanks.

> 
> 
>>   As a consequence, the extent to which protocol designers can foresee
>>   all of the privacy implications of a particular protocol at design
>>   time is limited.  An individual protocol may be relatively benign on
>>   its own, and it may make use of privacy and security features at
>>   lower layers of the protocol stack (Internet Protocol Security,
>>   Transport Layer Security, and so forth) to mitigate the risk of
>>   attack.  But when deployed within a larger system or used in a way
>>   not envisioned at design time, its use may create new privacy risks.
>>   Protocols are often implemented and deployed long after design time
>>   by different people than those who did the protocol design.  The
>>   guidelines in Section 7 ask protocol designers to consider how their
>>   protocols are expected to interact with systems and information that
>>   exist outside the protocol bounds, but not to imagine every possible
>>   deployment scenario.
>> 
>>   Furthermore, in many cases the privacy properties of a system are
>>   dependent upon the complete system design where various protocols are
>>   combined together to form a product solution; the implementation,
>>   which includes the user interface design; and operational deployment
>>   practices, including default privacy settings and security processes
>>   within the company doing the deployment.  These details are specific
>>   to particular instantiations and generally outside the scope of the
>>   work conducted in the IETF.  The guidance provided here may be useful
>>   in making choices about these details, but its primary aim is to
>>   assist with the design, implementation, and operation of protocols.
> 
> Perhaps the largest challenge I repeatedly see in the IETF is what I call "systems thinking", which is considering an integrated set of components and their interactions.  The above three paragraphs very nicely target exactly that scope of concern, in the context of privacy.
> 
> So I /strongly/ suggest you move the three paragraphs up to the Introduction.  Note that this would largely resolve the concern I raised there, that the Introduction really doesn't introduce cross-component (multi-specification) scoping issues for privacy.  Add a citation in it to this section.

This section has been moved to directly after the introduction.

> 
> 
>>   Transparency of data collection and use -- often effectuated through
>>   user interface design -- is normally a key factor in determining the
> 
> I realize that's a common view, but has it been validated or is it merely the default perspective that user permission solves everything?

Good point. I've added some text to indicate that this is what often happens, whether rightly or wrongly.

> 
> 
>>   privacy impact of a system.  Although most IETF activities do not
>>   involve standardizing user interfaces or user-facing communications,
>>   in some cases understanding expected user interactions can be
>>   important for protocol design.  Unexpected user behavior may have an
>>   adverse impact on security and/or privacy.
> 
> While a generically reasonable view, the challenge with its application in the IETF is our general tendency to think that we understand UI and UX issues, although few in the IETF actually have the background for it.  For example we tend to think that simply giving users more information is a universal palliative.  Most discussions here about "expected user interactions" are simply wrong.  Worse, I've no idea what to suggest to counter this for the draft.

Yeah, I'm not sure the draft can fix this problem. But agree that it's a problem.

> 
> 
>> 7. Guidelines
>> 
>> 
>>   This section provides guidance for document authors in the form of a
>>   questionnaire about a protocol being designed.  The questionnaire may
>>   be useful at any point in the design process, particularly after
>>   document authors have developed a high-level protocol model as
>>   described in [RFC4101].
>> 
>>   Note that the guidance does not recommend specific practices.  The
>>   range of protocols developed in the IETF is too broad to make
>>   recommendations about particular uses of data or how privacy might be
>>   balanced against other design goals.  However, by carefully
>>   considering the answers to each question, document authors should be
>>   able to produce a comprehensive analysis that can serve as the basis
>>   for discussion of whether the protocol adequately protects against
>>   privacy threats.
> 
> For some years after Security Considerations were made mandatory, authors mostly floundered with the topic, given their/our lack of background for assessing security considerations.  Eventually there was IETF focus on making the section useful.
> 
> While this draft goes a long way to making the nature and requirements of a Privacy Considerations section substantive, it's going to be some time before the community develops helpful skills at writing these sections.
> 

Agree.

> I suggest setting up a Privacy Directorate, essentially as a consulting/review service for authors to use in developing their text for the section in their documents.  The Directorate might also take initiative at reviewing new documents.

Perhaps this can be resurrected.

> 
> 
>>   The framework is divided into four sections that address each of the
>>   mitigation classes from Section 5, plus a general section.  Security
>>   is not fully elaborated since substantial guidance already exists in
>>   [RFC3552].
>> 
>> 7.1. Data Minimization
>> 
>> 
>>      a.  Identifiers.  What identifiers does the protocol use for
>>      distinguishing initiators of communications?  Does the protocol
>>      use identifiers that allow different protocol interactions to be
>>      correlated?  What identifiers could be omitted or be made less
>>      identifying while still fulfilling the protocol's goals?
> 
> I'd think that retention of recipient identifiers might also be an issue?

This is covered in 7.1g.

> 
> 
>>      b.  Data.  What information does the protocol expose about
>>      individuals, their devices, and/or their device usage (other than
>>      the identifiers discussed in (a))?  To what extent is this
>>      information linked to the identities of the individuals?  How does
>>      the protocol combine personal data with the identifiers discussed
>>      in (a)?
>> 
>>      c.  Observers.  Which information discussed in (a) and (b) is
>>      exposed to each other protocol entity (i.e., recipients,
>>      intermediaries, and enablers)?  Are there ways for protocol
>>      implementers to choose to limit the information shared with each
>>      entity?  Are there operational controls available to limit the
>>      information shared with each entity?
>> 
>>      d.  Fingerprinting.  In many cases the specific ordering and/or
>>      occurrences of information elements in a protocol allow users,
>>      devices, or software using the protocol to be fingerprinted.  Is
>>      this protocol vulnerable to fingerprinting?  If so, how?  Can it
>> 
>> 
>> 
>> Cooper, et al.           Expires August 27, 2013               [Page 25]
>> 
>> 
>> Internet-Draft           Privacy Considerations            February 2013
>> 
>> 
>>      be designed to reduce or eliminate the vulnerability?  If not, why
>>      not?
>> 
>>      e.  Persistence of identifiers.  What assumptions are made in the
>>      protocol design about the lifetime of the identifiers discussed in
>>      (a)?  Does the protocol allow implementers or users to delete or
>>      replace identifiers?  How often does the specification recommend
>>      to delete or replace identifiers by default?  Can the identifiers,
>>      along with other state information, be set to automatically
>>      expire?
>> 
>>      f.  Correlation.  Does the protocol allow for correlation of
>>      identifiers?  Are there expected ways that information exposed by
> 
> Is it productive to also look for 'unexpected' ways?  This could be a silly and wasteful exercise, or thinking creatively about strange combinations might trigger better insight.  I've no direct experience, so can't judge.

I think eventually that might be something we want to load onto protocol designers, but at this point even just scoping this to expected ways would be helpful IMO.

> 
> 
>> 8. Example
> ...
> 
>>   The fundamental architecture defined in RFC 2778 and RFC 3859 is a
>>   mediated one.  Clients (presentities in RFC 2778 terms) publish their
>>   presence information to presence servers, which in turn distribute
>>   information to authorized watchers.  Presence servers thus retain
>>   presence information for an interval of time, until it either changes
>>   or expires, so that it can be revealed to authorized watchers upon
>>   request.  This architecture mirrors existing pre-standard deployment
>>   models.  The integration of an explicit authorization mechanism into
>>   the presence architecture has been widely successful in involving the
>>   end users in the decision making process before sharing information.
>>   Nearly all presence systems deployed today provide such a mechanism,
>>   typically through a reciprocal authorization system by which a pair
>>   of users, when they agree to be "buddies," consent to divulge their
>>   presence information to one another.  Buddylists are managed by
>>   servers but controlled by end users.  Users can also explicitly block
>>   one another through a similar interface, and in some deployments it
>>   is desirable to provide "polite blocking" of various kinds.
> 
> As the discussion moves into the details of analyzing each type of privacy concern, I suggest making the format be bulleted and/or tabular.  This will make each segment of analysis more accessible to the reader and easier to correlate with the lists of privacy concerns/attributes provided earlier in the document.  It will also aid scanning for review and later consultation.

The goal of this section was to review how privacy decisions were made within the confines of one example architecture. In the IAB privacy program we plan to try to apply the guidance in more formal write-ups in the manner you suggest to some other existing protocols/architectures (likely selected from the reviews we've already done: <http://www.iab.org/activities/programs/privacy-program/privacy-reviews/>). 

> 
> 
>>   From a perspective of privacy design, however, the classical presence
>>   architecture represents nearly a worst-case scenario.  In terms of
>> 
>> 
>> 
>> Cooper, et al.           Expires August 27, 2013               [Page 28]
>> 
>> 
>> Internet-Draft           Privacy Considerations            February 2013
>> 
>> 
>>   data minimization, presentities share their sensitive information
>>   with presence services, and while services only share this presence
>>   information with watchers authorized by the user, no technical
>>   mechanism constrains those watchers from relaying presence to further
> 
> Offhand, I don't know what mechanisms are practical to impose such a constraint, in a protocol specification.  It would help to see an example.

I don't think the implication is that they exist.

> 
> 
>>   third parties.  Any of these entities could conceivably log or retain
>>   presence information indefinitely.  The sensitivity cannot be
>>   mitigated by rendering the user anonymous, as it is indeed the
>>   purpose of the system to facilitate communications between users who
>>   know one another.  The identifiers employed by users are long-lived
>>   and often contain personal information, including personal names and
>>   the domains of service providers.  While users do participate in the
>>   construction of buddylists and blacklists, they do so with little
>>   prospect for accountability: the user effectively throws their
>>   presence information over the wall to a presence server that in turn
>>   distributes the information to watchers.  Users typically have no way
>>   to verify that presence is being distributed only to authorized
>>   watchers, especially as it is the server that authenticates watchers,
>>   not the end user.  Connections between the server and all publishers
>>   and consumers of presence data are moreover an attractive target for
>>   eavesdroppers, and require strong confidentiality mechanisms, though
>>   again the end user has no way to verify what mechanisms are in place
>>   between the presence server and a watcher.
> 
> Again, what would be realistic choices for fixing this?  (It's possible that there aren't any and that privacy considerations would merely need to document an inherent and unfixable exposure.  In terms of guidance to writers of privacy considerations, that's ok, but it's worth making this point clear.)

Per the above, this was meant to provide a review of how the architecture was conceived of at the time it was designed.

> 
> ...
> 
>>   Privacy concerns about presence information largely arise due to the
>>   built-in mediation of the presence architecture.  The need for a
>>   presence server is motivated by two primary design requirements of
>>   presence: in the first place, the server can respond with an
>>   "offline" indication when the user is not online; in the second
>>   place, the server can compose presence information published by
>>   different devices under the user's control.  Additionally, to
>> 
>> 
>> 
>> Cooper, et al.           Expires August 27, 2013               [Page 29]
>> 
>> 
>> Internet-Draft           Privacy Considerations            February 2013
>> 
>> 
>>   preserve the use of URIs as identifiers for entities, some service
> 
> "preserve"?

Fixed.

> 
> 
>>   must operate a host with the domain name appearing in a presence URI,
>>   and in practical terms no commercial presence architecture would
>>   force end users to own and operate their own domain names.  Many end
>>   users of applications like presence are behind NATs or firewalls, and
>>   effectively cannot receive direct connections from the Internet - the
>>   persistent bidirectional channel these clients open and maintain with
>>   a presence server is essential to the operation of the protocol.
> 
> So?  I'm not understanding what makes this a privacy issue.
> 

This is explaining why the mediated model was chosen.

> 
>>   One must first ask if the trade-off of mediation for presence is
>>   worth it.  Does a server need to be in the middle of all publications
> 
>   worth it -> worthwhile.

Fixed.

> 
> 
>>   of presence information?  It might seem that end-to-end encryption of
>>   the presence information could solve many of these problems.  A
> 
> Not as described:  You'd still have mediation.  That is, the solution you offer does not answer the question you ask.
> 
> I think you mean to ask whether the intermediary needs to see all presence information in the clear.  If you really intend to suggest that an intermediary isn't needed, then you need to describe a scenario without one.

I think mediation is understood broadly here -- not as the question of whether an intermediary exists, but whether it actually mediates all aspects of the interaction.

> 
> 
>>   presentity could encrypt the presence information with the public key
>>   of a watcher, and only then send the presence information through the
>>   server.  The IETF defined an object format for presence information
>>   called the Presence Information Data Format (PIDF), which for the
>>   purposes of conveying location information was extended to the PIDF
>>   Location Object (PIDF-LO) - these XML objects were designed to
>>   accommodate an encrypted wrapper.  Encrypting this data would have
>>   the added benefit of preventing stored cleartext presence information
>>   from being seized by an attacker who manages to compromise a presence
>>   server.  This proposal, however, quickly runs into usability
>>   problems.  Discovering the public keys of watchers is the first
>>   difficulty, one that few Internet protocols have addressed
>>   successfully.  This solution would then require the presentity to
>>   publish one encrypted copy of its presence information per authorized
>>   watcher to the presence service, regardless of whether or not a
>>   watcher is actively seeking presence information - for a presentity
>>   with many watchers, this may place an unacceptable burden on the
>>   presence server, especially given the dynamism of presence
>>   information.  Finally, it prevents the server from composing presence
>>   information reported by multiple devices under the same user's
>>   control.  On the whole, these difficulties render object encryption
>>   of presence information a doubtful prospect.
>> 
>>   Some protocols that provide presence information, such as SIP, can
> 
> hmmm.  I didn't think that SIP, itself, provided presence information...?  SIMPLE uses SIP, but it isn't SIP doing the presence work.

Fixed.

> 
> 
>>   operate intermediaries in a redirecting mode, rather than a
>>   publishing or proxying mode.  Instead of sending presence information
>>   through the server, in other words, these protocols can merely
>>   redirect watchers to the presentity, and then presence information
>>   could pass directly and securely from the presentity to the watcher.
>>   It is worth noting that this would disclose the IP address of the
>>   presentity to the watcher, which has its own set of risks.  In that
>>   case, the presentity can decide exactly what information it would
>>   like to share with the watcher in question, it can authenticate the
>>   watcher itself with whatever strength of credential it chooses, and
>>   with end-to-end encryption it can reduce the likelihood of any
>> 
>> 
>> 
>> Cooper, et al.           Expires August 27, 2013               [Page 30]
>> 
>> 
>> Internet-Draft           Privacy Considerations            February 2013
>> 
>> 
>>   eavesdropping.  In a redirection architecture, a presence server
>>   could still provide the necessary "offline" indication, without
>>   requiring the presence server to observe and forward all information
>>   itself.  This mechanism is more promising than encryption, but also
>>   suffers from significant difficulties.  It too does not provide for
>>   composition of presence information from multiple devices - it in
>>   fact forces the watcher to perform this composition itself.  The
>>   largest single impediment to this approach is however the difficulty
>>   of creating end-to-end connections between the presentity's device(s)
>>   and a watcher, as some or all of these endpoints may be behind NATs
>>   or firewalls that prevent peer-to-peer connections.  While there are
>>   potential solutions for this problem, like STUN and TURN, they add
>>   complexity to the overall system.
> 
> Given the pragmatics, I'm surprised you'd call this 'promising'.

It's phrased relatively.

> 
> 
>> 
>>   Consequently, mediation is a difficult feature of the presence
>>   architecture to remove, and due especially to the requirement for
>>   composition it is hard to minimize the data shared with
>>   intermediaries.  Control over sharing with intermediaries must
>>   therefore come from some other explicit component of the
>>   architecture.  As such, the presence work in the IETF focused on
>>   improving the user participation over the activities of the presence
>>   server.  This work began in the GEOPRIV working group, with controls
>>   on location privacy, as location of users is perceived as having
>>   especially sensitive properties.  With the aim to meet the privacy
>>   requirements defined in [RFC2779] a set of usage indications, such as
>>   whether retransmission is allowed or when the retention period
>>   expires, have been added to PIDF-LO that always travel with location
>>   information itself.  These privacy preferences apply not only to the
>>   intermediaries that store and forward presence information, but also
>>   to the watchers who consume it.
>> 
>>   This approach very much follows the spirit of Creative Commons [CC],
>>   namely the usage of a limited number of conditions (such as 'Share
>>   Alike' [CC-SA]).  Unlike Creative Commons, the GEOPRIV working group
>>   did not, however, initiate work to produce legal language nor to
>>   design graphical icons since this would fall outside the scope of the
> 
> hmmm.  This raises a possible issue with finding and liaising with other groups relevant to privacy and with complementary skills.  So, for example, here's a case of needing work to aid privacy that was identified but needed to be handed off to another group.
> 
> Lining up such contacts ahead of time could be a useful bit of work for a privacy directorate?

Perhaps, it would depend on the nature of the use cases envisioned for the particular privacy preference expressions being built into protocols.

Thanks,
Alissa

> 
> 
> 
> d/
> 
> 
> -- 
> Dave Crocker
> Brandenburg InternetWorking
> bbiw.net
>