Re: [privacydir] Sean Turner's Discuss on draft-ietf-ipfix-anon-05: (with DISCUSS and COMMENT)

Sean Turner <turners@ieca.com> Thu, 20 January 2011 13:25 UTC

Return-Path: <turners@ieca.com>
X-Original-To: privacydir@core3.amsl.com
Delivered-To: privacydir@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8F9E93A7118 for <privacydir@core3.amsl.com>; Thu, 20 Jan 2011 05:25:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.529
X-Spam-Level:
X-Spam-Status: No, score=-102.529 tagged_above=-999 required=5 tests=[AWL=0.069, BAYES_00=-2.599, UNPARSEABLE_RELAY=0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id auYUggZISYjv for <privacydir@core3.amsl.com>; Thu, 20 Jan 2011 05:24:59 -0800 (PST)
Received: from nm24-vm0.bullet.mail.sp2.yahoo.com (nm24-vm0.bullet.mail.sp2.yahoo.com [98.139.91.226]) by core3.amsl.com (Postfix) with SMTP id ACFCF3A6DAA for <privacydir@ietf.org>; Thu, 20 Jan 2011 05:24:59 -0800 (PST)
Received: from [98.139.91.69] by nm24.bullet.mail.sp2.yahoo.com with NNFMP; 20 Jan 2011 13:27:40 -0000
Received: from [98.139.91.53] by tm9.bullet.mail.sp2.yahoo.com with NNFMP; 20 Jan 2011 13:27:40 -0000
Received: from [127.0.0.1] by omp1053.mail.sp2.yahoo.com with NNFMP; 20 Jan 2011 13:27:40 -0000
X-Yahoo-Newman-Id: 50451.96169.bm@omp1053.mail.sp2.yahoo.com
Received: (qmail 85065 invoked from network); 20 Jan 2011 13:27:40 -0000
Received: from thunderfish.local (turners@71.191.14.145 with plain) by smtp112.biz.mail.sp1.yahoo.com with SMTP; 20 Jan 2011 05:27:39 -0800 PST
X-Yahoo-SMTP: ZrP3VLSswBDL75pF8ymZHDSu9B.vcMfDPgLJ
X-YMail-OSG: EDT6clEVM1kH7ziyiixUDZQ0_rsH4UNitcfNHcWWD.ANSpL 4uIP0TQOw3AiPYOS.H0YTOEKfQSD4rJszStkJ2U47wSMa8aoLj8Z.DWb8PdY d9.khmlIe2TS4hzEhX1oxwJnywC5ntT2xa6LcNK5hKFKSbtaYosBBLiYi7wS ZUL9neT03HdbqiqSf91jTrswfYouT093VJSKUA9Qh3qceL4OwjAdGYZS6uUV UMCzQtzoGC35kBR.0Judf3tvzR9._gG78xC4O5ZOlc1vyTntR1r1QkT1soIx kAFit5TRtCsxzie9zb8KX4hLEfNA-
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4D3837F9.2030807@ieca.com>
Date: Thu, 20 Jan 2011 08:26:17 -0500
From: Sean Turner <turners@ieca.com>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101207 Lightning/1.0b2 Thunderbird/3.1.7
MIME-Version: 1.0
To: privacydir@ietf.org
References: <20110106160145.18680.61661.idtracker@localhost> <C4007BB6-8781-474F-8D8A-F18BFA654063@tik.ee.ethz.ch>
In-Reply-To: <C4007BB6-8781-474F-8D8A-F18BFA654063@tik.ee.ethz.ch>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: Re: [privacydir] Sean Turner's Discuss on draft-ietf-ipfix-anon-05: (with DISCUSS and COMMENT)
X-BeenThere: privacydir@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "Privacy Directorate to develop the concept of privacy considerations for IETF specifications and to review internet-drafts for privacy considerations." <privacydir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/privacydir>, <mailto:privacydir-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/privacydir>
List-Post: <mailto:privacydir@ietf.org>
List-Help: <mailto:privacydir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/privacydir>, <mailto:privacydir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Jan 2011 13:25:01 -0000

I logged a lengthy DISCUSS against this draft based on Nick's comments. 
  Here's the diffs from the old version:

Diff from previous version:
http://tools.ietf.org/rfcdiff?url2=draft-ietf-ipfix-anon-06

I haven't yet had a chance to review the changes to confirm that they 
actually addressed the comments.  I did however want everyone to see 
that there's been some impact.  Thanks for the review Nick.

spt

On 1/20/11 5:52 AM, Brian Trammell wrote:
> Hi, Sean, all,
>
> We have just posted an -06 revision of the ipfix-anon draft, which we believe addresses all these issues.
>
> Best regards,
>
> Brian
>
> On Jan 6, 2011, at 5:01 PM, Sean Turner wrote:
>
>> Sean Turner has entered the following ballot position for
>> draft-ietf-ipfix-anon-05: Discuss
>>
>> When responding, please keep the subject line intact and reply to all
>> email addresses included in the To and CC lines. (Feel free to cut this
>> introductory paragraph, however.)
>>
>> Please refer to http://www.ietf.org/iesg/statement/discuss-criteria.html
>> for more information about IESG DISCUSS and COMMENT positions.
>>
>>
>>
>> ----------------------------------------------------------------------
>> DISCUSS:
>> ----------------------------------------------------------------------
>>
>> #1) General
>>
>> The discussion and definition of 'anonymization' here may be the one more usually used when referring to data flows, but there are larger fields at work here that it might be a good idea to harmonize with.  The idea of deleting fields or aggregating records in a data set is more closely associated with the topic of inference control (see also work on "k-anonymity").  For these, the Inference Control subsection of Ross Anderson's "Security Engineering" is a reasonably good introduction.  The idea of anonymity in general seems more closely tied to notions of untraceability and unlinkability; for those see Pfitzmann and Hansen "A terminology for talking about privacy by data minimization".  It's no disaster (and certainly not unprecedented!) to have the meaning of "anonymize" be context-dependent, but it _is_ important to say what property you actually mean to achieve thereby.
>>
>> It would also be useful to note that this draft _only_ considers the kind of inference control you can achieve through transformation of individual flow records.  Aggregation-based anonymization is not considered --and not even included in this draft's definition of anonymization!-- even though it provides more robust privacy results.
>>
>> #2) Section 3
>>
>> So it seems that the goal of anonymization, as stated here, is to prevent IP flow data from being traced to the networks, hosts, or users that participated it.  But this is only one possible goal of these techniques.
>>
>> Other uses of the anonymization techniques in this draft include:
>>
>>    - One-way untraceability.  Under some circumstances, it is fine
>>      to identify the originator/recipient of a given flow, but not
>>      both.  For example, it might be fine to identify users so long
>>      as the services they use can't be inferred, and it might be
>>      fine to identify servers so long as you can't tell who their
>>      users are.
>>
>>    - Resistance to partner profiling.  Even if no particular flow can
>>      be linked to a particular entity, it might be undesirable for
>>      the set of flows as a whole to be useful for statistically
>>      inferring certain properties of networks, hosts, or users.  For
>>      example, even if an attacker can trace no specific flow to
>>      users Alice and Bob with confidence greater than 0.01%, if they
>>      could nevertheless infer that Alice and Bob communicate
>>      regularly with P=99%, Alice and Bob would reasonably consider
>>      their privacy to have been compromised.
>>
>>      For a more rigorous of an attack that achieves profiling without
>>      tracing specific interactions, see Danezis's Statistical
>>      Disclosure attack.
>>
>>    - Non-observability.  It might be undesirable for a flow or set
>>      of flows to confirm that a particular entity was in fact
>>      present or absent at a given time.
>>
>> Some related desirable properties include:
>>
>>    - Non-linkability.  It might be undesirable for two flows
>>      generated by the same entity to be linked to one another.
>>      Linkability between flows is a strong amplifier for traceability
>>      attacks: if through mischance, misdesign, or external
>>      knowledge, an adversary manages to trace a single flow to one
>>      of its entities, then linkability between flows means that
>>      _all_ of that entity's flows are now traced.
>>
>>      Website Fingerprinting ("Fingerprinting Websites Using Traffic
>>      Analysis", Andrew Hintz, 2002) is an example of a more subtle
>>      attack enabled by flow linkability.
>>
>>    - Ccorrelation resistance. It might be undesirable for
>>      anonymized flows processed by different IPFIX installations to
>>      be correlated to one another.  For example, suppose that the
>>      same flow is anonymized one way as it travels through network
>>      A, and another way as it travels through network B.  It might
>>      be the case that neither anonymized flow on its own has enough
>>      information to identify a user, but that both flows, taken
>>      together, can identify the user.  If this is so, and an attacker
>>      might see both anonymized flows, then it becomes critical to
>>      ensure that the adversary cannot easily learn that the two
>>      anonymized flows refer to the same flow.
>>
>> I'm not proposing that every IPFIX user should want all of these properties under all circumstances, but without them, the untraceability properties become more fragile and much harder to achieve.
>>
>> It seems that elsewhere in the document, requirements _like_ these are considered, though they're usually not explicit. Instead, the document only says that certain properties that are not themselves identifiers "can be used to identify hosts and users" without much considering how in some cases.
>>
>> #3) Section 3
>>
>> (Perhaps this applies better to the security consideration section. Either way, without a discussion of known attacks against entities' privacy, it's hard to have a meaningful discussion of how privacy can be achieved.)
>>
>> Privacy, like security, requires us to consider threat models.  In other words, we need to state our privacy requirements in terms of an attacker's resources, and what count as a successful attack. The part of this draft that worries me most is that, when discussing "untraceability", it does so with no actual explicit attacker in mind.
>>
>> Because there isn't an explicit threat model or collection of threat models, it's not really possible to say whether some of the attacks and caveats below are really "valid" attacks against their anonymization methods, because "without an treat model, there are no vulnerabilities--only surprising features."
>>
>> For example, some places in the draft discuss "traffic injection" attacks, implying an active attacker.  But other places in the draft claim that anonymization techniques are effective when they _do not_ resist an active attacker.
>>
>> #4) Section 4
>>
>> Throughout this section, it seems potentially misleading to say what various anonymization techniques are "intended to defeat", and so on.  A naive reader could take this to mean that a technique _actually does_ defeat one of these attacks, or that it _actually will_ provide a given degree of privacy, which I think is not what the authors are trying to say.
>>
>> On the other hand, if this draft _is_ trying to say which techniques achieve what, then it needs to be much, much more specific about the threat models and circumstances for which its statements are true.
>>
>> #5) Section 4.1.3
>>
>> We should say something about the security requirements of the permutation function.  There is all the difference in the world between, say, a block cipher and an xor with a known constant, but this section doesn't actually make that distinction.  Below, insection 5.4, the draft says that the permutation and its parameters SHOULD not be exported, but that's not quite the same as saying that the permutation SHOULD be hard to invert without knowing its parameters.
>>
>> Similarly, the recommendation to use a hash function can fail badly if the hash function is known to the attacker: it is trivial for the attacker to brute force all IPv4 addresses to deanonymize subjects if a known hash is used.  HMAC with a secret key would be more appropriate.
>>
>> #6) Section 4.3.2
>>
>> You should note that an active attacker who can create recognizable flows can turn an enumerated timestamp dataset into a precision-degraded dataset by periodically injecting a recognizable flow.
>>
>> #7) Section 4.3.3
>>
>> You should note that adding a uniform random shifts is remarkably fragile: if the adversary can identify the correct time for even one flow, he can learn the times for all other flows.  Worse, if datasets are generated continuously, with each one starting right after the previous one finishes, then the attacker who knows the shift for one dataset can place bounds about the shifts for all close-in-time datasets by induction.
>>
>> #8) Section 5.3&  5.4
>>
>> Statements to the effect of "Information about [the particular anonymization technique used] SHOULD NOT be exported" are a total violation of Kerckhoffs' principle: the security of a system should depend only on the secrecy of key-like parameters, not on the secrecy of its algorithms.
>>
>> In practice, after all, any competent attacker will know which permutation functions and binning functions are implemented by the popular IPFIX vendors.  Any aspect of permutation/binning which the attacker must not learn needs be keyed with a secret key that can be changed locally.
>>
>> #9) Section 9
>>
>> A risk that it could be worthwhile to mention: Frequently, anonymized data will be treated by administrators as "not
>> privacy-sensitive" when in fact it should only be treated as "less privacy-sensitive."  (For examples in other fields, see the results concerning user reidentification from AOL's search terms, or Netflix film queues.)  The anonymization techniques described here do indeed make entities associated with flows harder to trace ... but there is a risk that when they are applied, administrators will treat flow data as "completely safe" when in fact it has only become "less harmful if misused".
>>
>>
>> ----------------------------------------------------------------------
>> COMMENT:
>> ----------------------------------------------------------------------
>>
>> #1) General
>>
>> The authors should decide whether they're going to use American or British spellings.  The draft uses the American spellings for categorize, organize, minimize, behavior, and so on, but unaccountably uses the British "-ise" in anonymise and pseudonymise.  In the research literature, and in the other relevant RFCs, "anonymize" seems to be more popular, but either spelling type is fine so long as it's consistent.
>>
>> #2) Section 1
>>
>> It might be wise to repeat here (or even in the abstract) the note from the Security Considerations section that this draft is only meant to explain how to interchange anonymized data, not to provide any recommendations as to which anonymization techniques to use, or even any guarantee that any particular technique achieves any particular purpose.  Otherwise, it is easy to misread some parts of section 4 as promising that particular techniques will prevent particular attacks, which is not in fact the case for reasonable threat models.
>>
>> #3) Section 4.2
>>
>> Brute-forcing a 48-bit MAC addresses is harder than brute-forcing a 32-bit IPv4 address, but not out of reach even for a hobbyist.
>>
>> #4) Section 4.3
>>
>> There is existing research on the extent to which the beginning and ending times of related flows can be used to link an anonymized view of a flow to a non-anonymized view of the flow.  Can we add a pointer to Murdoch and Zelinski's "Sampled Traffic Analysis by Internet-Exchange-Level Adversaries". [ http://petworkshop.org/2007/papers/PET2007_preproc_Sampled_traffic.pdf ]
>>
>> #5) Section 4.3&  4.4
>>
>> There is a pretty extensive literature about the extent to which perturbing timing and volume information prevents correlation, linkability, and website fingerprinting. Check out the traffic analysis section of freehaven.net/anonbib, and also check out the literature on "stepping stone detection".
>>
>> The results are unintuitive to many people; in general, to resist correlation and linkability attacks, you need to use perturbations of higher-variance or bins of larger size than many implementors would expect.
>>
>> Seems like there should be a reference added to these.
>
>