Re: [TLS] About encrypting SNI - Traffic Analysis Attacks?
Tom Ritter <tom@ritter.vg> Thu, 15 May 2014 03:31 UTC
Return-Path: <tom@ritter.vg>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 730CD1A021B for <tls@ietfa.amsl.com>; Wed, 14 May 2014 20:31:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.321
X-Spam-Level: *
X-Spam-Status: No, score=1.321 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ei9gHIWGwcAK for <tls@ietfa.amsl.com>; Wed, 14 May 2014 20:31:32 -0700 (PDT)
Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com [IPv6:2a00:1450:400c:c05::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1FB871A01EB for <tls@ietf.org>; Wed, 14 May 2014 20:31:31 -0700 (PDT)
Received: by mail-wi0-f175.google.com with SMTP id f8so9077630wiw.14 for <tls@ietf.org>; Wed, 14 May 2014 20:31:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ritter.vg; s=vg; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=8G4QjLdXQbxDYA81wI/e9sATkeVVAPUI5VaZgU1eZS0=; b=vlrZOpAzC1FD9de57U3CGKBUZU6bN6jZPki7+CRSyYps+hAAkz7aCCOkCRuCnTcHXb MPJlH/GIYIqKBl3KRxSvC7ywDLa79BO29HIxlRwODaPVkiwoGpVPr87Um57eKSWiMPFY HTI3al24Ldg5eCKgNUQhKRhE0z2qzZ1VkVXc4=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=8G4QjLdXQbxDYA81wI/e9sATkeVVAPUI5VaZgU1eZS0=; b=lQulRNOmQPOUJddVoyu4vHU/9d0Yilr+Zz+jpDbPHhTupGYaqm4EsS8zRYOK4/K0S0 3MHe9GdTFJUoOJ4sj2+BU1i9EmoLKpWJVMKNXoNRW0WafT7vdTic/aSB9kKo/LumRJxT q/MsyesVVTu8QDq/GD9NujVbvfeTnaG0b8QrsbaqvUgbSdLHOP42gw7mlMLH2wZWT57q f0QaAjAfYskFUBqvXKle6jdsKqTGUqgAcvXr5rAszO86G7nNTAGF9Dv182wvEpgThXn7 fl5lCrTnjKS3yTAnB1IpeIUBXfjV/vnLKjU2pnRgKYjw97woTJ3D0qHBnJDIbhxg3MMl HuMw==
X-Gm-Message-State: ALoCoQmcvMPjGiqx+xCIyyIpalmF1d2f9Z7j7WhGd11QmkgDDglMSQ6TYTWyTKFNomondVirVLwH
X-Received: by 10.180.211.243 with SMTP id nf19mr6356444wic.58.1400124684155; Wed, 14 May 2014 20:31:24 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.58.134 with HTTP; Wed, 14 May 2014 20:31:04 -0700 (PDT)
In-Reply-To: <53727940.2070900@nthpermutation.com>
References: <2A0EFB9C05D0164E98F19BB0AF3708C7120A04ED40@USMBX1.msg.corp.akamai.com> <534C3D5A.3020406@fifthhorseman.net> <474FAE5F-DE7D-4140-931E-409325168487@akamai.com> <D2CB0B72-A548-414C-A926-A9AA45B962DA@gmail.com> <2A0EFB9C05D0164E98F19BB0AF3708C7120B490162@USMBX1.msg.corp.akamai.com> <CACsn0cmusUc3Rsb2Wof+dn0PEg3P0bPC3ZdJ75b9kkZ5LDGu_A@mail.gmail.com> <534DB18A.4060408@mit.edu> <CABcZeBOJ7k8Hb9QqCAxJ_uev9g_cb4j361dp7ANvnhOOKsT7NA@mail.gmail.com> <CA+cU71kFo6EihTVUrRRtBYEHbZwCa9nZo-awt4Sub2qXcKHC7g@mail.gmail.com> <CAK3OfOi1x9huaazwcO=d72mfOFuV_RyXnfHmFRduhhbJE2miYw@mail.gmail.com> <CALCETrWukS2QJSb01n7OpXD2iaK43OhZr4E8YZyJ6JaorCdBKw@mail.gmail.com> <CAKC-DJjgFrAmxkC-MsmL+-uRWpN_mDPGkV_g-6DhbVH+69EQEQ@mail.gmail.com> <2A0EFB9C05D0164E98F19BB0AF3708C7130ABEA050@USMBX1.msg.corp.akamai.com> <53725C34.8060105@fifthhorseman.net> <53727940.2070900@nthpermutation.com>
From: Tom Ritter <tom@ritter.vg>
Date: Wed, 14 May 2014 23:31:04 -0400
Message-ID: <CA+cU71nD89Wos7WLBggvo69DWfDKdCOX5_N9wFh3jUFP4an8yg@mail.gmail.com>
To: Michael StJohns <msj@nthpermutation.com>
Content-Type: text/plain; charset="ISO-8859-1"
Archived-At: http://mailarchive.ietf.org/arch/msg/tls/RgasPjSIZ4NH4khPhH5cPQmHA74
Cc: "tls@ietf.org" <tls@ietf.org>
Subject: Re: [TLS] About encrypting SNI - Traffic Analysis Attacks?
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>, <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tls/>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>, <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 15 May 2014 03:31:34 -0000
I have a lot of points on Encrypted SNI, so I'm looking forward to tomorrow, but I wanted to start putting some into email. I'd like to start with the notion that HTTPS fingerprinting makes it useless. One of the best surveys of the literatre is Roger Dingledine's blog post at https://blog.torproject.org/blog/critique-website-traffic-fingerprinting-attacks. (It does not include the most recent paper "I know why you went to the Clinic" by Miller et al.) There are lots of scenarios for HTTPS fingerprinting, and it's important to note that he concerns himself with Tor's use case: the adversary is trying to do fingerprinting when the client is potentially accessing the entire Internet. Some of the other scenarios considered in literature are a) determining which webPAGE is visited in a specific webSITE and b) determining which webPAGE you are visiting among many webPAGES on many (but finite) webSITES. An arguement that Encrypted SNI or Server Certificate (taken together, Encrypted Handshake) is not useful is absolutely correct if there is only one webSITE hosted on an IP address. The IP gives it away, regardless of DNS Privacy, Encrypted Handshake, etc (so long as the attacker can index the site, and we assume they can.) So I'd like to instead talk about the case when there is multiple webSITEs hosted at an IP, and an attacker wishes to determine which webSITE in a set of webSITEs the user is visiting - because that is the metadata that Encrypted Handshake protects against. Unfortunetly, there are no studies that attempt this, of course. But we can still talk about some stuff. First off, because we assume that the attacker knows all the websites hosted on an IP, it's a closed world study, an advantage to the attacker, they're much easier. However, even a 'closed world' is not truly closed. For a true closed world, an attacker must be able to train their classification engine on all the pages in the site. Dynamic content that changes via authentication, and just general additional data added to the page since classification (think comments, posts, etc) make this impossible. All of the studies do not attempt to deal with this problem. The attacker's advantage is not taken away, but it is ever so slightly reduced. But there are other things that make the closed world less closed: client beahvior. Caching, AdBlocking, Plugin Click-To-Play, Third-PArty Cookies Enabled/Disabled, Ads, Javascript Disabled and so on. Advantaged reduced a little bit again, although Miller attempts to address the caching one. Next off: False Positives Matter. A Lot. I'm basically just going to point you to the same section in Roger's blog post. Summed up and applied to us, it states that if your goal is to determine if a user is visiting Site A on IP X, instead of Site B - the false positive rate keeps adding up on each page load. You're not going to get a binary answer, you're more likely to get an answer that looks like "Well, we had 8 hits for SiteA, 10 for SiteB, 3 for SiteC". Advantage to the Defender. Finally, there are defenses a website concerned about this can employ. There are a number of different padding mechanisms for defenders to deploy, collected from many papers and outlined in Section 7 of the Miller paper. All of these defenses can be deployed unilaterally by the server (no client changes needed), inside of the TLS protocol, with the simple ability to insert random padding ignored by the client. I've seen this topic come up before, and I hope people are not opposed to this optional feature being present. So at this point I'd like to talk about the most recent paper, "I know why you went to the Clinic" by Miller et al. At a high level, they attempt to identify which webPAGE you are viewing inside a known webSITE. Different from our goal, but we're not speaking different languages. I'm going to liberally take excepts from it that in the real world will reduce their impressive claim of 89% accuracy. - They don't operate on whole sites, they took a subset of the site, 500 'labels' (unique pages), followed redirections, and ran it off that - They didn't operate on single loads, they visited 75 pages in a session and each page they collected 16 times. - They discover and remark that "caching significantly decreases the number of unique packet sizes observed for samples of a given label" and that "A reduction in the number of unique packet sizes reduces the number of non-zero features and creates difficulty in distinguishing samples." - said another way: caching makes it harder - They also find that visiting webPAGEs on a webSITE (as opposed to different webSITEs) causes decreased traffic volume, which also makes it harder. This is our scenario. - Their accuracy went from 72% to 89% by using a Hidden Markov Model. While this is not bad, strictly speaking, it assumes that the user follows the link structure of the site strictly. No bookmarks. They also don't factor in the possibility of 'Back' buttons (Although I suspect most websites that link A->B also link B->A so it's probably accounted for most of the time.) - They used 500 labels for their subset, the 4 sites whose redirections ballooned that up (to ~1000 labels) had the worst accuracy. Logical, but nice to see confirmation that the larger the site, the worse accuracy. - They do not factor in browser differences, OS differences, or user configuration of the browser - They present a defense that takes their own accuracy from 89% to 27%, with only 9% traffic overhead Perhaps most unrealistic: They assume users browsing with a single tab and easily delineated page requests. No mashing the refresh button, no multiple tabs, no opening in a background tab, no background tab that's doing AJAX polling, etc. There are some very sexy presentations on HTTPS Traffic Fingerprinting, probably the sexiest of which is watching you load Google Map Tiles over HTTPS, and detecting where you're zooming in on. But that presentation, and the papers, all assume a very unrealistic operating environment. Which is not to denigrate them, I think they're generally excellent work and foundations for further research. But you can't point to them and say the situation is hopeless. At this point, I'm actually pretty optimistic that attacks in the real world will be quite difficult. Trying to defeat Encrypted Handshake using HTTPS Traffic Analysis has a lot of things going against it: 1) The adversary's determination of Site A vs Site B is made significantly more difficult by cumulative false positives 2) The greater number of sites hosted at an address, the more difficult his job becomes. 3) The attacker has to do a considerable amount of work to train their classifier for the specific user preference that matches the user they're attacking 4) And factor in the current cache state of the user 5) A site can actively try and defend against it, and we have indications this would be very effective To put a point on active defenses: I think a lot of people assume that would never happen, but Twitter is currently padding profile images to resist traffic analysis. In the future, I expect more sites will start to care, and more people will ask them to. One of the things I think would be awesome was if a research project was done on this exact problem. I think that a CDN could provide accurate numbers for the characteristics and numbers of sites hosted on an IP address, which would be much better than arbitrarily chosen numbers. However negative results, that is results that say "We couldn't get good accuracy" often are not published, so we'd want to see those too ;) -tom
- [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Martin Thomson
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Watson Ladd
- Re: [TLS] About encrypting SNI Russ Housley
- Re: [TLS] About encrypting SNI Eric Rescorla
- Re: [TLS] About encrypting SNI Brian Sniffen
- Re: [TLS] About encrypting SNI Alyssa Rowan
- Re: [TLS] About encrypting SNI Nick Mathewson
- Re: [TLS] About encrypting SNI Daniel Kahn Gillmor
- Re: [TLS] About encrypting SNI Sniffen, Brian
- Re: [TLS] About encrypting SNI Yoav Nir
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Seth David Schoen
- Re: [TLS] About encrypting SNI Nico Williams
- Re: [TLS] About encrypting SNI Watson Ladd
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Yoav Nir
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Watson Ladd
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Eric Rescorla
- Re: [TLS] About encrypting SNI Alyssa Rowan
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Martin Rex
- Re: [TLS] About encrypting SNI Eric Rescorla
- Re: [TLS] About encrypting SNI Eric Rescorla
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Eric Rescorla
- Re: [TLS] About encrypting SNI Tom Ritter
- Re: [TLS] About encrypting SNI Brian Sniffen
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Eric Rescorla
- Re: [TLS] About encrypting SNI Watson Ladd
- Re: [TLS] About encrypting SNI Brian Sniffen
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Sniffen, Brian
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Alyssa Rowan
- Re: [TLS] About encrypting SNI Michael D'Errico
- Re: [TLS] About encrypting SNI James Cloos
- Re: [TLS] About encrypting SNI Jacob Appelbaum
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Erik Nygren
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Jacob Appelbaum
- [TLS] Forged RST (was: About encrypting SNI) Alyssa Rowan
- Re: [TLS] Forged RST (was: About encrypting SNI) Eric Rescorla
- Re: [TLS] About encrypting SNI Paul Lambert
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Erik Nygren
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] Forged RST (was: About encrypting SNI) Yoav Nir
- Re: [TLS] About encrypting SNI Sniffen, Brian
- Re: [TLS] Forged RST (was: About encrypting SNI) Erik Nygren
- Re: [TLS] About encrypting SNI James Cloos
- Re: [TLS] Forged RST Stephen Farrell
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] Forged RST (was: About encrypting SNI) Nico Williams
- Re: [TLS] About encrypting SNI Erik Nygren
- Re: [TLS] About encrypting SNI Sniffen, Brian
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Martin Thomson
- Re: [TLS] About encrypting SNI Brian Sniffen
- Re: [TLS] About encrypting SNI Eric Rescorla
- Re: [TLS] About encrypting SNI Sniffen, Brian
- Re: [TLS] About encrypting SNI Erik Nygren
- Re: [TLS] Forged RST (was: About encrypting SNI) Bill Frantz
- Re: [TLS] Forged RST (was: About encrypting SNI) Yoav Nir
- Re: [TLS] Forged RST (was: About encrypting SNI) Bill Frantz
- Re: [TLS] Forged RST (was: About encrypting SNI) Eric Rescorla
- Re: [TLS] Forged RST (was: About encrypting SNI) Watson Ladd
- Re: [TLS] Forged RST (was: About encrypting SNI) Eric Rescorla
- Re: [TLS] About encrypting SNI Marsh Ray
- Re: [TLS] About encrypting SNI Martin Rex
- Re: [TLS] About encrypting SNI David Holmes
- Re: [TLS] About encrypting SNI Brian Sniffen
- Re: [TLS] About encrypting SNI Marsh Ray
- Re: [TLS] About encrypting SNI Marsh Ray
- Re: [TLS] About encrypting SNI Nico Williams
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI Michael D'Errico
- Re: [TLS] About encrypting SNI Watson Ladd
- Re: [TLS] About encrypting SNI Erik Nygren
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Daniel Kahn Gillmor
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI - Traffic Analysis… Michael StJohns
- Re: [TLS] About encrypting SNI - Traffic Analysis… Nico Williams
- Re: [TLS] About encrypting SNI - Traffic Analysis… Stephen Farrell
- Re: [TLS] About encrypting SNI - Traffic Analysis… Martin Rex
- Re: [TLS] About encrypting SNI Nikos Mavrogiannopoulos
- Re: [TLS] About encrypting SNI Salz, Rich
- Re: [TLS] About encrypting SNI Stephen Farrell
- Re: [TLS] About encrypting SNI Andy Lutomirski
- Re: [TLS] About encrypting SNI - Traffic Analysis… Martin Thomson
- Re: [TLS] About encrypting SNI - Traffic Analysis… Tom Ritter
- Re: [TLS] About encrypting SNI Brian Sniffen
- Re: [TLS] About encrypting SNI Fabrice
- Re: [TLS] About encrypting SNI Jacob Appelbaum
- Re: [TLS] About encrypting SNI Paul Lambert
- Re: [TLS] About encrypting SNI Yoav Nir
- Re: [TLS] About encrypting SNI Daniel Kahn Gillmor
- Re: [TLS] About encrypting SNI Tim Bray
- Re: [TLS] About encrypting SNI Jacob Appelbaum
- Re: [TLS] About encrypting SNI Paul Hoffman
- Re: [TLS] About encrypting SNI Viktor Dukhovni
- Re: [TLS] About encrypting SNI Brian Sniffen
- Re: [TLS] About encrypting SNI Yoav Nir