Re: [TLS] A closer look at ROBOT, BB Attacks, timing attacks in general, and what we can do in TLS

Peter Gutmann <pgut001@cs.auckland.ac.nz> Wed, 20 December 2017 15:37 UTC

Return-Path: <pgut001@cs.auckland.ac.nz>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8149B1270A7 for <tls@ietfa.amsl.com>; Wed, 20 Dec 2017 07:37:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.21
X-Spam-Level:
X-Spam-Status: No, score=-4.21 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_MED=-2.3, T_RP_MATCHES_RCVD=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=auckland.ac.nz
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JI66dnHpWGgH for <tls@ietfa.amsl.com>; Wed, 20 Dec 2017 07:37:50 -0800 (PST)
Received: from mx4.auckland.ac.nz (mx4.auckland.ac.nz [130.216.125.248]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3E21C126CD8 for <tls@ietf.org>; Wed, 20 Dec 2017 07:37:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=auckland.ac.nz; i=@auckland.ac.nz; q=dns/txt; s=mail; t=1513784270; x=1545320270; h=from:to:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=dccg8mUdJVSgpPk5lZuTXRyCkeo+Tp5x6aLeYlT6tuw=; b=oWPPNy4NYZA6CxKq8pWux6SmUT1Hl9UGhn677Hx2dtqkFtB2FN6AzKV+ FaIqODHwyBiZKnYx9eNeQP5e6fv4Zw1a50Nb92agDJNRm+w8SF/gZKZWd xQDf1OVHafL9QhhNZGwGnW0l66qhdcNHKyNo8W3x6hbNmkEG7+Dm+9YTA lN1BVthepJ3KkMa9nWq/pl9n3VJlOKA7q3S1+cyNj2bgs5eceBdfucp1f 6QJntfu6JZJiForeU3f59Wi8mfp1bPPLEPchi7pgM+nD59DvipgbG7H7E kFpo/5oVNrE69sfSUADHJQV3AUlDrEAzGxCeabqA/wb/944avRXUmDkBV g==;
X-IronPort-AV: E=Sophos;i="5.45,432,1508756400"; d="scan'208";a="204884956"
X-Ironport-HAT: MAIL-SERVERS - $RELAYED
X-Ironport-Source: 10.6.2.2 - Outgoing - Outgoing
Received: from uxcn13-ogg-a.uoa.auckland.ac.nz ([10.6.2.2]) by mx4-int.auckland.ac.nz with ESMTP/TLS/AES256-SHA; 21 Dec 2017 04:37:47 +1300
Received: from uxcn13-ogg-d.UoA.auckland.ac.nz (10.6.2.5) by uxcn13-ogg-a.UoA.auckland.ac.nz (10.6.2.22) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Thu, 21 Dec 2017 04:37:46 +1300
Received: from uxcn13-ogg-d.UoA.auckland.ac.nz ([10.6.2.25]) by uxcn13-ogg-d.UoA.auckland.ac.nz ([10.6.2.25]) with mapi id 15.00.1263.000; Thu, 21 Dec 2017 04:37:46 +1300
From: Peter Gutmann <pgut001@cs.auckland.ac.nz>
To: Colm MacCárthaigh <colm@allcosts.net>, "tls@ietf.org" <tls@ietf.org>
Thread-Topic: [TLS] A closer look at ROBOT, BB Attacks, timing attacks in general, and what we can do in TLS
Thread-Index: AQHTdSndiFmdFk2VuUqeUDLf4NX5NqNMZdeu
Date: Wed, 20 Dec 2017 15:37:46 +0000
Message-ID: <1513784265636.74656@cs.auckland.ac.nz>
References: <CAAF6GDeeo2xjv1Xu7SFXVZ_zM=XUVJHT=eqH4_-G3+4UHsfvgg@mail.gmail.com>
In-Reply-To: <CAAF6GDeeo2xjv1Xu7SFXVZ_zM=XUVJHT=eqH4_-G3+4UHsfvgg@mail.gmail.com>
Accept-Language: en-NZ, en-GB, en-US
Content-Language: en-NZ
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [130.216.158.4]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/yDxU0utYBNNExqGHUonBU7PR2TU>
Subject: Re: [TLS] A closer look at ROBOT, BB Attacks, timing attacks in general, and what we can do in TLS
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>, <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tls/>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>, <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Dec 2017 15:37:53 -0000

Colm MacCárthaigh <colm@allcosts.net> writes:

>since it's important not to leak timing, we use a special
>"constant_time_copy_or_dont" routine to do the over-write:
>
>https://github.com/awslabs/s2n/blob/master/utils/s2n_safety.c#L70

Has anyone actually been able to measure the impact of issues like this?  I
know about papers like "Remote timing attacks are practical" that work with
things like Montgomery modmult, but can much be practically done with a few
instructions here and there, not repeated in a loop like a modmult?

The reason I ask is a combination of two things, firstly I've done timing
measurements on my own code to try and detect differences in behaviour and
couldn't find anything among the general noise of code execution, and secondly
any constant-time code is a tradeoff between readability/auditability and
(near) constant-time operation ("near" because you have no control over what a
compiler will do to optimise your code, so it may not be anywhere near
constant-time once it's in binary form).  Something like
s2n_constant_time_copy_or_dont() is barely passable, but I've seen constant-
time PKCS #1 decoding and similar that's essentially incomprehensible, there's
no way to look at it and see that it's correct.  Even something as simple as:

https://blog.cloudflare.com/yet-another-padding-oracle-in-openssl-cbc-ciphersuites/

had errors, because the constant-time code is so difficult to audit.

>Next, when the handshake does fail, we do two non-standard things. The first
>is that we don't return an alert message, we just close the connection. 

That's actually really annoying, because it implies something other than a TLS
crypto problem, e.g. a misconfigured firewall or who knows what.  I don't know
how diversely-used s2n is, in other words whether you can assume the clients
and servers employed will know how to react to this, but in general use it'll
make getting things working a major pain since it changes the error indicator
from "the TLS handshake failed due to crypto issues" to "something failed
somewhere, probably at the network level".

>The second non-standard thing we do is that in all error cases, s2n behaves
>as if something suspicious is going on and in case timing is involved, we add
>a random delay. It's well known that random delays are only partially
>effective against timing attacks, but we add a very very big one. We wait a
>random amount of time between a minimum of 10 seconds, and a maximum of 30
>seconds. With the delay being granular to nanoseconds. 

I do the same thing, but I use 1s rather than 10 (although I've just bumped
this to 3 in the latest code):

/* Delay by a small random amount, used to dither the results of (failed) 
   crypto operations to make timing attacks harder.
   
   As with anything involving randomness, there's been a lot of smoke and 
   not much light generated over the use of random delays to address timing 
   attacks.  The generic answer is that it doesn't help so there's no point 
   in doing it and everyone should just write constant-time code, another 
   prime example of le mieux est l'ennemi du bien.

   If the arguments ever get beyond abstract theory (theoretical constant-
   time code is better than theoretical random delays), the argument given 
   is usually that we're measuring timing differences in us or ns while the 
   delay functions are typically in ms.  Another argument is that all an 
   attacker has to do is repeat the measurements in order to filter out the 
   delays.

   The counterargument for the former is that if you've got an attacker 
   sitting on your local LAN segment making ns-scale timing measurements on 
   your crypto then you've got bigger things to worry about than side-
   channel attacks (over more distant connections, you can at best get tens 
   to hundreds of us using thousands of measurements, see e.g. 
   "Opportunities and Limits of Remote Timing Attacks" by Crosby, Reid, and 
   Wallach, which also required multiple days of data-gathering beforehand 
   to characterise the network).  In addition in cryptlib's case since all 
   operations go via the kernel, you don't have direct (timing) access to 
   the low-level crypto ops but only get strongly dithered results at that 
   level.

   As a result, while you can still use repeated measurements to eventually
   filter out the dithering, it's now made the attacker's job much, much 
   harder.  Since it's essentially free (it's only invoked if an operation 
   fails), it makes sense to add the delay.

   Another issue is how long to make the delay.  Obviously the longer, the
   better, however if we make it too long then it can end up looking like a
   different problem, e.g. a networking issue rather than a crypto defence,
   which will be a pain to diagnose for users.  3s seems to be a good tradeoff
   between dithering operations and slowing down an attacker while not adding
   an excessive delay that looks like it's caused by something else */

Again, it's for ease of deployment, a 1s (or 3s) delay won't get noticed while
a 10s delay looks like a network problem, and 30s looks like a major issue
somewhere.

>The solutions here typically are not small routines that are rigorously
>constant time, but instead rely on a degree of code-balancing: taking code
>paths that are "similar enough" in timing so as not to be measurable.

Yup, that's what I try and do.

>trying to be rigorously constant time on large complicated algorithms that
>were not designed to be constant time can seriously backfire by introducing
>logic errors in the very hard-to-follow constant-time adaptations.

You don't even need large complicated algorithms, you can create logic errors
in a few lines of code, as your cited example shows.  That's why I prefer
obvious correctness (or probably-correctness :-) and defences in other places.

Peter.