[Cfrg] Memory-efficient evaluation of data-independent memory-hard functions

Joel Alwen <jalwen@ist.ac.at> Fri, 12 February 2016 16:12 UTC

To: cfrg@irtf.org
From: Joel Alwen <jalwen@ist.ac.at>
Message-ID: <56BE045F.4060103@ist.ac.at>
Date: Fri, 12 Feb 2016 17:12:15 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/cfrg/aV9AWoN1eiIMYLuHPXNFOJ79vYY>
Subject: [Cfrg] Memory-efficient evaluation of data-independent memory-hard functions
Precedence: list

Hi,

I thought some of you might be interested in Jeremiah Blocki and my
recent paper that just appeared on eprint
(http://eprint.iacr.org/2016/115). It complements the recent work on
data-dependent MHF (i.e. scrypt) by instead focusing on data
*independent* MHFs (e.g. Both Catena functions, Argon2i, all three
Balloon Hashing functions, etc.). Some of you may find it relevant to
recent discussion on the list about finding a password-hashing standard.

In a nutshell we develop a new class of parallel evaluation algorithms
for several rather general classes of MHFs (including all those I just
listed) and show that the resulting memory complexity is a fair bit
lower then one might hope for when allowing parallelism (like in an
ASIC). Put differently, we give some better than hoped for
time-memory-trade-offs by using parallelism to evaluate several
instances of the MHF at once resulting in a low "Time X Memory" per
instance. In other words just the kind of algorithm one might find
useful for launching cheap(er) brute-force attacks on these iMHFs.

>From a more theoretical perspective we also show a general lower-bound.
It states that no data-independent MHF (say doing 1-pass over n memory
blocks) can have even only Omega(n^2/log(n)) cumulative
memory-complexity (let alone the Omega(n^2) which people have been
shooting for up till now).

So where do these results leave us? AFAIK the closest we've got to an
iMHF with a proof for large memory-complexity is the iMHF in my paper
with Vladimir Serbinenko. Unfortunately the construction is based on a
recursively built graph. What that boils down to is that if we want to
compute the incoming edges of a node (i.e. which blocks from memory go
into the next call to the compression function H) the straightforward
computation (unravelling the recursion) takes log(n) operations. But
really we want constant time (say a single call to a compression
function + a mod operation as in scrypt) as otherwise the throughput of
any implementation is going to suffer. Also the proof only shows
something like Omega(n^2/log^10(n)) (with probably nasty constants)
which is asymptotically way better then any of above mentioned iMHFs but
actually pretty useless for practical parameters. I think its easy  to
tighten the proof down to something closer to Omega(n^2/log^7(n)) but
thats still pretty unsatisfactory and any way it doesnt help making the
incoming-edges function easier to compute.

Anyway I hope you enjoy the paper and that it provides a bit of fuel for
thought. :-) If anything, IMHO it points out room for a lot more work in
this area (both from a practical and theoretical point of view) before I
would feel confident in proposing a standard for wider adoption.

- Joel




PS. What does the theoretical lower-bound mean for our search for a new
password-hashing standard? What can we still hope for?

Here are a couple thoughts about those questions.

1) Well, we can always just ignore it if we find an iMHF which no one
manages to attack after a convincing amount of effort has been exerted.
After all we know efficient Random Oracles can't exist yet we still
effectively ask that our now hash function candidates to behave like
one. This route does seem to give up on proof based evidence for (at
least asymptotic) security though.

2) We could decide to settle for Omega(n^2/log(n)) assuming we can find
a clean, simple and strongly explicit iMHF with accompanying proof
matching this bound with small constants. (Notice that in the recent
scrypt paper we only show Omega(n^2/log^2(n)) so its not even clear that
a data-*dependent* MHF could break that bound.)

3) Alternatively, if we want to still shoot for a proof of Omega(n^2)
complexity, then one potential way to still get there could be to make
an analysis for concrete practical parameter ranges (rather then
asymptotic behaviour). The large constants in the lower-bound do not
rule this out. (Though the parameters of our evaluation algorithms for
almost all the iMHFs I listed above do rule this out for those
functions.) See figures in the paper...

4) Another less preferable way IMHO would be to increase the number of
inputs to the underlying hash function. (Something like the Single and
Double Buffer balloon hashing functions do as they use 21 inputs.) E.g.
I think the lower-bound lets us have Omega(n^2) memory-hardness if we
use order log(n) (or maybe log^2(n)) inputs to the hash function
(instead of a constant like 2 or 3 as in most current iMHFs). There are
at least two reasons why I'm personally not super in favour of this
approach though. Practically it means worse throughput for an
implementation as evaluating the hash function takes a lot longer now.
Theoretically I'd also call it cheating because it really boils down
making log(n) calls to a compression function (as was pointed out
already in a couple papers on the subject). In particular, really we
should be looking at MHFs defined over compression functions, not
arbitrary input length hashes. But if look at what happens to any proof
of memory-hardness for the high-indegree MHF when view the function as
being over a compression function then AFAIK the proof looses a
multiplicative cubic(!) of the indegree. So if we prove O(n^2) for an
MHF with d inputs to the hash function then I only see how to prove that
the same MHF viewed over the underlying compression function really has
O(n^2/d^3) complexity. (See Alwen Serbinenko for how that works.)


Also I'd love to hear other ideas for how to deal with the lower-bound
if you have any!

[Cfrg] Memory-efficient evaluation of data-indepe… Joel Alwen
Re: [Cfrg] Memory-efficient evaluation of data-in… Paul Grubbs
Re: [Cfrg] Memory-efficient evaluation of data-in… Bill Cox
Re: [Cfrg] Memory-efficient evaluation of data-in… Joel Alwen
Re: [Cfrg] Memory-efficient evaluation of data-in… Bill Cox