[Cfrg] Memory-efficient evaluation of data-independent memory-hard functions

Joel Alwen <jalwen@ist.ac.at> Fri, 12 February 2016 16:12 UTC

Return-Path: <jalwen@ist.ac.at>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5CB911A21AF for <cfrg@ietfa.amsl.com>; Fri, 12 Feb 2016 08:12:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.433
X-Spam-Level:
X-Spam-Status: No, score=-0.433 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HELO_EQ_AT=0.424, HOST_EQ_AT=0.745, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Hm7sW0OQE7E1 for <cfrg@ietfa.amsl.com>; Fri, 12 Feb 2016 08:12:20 -0800 (PST)
Received: from mx2.ist.ac.at (mx2.ist.ac.at [193.170.62.130]) by ietfa.amsl.com (Postfix) with ESMTP id 667081A21A8 for <cfrg@irtf.org>; Fri, 12 Feb 2016 08:12:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ist.ac.at; i=@ist.ac.at; q=dns/txt; s=ist2009; t=1455293540; x=1486829540; h=to:from:subject:message-id:date:mime-version: content-transfer-encoding; bh=0sTHRvFoxmbZK6F4QSeqeqPrwg61nF1PmRAJ1ck/b8U=; b=abeh3EKrxOoQnCtQYsoVkld8IbnGlHDrM69REYQ9QIx8nF0lbwhjmsJk f7O87BIutt90/vK16IMs99xXbiF+hGvVt8Ts0w+09E7L9yNkGOXC9GZsF BSB8G4gIGckqsNISF6MW0rZyooMFWYZB3bpBCF1cjPwU70umf42Sle/tp w0kOmOmORyPEE/pQF+NBUKSIC1orJUnVza+DDaJqSjM0gh8u/Q1PHdSi0 QyFIxa0V/roeZDPMcMuaH7aeTi3clGSPZEqVp0xjsgRpW45F7x6kd4rqO 8GfsutaH4ieOSleteHYtmxKJVAOEcAnaIjGw68PhRsFT0T2/zUskGM35E w==;
X-IronPort-AV: E=Sophos;i="5.22,436,1449529200"; d="scan'208";a="3196842"
Received: from lserv46.ista.local ([10.15.21.55]) by ironport2-intern.ista.local with ESMTP; 12 Feb 2016 17:12:17 +0100
Received: from sslmail1.ist.ac.at (sslmail1.ista.local [10.15.21.69]) by lserv46.ista.local (8.14.4/8.14.4/Debian-4) with ESMTP id u1CGCFpc024993 for <cfrg@irtf.org>; Fri, 12 Feb 2016 17:12:16 +0100
Received: from [10.1.48.150] (fw.ist.ac.at [193.170.138.132]) (authenticated bits=0) by sslmail1.ist.ac.at (8.14.4/8.14.4/Debian-4) with ESMTP id u1CGCF9f020831 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for <cfrg@irtf.org>; Fri, 12 Feb 2016 17:12:15 +0100
To: cfrg@irtf.org
From: Joel Alwen <jalwen@ist.ac.at>
X-Enigmail-Draft-Status: N1110
Message-ID: <56BE045F.4060103@ist.ac.at>
Date: Fri, 12 Feb 2016 17:12:15 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/cfrg/aV9AWoN1eiIMYLuHPXNFOJ79vYY>
Subject: [Cfrg] Memory-efficient evaluation of data-independent memory-hard functions
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Feb 2016 16:12:23 -0000

Hi,

I thought some of you might be interested in Jeremiah Blocki and my
recent paper that just appeared on eprint
(http://eprint.iacr.org/2016/115). It complements the recent work on
data-dependent MHF (i.e. scrypt) by instead focusing on data
*independent* MHFs (e.g. Both Catena functions, Argon2i, all three
Balloon Hashing functions, etc.). Some of you may find it relevant to
recent discussion on the list about finding a password-hashing standard.

In a nutshell we develop a new class of parallel evaluation algorithms
for several rather general classes of MHFs (including all those I just
listed) and show that the resulting memory complexity is a fair bit
lower then one might hope for when allowing parallelism (like in an
ASIC). Put differently, we give some better than hoped for
time-memory-trade-offs by using parallelism to evaluate several
instances of the MHF at once resulting in a low "Time X Memory" per
instance. In other words just the kind of algorithm one might find
useful for launching cheap(er) brute-force attacks on these iMHFs.

>From a more theoretical perspective we also show a general lower-bound.
It states that no data-independent MHF (say doing 1-pass over n memory
blocks) can have even only Omega(n^2/log(n)) cumulative
memory-complexity (let alone the Omega(n^2) which people have been
shooting for up till now).

So where do these results leave us? AFAIK the closest we've got to an
iMHF with a proof for large memory-complexity is the iMHF in my paper
with Vladimir Serbinenko. Unfortunately the construction is based on a
recursively built graph. What that boils down to is that if we want to
compute the incoming edges of a node (i.e. which blocks from memory go
into the next call to the compression function H) the straightforward
computation (unravelling the recursion) takes log(n) operations. But
really we want constant time (say a single call to a compression
function + a mod operation as in scrypt) as otherwise the throughput of
any implementation is going to suffer. Also the proof only shows
something like Omega(n^2/log^10(n)) (with probably nasty constants)
which is asymptotically way better then any of above mentioned iMHFs but
actually pretty useless for practical parameters. I think its easy  to
tighten the proof down to something closer to Omega(n^2/log^7(n)) but
thats still pretty unsatisfactory and any way it doesnt help making the
incoming-edges function easier to compute.

Anyway I hope you enjoy the paper and that it provides a bit of fuel for
thought. :-) If anything, IMHO it points out room for a lot more work in
this area (both from a practical and theoretical point of view) before I
would feel confident in proposing a standard for wider adoption.

- Joel




PS. What does the theoretical lower-bound mean for our search for a new
password-hashing standard? What can we still hope for?

Here are a couple thoughts about those questions.

1) Well, we can always just ignore it if we find an iMHF which no one
manages to attack after a convincing amount of effort has been exerted.
After all we know efficient Random Oracles can't exist yet we still
effectively ask that our now hash function candidates to behave like
one. This route does seem to give up on proof based evidence for (at
least asymptotic) security though.

2) We could decide to settle for Omega(n^2/log(n)) assuming we can find
a clean, simple and strongly explicit iMHF with accompanying proof
matching this bound with small constants. (Notice that in the recent
scrypt paper we only show Omega(n^2/log^2(n)) so its not even clear that
a data-*dependent* MHF could break that bound.)

3) Alternatively, if we want to still shoot for a proof of Omega(n^2)
complexity, then one potential way to still get there could be to make
an analysis for concrete practical parameter ranges (rather then
asymptotic behaviour). The large constants in the lower-bound do not
rule this out. (Though the parameters of our evaluation algorithms for
almost all the iMHFs I listed above do rule this out for those
functions.) See figures in the paper...

4) Another less preferable way IMHO would be to increase the number of
inputs to the underlying hash function. (Something like the Single and
Double Buffer balloon hashing functions do as they use 21 inputs.) E.g.
I think the lower-bound lets us have Omega(n^2) memory-hardness if we
use order log(n) (or maybe log^2(n)) inputs to the hash function
(instead of a constant like 2 or 3 as in most current iMHFs). There are
at least two reasons why I'm personally not super in favour of this
approach though. Practically it means worse throughput for an
implementation as evaluating the hash function takes a lot longer now.
Theoretically I'd also call it cheating because it really boils down
making log(n) calls to a compression function (as was pointed out
already in a couple papers on the subject). In particular, really we
should be looking at MHFs defined over compression functions, not
arbitrary input length hashes. But if look at what happens to any proof
of memory-hardness for the high-indegree MHF when view the function as
being over a compression function then AFAIK the proof looses a
multiplicative cubic(!) of the indegree. So if we prove O(n^2) for an
MHF with d inputs to the hash function then I only see how to prove that
the same MHF viewed over the underlying compression function really has
O(n^2/d^3) complexity. (See Alwen Serbinenko for how that works.)


Also I'd love to hear other ideas for how to deal with the lower-bound
if you have any!