Re: [Cfrg] Balloon-Hashing or Argon2i.

Good work, we greatly appreciate the new results and ongoing public
scrutiny of Argon2.

I have not checked the details of improvements yet, but I feel that one of
conclusions is incorrect.

The authors say  "More generally, the parameter selection process in the
proposal is flawed in that it tends towards producing parameters for which
the attack is successful". Concretely, for 1 GB on Figure 1 we see that the
attack on 1-pass Argon2i has quality 5, 3-pass -- quality 3, 4-pass --
quality 2.5, 6-pass -- quality 2, 10 passes -- quality 1.5.

The IRTF proposal and the Argon2 spec recommend the practitioners to select
the maximum allowed amount of memory first, and then increase the number of
passes until it is too long.

Why so? Suppose for the sake of simplicity that 1 pass of Argon2 can fill 5
GB of RAM per second, and you have 1 second to spend on hashing. How to
choose the parameters to maximize the adversary's costs? If we set 10
passes, only 0.5 GB of RAM will be filled, so a naive ASIC implementation
would cost 0.5 GB-sec in area-time units, but the AB attack reduces it to
0.33 GB-sec. However, if 4 passes are selected, then 1.25 GB can be
utilized, the naive implementation would cost 1.25 GB-sec, and the
optimized one -- 0.5 GB-sec, i.e. more costs than the 10-pass one. For 3
passes the adversary would spend 0.55 GB-sec, and for 1 pass - 1 GB-sec (!).

Therefore more passes actually benefit the adversary, that's why we
proposed the procedure above. To compare the security of different number
of passes, not the attack quality but quality*passes must be used.

Dmitry

On Wed, Aug 10, 2016 at 3:55 PM, Paul Grubbs <pag225@cornell.edu> wrote:

> Relevant to this discussion:
>
> Towards Practical Attacks on Argon2i and Balloon Hashing, Alwen and Blocki
> http://eprint.iacr.org/2016/759.pdf
>
> Abstract:
>
> The algorithm Argon2i-B of Biryukov, Dinu and Khovratovich is currently
> being considered by the IRTF (Internet Research Task Force) as a new
> de-facto standard for password hashing. An older version (Argon2i-A) of the
> same algorithm was chosen as the winner of the recent Password Hashing
> Competition. An important competitor to Argon2i-B is the recently
> introduced Balloon Hashing (BH) algorithm of Corrigan-Gibs, Boneh and
> Schechter. A key security desiderata for any such algorithm is that
> evaluating it (even using a custom device) requires a large amount of
> memory amortized across multiple instances. Alwen and Blocki (CRYPTO 2016)
> introduced a class of theoretical attacks against Argon2i-A and BH. While
> these attacks yield large asymptotic reductions in the amount of memory, it
> was not, a priori, clear if (1) they can be extended to the newer
> Argon2i-B, (2) the attacks are effective on any algorithm for practical
> parameter ranges (e.g., 1GB of memory) and (3) if they can be effectively
> instantiated against any algorithm under realistic hardware constrains. In
> this work we answer all three of these questions to the affirmative for all
> three algorithms. It is also the first work to analyze the security of
> Argon2i-B. In more detail, we extend the theoretical attacks of Alwen and
> Blocki (CRYPTO 2016) to the recent Argon2i-B proposal demonstrating severe
> asymptotic deficiencies in its security. Next we introduce several novel
> heuristics for improving the attack’s concrete memory efficiency even when
> on-chip memory bandwidth is bounded. We then simulate our attacks on
> randomly sampled Argon2i-A, Argon2iB and BH instances and measure the
> resulting memory consumption for various practical parameter ranges and for
> a variety of upperbounds on the amount of parallelism available to the
> attacker. Finally we describe, implement and test a new heuristic for
> applying the Alwen-Blocki attack to functions employing a technique
> developed by Corrigan-Gibs et al. for improving concrete security of
> memory-hard functions. We analyze the collected data and show the effects
> various parameters have on the memory consumption of the attack. In
> particular, we can draw several interesting conclusions about the level of
> security provided by these functions.
> • For the Alwen-Blocki attack to fail against practical memory parameters,
> Argon2i-B must be instantiated with more than 10 passes on memory. The
> current IRTF proposal calls even just 6 passes as the recommended
> “paranoid” setting.
> • More generally, the parameter selection process in the proposal is
> flawed in that it tends towards producing parameters for which the attack
> is successful (even under realistic constraints on parallelism).
> • The technique of Corrigan-Gibs for improving security can also be
> overcome by the Alwen-Blocki attack under realistic hardware constraints.
> • On a positive note, both the asymptotic and concrete security
> of Argon2i-B seem to improve on that of Argon2i-A.
>
> On Wed, Jun 1, 2016 at 11:01 AM, Dmitry Khovratovich <
> khovratovich@gmail.com> wrote:
>
>> Hi Henry,
>>
>> Both for Balloon and for Argon2 there are now provable bounds of O(n^2)
>> with different constants, not very surprising since the mode of operation
>> is very simple (Argon is close to the simplest you can imagine and Baloon
>> is close to just Argon2 with 3 addresses instead of 1).
>>
>> Argon2 is close to the fastest you can achieve since it was purposefully
>> fine-tuned for speed vs memory gains of attacker. If Balloon or other
>> design will want to approach similar speeds, design and parameter choices
>> will bring it very close to Argon2 design.
>>
>> The Balloon mode though admits a simple ASIC attack with factor 3: a 3x
>> faster memory bus allows parallel loading of your 3 random blocks, whereas
>> on x64 you have to load them (almost) sequentially. This is much more
>> practical attack than theoretical attacks on Argon2i requiring thousands of
>> Blake2b cores on chip and terabyte/sec bandwidth. We knew of it while
>> designing Argon2 and thus made delta=1.
>>
>> For fair comparison with Argon2 and all the other PHC primitives you need
>> to fix concrete parameters and  compression function and test
>> speed/security on this fixed instance.
>>
>> Best regards,
>> Dmitry
>>
>> On Tue, May 31, 2016 at 5:57 PM, Henry Corrigan-Gibbs <
>> henrycg@stanford.edu> wrote:
>>
>>> Hi Dmity,
>>>
>>> Thanks for taking a look at the paper.
>>>
>>> For the purposes of the security proofs, we model the cryptographic
>>> compression function used in Argon2, Balloon, and scrypt, as a random
>>> oracle (as is standard). In practice, it is possible to instantiate Argon2i
>>> or Balloon or scrypt with any cryptographic compression function that
>>> "behaves like a random oracle." We are agnostic to the choice of
>>> cryptographic compression function -- we just evaluated our design with a
>>> few standard ones.
>>>
>>> My sense is that Balloon (instantiated with parameter delta=3) and
>>> Argon2i will run at roughly the same speed, no matter the choice of
>>> cryptographic compression function. That's because three-pass Argon2i on N
>>> blocks of memory makes roughly 3N calls to the underlying compression
>>> function during its execution. Two-round Balloon (the equivalent
>>> construction, since it makes three passes over memory) makes roughly 7N
>>> calls to the underlying compression function during its execution.
>>>
>>> However, the known attacks on three-pass Argon2i allow the attacker to
>>> use roughly 2.5x less space than the defender with no time penalty, so to
>>> force the attacker to use N blocks of space requires the defender to run
>>> Argon2i with a ~2.5x larger memory buffer (i.e., a buffer of N' = 2.5N
>>> blocks). In this setting, both functions would require ~7N invocations of
>>> the underlying cryptographic compression function, so the performance
>>> should be roughly equivalent for the two.
>>>
>>> Changing the underlying cryptographic compression function will change
>>> the absolute performance of both Argon2i and Balloon, but I would expect
>>> their relative performance to stay close to constant. Indeed, if I
>>> instantiate Argon2i and Balloon with the compression function defined in
>>> the Argon2 spec, then I get, on an Intel i7-6700 @ 3.4 GHz:
>>>   Argon2i (2.5 MB buffer): 153 hashes/sec
>>>   Balloon (1.0 MB buffer): 167 hashes/sec
>>>
>>> The surprise to us is that getting this sort of performance with Balloon
>>> is still possible, even though we are able to prove some meaningful
>>> statements about its security. (Of course, the Alwen-Blocki attacks still
>>> apply for large enough parameter sizes.)
>>>
>>> Henry
>>>
>>>
>>>
>>> On 5/31/16 7:02 AM, Dmitry Khovratovich wrote:
>>>
>>>> Dear Henry,
>>>>
>>>> thank you for the updated paper.
>>>>
>>>> We appreciate your work on lower-bounding the Argon2i tradeoff
>>>> resistance, it is a very interesting result, and we hope to mention it
>>>> in the specification. However for your proof to work you instantiate
>>>> Argon2  with a collision resistant hash function. During the competition
>>>> it became clear that collision-resistance of the inner function is not
>>>> needed in practice, since the attacker does not control intermediate
>>>> variables. It is similar to the situation with block-ciphers like AES
>>>> where a single round is cryptographically weak, though you might want it
>>>> to be a keyed PRP for the sake of a proof. Worse enough, a full-round
>>>> hash function with a narrow block implies more frequent random memory
>>>> access and more CPU load, which together make a terrible impact on the
>>>> performance. That's why we never defined Argon2 for such hash functions,
>>>> and recommend not calling such modifications "Argon2".
>>>>
>>>> This observation and possibly use of your own code for Argon2 causes
>>>>  performance comparison of your Balloon Hash design with Argon2i and
>>>> other designs to be incorrect.
>>>>
>>>> in the introduction you write
>>>> "if we ... run the construction for five “rounds” of hashing, and set
>>>> the space parameter to require the attacker to use 1 MiB of working
>>>> space to compute the function, then we can compute Balloon Hashes at the
>>>> rate of 13 hashes per second on a modern server, compared with 12.8 for
>>>> Argon2i, and 2.1 for Catena DBG".
>>>>
>>>> This statement underestimates the Argon2i performance by the factor of
>>>> 30 (!) or so.
>>>>
>>>> In detail. Argon2i makes much more than 13 hashes per second using 5
>>>> iterations and 1 MiB of memory. On my 2-core 2.3 GHz (i5-6200) laptop
>>>>  it computes 1000 hashes in 5.1-5.3 Gigacycles, thus averaging more than
>>>> 400 hashes per second. I modified our benchmark settings for that, but
>>>> even with default ones (3 iterations, 1 hash) everyone can observe
>>>> similar performance by running 'make bench' +'./bench' using the source
>>>> in https://github.com/P-H-C/phc-winner-argon2
>>>>
>>>> In the further text you write
>>>> "Let an authentication server must perform 100 hashes per second per
>>>> four-core machine, ... At the same authentication rate, Argon2i requires
>>>> the attacker to use a smaller buffer—roughly 1.5 MiB for the one-pass
>>>> variant".
>>>>
>>>> This is again incorrect. Here are the numbers for 1-pass Argon2i: 2MiB:
>>>> 225Mcycles, 16MiB: 1174 Mcycles, 32 MiB: 2532 Mcycles (a bit more than 1
>>>> second).
>>>>
>>>> So there is a factor of 20 underestimate here. (We also remind that
>>>> 1-pass Argon2i was never  recommended for practical use.)
>>>>
>>>> Best regards,
>>>> the Argon2 team
>>>>
>>>> On Mon, May 30, 2016 at 7:53 PM, Henry Corrigan-Gibbs
>>>> <henrycg@stanford.edu <mailto:henrycg@stanford.edu>> wrote:
>>>>
>>>>     Hi Joel,
>>>>
>>>>     Since I am one of the designers of the Balloon Hashing algorithm, it
>>>>     would be difficult for me to give an unbiased opinion on the pros
>>>>     and cons of Argon2i versus Balloon.
>>>>
>>>>     That said, we just updated the Balloon Hashing paper on ePrint:
>>>>     http://eprint.iacr.org/2016/027
>>>>
>>>>     The revised paper includes some new results that might be relevant
>>>>     to CFRG discussions on password hashing. Let me summarize the new
>>>>     results here:
>>>>
>>>>
>>>>     (1) We prove (in the single-instance, random-oracle model) a new,
>>>>     stronger memory-hardness claim about the Balloon algorithm. We show
>>>>     that computing one instance of the Balloon function with space S and
>>>>     time T requires a space-time product that (roughly) satisfies:
>>>>     S . T >= n^2 / 8.
>>>>     As the adversary's space usage drops below a certain threshold
>>>>     (which depends on the parameter settings) the trade-off gets even
>>>>     worse for the adversary.
>>>>
>>>>
>>>>     (2) We refine the analysis of the single-buffer Balloon algorithm,
>>>>     and the refined analysis allows us to dramatically improve the
>>>>     parameters of the scheme. After this change, the performance of the
>>>>     single-buffer Balloon function *meets or exceeds* that of Argon2i
>>>>     when we instantiate both with Blake2b as the underlying
>>>>     cryptographic hash function. (Consult the paper for the details on
>>>>     this.) The single-buffer Balloon algorithm runs so quickly now that
>>>>     we decided that the other two Balloon variants (double-buffer and
>>>>     linear) were superfluous. The revised paper includes only the
>>>>     single-buffer scheme, which we now just call the "Balloon
>>>> algorithm".
>>>>
>>>>
>>>>     (3) We discuss your (very cool) recent paper on parallel attacks on
>>>>     iMHFs and give some ideas on how to ameliorate them in the context
>>>>     of password hashing. Our recommendation is to follow Balloon Hashing
>>>>     with one round of scrypt (or even data-dependent Balloon). This
>>>>     composition apparently defends against either parallel attacks or
>>>>     cache attacks -- just not both at the same time. The design of the
>>>>     Argon2id function uses a similar idea. To us, this seems like a nice
>>>>     compromise between functionality and security.
>>>>
>>>>
>>>>     (4) We apply our analysis techniques to prove that Argon2i and a
>>>>     simplified variant of scrypt are also memory-hard, though with
>>>>     looser constants than one might like. We show that these two
>>>>     algorithms (again, in the single-instance random-oracle model)
>>>>     require space S and time T such that:
>>>>     S . T >= n^2 / C
>>>>     to compute with good probability. For Argon2i, we can prove this
>>>>     statement for C=192, and for scrypt with C=24. In contrast, we prove
>>>>     the claim with C=8 for Balloon, and we additionally prove much
>>>>     stronger time-space trade-offs for Balloon for very-small-space
>>>>     adversaries. From a "provable security" perspective then, Balloon
>>>>     seems to have an edge over the other practical constructions out
>>>> there.
>>>>
>>>>
>>>>     If you find bugs or omissions in the ePrint draft, please let me
>>>> know.
>>>>
>>>>     Henry
>>>>
>>>>
>>>>
>>>>     On 05/25/2016 12:50 PM, Joel Alwen wrote:
>>>>
>>>>         I was wondering what peoples opinion is on standardizing the
>>>>         double-buffer balloon hashing (DB) construction rather than
>>>> Argon2i.
>>>>         Both in terms of arguments for and against.
>>>>
>>>>         I'm sure other people have thought about this much more though
>>>>         so I'd
>>>>         love to hear what people think...
>>>>
>>>>         - Joel
>>>>
>>>>         _______________________________________________
>>>>         Cfrg mailing list
>>>>         Cfrg@irtf.org <mailto:Cfrg@irtf.org>
>>>>         https://www.irtf.org/mailman/listinfo/cfrg
>>>>
>>>>
>>>>     _______________________________________________
>>>>     Cfrg mailing list
>>>>     Cfrg@irtf.org <mailto:Cfrg@irtf.org>
>>>>     https://www.irtf.org/mailman/listinfo/cfrg
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Dmitry Khovratovich
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Dmitry Khovratovich
>>
>> _______________________________________________
>> Cfrg mailing list
>> Cfrg@irtf.org
>> https://www.irtf.org/mailman/listinfo/cfrg
>>
>>
>

-- 
Best regards,
Dmitry Khovratovich