Re: [Cfrg] Balloon-Hashing or Argon2i.

Hi Bill,

First, thanks for all the details. I greatly appreciate hearing about
the perspective of someone working closely with hardware about this stuff.

Let me see if I understood correctly.

So one thing you saying is that if we want to talk about total energy
consumption we should really be including the cost of read/write to memory.

One way to incorporate this into the PROM model (at least for all
algorithms we've ever considered in it) is to change the definition of R
so that it also includes that energy consumption. In particular data is
only ever read/written in conjunction with one call to the compression
function. However if we do this then my understanding is your saying
then R >> n.

To me that sounds like (assuming for a second that energy consumption is
what matters most) the adversary cares primarily about using an
algorithm that minimize the number of calls to the compression function.
So then the optimal algorithm is always going to be the honest one for
pretty much any iMHF I can think of so then really there isn't a lot to
be done here...

However, this line of reasoning is quite sensitive to the relation
between R and n. It would be great to have more concrete data on this
(see bellow).

One thing I want to add is that the way "Energy-complexity" in the PROM
is defined has the following property. If we use

         R = area-core / area-storing-one-block

which is roughly how we actually set R in our experiments in the recent
paper, then

      Energy-complexity =< AT-complexity per instance

Moreover, I think the gap between the two is relatively small.
Especially if R is set to be large enough to account for the area not
just of a core but also the routing needed per core. Though, in this
case, your previous comments on how ASIC routing scales implies one
should probably also have R scale with the the number of parallel cores.

Security Statements in the PROM
-------------------------------
So with that in mind a statement of the form:

       "No algorithm can compute F() with less than X energy-complexity
in the PROM model."

is then really also making a statement about the amortized AT-complexity
of an adversary. My understanding is that the consensus in various works
(including the scrypt paper, the Catena paper and the cryptanalysis of
various MHFs be Biryukov and Khovratovich) is that large AT complexity
is indeed a desirable defence. In fact, in the former works bounding
area for just the memory components is actually already a good defence.
While in the latter (and most of the PROM works) we also take into
account the area of the cores on chip. So with this in mind I believe
such "security statements" in the PROM remain of interest.

Attack Complexity in the PROM
-----------------------------
Turning to statements of the form:

     "Algorithm A evaluates F() with Energy-complexity at most X in the
PROM."

like in AB16, it seems to me that when we evaluate its practical
implications it is key to have the right value of R set. If we care
about the energy efficiency of A then we can derive a meaningful
statement about the actual energy consumption of a custom device
implementing algorithm A as long as we set R to include both the energy
for evaluating a core and the energy for one read and one write.

Similarly, if we want to derive a statement about the AT complexity of
such a device (amortized per instance) then we should set R to include
not just the area of a core but also of the routing.

What do you guys think about this line of reasoning? does it make sense?
are there serious issues here?

And what do you guys think realistic values of R would be. Lets say in
the case of the Argon2 compression function for say 10, 25, 50, 200, 500
and 1000 cores (if such a device with that amount of cores is even
possible of course). I'd love to have data both when R captures the
energy consumption of read+core+write and when R captures area of
core+routing.

- Joel

> This is an interesting model, but does not match real hardware very 
> well, IMO.  Instead of modeling the DRAM power by the static power
> which is comparatively low, why not model the power as a constant
> plus a constant times the rate of read/writes?  I think that would be
> a lot closer to what we see in DDR3 and GDDR5.  I think it also will
> hold with a theoretical through-silicon-via based stacked-die DRAM +
> ASIC.
> 
> Then evaluating Scrypt (RoMix) using the naive algorithm (which only
> has sequential access to H(.)) for n blocks of memory costs n^2+R*r
> \approx n^2 (as typically R << n).
> 
> 
> Since R is the cost of evaluating H, can I assume r is the number of 
> times we call H?  In Scrypt, normally r = 2*n.  In this case, I
> assume n^2 is the static power loss in the memory.  If so, then I
> think R >> n.  Also, we've left out the dynamic power of the memory,
> which is likely on the order of the power of computing H for a read
> or write.
> 
> I would prefer to model the power more like this.  Assume h is the
> power of evaluating H and m is the power of a read or write to memory
> of one block.  I would ignore static power in the DRAM, which
> contributes little to heading real DRAMs.  For Scrypt, the normal
> energy used would be 2*h + 2*m.
> 
> Recently it was shown that, assuming the underlying building block
> H(.) is modelled as a random function, even an adversary who can make
> exponentially many calls to H(.) *for free* and amortize over 
> exponentially many calls, will have cost z*n^2 for some constant
> z>0.
> 
> 
> You lost me there.  If I can compute H for free (zero power), then
> I'd simply recompute every value from the initial block, and avoid
> the power needed to read or write to DRAM.  Wouldn't the total power
> be zero?  Is the argument that the cores each need to have 1 block
> saved, and the static power of storing those block is still O(n^2)?
> Is this some sort of hardened by static power defense?
> 
> Foundries routinely design low-power processes with many orders of 
> magnitude less static power than what we see in modern CPUs.
> Hardening MHFs based on static power is not going to work out well
> for the defender.  The energy cost of storing bits that are not
> accessed can be made very close to zero.  Consider flash memory.  The
> power to read and write is huge, but just sitting there waiting to be
> accessed, these tiny capacitors holding the charge will maintain that
> charge reliably for over 10 years.  We should assume that the power
> is dominated by read/write, not the power required to store some
> number of bits over time.
> 
> 
> All of the iMHF submitted to the PHC can be evaluated at amortized
> cost
> 
> n^c + cost(R,n)
> 
> for some constant c<2 and where cost(R,n) is the cost of evaluating 
> H(.) n^c' (for some c'<2) times. The constant c is <1.75 for 
> Argon2i,Rig.v2 and Gambit, <1.83 for Pomelo, <1.63 for Lyra2, <1.67 
> for Catena (for Pomelo and Lyra this only applies to the first --
> data independent -- phase).
> 
> 
> Is the n^c term the static power term for storing the data?  If so,
> it would be best to ignore it.  Read/write costs will dominate over
> it.
> 
> 
> Unlike in the security proof, for an attack we cannot ignore the 
> cost(R,n) term; whether we have a practical attack depends on
> whether the attack quality
> 
> n^2 / (n^c+cost(R,n))
> 
> is greater than 1.
> 
> 
> What about the read/write cost?  If n^2 is the static power loss,
> which we can reduce by a factor of a million easily in the fab, then
> I would not count on n^2 being significant.
> 
> 
> The cost(R,n) term is different for the various constructions and
> depends on the hardware. So, if you want to use any of these
> constructions, you must certify that for the parameters you're
> interested in, any hardware options and all trade-offs (e.g. on the
> number of parallel cores) that the [AB16] attacks allows, this 
> quality is small, and then pray that no better attacks are found. Or 
> you could not care about that and use a mode where the parallel 
> cumulative memory complexity and the sequential space/time
> complexity are proven to be of the same order (Scrypt for data
> dependent or a mode based on an optimally depth-robust graph for
> data-independent MHFs).
> 
> Krzysztof
> 
> 
> I do prefer the data-dependent modes, like Argon2id for most use
> cases. The data-independent algorithms are much harder to make
> secure, IMO, and I think that we saw that when Argon2i-A was
> partially broken, forcing Argon2i-B to be created.  I don't mean to
> sound negative on the research into iMHFs.  These are required in
> some cases, and I think the research will likely yield significant
> improvements.
> 
> I would try to argue that it makes sense to move one of the 
> data-dependent algorithms forward in CFRG while waiting on results
> from current research on the data-independent version, but I think
> pretty much everyone would hate that idea, so I do not propose it :)
> 
> Bill
> 
> 
> _______________________________________________ Cfrg mailing list 
> Cfrg@irtf.org https://www.irtf.org/mailman/listinfo/cfrg
>