[Cfrg] Internal collisions

"D. J. Bernstein" <djb@cr.yp.to> Sun, 26 July 2015 19:43 UTC

Date: Sun, 26 Jul 2015 19:43:06 -0000
Message-ID: <20150726194306.14873.qmail@cr.yp.to>
From: "D. J. Bernstein" <djb@cr.yp.to>
To: cfrg@irtf.org
Mail-Followup-To: cfrg@irtf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/cfrg/E-q_6ABdKfPU5XG2N6IVAs_sNhA>
Subject: [Cfrg] Internal collisions
Precedence: list

This message elaborates on Tanja's comment about "internal collisions"
at the CFRG meeting. The quick summary is that Schnorr's H(R,M) adds
real protection against the possibility of collisions in H, while the
alternative H(M,R) doesn't, even when H is a wide-pipe hash function.

As a concrete example, take H to be 24 rounds of SHA-512/256 or SHA-384.
Both of these functions have

   * outputs that are too big for brute-force collision attacks, and
   * even wider pipes.

However, internally these functions process messages the same way as
SHA-512 (starting from different IVs and truncating the output), and so
you can break these hash functions inside H(M,R) signatures as follows:

   * Start from 2008 Sanadhya--Sarkar, which showed how to very quickly
     find distinct 1024-bit blocks B,B' that produce colliding 512-bit
     states after 24 rounds starting from the initial SHA-512 state.

   * Notice that the Sanadhya--Sarkar method also works for (e.g.) the
     initial SHA-384 state.
     
     I'm not saying that these particular blocks B,B' have a noticeable
     chance of producing collisions for other starting states. This
     would be an impressive break of the SHA-512 data flow, a
     "differential attack without message modification", a technique
     that has trouble breaking MD5.
     
     What I'm saying is that, _given_ any starting state h (e.g., the
     initial SHA-384 state or the initial SHA-512/256 state), the method
     _computes_ distinct 1024-bit blocks B,B' that produce colliding
     512-bit states after 24 rounds starting from h.

   * Notice that 24 rounds of SHA-384 now collide on any inputs of the
     form (B,X) and (B',X), since the states after B and B' are the
     same---this is what's called an "internal collision".

   * Find an X such that (B,X) is an innocent message that the signer is
     willing to sign while (B',X) is a dangerous message. Obtain the
     signer's signature (R,S) on the innocent message (B,X).

   * Forge the message (B',X) with the same signature (R,S). This is a
     valid signature since H(B',X,R) = H(B,X,R).

What went wrong here is that H(M,R) isn't resilient to _internal_
collisions in H. It wasn't useful to watch for collisions in H as a
canary for potential security problems in H(M,R)---the victim died a
moment after the canary did.

H(M,R) for wide-pipe H has the weak property of being resilient to mere
_output_ collisions---for example, if someone manages to find an H
collision by brute force, then there's no reason to think that appending
R to the inputs will also produce a collision---but brute force isn't
the real threat against any reasonably large hash output. The way that
cryptanalysts have found collisions much more quickly than brute force

   * in MD5, and
   * in 76 out of 80 steps of the SHA-1 compression function, and
   * in 38 out of 64 steps of the SHA-256 compression function, and
   * in 4 out of 24 rounds of SHA3-256,
   
is by finding ways that small input differences can stay on a path of
small differences through the entire hash computation---it's hard to
imagine how differences would have a noticeable chance of disappearing
otherwise! Output truncation has very little effect on this type of
attack: it can remove a few conditions from the end of the computation,
but most of the attack cost comes from many more conditions needed to
control diffusion in many earlier rounds of the computation.

(Probably even more than 24 rounds of SHA-384 etc. are feasible to
break. There has been considerable progress since 2008, and the latest
collision attacks break 38 out of 80 rounds of the SHA-512 compression
function, although breaking 38 rounds of the SHA-512 _hash_ function
will require hooking the SHA-512 initial state to the two initial states
used in this compression-function collision.)

For comparison, what happens to this attack if H(M,R) is replaced by
Schnorr's H(R,M)? This time R immediately affects the hash processing,
and this change in data flow creates serious obstacles for the attacker.
Here's what happens:

   * _Given_ R, the attacker computes two extensions (R,M) and (R,M')
     that produce an internal collision.

   * But wait---why is this collision useful? To forge M' as a
     substitute for M, the attacker needs a signature involving H(R,M).

   * So the attacker asks the signer to sign M.

   * The signer now generates a signature of M---but this signature
     involves a new R' that the attacker has never seen before.

There's no reason to think that H(R',M) and H(R',M') will collide when
R' is chosen randomly. Asking for this to happen is asking for something
much stronger than a collision---as above, this would be an impressive
break of the SHA-512 data flow. The attacker really wants to see R
before generating M and M', but the signer won't generate R without
seeing M.

So the attacker starts by computing just one message M, then sees R from
the signer, and then tries to compute a second message M' with H(R,M') =
H(R,M). The usual techniques for trying to find preimages much more
quickly than brute force are again to take M' close to M and hope that
the differences stay under control. However, the fact that M can't
change during the second computation makes this preimage search much
more difficult than a collision search:

   * The attacker can't simply use the best difference M'-M: if this M'
     doesn't work, then the attacker has to move on to the second-best
     difference, then the third-best, etc. In collision attacks the
     attacker was free to keep trying the best difference with many
     choices of M.

   * The attacker can't apply "message modification", working backwards
     from as many of the path conditions as possible to a subset of M
     that's feasible to enumerate.

The attacker can of course try to modify M during the first computation
so that the path conditions are more likely to be satisfied, but this
has to be done in a way that gets past the obstacle of R being
subsequently randomized.

In short, the fact that H(R,M) is resilient to _internal_ collisions
reflects a real gap in difficulty between finding H collisions and
breaking H(R,M). Anyone who disputes this should be able to demonstrate
feasible attacks on H(R,M) using MD5, using 24 rounds of SHA-384, and
using 4 rounds of SHA3-256, all of which are broken for H(M,R).

---Dan

[Cfrg] Internal collisions D. J. Bernstein
Re: [Cfrg] Internal collisions Dan Brown
Re: [Cfrg] Internal collisions David Jacobson
Re: [Cfrg] Internal collisions Ilari Liusvaara
Re: [Cfrg] Internal collisions Dan Brown
Re: [Cfrg] Internal collisions Paterson, Kenny
Re: [Cfrg] Internal collisions D. J. Bernstein
Re: [Cfrg] Internal collisions Dan Brown
Re: [Cfrg] Internal collisions D. J. Bernstein
Re: [Cfrg] Internal collisions Dan Brown
Re: [Cfrg] Internal collisions Bryan Ford
Re: [Cfrg] Internal collisions D. J. Bernstein