Re: [Cfrg] uniform random distribution in ECDH public key

Hi Uri,

Thanks for the response.  

Regarding your point about pseudorandomness: would you confirm that you merely mean that the SHA-1 and SHA-2 standards do not claim the functions to have "pseudorandom" (ignoring definability and provability see below *), and that you do not mean there is some kind of distinguisher of SHA-1 and SHA-2 (i.e. for non-contrived inputs, see **)?

If not, then please elaborate.  

If so, then I would add, that although the original standards for SHA-1 and SHA-2 did not make pseudorandomness claims, many others have gone ahead to make such claims (at least implicitly).  Such claims include: most random oracle proofs (if they use the term "hash function", which implicitly includes SHA-1 or SHA-2, unless the papers explicitly exclude them); standards RSA-OAEP and RSA-PSS; SHA-based KDF (the topic of this thread); HMAC-KDF (SP 800-56C); Hash-based DRBG and HMAC-DRBG in SP 800-90A.   These claims have created a large incentive to find distinguishability properties of SHA-1 and SHA-2.  

So now I clarify my original point in 2 ways: (1) the abundantly apparent pseudorandomness of SHA-1 (at least relevant pseudorandomness, see **), and of SHA-2, means that their use in the KDF is a definite general-purpose improvement over the easy distinguishability of the raw ECDH (or DH) shared secret, and (2) if one takes a much looser meaning of "hash" that includes easily-distinguishable hash functions, then hashing of the raw ECDH shared secret does add value.

(*) Definability and provability: it's hard (if not impossible) to even define general pseudorandomness of a general keyless function (and therefore to hard prove anything). In the context of KDFs, with fixed input lengths, or even when fixed input domains (e.g. an EC point), one may be able to at least define something: e.g. KDF(uniformly distributed EC point) -> (bit string computationally indistinguishable from uniformly distributed bit string).

(**) A contrived distinguisher would be to select inputs such that hash output has some bias, this is the main obstacle to defining pseudorandomness (see * above).  A distinguisher would need to be faster than a distinguisher for a random oracle.  I suppose that the SHA-1 collision algorithms can be viewed as a distinguisher, though not with any impact on most typical pseudorandom hash applications such as KDF.  Is there a more direct distinguisher for SHA-1 or SHA-2?

Daniel Brown
Research In Motion Limited

> -----Original Message-----
> From: Blumenthal, Uri - 0668 - MITLL [mailto:uri@ll.mit.edu]
> Sent: Friday, August 24, 2012 1:12 PM
> To: Dan Brown; dmjacobson@sbcglobal.net; cfrg@irtf.org
> Subject: Re: [Cfrg] uniform random distribution in ECDH public key
> 
> On 8/23/12 8:27 , "Dan Brown" <dbrown@certicom.com> wrote:
> 
> 
> >One general advantage: after hashing the shared secret is pseudorandom.
> 
> No it is not. Pseudorandomness of the output has not been a design
> criteria of hash functions until the SHA3 competition.
> 
> >A second advantage, nonces and identifiers can be thrown into the hash
> >input, for various gains, such as generating several keys, one for AES
> >and for HMAC.
> 
> That holds and helps, though (again, until SHA3 is selected and deployed)
> it is hard to tell how well (or otherwise) these extra bits get
> "intermixed" or spread over in the output value.
> 
> >A third advantage, if the shared key is later revealed, then the hash
> >prevents the calculation of the raw ECDH value
> 
> That holds, and is a nice property to have.
> 
> 
> >----- Original Message -----
> >From: David Jacobson [mailto:dmjacobson@sbcglobal.net]
> >Sent: Thursday, August 23, 2012 01:17 AM
> >To: cfrg@irtf.org <cfrg@irtf.org>
> >Subject: Re: [Cfrg] uniform random distribution in ECDH public key
> >
> >On 08/14/2012 11:23 AM, Dan Harkins wrote:
> >>    Hi Bob,
> >>
> >> On Tue, August 14, 2012 11:01 am, Robert Moskowitz wrote:
> >>> I understand from RFC 6090 and 5869 that the secret key produced from
> >>>an
> >>> ECDH exchange is not uniformly randomly distributed and that is why we
> >>> have the 'Extract' phase in HKDF.  Got that.
> >>>
> >>> This question is about the public key, g^j:
> >>>
> >>> I understand that like j, it must be a point on the curve, thus if the
> >>> curve is p-256, both j and g^j are 256 bits long.  But is g^j uniformly
> >>> randomly distributed like j is suppose to be?
> >>    No, it's not. It's it's a special pair (x,y) that satisfy the
> >>equation
> >> of the
> >> curve:  y^2 = x^3 + ax + b. Not all pairs will satisfy that equation. I
> >> believe about half of them will and about half won't.
> >>
> >>    For x to be random, each number between 0 and p would have equal
> >> probability. But that's not the case since about half won't.
> >>
> >>> Side question:  I am still unclear on the length of the exchanged
> >>>secret
> >>> (g^j)^k, is it 256 bits (for p-256) or larger (perhaps 512 bits)?
> >>    The result of an ECDH is an element in the group so it's also an
> >>(x,y)
> >> pair but the secret that you use in your KDF is the x coordinate of that
> >> result. The y coordinate is discarded.
> >>
> >>    regards,
> >>
> >>    Dan.
> >>
> >>
> >> _______________________________________________
> >> Cfrg mailing list
> >> Cfrg@irtf.org
> >> http://www.irtf.org/mailman/listinfo/cfrg
> >>
> >So now that we are into tutorial mode on this, I'd like to ask a
> >question.  Standard procedure for Diffie-Hellman key exchange is to
> >construct the session key from the X-coordinate by hashing.  Now suppose
> >that I'm using the NIST P-256 curve and the symmetric encryption
> >functions is AES-256.
> >
> >The number of possible shared key values is the order of the curve - 1
> >(point at infinity isn't used), which is extremely close to 2^256.
> >These points come in pairs, if there is a point at an X value, there are
> >2, one at Y and the other -Y.  So essentially all X values occur with
> >probability  very close to 2/2^256, which means that the X-coordinate
> >after the DH procedure can be thought of as a source with 255 bits of
> >min-entropy.  If we hash the X coordinate with SHA-256, we actually lose
> >a little bit of entropy, since some X values will collide and produce
> >some session key with probability higher than 2/2^256, lowering the
> >min-entropy.
> >
> >So what is the advantage of the hash operation?
> >
> >Thank you,
> >
> >     --David Jacobson
> >
> >_______________________________________________
> >Cfrg mailing list
> >Cfrg@irtf.org
> >http://www.irtf.org/mailman/listinfo/cfrg
> >
> >---------------------------------------------------------------------
> >This transmission (including any attachments) may contain confidential
> >information, privileged material (including material protected by the
> >solicitor-client or other applicable privileges), or constitute
> >non-public information. Any use of this information by anyone other than
> >the intended recipient is prohibited. If you have received this
> >transmission in error, please immediately reply to the sender and delete
> >this information from your system. Use, dissemination, distribution, or
> >reproduction of this transmission by unintended recipients is not
> >authorized and may be unlawful.
> >_______________________________________________
> >Cfrg mailing list
> >Cfrg@irtf.org
> >http://www.irtf.org/mailman/listinfo/cfrg

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.