[Cfrg] Answers to HKDF questions

Hugo Krawczyk <hugo@ee.technion.ac.il> Thu, 22 October 2009 01:38 UTC

Return-Path: <hugokraw@gmail.com>
X-Original-To: cfrg@core3.amsl.com
Delivered-To: cfrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 547EC3A68D6 for <cfrg@core3.amsl.com>; Wed, 21 Oct 2009 18:38:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.676
X-Spam-Level:
X-Spam-Status: No, score=-1.676 tagged_above=-999 required=5 tests=[AWL=0.300, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fKXHkQSVl2i5 for <cfrg@core3.amsl.com>; Wed, 21 Oct 2009 18:38:28 -0700 (PDT)
Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.24]) by core3.amsl.com (Postfix) with ESMTP id 2790F3A6876 for <cfrg@irtf.org>; Wed, 21 Oct 2009 18:38:28 -0700 (PDT)
Received: by qw-out-2122.google.com with SMTP id 5so992416qwd.7 for <cfrg@irtf.org>; Wed, 21 Oct 2009 18:38:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:from:date :x-google-sender-auth:message-id:subject:to:content-type; bh=GpaKxlnbDPs5Dgp4xw7BNKi8FhEFFox8F+6+VyYz9AE=; b=KuKSYXETQTNbfMxNFa0jfCH0GxBveYcSPIjUhZRff6TOjZPMaa8IapBHQ6ETm9HjKt lltl/7BVDlLtlB8QUZ19n7cxs2W6FlGhgGLB4zfoJhogGRivUTJN/RIH9h83sn0QQHoD JKnz5m6Ts0MQKlAFwWqX3G0aCISLm0yGsW5XY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:from:date:x-google-sender-auth:message-id :subject:to:content-type; b=kbvnaD0KtPdSm920Vtt7gtGSgYTMKnKl3ysai8122PMifPtnwRdnPqiUrXb6VIKCg8 XVhDga1sXKTHZG6pg4nypMkLucsz/CrjS7RpAmy+UQBzW0oib8ObYKwt+S75Qm0beH77 yYZlq7HSag8AY/bivXvBaj8zlXjLGj5QKC4Fs=
MIME-Version: 1.0
Sender: hugokraw@gmail.com
Received: by 10.224.30.209 with SMTP id v17mr4336093qac.188.1256175516133; Wed, 21 Oct 2009 18:38:36 -0700 (PDT)
From: Hugo Krawczyk <hugo@ee.technion.ac.il>
Date: Wed, 21 Oct 2009 21:38:16 -0400
X-Google-Sender-Auth: 273d73088cf8ee45
Message-ID: <e89b43830910211838x2e1ca67cgaf48d02cd4008710@mail.gmail.com>
To: cfrg@irtf.org
Content-Type: multipart/alternative; boundary="00c09f89928f6a2be204767c2757"
Subject: [Cfrg] Answers to HKDF questions
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Oct 2009 01:38:30 -0000

Here is a list of questions people asked related to HKDF in the last two
days
with some answers. I wrote it in this way as it seems more manageable than
answering 20 or so messages, many raising similar questions.
SEE ALSO MY LONG MESSAGE FROM EARLIER TODAY WITH MORE
GENERAL RATIONALE.

>Many comments ask "why we need yet another KDF, if we have so many".

Answer: Because we have too many! (*)
Designers of new protocols/applications do not have any guidance to choose
between them. No one took the effort to sort out what properties we are
looking
for. What does each of these schemes provide and what they lack, etc etc.
Well, I did.
My work is far from a full taxonomy of properties, scenarios and existing
schemes. But it does provide the main guidelines, and it does show (I want
to
believe) that analysis does favor some schemes over others. In particular,
HKDF
is the closest I could get to something that is general purpose enough and
well-founded in analytics.

(*) The best answer to the above question is given by Peter's own question:
> In fact every protocol that I know of has invented its own KDF,
incompatible
> with every other KDF out there, thus my question about why we need yet
another
> one.
We do not need another one, we need one...

> Dan Harkins and others: How is HMAC(0,X) better than SHA256(X||X).

Merkle-Damgard (MD), the iteration mode underlying SHA256, does not preserve
"randomness" in the following sense: Even if the compression function that
you
use as the underlying hash is purely random, the result you get from M-D
iterating this compression function is NOT random (actually not even a good
extractor). HMAC, on the other hand, if you start with a random compression
function you end with a random oracle (i.e., a function that looks random to
an
adversary that can only query the function in a bounded number of points).
This is known in the literature as "random oracle indifferentiability".
It is not a result about actual hash functions but a strong result about the
STRUCTURAL STRENGTH OF HMAC (if it starts with something good it preserves
the
goodness). There are several papers on this; the main ones are [23,17] in my
paper.

>What is the value of HKDF when salt is set to 0.

In this case we can only say useful things if we assume the compression
function
to be random. In this case, as said above in the context of
indifferentiability,
HMAC implies a random oracle and then good extraction properties while in
the
case of a plain M-D hash an underlying random compression function does not
imply randomness and not even extraction. Moreover, the fact that you apply
HMAC
only once (with salt=0) to concentrate the input entropy in one key, and
then
use this key with the PRF is much better than the naive (unsalted)
repetition
SHA(X||1)||SHA(X||2)||... common to other KDFs -- for more on this see the
paper.

>Dan: HKDF is a GENERALIZATION of IKE's HKDF.

True, but only for the unsalted case. Note that IKEv1 illustrates nicely the
versatility of HKDF. It can use public nonces for salt in the signature
mode,
and it can use a secret salt in PK encryption mode. The latter gives you a
much
stronger KDF. One that is only based on the PRF properties of HMAC (since
the
salt acts as a secret key to the PRF). IKE is also a good example of how
applications may use/find salt. Most other protocols do not use salt even if
they have it available (e.g., SP 800-56A). Salting the KDF has important
benefits for randomization and independence.

>Uri repeatedly asks what makes HMAC appropriate for extraction/KDF.

You should read the paper and the results cited there (especially section
8).
Keep in mind that HMAC can only inherit its properties from the underlying
hash
function. It cannot transform a bad hash function into a good one, but for
some
functionalities (KDF is one of them) it can use weaker hash functions for
stronger purposes than alternative schemes using the plain hash.
If you are not willing to assume anything from the underlying hash function
other than collision resistance, then there is only one result I can offer
to
you: Look at the result from [23] quoted in page 24 of my paper (second
bullet).
That result assumes properties of the hash function implied by collision
resistance only. The downside: It applies only when you truncate the output
of
the extract part (for example, you use SHA-512 for extraction and SHA-256
for
PRF). But it may not be totally foolish (smiley inserted) for you to make a
heuristic leap of faith and accept this as an indication of strength also
when
you do not truncate.

>Peter Gutman and Simon J.: PBKDF.

PBKDF is quite specific to passwords since it has the iterations count that
is
not needed in general and you certainly do now want to artificially slow
down
your KDF when your source has decent entropy. I believe that PBKDF2 is a
good
design for passwords: it uses an iteration counter, salt, and HMAC as the
underlying PRF (better in that sense than PBKDF1). You could make it general
purpose by having the option of using it with iteration-count=1. However,
even
in this case the design is not great as a general purpose KDF since it uses
the
source keying material directly as a PRF key. There is nothing we can say in
general if you key a PRF with a non-uniform key (some PRFs do not even
accept
arbitrary-long keys). That's why you first extract (or concentrate) the
entropy
of the input into a short key and then use that key for keying the PRF in
the
extraction stage.

> Can we use HKDF for passwords?

Not as defined. For passwords you need to replace the entropy of the source
with
slowing iterations of the function to slow down dictionary attacks (see page
25
in my paper). This can be added, but it is not included now. For example,
for
the specific case of passwords one could replace the extract part of HKDF
with
the F(P,S,c,1) function from PBKDF2. I am not sure if this is something we
want
to include in our draft. I will discuss with Pasi.


>The input source is random (a strong cryptographic key), can we skip
extract?

When your input is already a strong cryptographic key you may choose to skip
the
extract part and go directly to expansion. For this any (variable output)
PRF
will do. The expansion part of HKDF is an implementation of such a PRF.

>David McGrew raises the question that we may need different KDFs for
different
>settings.

I believe that having one general-purpose KDF is IMPORTANT. It does not
preclude
an application to choose another one or a variant. But if we start
standardizing
a KDF for each usage scenario (including different types of key material
sources)
we will have to leave in the user's or protocol designer's hands to choose
what
is best for his setting. Not only this is not an easy task (understanding
entropy and extraction in full generality is hard) but what you may assume
for
an application today may be different tomorrow. The one case where a special
variant may be needed is for passwords (see above).

>Are block ciphers better than hash for KDF?

Block ciphers are good as a basis for a PRF (e.g via a variable input PRF
mode).
However, they are more problematic for extraction. For block ciphers as
extractors we can only say things under the assumption that the block cipher
is
a truly random permutation. In the case of HMAC we have much richer results
including those that do not idealize the hash function.
(BTW, those that said to believe more on ideal ciphers than in random
oracles,
should pay attention to the related-key attacks against AES-256!)

>David McGrew: Should we standardize on the extract part and allow for
multiple
>expansion modules?
Good question. From a modularity point of view the answer is yes.
>From the IETF point of view that wants to have as specific transforms as
possible this may be problematic. In other words, my academic side would
prefer
modularity in the spec and independence between extraction and expansion
(as long as the latter uses a good appropriate PRF mode). But I think that
the
IETF (well represented in this work by Pasi :) will not like that.

>Zooko: How about outputting a random (algebraic) group element instead of a
>(close to) uniform string?

Even a "general purpose" KDF has limits to its generality. To me, the main
need
for applications is to have a KDF that produces uniform (or "close to
uniform")
strings of a certain length(s). In particular, the problem with choosing a
random group element is that techniques to do so are group-dependent (you do
it
differently in a finite field multiplicative group than in an RSA group than
in
an elliptic curve; you may even have different techniques for different
types of
curves). So I do not see that you can find a "fits all" solution here.
(Neither is HKDF a fit-all solution - for example it doesn't do what you
want -
but it as close as it get to serve the most common applications).

> David McGrew: what is the precise security statement for HKDF?
> What assumptions does one need to make about the hash function used in
HKDF in
> order that the security analysis applies? The paper says that "it is shown
in
> [23] (see Section 8) that using HMAC with a truncated output as an
extractor
> allows to prove security under considerably weaker assumptions on the
> underlying hash function." However, both of the Lemmas in that paper (and
the
> implication in Section 8) make random oracle assumptions.

Section 8 contains a summary of results from [23,17] regarding the
properties of
HMAC extractor. These results are quantitative, relating the strength of the
underlying hash function with the quality of extraction. They cover a
variety of
situations depending on the entropy of the source and the existence of salt.
The truncated output result that does not require any form of idealized
assumption (uses almost universality and/or collision resistance) is quoted
in
my paper page 24 second bullet and is proven in [23].

I apologize for questions I missed.

Hugo