Re: [Gen-art] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03

Simon Josefsson <simon@josefsson.org> Fri, 20 November 2015 12:29 UTC

Return-Path: <simon@josefsson.org>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3221B1B2F6B for <gen-art@ietfa.amsl.com>; Fri, 20 Nov 2015 04:29:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.149
X-Spam-Level: *
X-Spam-Status: No, score=1.149 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HELO_EQ_SE=0.35, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cBpuYoCzcwm2 for <gen-art@ietfa.amsl.com>; Fri, 20 Nov 2015 04:29:30 -0800 (PST)
Received: from duva.sjd.se (duva.sjd.se [IPv6:2001:9b0:1:1702::100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 06F511B2F51 for <gen-art@ietf.org>; Fri, 20 Nov 2015 04:29:29 -0800 (PST)
Received: from latte.josefsson.org ([155.4.17.2]) (authenticated bits=0) by duva.sjd.se (8.14.4/8.14.4/Debian-4) with ESMTP id tAKCTFR8007326 (version=TLSv1/SSLv3 cipher=AES128-GCM-SHA256 bits=128 verify=NOT); Fri, 20 Nov 2015 13:29:16 +0100
X-Hashcash: 1:22:151120:draft-josefsson-scrypt-kdf.all@tools.ietf.org::5XUSfJ1c4Us/6+CM:39Z4
From: Simon Josefsson <simon@josefsson.org>
To: Paul Kyzivat <pkyzivat@alum.mit.edu>
References: <55DCB0A2.5050102@alum.mit.edu>
OpenPGP: id=54265E8C; url=http://josefsson.org/54265e8c.txt
X-Hashcash: 1:22:151120:draft-josefsson-scrypt-kdf-03.all@tools.ietf.org::ypACwCE3b5/ik9OQ:t6t
X-Hashcash: 1:22:151120:gen-art@ietf.org::t3JTdoaEjXgXwa0J:2QeZ
X-Hashcash: 1:22:151120:pkyzivat@alum.mit.edu::aGjC1ByhfOfcpast:fmkF
Date: Fri, 20 Nov 2015 13:29:14 +0100
In-Reply-To: <55DCB0A2.5050102@alum.mit.edu> (Paul Kyzivat's message of "Tue, 25 Aug 2015 14:14:58 -0400")
Message-ID: <87d1v4hn1h.fsf@latte.josefsson.org>
User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/24.4 (gnu/linux)
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature"
X-Virus-Scanned: clamav-milter 0.98.7 at duva.sjd.se
X-Virus-Status: Clean
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/plsf7qSODwBKNCcskog1uv6ykvs>
Cc: draft-josefsson-scrypt-kdf.all@tools.ietf.org, General Area Review Team <gen-art@ietf.org>
Subject: Re: [Gen-art] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Nov 2015 12:29:33 -0000

Executive Summary: I have submitted
https://tools.ietf.org/html/draft-josefsson-scrypt-kdf-04
that (hopefully) resolve the gen-art and secdir feedback.

Paul Kyzivat <pkyzivat@alum.mit.edu> writes:

> This draft is on the right track but has open issues, described in the
> review.

Hello Paul.  I am sorry for the delay in answering your thorough (much
appreciated!) review.  Your review was emailed to the wrong
@tools.ietf.org address, but I can't blame that for my delay since I was
notified of your review through the secdir review.

> (Issue-1): Intended status
>
> The intended status of this document is Informational. I question why
> it is not a normative document. As best I can tell, this is the formal
> specification of the algorithm. Those that use it would presumably
> want to claim conformance to it. The introduction describes this as an
> alternative to other KDF functions, including only one with an RFC
> reference - RFC2898. That one is also informational, but it is a
> restatement of an algorithm specified elsewhere, so that RFC can be
> viewed as an informational supplement to the actual definition. The
> same is not true of this document.
>
> (Of course changing this to a normative document would require
> significant changes, including adding 2119 language. And it probably
> could then not be handled as an AD-sponsored document.)

As I understand it, the IETF tradition is that descriptions of crypto
algorithms that were invented elsewhere are not published as Standards
Tracks documents.  So this is in line with that tradition.

> (Issue-2): Type mismatch on call to scryptROMix
>
> The scryptROMix function calls scryptBlockMix with an octet vector of
> length 128*r octets. But the definition of scryptBlockMix specifies
> that the input argument should be a vector of 2*r 64-octet blocks.
>
> Clearly these don't match. One way to make them match would be to
> divide the single 128*r octet vector into two 64-octet vectors, and
> then to treat r as 2 inside of scryptBlockMix. I don't know if that is
> the intent.

I believe this is a presentation issue.  The intention is indeed that
conversion between these formats is transparent.  In performant
implementations, B will refers to the same memory area.  The document
was confusing here, and I believe the document was further problematic
since it described the inputs as an array of elements, rather than a
concatenation of octets.  It now reads:

   Input:
            B[0] || B[1] || ... || B[2 * r - 1]
                   Input octet string (of size 128 * r octets),
                   treated as 2 * r 64-octet blocks.

   Output:
            B'[0] || B'[1] || ... || B'[2 * r - 1]
                   Output octet string.

I hope this improves this aspect.

> (Issue-3): Definition of Integerify
>
> The Integerify function is not clearly defined. The following is given
> in the document:
>
>   j = Integerify (X) mod N
>       where Integerify (B[0] ... B[2 * r - 1]) is defined
>       as the result of interpreting B[2 * r - 1] as a
>       little-endian integer.
>
> I can make no sense of this definition of Integerify. The description
> implies that B must be an array containing elements up to index
> 2*r-1. But the definition of B is "Input octet vector of length 128 *
> r octets". Taking the definition literally, B[2*r-1] must be an octet,
> and 2*r must be less than 128. That seems like nonsense to me.
>
> I found the following in the [SCRYPT] paper:
>
> "We expect that for reasons of performance and simplicity,
> implementors will restrict N to being a power of 2, in which case
> the function Integerify can be replaced by reading the first (or
> last) machine-length word from a k-bit block."
>
> Simply reading a machine-length word ignores the differences between
> little-endian and big-endian machines, and machines with different
> word sizes. Conveniently, [SALSA20SPEC] defines a littleendian
> function that yields a 32-bit integer from four bytes. That should be
> sufficient bits for computing "j". So Integerify(X) could be defined
> as:
>
>    littleendian(X[0],X[1],X[2],X[3])
>
> or
>
>    littleendian(X[128*r-4],X[128*r-3],X[128*r-2],X[128*r-1])
>
> (I don't think it matters which, as long as everyone does it the same way.)
>
> In any case, the language is ambiguous and needs to be clarified.

Right.  I have changed this into:

          j = Integerify (X) mod N
              where Integerify (X) is defined as the result of
              interpreting the last four octets of X as a little-
              endian integer, i.e.:
                  littleendian(X[128*r-4], X[128*r-3],
                               X[128*r-2], X[128*r-1])

> -------------
> Minor issues:
>
> (Issue-4): Identifiers reused for different meanings
>
> In scrypt, "B" is an array of "p" vectors, each of which is 128*r
> octets. In scryptROMix, "B" is a single vector of 128*r octets. In
> scryptBlockMix, "B" is a vector of 2*r 64-octet blocks.

I'm not sure that changing variable names here is a good idea.  For any
performant implementation, these variables will refer to the same memory
area and thus a consistent variable name helps to signal that.  It is
just the interpretation of that memory area that is different in these
functions.

> In both scrypt and scryptROMix "r" is the same block size
> parameter. But in scryptROMix it is only used in the (broken)
> definition of Integerify.

It is used in the variable descriptions too, to indicate length of the
variables.

> In scryptBlockMix "r" is (apparently, if I have figured things out)
> always 2, and unrelated to the other "r".

No, r is the same throughout.

> The document would be clearer if distinct identifiers were used for
> each unique concept.

I believe the identifier refers to unique concepts.  For B, what differs
is how that memory area is interpreted in each algorithm description.

Can you think of some way to make this more clear, if the changes I've
made now aren't sufficient?  I believe some changes I have made already
make this somewhat clearer though.

> For those identifiers whose value is intended to be constant and
> common across all the functions (such as "N"), it would be better to
> define them once, globally.
>
> (Issue-5): Confusing/misleading names/definitions of identifiers
>
> The "Block size" parameter ("r") does not denote the size of a
> block. It is a factor in the size of blocks, varying from function to
> function. Exactly what concept it denotes, and how one would choose
> it, isn't clear to me.
>
> The definition of the "N" parameter (CPU/Memory cost parameter) isn't
> especially clear. It appears that increasing N increases the cost both
> of CPU and memory. But the "p" (parallelization) parameter acts as a
> multiplier on N, also increasing the cost. It is far from clear how
> one would choose appropriate values for N and p. For a given value of
> N*p, is it better for N to be large, or p to be large?
>
> I suggest that more thought be given to what these things mean in the
> context of this application, and then choose identifier names and
> descriptions accordingly. It may be better to refactor these some
> other way.

This came from the secdir review as well -- I have added a section
"Scrypt parameters" to discuss this.

2.  Scrypt Parameters

   The scrypt function takes several parameters.  The passphrase P is
   typically a human-chosen password.  The salt is normally uniquely and
   randomly generated [RFC4086].  The parameter r ("blockSize") specify
   the block size.  The CPU/Memory cost parameter N ("costParameter")
   must be larger than 1, a power of 2 and less than 2^(128 * r / 8).
   The parallelization parameter p ("parallelizationParameter"), a
   positive integer less than or equal to ((2^32-1) * 32) / (128 * r).
   The intended output length dkLen in octets of the derived key
   ("keyLength"); a positive integer less than or equal to (2^32 - 1) *
   32.

   Users of scrypt can tune the parameters N, r, and p according to the
   amount of memory and computing power available, the latency-bandwidth
   product of the memory subsystem, and the amount of parallelism
   desired.  At the current time, taking r=8 and p=1 appears to yield
   good results, but as memory latency and CPU parallelism increase it
   is likely that the optimum values for both r and p will increase.
   Note also that since the computations of SMix are independent, a
   large value of p can be used to increase the computational cost of
   scrypt without increasing the memory usage; so we can expect scrypt
   to remain useful even if the growth rates of CPU power and memory
   capacity diverge.

> The ASN.1 in section 6 assigns names to several of these
> identifiers. It would be helpful to readers if the names used in
> defining the algorithms were also the names used here.

I've added these names to the "Scrypt Parameters" section above.

> (Issue-6): Dubious stability of references
>
> I looked for prior discussion of this draft, and found some on the
> saag mailing list regarding the references.
>
> The definition of the Salsa20 hash function in
> http://cr.yp.to/snuffle/spec.pdf seems clear enough, but is the
> document reference stable? It might be safer to replicate the
> definition in this document (in an appendix) with attribution. It
> doesn't appear that there is any copyright in the referenced
> document.

I have included the brief C snippet that defines the function in the
section, which hopefully is sufficient clear for implementers (together
with the already included test vector) to transpose it into something
working in any language.  This allowed me to move those two references
to informative ones.

> I'll also note that call to this hash function in scryptBlockMix is to
> "Salsa", not Salso20. It ought to be consistent with the definition.

This is explained in the section:

 Below, Salsa(T) corresponds to the Salsa20/8 Core function applied to
   the octet vector T.

> ------------------------
> Nits/editorial comments:
>
> (Issue-7): IdNits reported errors
>
> IdNits reports:
>
> -- Obsolete informational reference (is this intentional?): RFC 5208
>      (Obsoleted by RFC 5958)
>
> This is triggered by the following in section 6:
>
> "To be usable in PKCS#8 [RFC5208] and Asymmetric Key Packages
> [RFC5958] the following extension of the PBES2-KDFs type is needed."
>
> Since RFC5958 is referenced, and it obsoletes RFC5208, what is the
> reason for referencing both?

The reason was that a lot of protocols refers to 'PKCS#8' and RFC 5208
is the IETF-canonical reference for that term, and a way to illustrate
that the scrypt ASN.1 schema is compliant with "old" PKCS#8.  RFC 5958
obsoletes RFC 5208 but the identifier 'PKCS#8' was lost in the process.

I'm fine with dropping 5208 as a reference, but I do believe it provides
slightly more value to the reader.

> (There is also an error about 2119, but it is bogus, triggered by the
> use of OPTIONAL in the ASN.1.)
>
> This completes my review.

Thank you!

/Simon