[Gen-art] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03

Paul Kyzivat <pkyzivat@alum.mit.edu> Tue, 25 August 2015 18:15 UTC

Return-Path: <pkyzivat@alum.mit.edu>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 59FC01A87A9 for <gen-art@ietfa.amsl.com>; Tue, 25 Aug 2015 11:15:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.664
X-Spam-Level:
X-Spam-Status: No, score=0.664 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_SOFTFAIL=0.665] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hsyyO5lT0Z90 for <gen-art@ietfa.amsl.com>; Tue, 25 Aug 2015 11:15:01 -0700 (PDT)
Received: from resqmta-ch2-12v.sys.comcast.net (resqmta-ch2-12v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:44]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2AC951A88F1 for <gen-art@ietf.org>; Tue, 25 Aug 2015 11:15:01 -0700 (PDT)
Received: from resomta-ch2-10v.sys.comcast.net ([69.252.207.106]) by resqmta-ch2-12v.sys.comcast.net with comcast id 96EH1r0092JGN3p016F0Ch; Tue, 25 Aug 2015 18:15:00 +0000
Received: from Paul-Kyzivats-MacBook-Pro.local ([50.138.229.151]) by resomta-ch2-10v.sys.comcast.net with comcast id 96Ez1r00P3Ge9ey016EzkZ; Tue, 25 Aug 2015 18:15:00 +0000
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
To: draft-josefsson-scrypt-kdf-03.all@tools.ietf.org
Message-ID: <55DCB0A2.5050102@alum.mit.edu>
Date: Tue, 25 Aug 2015 14:14:58 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1440526500; bh=qWEin/TCQFlTsZC7+7Li6ryAKPWxqU3RfzWN3LgtjNw=; h=Received:Received:From:Subject:To:Message-ID:Date:MIME-Version: Content-Type; b=orSj9EQJbRDyhGX7lRkNqd6zohW7KNzXKnr7DkDTQSdHODjB1To6sMSJwcJzcqpyE hmr0p6pgAzPhGkzLM9fERv4J+pw9S1chGjbvZi4eQpqdUnZNBuXQ51nHK0esDsE/Mk 03AOf1B9ED8PWIRMj3E3KKj5xNx0Nln09t4P1oieiesKynTndWjREa994CHekKXhZW DgRsKXmi/xQPTvY/rKjd3Glsrvjt9PSPQPgn0RzK2Wg6Sut4lkzgfVQKg457laq6Kh xPd0QQ48NUW49Bx3y+YGQ19iC4SeltXamNb7rNsuLIH0nEXXgdPyQ3NOcvnWOo1t6y N3xZvnu/XL8rA==
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/fToZiioHo-6x5ZRQWNcTr-aUYVk>
Cc: General Area Review Team <gen-art@ietf.org>
Subject: [Gen-art] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Aug 2015 18:15:03 -0000

I am the assigned Gen-ART reviewer for this draft. For background on 
Gen-ART, please see the FAQ at <​ 
http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other comments you may receive.

Summary:

This draft is on the right track but has open issues, described in the 
review.

Disclaimers:

I am not schooled about crypto algorithms and techniques. I have 
reviewed this draft from the perspective of a software engineer, trying 
to understand if I could implement from this specification.

I also know nothing about using ASN.1 to configure KDFs. So I can't 
comment on section 6. This draft ought to be reviewed by somebody 
capable of checking that.

I have not attempted to verify the test vectors in sections 7-12.

-------------
Major issues:

(Issue-1): Intended status

The intended status of this document is Informational. I question why it 
is not a normative document. As best I can tell, this is the formal 
specification of the algorithm. Those that use it would presumably want 
to claim conformance to it. The introduction describes this as an 
alternative to other KDF functions, including only one with an RFC 
reference - RFC2898. That one is also informational, but it is a 
restatement of an algorithm specified elsewhere, so that RFC can be 
viewed as an informational supplement to the actual definition. The same 
is not true of this document.

(Of course changing this to a normative document would require 
significant changes, including adding 2119 language. And it probably 
could then not be handled as an AD-sponsored document.)

(Issue-2): Type mismatch on call to scryptROMix

The scryptROMix function calls scryptBlockMix with an octet vector of 
length 128*r octets. But the definition of scryptBlockMix specifies that 
the input argument should be a vector of 2*r 64-octet blocks.

Clearly these don't match. One way to make them match would be to divide 
the single 128*r octet vector into two 64-octet vectors, and then to 
treat r as 2 inside of scryptBlockMix. I don't know if that is the intent.

(Issue-3): Definition of Integerify

The Integerify function is not clearly defined. The following is given 
in the document:

   j = Integerify (X) mod N
       where Integerify (B[0] ... B[2 * r - 1]) is defined
       as the result of interpreting B[2 * r - 1] as a
       little-endian integer.

I can make no sense of this definition of Integerify. The description 
implies that B must be an array containing elements up to index 2*r-1. 
But the definition of B is "Input octet vector of length 128 * r 
octets". Taking the definition literally, B[2*r-1] must be an octet, and 
2*r must be less than 128. That seems like nonsense to me.

I found the following in the [SCRYPT] paper:

"We expect that for reasons of performance and simplicity,
implementors will restrict N to being a power of 2, in which case
the function Integerify can be replaced by reading the first (or
last) machine-length word from a k-bit block."

Simply reading a machine-length word ignores the differences between 
little-endian and big-endian machines, and machines with different word 
sizes. Conveniently, [SALSA20SPEC] defines a littleendian function that 
yields a 32-bit integer from four bytes. That should be sufficient bits 
for computing "j". So Integerify(X) could be defined as:

    littleendian(X[0],X[1],X[2],X[3])

or

    littleendian(X[128*r-4],X[128*r-3],X[128*r-2],X[128*r-1])

(I don't think it matters which, as long as everyone does it the same way.)

In any case, the language is ambiguous and needs to be clarified.

-------------
Minor issues:

(Issue-4): Identifiers reused for different meanings

In scrypt, "B" is an array of "p" vectors, each of which is 128*r 
octets. In scryptROMix, "B" is a single vector of 128*r octets. In 
scryptBlockMix, "B" is a vector of 2*r 64-octet blocks.

In both scrypt and scryptROMix "r" is the same block size parameter. But 
in scryptROMix it is only used in the (broken) definition of Integerify. 
In scryptBlockMix "r" is (apparently, if I have figured things out) 
always 2, and unrelated to the other "r".

The document would be clearer if distinct identifiers were used for each 
unique concept.

For those identifiers whose value is intended to be constant and common 
across all the functions (such as "N"), it would be better to define 
them once, globally.

(Issue-5): Confusing/misleading names/definitions of identifiers

The "Block size" parameter ("r") does not denote the size of a block. It 
is a factor in the size of blocks, varying from function to function. 
Exactly what concept it denotes, and how one would choose it, isn't 
clear to me.

The definition of the "N" parameter (CPU/Memory cost parameter) isn't 
especially clear. It appears that increasing N increases the cost both 
of CPU and memory. But the "p" (parallelization) parameter acts as a 
multiplier on N, also increasing the cost. It is far from clear how one 
would choose appropriate values for N and p. For a given value of N*p, 
is it better for N to be large, or p to be large?

I suggest that more thought be given to what these things mean in the 
context of this application, and then choose identifier names and 
descriptions accordingly. It may be better to refactor these some other way.

The ASN.1 in section 6 assigns names to several of these identifiers. It 
would be helpful to readers if the names used in defining the algorithms 
were also the names used here.

(Issue-6): Dubious stability of references

I looked for prior discussion of this draft, and found some on the saag 
mailing list regarding the references.

The definition of the Salsa20 hash function in 
http://cr.yp.to/snuffle/spec.pdf seems clear enough, but is the document 
reference stable? It might be safer to replicate the definition in this 
document (in an appendix) with attribution. It doesn't appear that there 
is any copyright in the referenced document. I'll also note that call to 
this hash function in scryptBlockMix is to "Salsa", not Salso20. It 
ought to be consistent with the definition.

------------------------
Nits/editorial comments:

(Issue-7): IdNits reported errors

IdNits reports:

-- Obsolete informational reference (is this intentional?): RFC 5208
      (Obsoleted by RFC 5958)

This is triggered by the following in section 6:

"To be usable in PKCS#8 [RFC5208] and Asymmetric Key Packages [RFC5958] 
the following extension of the PBES2-KDFs type is needed."

Since RFC5958 is referenced, and it obsoletes RFC5208, what is the 
reason for referencing both?

(There is also an error about 2119, but it is bogus, triggered by the 
use of OPTIONAL in the ASN.1.)

This completes my review.

	Thanks,
	Paul Kyzivat