Re: [Gen-art] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03

Paul Kyzivat <> Fri, 20 November 2015 15:43 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 6B1231B2AA9 for <>; Fri, 20 Nov 2015 07:43:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.235
X-Spam-Status: No, score=-1.235 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_SOFTFAIL=0.665] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 5oRtSd696MCg for <>; Fri, 20 Nov 2015 07:43:19 -0800 (PST)
Received: from ( [IPv6:2001:558:fe21:29:69:252:207:41]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 8F3951B2AA8 for <>; Fri, 20 Nov 2015 07:43:18 -0800 (PST)
Received: from ([]) by with comcast id jrhM1r0012XD5SV01rjJGm; Fri, 20 Nov 2015 15:43:18 +0000
Received: from Paul-Kyzivats-MacBook-Pro.local ([]) by with comcast id jrjG1r00T3KdFy101rjGhe; Fri, 20 Nov 2015 15:43:18 +0000
To: Stephen Farrell <>, Simon Josefsson <>
References: <> <> <>
From: Paul Kyzivat <>
Message-ID: <>
Date: Fri, 20 Nov 2015 10:43:15 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=q20140121; t=1448034198; bh=GWN2SE6+4G8C93olK+3xchJOgWTViAEe6NRDClzQU8M=; h=Received:Received:Subject:To:From:Message-ID:Date:MIME-Version: Content-Type; b=o9Lkaoflw8+zOGFZcpOjA2ETOfMkmAYq1q7RZGbAvoFs8OYIlrOMYeX2Wbau5KsYa 5KtphumOsZyaCzruqsb1dx4toRdaN30IRvq7NwrAO9tg9MMOGVEEJaso+lHUqiISV9 BM9nrwYXHFalkJkp0Zm/Ujhg1UNLQunsCkHQk1Fv/684Mccc5tMfIi5XYBAIgm47yD cPTWwSxixXmBJaFrINUvCGmslPqBjtJf6b0WJOKYVTDjcWX/a1LIDRZdzHP44HyEqL nYe5sNwHi6j2D8nCeSNdLIt+Qeiv3WpTehe04srh5VmG02jJlnsgYoyfCmUkGDY8o8 gGnBt3jZXSbrg==
Archived-At: <>
Cc:, General Area Review Team <>,,
Subject: Re: [Gen-art] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 20 Nov 2015 15:43:22 -0000

On 11/20/15 10:22 AM, Stephen Farrell wrote:
> (This combines the distributions for Simon's mail to gen-art
> and secdir. I think one thread responding to my question(s)
> below will work better.)


I was out of my depth reviewing this because I am not familiar with 
common practice in defining security algorithms. I reviewed this from 
the perspective of a software engineer and found it very difficult to 

So it is probably better if secdir looks at my comments and decides if 
they have any merit and need further attention.


> Hi Simon, all,
> Thanks for the reviews and updates.
> Does anyone think we need more review for this or is
> it now ready for IESG eval? Modulo one question below,
> I think it is ready to move forward, but there are a
> lot of detailed changes in this revision, so it might
> be prudent to try get a few eyeballs on those.
> One other thing below...
> On 20/11/15 12:29, Simon Josefsson wrote:
>> Executive Summary: I have submitted
>> that (hopefully) resolve the gen-art and secdir feedback.
>> Paul Kyzivat <> writes:
>>> This draft is on the right track but has open issues, described in the
>>> review.
>> Hello Paul.  I am sorry for the delay in answering your thorough (much
>> appreciated!) review.  Your review was emailed to the wrong
>> address, but I can't blame that for my delay since I was
>> notified of your review through the secdir review.
>>> (Issue-1): Intended status
>>> The intended status of this document is Informational. I question why
>>> it is not a normative document. As best I can tell, this is the formal
>>> specification of the algorithm. Those that use it would presumably
>>> want to claim conformance to it. The introduction describes this as an
>>> alternative to other KDF functions, including only one with an RFC
>>> reference - RFC2898. That one is also informational, but it is a
>>> restatement of an algorithm specified elsewhere, so that RFC can be
>>> viewed as an informational supplement to the actual definition. The
>>> same is not true of this document.
>>> (Of course changing this to a normative document would require
>>> significant changes, including adding 2119 language. And it probably
>>> could then not be handled as an AD-sponsored document.)
>> As I understand it, the IETF tradition is that descriptions of crypto
>> algorithms that were invented elsewhere are not published as Standards
>> Tracks documents.  So this is in line with that tradition.
>>> (Issue-2): Type mismatch on call to scryptROMix
>>> The scryptROMix function calls scryptBlockMix with an octet vector of
>>> length 128*r octets. But the definition of scryptBlockMix specifies
>>> that the input argument should be a vector of 2*r 64-octet blocks.
>>> Clearly these don't match. One way to make them match would be to
>>> divide the single 128*r octet vector into two 64-octet vectors, and
>>> then to treat r as 2 inside of scryptBlockMix. I don't know if that is
>>> the intent.
>> I believe this is a presentation issue.  The intention is indeed that
>> conversion between these formats is transparent.  In performant
>> implementations, B will refers to the same memory area.  The document
>> was confusing here, and I believe the document was further problematic
>> since it described the inputs as an array of elements, rather than a
>> concatenation of octets.  It now reads:
>>     Input:
>>              B[0] || B[1] || ... || B[2 * r - 1]
>>                     Input octet string (of size 128 * r octets),
>>                     treated as 2 * r 64-octet blocks.
>>     Output:
>>              B'[0] || B'[1] || ... || B'[2 * r - 1]
>>                     Output octet string.
>> I hope this improves this aspect.
>>> (Issue-3): Definition of Integerify
>>> The Integerify function is not clearly defined. The following is given
>>> in the document:
>>>    j = Integerify (X) mod N
>>>        where Integerify (B[0] ... B[2 * r - 1]) is defined
>>>        as the result of interpreting B[2 * r - 1] as a
>>>        little-endian integer.
>>> I can make no sense of this definition of Integerify. The description
>>> implies that B must be an array containing elements up to index
>>> 2*r-1. But the definition of B is "Input octet vector of length 128 *
>>> r octets". Taking the definition literally, B[2*r-1] must be an octet,
>>> and 2*r must be less than 128. That seems like nonsense to me.
>>> I found the following in the [SCRYPT] paper:
>>> "We expect that for reasons of performance and simplicity,
>>> implementors will restrict N to being a power of 2, in which case
>>> the function Integerify can be replaced by reading the first (or
>>> last) machine-length word from a k-bit block."
>>> Simply reading a machine-length word ignores the differences between
>>> little-endian and big-endian machines, and machines with different
>>> word sizes. Conveniently, [SALSA20SPEC] defines a littleendian
>>> function that yields a 32-bit integer from four bytes. That should be
>>> sufficient bits for computing "j". So Integerify(X) could be defined
>>> as:
>>>     littleendian(X[0],X[1],X[2],X[3])
>>> or
>>>     littleendian(X[128*r-4],X[128*r-3],X[128*r-2],X[128*r-1])
>>> (I don't think it matters which, as long as everyone does it the same way.)
>>> In any case, the language is ambiguous and needs to be clarified.
>> Right.  I have changed this into:
>>            j = Integerify (X) mod N
>>                where Integerify (X) is defined as the result of
>>                interpreting the last four octets of X as a little-
>>                endian integer, i.e.:
>>                    littleendian(X[128*r-4], X[128*r-3],
>>                                 X[128*r-2], X[128*r-1])
>>> -------------
>>> Minor issues:
>>> (Issue-4): Identifiers reused for different meanings
>>> In scrypt, "B" is an array of "p" vectors, each of which is 128*r
>>> octets. In scryptROMix, "B" is a single vector of 128*r octets. In
>>> scryptBlockMix, "B" is a vector of 2*r 64-octet blocks.
>> I'm not sure that changing variable names here is a good idea.  For any
>> performant implementation, these variables will refer to the same memory
>> area and thus a consistent variable name helps to signal that.  It is
>> just the interpretation of that memory area that is different in these
>> functions.
>>> In both scrypt and scryptROMix "r" is the same block size
>>> parameter. But in scryptROMix it is only used in the (broken)
>>> definition of Integerify.
>> It is used in the variable descriptions too, to indicate length of the
>> variables.
>>> In scryptBlockMix "r" is (apparently, if I have figured things out)
>>> always 2, and unrelated to the other "r".
>> No, r is the same throughout.
>>> The document would be clearer if distinct identifiers were used for
>>> each unique concept.
>> I believe the identifier refers to unique concepts.  For B, what differs
>> is how that memory area is interpreted in each algorithm description.
>> Can you think of some way to make this more clear, if the changes I've
>> made now aren't sufficient?  I believe some changes I have made already
>> make this somewhat clearer though.
>>> For those identifiers whose value is intended to be constant and
>>> common across all the functions (such as "N"), it would be better to
>>> define them once, globally.
>>> (Issue-5): Confusing/misleading names/definitions of identifiers
>>> The "Block size" parameter ("r") does not denote the size of a
>>> block. It is a factor in the size of blocks, varying from function to
>>> function. Exactly what concept it denotes, and how one would choose
>>> it, isn't clear to me.
>>> The definition of the "N" parameter (CPU/Memory cost parameter) isn't
>>> especially clear. It appears that increasing N increases the cost both
>>> of CPU and memory. But the "p" (parallelization) parameter acts as a
>>> multiplier on N, also increasing the cost. It is far from clear how
>>> one would choose appropriate values for N and p. For a given value of
>>> N*p, is it better for N to be large, or p to be large?
>>> I suggest that more thought be given to what these things mean in the
>>> context of this application, and then choose identifier names and
>>> descriptions accordingly. It may be better to refactor these some
>>> other way.
>> This came from the secdir review as well -- I have added a section
>> "Scrypt parameters" to discuss this.
>> 2.  Scrypt Parameters
>>     The scrypt function takes several parameters.  The passphrase P is
>>     typically a human-chosen password.  The salt is normally uniquely and
>>     randomly generated [RFC4086].  The parameter r ("blockSize") specify
>>     the block size.  The CPU/Memory cost parameter N ("costParameter")
>>     must be larger than 1, a power of 2 and less than 2^(128 * r / 8).
>>     The parallelization parameter p ("parallelizationParameter"), a
>>     positive integer less than or equal to ((2^32-1) * 32) / (128 * r).
>>     The intended output length dkLen in octets of the derived key
>>     ("keyLength"); a positive integer less than or equal to (2^32 - 1) *
>>     32.
>>     Users of scrypt can tune the parameters N, r, and p according to the
>>     amount of memory and computing power available, the latency-bandwidth
>>     product of the memory subsystem, and the amount of parallelism
>>     desired.  At the current time, taking r=8 and p=1 appears to yield
>>     good results, but as memory latency and CPU parallelism increase it
>>     is likely that the optimum values for both r and p will increase.
>>     Note also that since the computations of SMix are independent, a
>>     large value of p can be used to increase the computational cost of
>>     scrypt without increasing the memory usage; so we can expect scrypt
>>     to remain useful even if the growth rates of CPU power and memory
>>     capacity diverge.
> I think that new text will almost certainly generate debate in
> the IESG. You introduce a whole bunch of parameters but only
> give recommended values for two. You will for sure be asked
> what's good for the others. So.... what's good for the others
> and why not include that in the draft?
> Cheers,
> S.
>>> The ASN.1 in section 6 assigns names to several of these
>>> identifiers. It would be helpful to readers if the names used in
>>> defining the algorithms were also the names used here.
>> I've added these names to the "Scrypt Parameters" section above.
>>> (Issue-6): Dubious stability of references
>>> I looked for prior discussion of this draft, and found some on the
>>> saag mailing list regarding the references.
>>> The definition of the Salsa20 hash function in
>>> seems clear enough, but is the
>>> document reference stable? It might be safer to replicate the
>>> definition in this document (in an appendix) with attribution. It
>>> doesn't appear that there is any copyright in the referenced
>>> document.
>> I have included the brief C snippet that defines the function in the
>> section, which hopefully is sufficient clear for implementers (together
>> with the already included test vector) to transpose it into something
>> working in any language.  This allowed me to move those two references
>> to informative ones.
>>> I'll also note that call to this hash function in scryptBlockMix is to
>>> "Salsa", not Salso20. It ought to be consistent with the definition.
>> This is explained in the section:
>>   Below, Salsa(T) corresponds to the Salsa20/8 Core function applied to
>>     the octet vector T.
>>> ------------------------
>>> Nits/editorial comments:
>>> (Issue-7): IdNits reported errors
>>> IdNits reports:
>>> -- Obsolete informational reference (is this intentional?): RFC 5208
>>>       (Obsoleted by RFC 5958)
>>> This is triggered by the following in section 6:
>>> "To be usable in PKCS#8 [RFC5208] and Asymmetric Key Packages
>>> [RFC5958] the following extension of the PBES2-KDFs type is needed."
>>> Since RFC5958 is referenced, and it obsoletes RFC5208, what is the
>>> reason for referencing both?
>> The reason was that a lot of protocols refers to 'PKCS#8' and RFC 5208
>> is the IETF-canonical reference for that term, and a way to illustrate
>> that the scrypt ASN.1 schema is compliant with "old" PKCS#8.  RFC 5958
>> obsoletes RFC 5208 but the identifier 'PKCS#8' was lost in the process.
>> I'm fine with dropping 5208 as a reference, but I do believe it provides
>> slightly more value to the reader.
>>> (There is also an error about 2119, but it is bogus, triggered by the
>>> use of OPTIONAL in the ASN.1.)
>>> This completes my review.
>> Thank you!
>> /Simon