Re: [secdir] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03

Stephen Farrell <stephen.farrell@cs.tcd.ie> Fri, 20 November 2015 15:23 UTC

To: Simon Josefsson <simon@josefsson.org>, Paul Kyzivat <pkyzivat@alum.mit.edu>
References: <55DCB0A2.5050102@alum.mit.edu> <87d1v4hn1h.fsf@latte.josefsson.org>
From: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Openpgp: id=D66EA7906F0B897FB2E97D582F3C8736805F8DA2; url=
Message-ID: <564F3AB9.7060400@cs.tcd.ie>
Date: Fri, 20 Nov 2015 15:22:33 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <87d1v4hn1h.fsf@latte.josefsson.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/secdir/xcFMN9mGaiCaTAId17NK3-EYri0>
Cc: draft-josefsson-scrypt-kdf.all@tools.ietf.org, General Area Review Team <gen-art@ietf.org>, secdir@ietf.org
Subject: Re: [secdir] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03
Precedence: list

(This combines the distributions for Simon's mail to gen-art
and secdir. I think one thread responding to my question(s)
below will work better.)

Hi Simon, all,

Thanks for the reviews and updates.

Does anyone think we need more review for this or is
it now ready for IESG eval? Modulo one question below,
I think it is ready to move forward, but there are a
lot of detailed changes in this revision, so it might
be prudent to try get a few eyeballs on those.

One other thing below...

On 20/11/15 12:29, Simon Josefsson wrote:
> Executive Summary: I have submitted
> https://tools.ietf.org/html/draft-josefsson-scrypt-kdf-04
> that (hopefully) resolve the gen-art and secdir feedback.
> 
> Paul Kyzivat <pkyzivat@alum.mit.edu> writes:
> 
>> This draft is on the right track but has open issues, described in the
>> review.
> 
> Hello Paul.  I am sorry for the delay in answering your thorough (much
> appreciated!) review.  Your review was emailed to the wrong
> @tools.ietf.org address, but I can't blame that for my delay since I was
> notified of your review through the secdir review.
> 
>> (Issue-1): Intended status
>>
>> The intended status of this document is Informational. I question why
>> it is not a normative document. As best I can tell, this is the formal
>> specification of the algorithm. Those that use it would presumably
>> want to claim conformance to it. The introduction describes this as an
>> alternative to other KDF functions, including only one with an RFC
>> reference - RFC2898. That one is also informational, but it is a
>> restatement of an algorithm specified elsewhere, so that RFC can be
>> viewed as an informational supplement to the actual definition. The
>> same is not true of this document.
>>
>> (Of course changing this to a normative document would require
>> significant changes, including adding 2119 language. And it probably
>> could then not be handled as an AD-sponsored document.)
> 
> As I understand it, the IETF tradition is that descriptions of crypto
> algorithms that were invented elsewhere are not published as Standards
> Tracks documents.  So this is in line with that tradition.
> 
>> (Issue-2): Type mismatch on call to scryptROMix
>>
>> The scryptROMix function calls scryptBlockMix with an octet vector of
>> length 128*r octets. But the definition of scryptBlockMix specifies
>> that the input argument should be a vector of 2*r 64-octet blocks.
>>
>> Clearly these don't match. One way to make them match would be to
>> divide the single 128*r octet vector into two 64-octet vectors, and
>> then to treat r as 2 inside of scryptBlockMix. I don't know if that is
>> the intent.
> 
> I believe this is a presentation issue.  The intention is indeed that
> conversion between these formats is transparent.  In performant
> implementations, B will refers to the same memory area.  The document
> was confusing here, and I believe the document was further problematic
> since it described the inputs as an array of elements, rather than a
> concatenation of octets.  It now reads:
> 
>    Input:
>             B[0] || B[1] || ... || B[2 * r - 1]
>                    Input octet string (of size 128 * r octets),
>                    treated as 2 * r 64-octet blocks.
> 
>    Output:
>             B'[0] || B'[1] || ... || B'[2 * r - 1]
>                    Output octet string.
> 
> I hope this improves this aspect.
> 
>> (Issue-3): Definition of Integerify
>>
>> The Integerify function is not clearly defined. The following is given
>> in the document:
>>
>>   j = Integerify (X) mod N
>>       where Integerify (B[0] ... B[2 * r - 1]) is defined
>>       as the result of interpreting B[2 * r - 1] as a
>>       little-endian integer.
>>
>> I can make no sense of this definition of Integerify. The description
>> implies that B must be an array containing elements up to index
>> 2*r-1. But the definition of B is "Input octet vector of length 128 *
>> r octets". Taking the definition literally, B[2*r-1] must be an octet,
>> and 2*r must be less than 128. That seems like nonsense to me.
>>
>> I found the following in the [SCRYPT] paper:
>>
>> "We expect that for reasons of performance and simplicity,
>> implementors will restrict N to being a power of 2, in which case
>> the function Integerify can be replaced by reading the first (or
>> last) machine-length word from a k-bit block."
>>
>> Simply reading a machine-length word ignores the differences between
>> little-endian and big-endian machines, and machines with different
>> word sizes. Conveniently, [SALSA20SPEC] defines a littleendian
>> function that yields a 32-bit integer from four bytes. That should be
>> sufficient bits for computing "j". So Integerify(X) could be defined
>> as:
>>
>>    littleendian(X[0],X[1],X[2],X[3])
>>
>> or
>>
>>    littleendian(X[128*r-4],X[128*r-3],X[128*r-2],X[128*r-1])
>>
>> (I don't think it matters which, as long as everyone does it the same way.)
>>
>> In any case, the language is ambiguous and needs to be clarified.
> 
> Right.  I have changed this into:
> 
>           j = Integerify (X) mod N
>               where Integerify (X) is defined as the result of
>               interpreting the last four octets of X as a little-
>               endian integer, i.e.:
>                   littleendian(X[128*r-4], X[128*r-3],
>                                X[128*r-2], X[128*r-1])
> 
>> -------------
>> Minor issues:
>>
>> (Issue-4): Identifiers reused for different meanings
>>
>> In scrypt, "B" is an array of "p" vectors, each of which is 128*r
>> octets. In scryptROMix, "B" is a single vector of 128*r octets. In
>> scryptBlockMix, "B" is a vector of 2*r 64-octet blocks.
> 
> I'm not sure that changing variable names here is a good idea.  For any
> performant implementation, these variables will refer to the same memory
> area and thus a consistent variable name helps to signal that.  It is
> just the interpretation of that memory area that is different in these
> functions.
> 
>> In both scrypt and scryptROMix "r" is the same block size
>> parameter. But in scryptROMix it is only used in the (broken)
>> definition of Integerify.
> 
> It is used in the variable descriptions too, to indicate length of the
> variables.
> 
>> In scryptBlockMix "r" is (apparently, if I have figured things out)
>> always 2, and unrelated to the other "r".
> 
> No, r is the same throughout.
> 
>> The document would be clearer if distinct identifiers were used for
>> each unique concept.
> 
> I believe the identifier refers to unique concepts.  For B, what differs
> is how that memory area is interpreted in each algorithm description.
> 
> Can you think of some way to make this more clear, if the changes I've
> made now aren't sufficient?  I believe some changes I have made already
> make this somewhat clearer though.
> 
>> For those identifiers whose value is intended to be constant and
>> common across all the functions (such as "N"), it would be better to
>> define them once, globally.
>>
>> (Issue-5): Confusing/misleading names/definitions of identifiers
>>
>> The "Block size" parameter ("r") does not denote the size of a
>> block. It is a factor in the size of blocks, varying from function to
>> function. Exactly what concept it denotes, and how one would choose
>> it, isn't clear to me.
>>
>> The definition of the "N" parameter (CPU/Memory cost parameter) isn't
>> especially clear. It appears that increasing N increases the cost both
>> of CPU and memory. But the "p" (parallelization) parameter acts as a
>> multiplier on N, also increasing the cost. It is far from clear how
>> one would choose appropriate values for N and p. For a given value of
>> N*p, is it better for N to be large, or p to be large?
>>
>> I suggest that more thought be given to what these things mean in the
>> context of this application, and then choose identifier names and
>> descriptions accordingly. It may be better to refactor these some
>> other way.
> 
> This came from the secdir review as well -- I have added a section
> "Scrypt parameters" to discuss this.
> 
> 2.  Scrypt Parameters
> 
>    The scrypt function takes several parameters.  The passphrase P is
>    typically a human-chosen password.  The salt is normally uniquely and
>    randomly generated [RFC4086].  The parameter r ("blockSize") specify
>    the block size.  The CPU/Memory cost parameter N ("costParameter")
>    must be larger than 1, a power of 2 and less than 2^(128 * r / 8).
>    The parallelization parameter p ("parallelizationParameter"), a
>    positive integer less than or equal to ((2^32-1) * 32) / (128 * r).
>    The intended output length dkLen in octets of the derived key
>    ("keyLength"); a positive integer less than or equal to (2^32 - 1) *
>    32.
> 
>    Users of scrypt can tune the parameters N, r, and p according to the
>    amount of memory and computing power available, the latency-bandwidth
>    product of the memory subsystem, and the amount of parallelism
>    desired.  At the current time, taking r=8 and p=1 appears to yield
>    good results, but as memory latency and CPU parallelism increase it
>    is likely that the optimum values for both r and p will increase.
>    Note also that since the computations of SMix are independent, a
>    large value of p can be used to increase the computational cost of
>    scrypt without increasing the memory usage; so we can expect scrypt
>    to remain useful even if the growth rates of CPU power and memory
>    capacity diverge.

I think that new text will almost certainly generate debate in
the IESG. You introduce a whole bunch of parameters but only
give recommended values for two. You will for sure be asked
what's good for the others. So.... what's good for the others
and why not include that in the draft?

Cheers,
S.


> 
>> The ASN.1 in section 6 assigns names to several of these
>> identifiers. It would be helpful to readers if the names used in
>> defining the algorithms were also the names used here.
> 
> I've added these names to the "Scrypt Parameters" section above.
> 
>> (Issue-6): Dubious stability of references
>>
>> I looked for prior discussion of this draft, and found some on the
>> saag mailing list regarding the references.
>>
>> The definition of the Salsa20 hash function in
>> http://cr.yp.to/snuffle/spec.pdf seems clear enough, but is the
>> document reference stable? It might be safer to replicate the
>> definition in this document (in an appendix) with attribution. It
>> doesn't appear that there is any copyright in the referenced
>> document.
> 
> I have included the brief C snippet that defines the function in the
> section, which hopefully is sufficient clear for implementers (together
> with the already included test vector) to transpose it into something
> working in any language.  This allowed me to move those two references
> to informative ones.
> 
>> I'll also note that call to this hash function in scryptBlockMix is to
>> "Salsa", not Salso20. It ought to be consistent with the definition.
> 
> This is explained in the section:
> 
>  Below, Salsa(T) corresponds to the Salsa20/8 Core function applied to
>    the octet vector T.
> 
>> ------------------------
>> Nits/editorial comments:
>>
>> (Issue-7): IdNits reported errors
>>
>> IdNits reports:
>>
>> -- Obsolete informational reference (is this intentional?): RFC 5208
>>      (Obsoleted by RFC 5958)
>>
>> This is triggered by the following in section 6:
>>
>> "To be usable in PKCS#8 [RFC5208] and Asymmetric Key Packages
>> [RFC5958] the following extension of the PBES2-KDFs type is needed."
>>
>> Since RFC5958 is referenced, and it obsoletes RFC5208, what is the
>> reason for referencing both?
> 
> The reason was that a lot of protocols refers to 'PKCS#8' and RFC 5208
> is the IETF-canonical reference for that term, and a way to illustrate
> that the scrypt ASN.1 schema is compliant with "old" PKCS#8.  RFC 5958
> obsoletes RFC 5208 but the identifier 'PKCS#8' was lost in the process.
> 
> I'm fine with dropping 5208 as a reference, but I do believe it provides
> slightly more value to the reader.
> 
>> (There is also an error about 2119, but it is bogus, triggered by the
>> use of OPTIONAL in the ASN.1.)
>>
>> This completes my review.
> 
> Thank you!
> 
> /Simon
>

Re: [secdir] Gen-ART Last Call review of draft-jo… Stephen Farrell
Re: [secdir] Gen-ART Last Call review of draft-jo… Paul Kyzivat
Re: [secdir] Gen-ART Last Call review of draft-jo… Paul Kyzivat