Re: [Gen-art] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03

Stephen Farrell <stephen.farrell@cs.tcd.ie> Thu, 10 December 2015 13:11 UTC

Return-Path: <stephen.farrell@cs.tcd.ie>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E54EA1AD213 for <gen-art@ietfa.amsl.com>; Thu, 10 Dec 2015 05:11:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.311
X-Spam-Level:
X-Spam-Status: No, score=-4.311 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CUf0oByOKkRH for <gen-art@ietfa.amsl.com>; Thu, 10 Dec 2015 05:11:31 -0800 (PST)
Received: from mercury.scss.tcd.ie (mercury.scss.tcd.ie [134.226.56.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AC5B11B2A89 for <gen-art@ietf.org>; Thu, 10 Dec 2015 05:11:30 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by mercury.scss.tcd.ie (Postfix) with ESMTP id 2DE71BE79; Thu, 10 Dec 2015 13:11:29 +0000 (GMT)
X-Virus-Scanned: Debian amavisd-new at scss.tcd.ie
Received: from mercury.scss.tcd.ie ([127.0.0.1]) by localhost (mercury.scss.tcd.ie [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zL3Uge0P2j-8; Thu, 10 Dec 2015 13:11:26 +0000 (GMT)
Received: from [10.0.10.19] (unknown [212.76.224.242]) by mercury.scss.tcd.ie (Postfix) with ESMTPSA id 66F23BE73; Thu, 10 Dec 2015 13:11:25 +0000 (GMT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cs.tcd.ie; s=mail; t=1449753086; bh=S9HIV00y18P3uuILw5b7lKNhgl7z9owdc5jOgLrRpVM=; h=Subject:To:References:Cc:From:Date:In-Reply-To:From; b=uIRRSk3Ji3KxRAJx/tyz0zQf6DIvoPYkxPpOx5DE1LifEe1wHguEKoPtnlwVTrpDR opM2EFqX6gXsACsFJHs2+vZ3TuNYn9L/nrdsb9/VRaUGQSC9lJp6ZTyBEqm77LQh5S ZcVxw12QXzKqGJwGpmDy9S4t99wtxb0eoRh8DFik=
To: Simon Josefsson <simon@josefsson.org>, Paul Kyzivat <pkyzivat@alum.mit.edu>
References: <55DCB0A2.5050102@alum.mit.edu> <87d1v4hn1h.fsf@latte.josefsson.org>
From: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Openpgp: id=D66EA7906F0B897FB2E97D582F3C8736805F8DA2; url=
Message-ID: <566979FD.6030004@cs.tcd.ie>
Date: Thu, 10 Dec 2015 13:11:25 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0
MIME-Version: 1.0
In-Reply-To: <87d1v4hn1h.fsf@latte.josefsson.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/QgdXztVO64I2is5y8xO6iw0ezxQ>
Cc: draft-josefsson-scrypt-kdf.all@tools.ietf.org, General Area Review Team <gen-art@ietf.org>
Subject: Re: [Gen-art] Gen-ART Last Call review of draft-josefsson-scrypt-kdf-03
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Dec 2015 13:11:35 -0000

Hiya,

Are we all set with this one now?

I won't put it on the Dec17 IESG telechat (as I've already
got a bunch of stuff on that one) but will tee it up for
Jan7 if nobody tells me otherwise.

Thanks,
S.

On 20/11/15 12:29, Simon Josefsson wrote:
> Executive Summary: I have submitted
> https://tools.ietf.org/html/draft-josefsson-scrypt-kdf-04
> that (hopefully) resolve the gen-art and secdir feedback.
> 
> Paul Kyzivat <pkyzivat@alum.mit.edu> writes:
> 
>> This draft is on the right track but has open issues, described in the
>> review.
> 
> Hello Paul.  I am sorry for the delay in answering your thorough (much
> appreciated!) review.  Your review was emailed to the wrong
> @tools.ietf.org address, but I can't blame that for my delay since I was
> notified of your review through the secdir review.
> 
>> (Issue-1): Intended status
>>
>> The intended status of this document is Informational. I question why
>> it is not a normative document. As best I can tell, this is the formal
>> specification of the algorithm. Those that use it would presumably
>> want to claim conformance to it. The introduction describes this as an
>> alternative to other KDF functions, including only one with an RFC
>> reference - RFC2898. That one is also informational, but it is a
>> restatement of an algorithm specified elsewhere, so that RFC can be
>> viewed as an informational supplement to the actual definition. The
>> same is not true of this document.
>>
>> (Of course changing this to a normative document would require
>> significant changes, including adding 2119 language. And it probably
>> could then not be handled as an AD-sponsored document.)
> 
> As I understand it, the IETF tradition is that descriptions of crypto
> algorithms that were invented elsewhere are not published as Standards
> Tracks documents.  So this is in line with that tradition.
> 
>> (Issue-2): Type mismatch on call to scryptROMix
>>
>> The scryptROMix function calls scryptBlockMix with an octet vector of
>> length 128*r octets. But the definition of scryptBlockMix specifies
>> that the input argument should be a vector of 2*r 64-octet blocks.
>>
>> Clearly these don't match. One way to make them match would be to
>> divide the single 128*r octet vector into two 64-octet vectors, and
>> then to treat r as 2 inside of scryptBlockMix. I don't know if that is
>> the intent.
> 
> I believe this is a presentation issue.  The intention is indeed that
> conversion between these formats is transparent.  In performant
> implementations, B will refers to the same memory area.  The document
> was confusing here, and I believe the document was further problematic
> since it described the inputs as an array of elements, rather than a
> concatenation of octets.  It now reads:
> 
>    Input:
>             B[0] || B[1] || ... || B[2 * r - 1]
>                    Input octet string (of size 128 * r octets),
>                    treated as 2 * r 64-octet blocks.
> 
>    Output:
>             B'[0] || B'[1] || ... || B'[2 * r - 1]
>                    Output octet string.
> 
> I hope this improves this aspect.
> 
>> (Issue-3): Definition of Integerify
>>
>> The Integerify function is not clearly defined. The following is given
>> in the document:
>>
>>   j = Integerify (X) mod N
>>       where Integerify (B[0] ... B[2 * r - 1]) is defined
>>       as the result of interpreting B[2 * r - 1] as a
>>       little-endian integer.
>>
>> I can make no sense of this definition of Integerify. The description
>> implies that B must be an array containing elements up to index
>> 2*r-1. But the definition of B is "Input octet vector of length 128 *
>> r octets". Taking the definition literally, B[2*r-1] must be an octet,
>> and 2*r must be less than 128. That seems like nonsense to me.
>>
>> I found the following in the [SCRYPT] paper:
>>
>> "We expect that for reasons of performance and simplicity,
>> implementors will restrict N to being a power of 2, in which case
>> the function Integerify can be replaced by reading the first (or
>> last) machine-length word from a k-bit block."
>>
>> Simply reading a machine-length word ignores the differences between
>> little-endian and big-endian machines, and machines with different
>> word sizes. Conveniently, [SALSA20SPEC] defines a littleendian
>> function that yields a 32-bit integer from four bytes. That should be
>> sufficient bits for computing "j". So Integerify(X) could be defined
>> as:
>>
>>    littleendian(X[0],X[1],X[2],X[3])
>>
>> or
>>
>>    littleendian(X[128*r-4],X[128*r-3],X[128*r-2],X[128*r-1])
>>
>> (I don't think it matters which, as long as everyone does it the same way.)
>>
>> In any case, the language is ambiguous and needs to be clarified.
> 
> Right.  I have changed this into:
> 
>           j = Integerify (X) mod N
>               where Integerify (X) is defined as the result of
>               interpreting the last four octets of X as a little-
>               endian integer, i.e.:
>                   littleendian(X[128*r-4], X[128*r-3],
>                                X[128*r-2], X[128*r-1])
> 
>> -------------
>> Minor issues:
>>
>> (Issue-4): Identifiers reused for different meanings
>>
>> In scrypt, "B" is an array of "p" vectors, each of which is 128*r
>> octets. In scryptROMix, "B" is a single vector of 128*r octets. In
>> scryptBlockMix, "B" is a vector of 2*r 64-octet blocks.
> 
> I'm not sure that changing variable names here is a good idea.  For any
> performant implementation, these variables will refer to the same memory
> area and thus a consistent variable name helps to signal that.  It is
> just the interpretation of that memory area that is different in these
> functions.
> 
>> In both scrypt and scryptROMix "r" is the same block size
>> parameter. But in scryptROMix it is only used in the (broken)
>> definition of Integerify.
> 
> It is used in the variable descriptions too, to indicate length of the
> variables.
> 
>> In scryptBlockMix "r" is (apparently, if I have figured things out)
>> always 2, and unrelated to the other "r".
> 
> No, r is the same throughout.
> 
>> The document would be clearer if distinct identifiers were used for
>> each unique concept.
> 
> I believe the identifier refers to unique concepts.  For B, what differs
> is how that memory area is interpreted in each algorithm description.
> 
> Can you think of some way to make this more clear, if the changes I've
> made now aren't sufficient?  I believe some changes I have made already
> make this somewhat clearer though.
> 
>> For those identifiers whose value is intended to be constant and
>> common across all the functions (such as "N"), it would be better to
>> define them once, globally.
>>
>> (Issue-5): Confusing/misleading names/definitions of identifiers
>>
>> The "Block size" parameter ("r") does not denote the size of a
>> block. It is a factor in the size of blocks, varying from function to
>> function. Exactly what concept it denotes, and how one would choose
>> it, isn't clear to me.
>>
>> The definition of the "N" parameter (CPU/Memory cost parameter) isn't
>> especially clear. It appears that increasing N increases the cost both
>> of CPU and memory. But the "p" (parallelization) parameter acts as a
>> multiplier on N, also increasing the cost. It is far from clear how
>> one would choose appropriate values for N and p. For a given value of
>> N*p, is it better for N to be large, or p to be large?
>>
>> I suggest that more thought be given to what these things mean in the
>> context of this application, and then choose identifier names and
>> descriptions accordingly. It may be better to refactor these some
>> other way.
> 
> This came from the secdir review as well -- I have added a section
> "Scrypt parameters" to discuss this.
> 
> 2.  Scrypt Parameters
> 
>    The scrypt function takes several parameters.  The passphrase P is
>    typically a human-chosen password.  The salt is normally uniquely and
>    randomly generated [RFC4086].  The parameter r ("blockSize") specify
>    the block size.  The CPU/Memory cost parameter N ("costParameter")
>    must be larger than 1, a power of 2 and less than 2^(128 * r / 8).
>    The parallelization parameter p ("parallelizationParameter"), a
>    positive integer less than or equal to ((2^32-1) * 32) / (128 * r).
>    The intended output length dkLen in octets of the derived key
>    ("keyLength"); a positive integer less than or equal to (2^32 - 1) *
>    32.
> 
>    Users of scrypt can tune the parameters N, r, and p according to the
>    amount of memory and computing power available, the latency-bandwidth
>    product of the memory subsystem, and the amount of parallelism
>    desired.  At the current time, taking r=8 and p=1 appears to yield
>    good results, but as memory latency and CPU parallelism increase it
>    is likely that the optimum values for both r and p will increase.
>    Note also that since the computations of SMix are independent, a
>    large value of p can be used to increase the computational cost of
>    scrypt without increasing the memory usage; so we can expect scrypt
>    to remain useful even if the growth rates of CPU power and memory
>    capacity diverge.
> 
>> The ASN.1 in section 6 assigns names to several of these
>> identifiers. It would be helpful to readers if the names used in
>> defining the algorithms were also the names used here.
> 
> I've added these names to the "Scrypt Parameters" section above.
> 
>> (Issue-6): Dubious stability of references
>>
>> I looked for prior discussion of this draft, and found some on the
>> saag mailing list regarding the references.
>>
>> The definition of the Salsa20 hash function in
>> http://cr.yp.to/snuffle/spec.pdf seems clear enough, but is the
>> document reference stable? It might be safer to replicate the
>> definition in this document (in an appendix) with attribution. It
>> doesn't appear that there is any copyright in the referenced
>> document.
> 
> I have included the brief C snippet that defines the function in the
> section, which hopefully is sufficient clear for implementers (together
> with the already included test vector) to transpose it into something
> working in any language.  This allowed me to move those two references
> to informative ones.
> 
>> I'll also note that call to this hash function in scryptBlockMix is to
>> "Salsa", not Salso20. It ought to be consistent with the definition.
> 
> This is explained in the section:
> 
>  Below, Salsa(T) corresponds to the Salsa20/8 Core function applied to
>    the octet vector T.
> 
>> ------------------------
>> Nits/editorial comments:
>>
>> (Issue-7): IdNits reported errors
>>
>> IdNits reports:
>>
>> -- Obsolete informational reference (is this intentional?): RFC 5208
>>      (Obsoleted by RFC 5958)
>>
>> This is triggered by the following in section 6:
>>
>> "To be usable in PKCS#8 [RFC5208] and Asymmetric Key Packages
>> [RFC5958] the following extension of the PBES2-KDFs type is needed."
>>
>> Since RFC5958 is referenced, and it obsoletes RFC5208, what is the
>> reason for referencing both?
> 
> The reason was that a lot of protocols refers to 'PKCS#8' and RFC 5208
> is the IETF-canonical reference for that term, and a way to illustrate
> that the scrypt ASN.1 schema is compliant with "old" PKCS#8.  RFC 5958
> obsoletes RFC 5208 but the identifier 'PKCS#8' was lost in the process.
> 
> I'm fine with dropping 5208 as a reference, but I do believe it provides
> slightly more value to the reader.
> 
>> (There is also an error about 2119, but it is bogus, triggered by the
>> use of OPTIONAL in the ASN.1.)
>>
>> This completes my review.
> 
> Thank you!
> 
> /Simon
>