Re: [Cfrg] Timing of libsodium, curve25519-donna, MSR ECCLib, and openssl-master

Andrey Jivsov <crypto@brainhub.org> Mon, 18 August 2014 04:50 UTC

Return-Path: <crypto@brainhub.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ABB991A02D9 for <cfrg@ietfa.amsl.com>; Sun, 17 Aug 2014 21:50:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6cWmQZ8xe9do for <cfrg@ietfa.amsl.com>; Sun, 17 Aug 2014 21:50:17 -0700 (PDT)
Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:96]) by ietfa.amsl.com (Postfix) with ESMTP id 1F72F1A02D8 for <cfrg@irtf.org>; Sun, 17 Aug 2014 21:50:17 -0700 (PDT)
Received: from omta24.emeryville.ca.mail.comcast.net ([76.96.30.92]) by qmta09.emeryville.ca.mail.comcast.net with comcast id g4pj1o0011zF43QA94qGmG; Mon, 18 Aug 2014 04:50:16 +0000
Received: from [192.168.1.2] ([71.202.164.227]) by omta24.emeryville.ca.mail.comcast.net with comcast id g4qF1o00K4uhcbK8k4qFXu; Mon, 18 Aug 2014 04:50:16 +0000
Message-ID: <53F18607.3000005@brainhub.org>
Date: Sun, 17 Aug 2014 21:50:15 -0700
From: Andrey Jivsov <crypto@brainhub.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0
MIME-Version: 1.0
To: cfrg@irtf.org
References: <53F0010B.6080101@brainhub.org> <CD159876-F061-4EB8-B1DC-FAB8E4798E26@shiftleft.org> <53F108CF.4040704@brainhub.org>
In-Reply-To: <53F108CF.4040704@brainhub.org>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1408337416; bh=bcSua1orAHfEhSDM6e7MulRIjaIPlN7Cj8zH3gbKZ4g=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=tXPHekymOfLCPysB3gtzmjMw9z62UqWXz/WG5+M9BQWP1ywG601/1Li/c0ApKH07i dDvJot/sQYqeWu7SCoL0cSLfVkud9cnAq9mWq4l4mlk3BnsswHsED6wbCN7n2o4jc5 1iHC6z+svewBuPi88l7ZKNwvN7BZlkhPK4bLyucerrVJ8rSKrxp6JRvXUpgd9NioTx urmNDGwKGmc/5FmP6yiVlmvYA1VBk0GfOU8i3OJXcY3cVpT7fwkBePIv6eqtGgj5Ks O8QEsBZsXZU3dSG2xsRW82EzfMEWCJf7bpA94H2mOqZIxeCa/nH0kdmHHzrNem8CT5 TjOhfTwqlpHpQ==
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/fljlJUzqe5YOhfXllQA6sP2H2YU
Subject: Re: [Cfrg] Timing of libsodium, curve25519-donna, MSR ECCLib, and openssl-master
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Aug 2014 04:50:18 -0000

Michael Hamburg's question about hyperthreading made me think what kind 
of performance I can actually get on the whole CPU, as opposed to a 
single core?

I confirmed that, using the Curve25519 as an example, the 
single-threaded op/sec value can be multiplied by the number of cores to 
arrive at the total count.

Which gives us over 60,000 256-bit variable base scalar multiplications 
on a whole CPU. Recall that my reported number was 15764.3 op/sec.

To perform the measurement I wrapped the curve25519-donna code into the 
multithreaded code.

As the prints below show, the op/sec number, per thread, stays at ~15000 
level, deteriorating slightly, until the number of threads reaches the 
number of cores. The total operation count per CPU peaks at the number 
of threads equal the number of cores and then stays constant for higher 
number of threads. (actually, it peaks at the number of cores + 1, but 
that's a small increase).


Prints from the program follow. The diff: 
https://github.com/brainhub/curve25519-donna/commit/8fe32587444b9be3b9ec06cb2334a2cd3fc356cd.

Waiting in thread 1 ...
thread 1: 15616.2 op/sec in 6 sec
Total for CPU: 15616.2 op/sec

Waiting in thread 1 ...
Waiting in thread 2 ...
thread 1: 15221 op/sec in 7 sec
thread 2: 15224.6 op/sec in 7 sec
Total for CPU: 30445.5 op/sec

Waiting in thread 1 ...
Waiting in thread 2 ...
Waiting in thread 3 ...
thread 1: 14950.4 op/sec in 7 sec
thread 2: 14962.4 op/sec in 7 sec
thread 3: 14959.9 op/sec in 7 sec
Total for CPU: 44872.7 op/sec

Waiting in thread 1 ...
Waiting in thread 2 ...
Waiting in thread 3 ...
Waiting in thread 4 ...
thread 1: 14900.6 op/sec in 7 sec
thread 2: 14960.9 op/sec in 7 sec
thread 3: 14937.9 op/sec in 7 sec
thread 4: 14308.3 op/sec in 7 sec
Total for CPU: 59107.7 op/sec

Waiting in thread 4 ...
Waiting in thread 5 ...
Waiting in thread 3 ...
Waiting in thread 1 ...
Waiting in thread 2 ...
thread 1: 12298.4 op/sec in 8 sec
thread 2: 13157.2 op/sec in 8 sec
thread 3: 12187.7 op/sec in 8 sec
thread 4: 11612.9 op/sec in 9 sec
thread 5: 11201.8 op/sec in 9 sec
Total for CPU: 60457.9 op/sec

... and there is no improvement beyond 5 threads.