Re: [Cfrg] Timing of libsodium, curve25519-donna, MSR ECCLib, and openssl-master

Andrey Jivsov <crypto@brainhub.org> Sun, 17 August 2014 19:56 UTC

Return-Path: <crypto@brainhub.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6F5F31A6FCF for <cfrg@ietfa.amsl.com>; Sun, 17 Aug 2014 12:56:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OSy_ifs3xpET for <cfrg@ietfa.amsl.com>; Sun, 17 Aug 2014 12:56:01 -0700 (PDT)
Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:64]) by ietfa.amsl.com (Postfix) with ESMTP id 072511A6FCE for <cfrg@irtf.org>; Sun, 17 Aug 2014 12:56:00 -0700 (PDT)
Received: from omta22.emeryville.ca.mail.comcast.net ([76.96.30.89]) by qmta07.emeryville.ca.mail.comcast.net with comcast id fvv61o0011vN32cA7vw0b5; Sun, 17 Aug 2014 19:56:00 +0000
Received: from [192.168.1.2] ([71.202.164.227]) by omta22.emeryville.ca.mail.comcast.net with comcast id fvvz1o0064uhcbK8ivvzZ1; Sun, 17 Aug 2014 19:56:00 +0000
Message-ID: <53F108CF.4040704@brainhub.org>
Date: Sun, 17 Aug 2014 12:55:59 -0700
From: Andrey Jivsov <crypto@brainhub.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0
MIME-Version: 1.0
To: Michael Hamburg <mike@shiftleft.org>
References: <53F0010B.6080101@brainhub.org> <CD159876-F061-4EB8-B1DC-FAB8E4798E26@shiftleft.org>
In-Reply-To: <CD159876-F061-4EB8-B1DC-FAB8E4798E26@shiftleft.org>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1408305360; bh=GP2AmNmw27OzxuB5PNnw6e379jlVK2DqwladdELdgAA=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=RMA2lyq261XdATs0DvgGOikloEK/2APEFXJqGimtWPH//NemPI+Zqve4pyT5uId8U StzXSQStqkVspnxPc4QxFTExVPwJJOlJmgFkJl8jf3IZJPFcCadBhtOTSXI4jprExq iCy2VQev05eNCvfY04fcS5z8r+WGagWbl07SwfK9+CVCOzA5ELjMqJyDHoenjUK44A +rSQFMO8M9OuucX3Lila10kIeHDVuwCbmICA5cJm9Q+jfO/VzaM2nXiDF0QMC/y0NN DoUenyPryyWt69cLfnmhwcLpBVjuAbzhqz5tM43KZqIP0xVL5REnS+VxgcWijkInyK WCIg7j036nK1w==
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/64o56J4FxuHBZ8v1WGNq3sBXGOg
Cc: "cfrg@irtf.org" <cfrg@irtf.org>
Subject: Re: [Cfrg] Timing of libsodium, curve25519-donna, MSR ECCLib, and openssl-master
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Aug 2014 19:56:03 -0000

On 08/17/2014 10:09 AM, Michael Hamburg wrote:
>
>> On Aug 16, 2014, at 6:10 PM, Andrey Jivsov <crypto@brainhub.org> wrote:
>>
>> I timed libsodium, curve25519-donna, MSR ECCLib, and openssl-master.
>
> Thanks for the data!
>
>> In all cases minor tweaks to the source code were added to measure and report the timing. I made sure to time the variable base scalar multiplication. I also timed the fixed base multiplication and precomputation (only needed for MSR ECCLib).
>>
>> Operations are reported as operations per second. I used default compile options.
>>
>> MSR ECCLib was slightly faster in variable base operations. It uses assembler code.
>>
>> Interestingly, MSR ECCLib Weierstrass a=-3 curves are only 10% slower than curve25519-donna. At the same time all pseudo-Mersenne prime curves are ~5 times faster than NIST P-256 (this is better than factor of 2 back-of-envelope difference in modp multiplication performance)
>>
>> The factor of 2+ improvement for fixed base calculation in MSR ECCLib is impressive. Note, however, the significant penalty that precalculation step adds. If the pre-calculation is included in timing, we could do ~50% more EDH agreements with NIST P-256.
>
> Precomputation on the curve’s base point shouldn't be included in timing, because it will be hard-coded in a production version.  Handling it will be cheap, unless it’s big enough to cause CPU cache thrashing.

That's a possibility. Tables need to be reasonably small to make this 
attractive.

I think the optimal solution will be application-specific.

While powerful servers are good candidates to take advantage of 
pre-computation, they also have the ability to run a separate pipeline 
to produce pre-generated pairs as well, for the entire cost of DH 
protocol done in 1 variable-base scalar multiplication, something that 
the crypto alone cannot possibly beat.

Small systems may care about executable binary size (code+static data), 
memory size, and pre-computation cost.

Also, let's not discard a possibility of a brief ephemeral value re-use.

>
>> CPU: Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz, no AVX2. Fedora Core 20 64 bit.
>
> Do you know if TurboBoost was on?  That CPU can boost to 3.7 GHz, so the cycle counts could be 12% different one way or another.

I checked my BIOS setting. It says "Intel HT Technology" Not Supported.

I followed:
http://xmodulo.com/2012/06/how-to-find-number-of-cpu-cores-on.html
and also:
http://www.linuxforums.org/forum/red-hat-fedora-linux/155256-3-ways-tell-if-ht-enabled-without-going-through-bios-problems.html

These commands report that I have 1 CPU, 4 cores, 4 processors in Linux. 
I interpret this that I have no Hyperthreading enabled.

I played with a few options in the BIOS that would make the measurements 
constant-time. I noticed one that made a difference: "Intel Turboboost: 
Enabled". It had an effect of boosting the clock speed. With this value 
disabled I get slightly lower op/sec values, but the relationship stayed 
the about the same with all measurements going down.

e.g. before/after:

donna: 15722/14105=1.11
MSR variable base Weierstrass: 14047/12602=1.11
openssl: 3245/3025.2=1.07

The numbers, as seen on the terminal, are at the bottom.

>
>> https://github.com/jedisct1/libsodium
>> modified tests in libsodium/test/default to take the timing:
>> crypto_scalarmult_curve25519_base: 15620.2 op/s
>> crypto_scalarmult_curve25519: 15602.8 op/s
>>
>> https://github.com/agl/curve25519-donna:
>> make ./speed-curve25519-donna-c64 && ./speed-curve25519-donna-c64
>> 63 us, 15722.1 op/s
>> (also modified to check variable base v.s. generator 9 -- no difference)
>>
>> OpenSSL 1.0.1e-fips 11 Feb 2013:
>> openssl speed ecdhp256 (ECDH_compute_key)
>> 256 bit ecdh (nistp256)   0.0003s   3245.4  op      op/s
>> and from git://git.openssl.org/openssl.git:
>> 256 bit ecdh (nistp256)   0.0003s   3406.7  op      op/s
>>
>> MSR ECCLib http://research.microsoft.com/en-us/downloads/149804d4-b5f5-496f-9a17-a013b242c02d/
>>
>> In the function that prints "Crypto operations: Weierstrass a=-3 over GF(2^256-189)":
>> with variable base (baseecdh_secret_agreement_Jac256) 14047.9 op/sec
>> with fixed base (ecdh_keygen_Jac256) 35370 op/sec
>> table precomp (ecdh_generator_table_Jac256) 1284.03 op/sec
>> table precomp+keygen+variable base 1056.86 op/sec
>> "ECDH(E) runs in [...] 328926 cycles"
>>
>> In the function that prints "Crypto operations: twisted Edwards a=-1 over GF(2^256-189)"
>> with variable base (ecdh_secret_agreement_Ted256): 17482 op/sec
>> with fixed base (ecdh_keygen_Ted256) 35370 op/sec: 45762.9 op/sec
>> table precomp (ecdh_generator_table_Ted256) 1346.98 op/sec
>> table precomp+keygen+variable base 1195.89 op/sec
>> "ECDH(E) runs in [...] 261385 cycles"
>>
>> memcpy of the 32 bytes: 595968511 op/sec, see attached code
>> ( i.e. memcpy count / crypto_scalarmult_curve25519 count = 38042 )
>> <memcpy-timing.c>_______________________________________________
>> Cfrg mailing list
>> Cfrg@irtf.org
>> http://www.irtf.org/mailman/listinfo/cfrg


[andrey@ivy libsodium]$
crypto_scalarmult_curve25519_base: 14052.8 op/s
crypto_scalarmult_curve25519: 14072.8 op/s

[andrey@ivy curve25519-donna]$
71 us, 14079.1 op/s (fixed base)
70 us, 14105.4 op/s (variable base)

[andrey@ivy openssl-master]$
  256 bit ecdh (nistp256)   0.0003s   3042.2

[andrey@ivy MSR_ECClib_v1.1]$
Crypto operations: Weierstrass a=-3 over GF(2^256-189)

   ECDH(E) runs in ................................................. 
366436 cycles
   ECDH with variable base 
................................................. 12602 op/sec
   ECDH with fixed base (keygen) 
................................................. 31787.3 op/sec
   ECDH table precomp ................................................. 
1148.5 op/sec
   ECDH table precomp+keygen+variablebase 
................................................. 1066.89 op/sec

Crypto operations: twisted Edwards a=-1 over GF(2^256-189)

   ECDH(E) runs in ................................................. 
291183 cycles
   ECDH with one component 
................................................. 15648.9 op/sec
   ECDH with fixed base (keygen) 
................................................. 41296.2 op/sec
   ECDH table precomp ................................................. 
1243.78 op/sec
   ECDH table precomp+keygen+variablebase 
................................................. 1120.32 op/sec