Re: [Cfrg] Publicly verifiable benchmarks
Andrey Jivsov <crypto@brainhub.org> Sun, 12 October 2014 02:54 UTC
Return-Path: <crypto@brainhub.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA46F1A0A6B for <cfrg@ietfa.amsl.com>; Sat, 11 Oct 2014 19:54:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.301
X-Spam-Level:
X-Spam-Status: No, score=-1.301 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, J_CHICKENPOX_15=0.6, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id maKLxgZYnlZU for <cfrg@ietfa.amsl.com>; Sat, 11 Oct 2014 19:54:49 -0700 (PDT)
Received: from resqmta-po-11v.sys.comcast.net (resqmta-po-11v.sys.comcast.net [IPv6:2001:558:fe16:19:96:114:154:170]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 07D751A19E4 for <cfrg@irtf.org>; Sat, 11 Oct 2014 19:54:48 -0700 (PDT)
Received: from resomta-po-14v.sys.comcast.net ([96.114.154.238]) by resqmta-po-11v.sys.comcast.net with comcast id 22uo1p00158ss0Y012uoZt; Sun, 12 Oct 2014 02:54:48 +0000
Received: from [192.168.1.2] ([71.202.164.227]) by resomta-po-14v.sys.comcast.net with comcast id 22un1p00E4uhcbK012unqH; Sun, 12 Oct 2014 02:54:48 +0000
Message-ID: <5439ED77.3010701@brainhub.org>
Date: Sat, 11 Oct 2014 19:54:47 -0700
From: Andrey Jivsov <crypto@brainhub.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1
MIME-Version: 1.0
To: cfrg@irtf.org
References: <20141010071847.9478.qmail@cr.yp.to>
In-Reply-To: <20141010071847.9478.qmail@cr.yp.to>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1413082488; bh=4RXdlie6oXfQjdDs56DIU0xPyGRYdJe6OSAU2dGk6yg=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=FLglQdaTtdlFqqJ8aI/T7drHwPCbw+S1VghHrZz6jWq/8jNv9KL1mH1vcyVBwj+Sg PBMGILA7FfgpnzNM9wQF6ftxWQmRckx13444gBSfA1vxK6gF82Gc1wNOLnNFMzx/0K 4wKtPa9BmdZMoU/892G6aBRB/c+lnpVQPw+IC+qHqIQ4/vMwesAiC5RPwfyOBAQ4ew 1ib/Ugt9eoYbf4ULL+dZg3vp6WxtPIHsXCRHWT1SR1HWvW2khGgWlDZmpu0Wbv/kw3 IaFUsVlLtVp/iipvJRn5ImV4Qpa1zkvTI/56PemNrFed2GdKFk3Q+wVYhgzp/U17EY H9wU+H7fQm02A==
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/wTV6zTfn22q-rD78EkJqYUlbbr4
Subject: Re: [Cfrg] Publicly verifiable benchmarks
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Oct 2014 02:54:52 -0000
On 10/10/2014 12:18 AM, D. J. Bernstein wrote: > Parkinson, Sean writes: >> Making a decision on new elliptic curves based on data that hasn't >> been corroborated by a 3rd party is bad practice. > > More than 1500 implementations of various cryptographic functions have > been contributed to, and are publicly available as part of, the > state-of-the-art SUPERCOP benchmarking framework: > > http://bench.cr.yp.to/supercop.html > > Practically all of the implementations are free to use, and many of them > are in fact widely used. Most of the researchers producing speed records > for cryptography are contributing their software to the same system. > > The eBACS benchmarking project systematically and reproducibly measures > these implementations on more than 20 different microarchitectures: > > http://bench.cr.yp.to/computers.html > > It's easy for other people to download and run the benchmarks on their > favorite CPU, and to contribute the results to the central system, where > detailed speed reports are posted publicly for other people to see and > verify. eBACS has practically eliminated the measurement errors and > disputes that plagued previous approaches to cryptographic benchmarking. > eBASH, the hash component of eBACS, was mentioned 30 times in NIST's > final report on the SHA-3 competition. > > As a concrete example, now that Mike has sent crypto_dh/ed448goldilocks > in for benchmarking, eBACS is automatically filling lines into > > http://bench.cr.yp.to/results-dh.html > http://bench.cr.yp.to/impl-dh/ed448goldilocks.html > > whenever machines finish benchmark runs: 529066 cycles on titan0, 689020 > cycles on hydra8, 757676 cycles on h6sandy, etc. These don't exactly > confirm Mike's comparisons to the Sandy Bridge numbers that Microsoft > claimed in http://eprint.iacr.org/2014/130.pdf---although they do seem > adequate to support his point about ed448goldilocks hitting a sweet spot > on the security/speed curve while Microsoft's design strategy > compromises the security/speed tradeoff: > > * ed448goldilocks isn't quite twice as fast as numsp512t1 > (ed-512-mers): 757676 cycles vs. 1293000 cycles. > > * ed448goldilocks is about 23% slower than numsp384t1 (ed-384-mers): > 757676 cycles vs. 617000 cycles. > > Of course, if Mike or anyone else thinks that ed448goldilocks can be > computed more efficiently, he's welcome to prove it by contributing a > better implementation of that function to SUPERCOP, and then the > benchmarks will be updated appropriately. He can also raise reasonable > questions about the accuracy of Microsoft's claims; if Microsoft's > numbers are actually correct then Microsoft can dispel the skepticism > by contributing their own code to SUPERCOP. > > As a more detailed example of reproducibility, let's look at what the > benchmarks say about X25519 on Haswell. Checking > > http://bench.cr.yp.to/results-dh.html > > we see a median of 145907 cycles (quartiles: 144894 and 147191 cycles) > for the crypto_dh/curve25519 software on an Intel Xeon E3-1275 V3. > Clicking on "titan0" shows more information: the best speeds found for > crypto_scalarmult/curve25519 on this machine used > > gcc-4.8.1 -m64 -O -march=native -mtune=native -fomit-frame-pointer > > to compile the "amd64-51" implementation. Anyone can use the same free > implementation with the same free compiler and will obtain the same > compiled code running in the same number of Haswell cycles: > > wget https://hyperelliptic.org/ebats/supercop-20140924.tar.bz2 > tar -xf supercop-20140924.tar.bz2 > cd supercop-20140924 > # compile and measure everything: nohup sh data-do & > # alternatively, extract X25519 as follows: > mkdir x25519 > cp measure-anything.c x25519 > cp crypto_scalarmult/measure.c x25519 > cp crypto_scalarmult/curve25519/amd64-51/* x25519 > cp include/randombytes.h x25519 > cp cpucycles/amd64cpuinfo.h x25519/cpucycles.h > cp cpucycles/amd64cpuinfo.c x25519/cpucycles.c > cp cpucycles/osfreq.c x25519/osfreq.c > cd x25519 > ( sed s/CRYPTO_/crypto_scalarmult_/ < api.h > echo '#define crypto_scalarmult_IMPLEMENTATION "amd64-51"' > echo '#define crypto_scalarmult_VERSION "-"' > ) > crypto_scalarmult.h > echo 'static const char cpuid[] = {0};' > cpuid.h > gcc -m64 -O -march=native -mtune=native -fomit-frame-pointer \ > -D COMPILER='"gcc"' \ > -D LOOPS=1 \ > -o measure measure-anything.c measure.c cpucycles.c \ > mont* fe*.c *.s > ./measure Thank you for this information. I added timing code and built the "measure" program in the x25519 directory (see above) on Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz, Ivy Bridge (not Haswell). 3 results are in op/sec: above x25519 variable base : openssl : curve25519-donna: 18167.7 : 11030.5 : 14098.2 18167.7/11030.5 = 65% The ./measure reported 182112 cycles for variable base before and after my changes (which looks correct: 3300000000 Hz/182112=18120 sec). Specifically, I timed the crypto_scalarmult(). I won't have access to my Haswell machine for a few days. > > For example, on one core of Andrey's 3.4GHz i7-4770 (Haswell), this > X25519 code will take the same ~146000 cycles: i.e., more than 23000 > operations/second, whereas the latest Haswell-optimized OpenSSL NIST > P-256 ECDH code that he measured was only 15000 operations/second. > > This is, by the way, rather old Curve25519 code optimized for Nehalem, > the microarchitecture of the first Core i7 CPUs in 2008---but on Intel's > latest Haswell CPUs it's still solidly beating NIST P-256 code that's > optimized for Haswell. There's ample literature explaining that > > * reductions mod 2^255-19 are faster than reductions mod > 2^256-2^224+2^192+2^96-1 on a broad range of platforms, and that > > * Montgomery scalarmult is faster than Weierstrass scalarmult, > > so the performance gap is unsurprising. > > Why did Andrey report only 17289 operations/second for X25519 on > Haswell? The answer, in a nutshell, is that there's an active ecosystem > of Curve25519/X25519/Ed25519 implementations, and it's easy to find > implementations that prioritize simplicity over speed---including the > one implementation included in Andrey's manual benchmarks. Of course, > any application developer who needs more speed will look for, and find, > the faster X25519 implementations. > > ---Dan It's natural for a developer, the consumer of these open source libraries, to test-drive the speed tests the way I did. If the sequence of steps needed to run the tests is clear and short, the results are easily verifiable. In case of openssl the results depends on: * the exact version of the source code * the exact version of the compiler and assembler tools * the options given to the config, configure * CPU OpenSSL will autodetect the gcc and build the binary by excluding or including certain highly-optimized pieces without any relevant reports. Some beneficial configuration options are not always the defaults. Not surprisingly, the openssl that comes with the latest OS I test on, Fedora Core 20, is 3+ times slower than what I report above. I find that the quickest way for me to confirm that I am timing the right code is to use the debugger. For example, I confirmed that ./measure is spending time in crypto_scalarmult_curve25519_amd64_51_ladderstep in a hand-written assembler file. http://bench.cr.yp.to looks like an excellent way to try code on platforms one doesn't own.
- [Cfrg] When's the decision? Watson Ladd
- Re: [Cfrg] When's the decision? Yoav Nir
- Re: [Cfrg] When's the decision? Stephen Farrell
- Re: [Cfrg] When's the decision? Watson Ladd
- Re: [Cfrg] When's the decision? David Jacobson
- Re: [Cfrg] When's the decision? Watson Ladd
- Re: [Cfrg] When's the decision? Michael Hamburg
- Re: [Cfrg] When's the decision? David Jacobson
- Re: [Cfrg] When's the decision? D. J. Bernstein
- [Cfrg] Publicly verifiable benchmarks D. J. Bernstein
- Re: [Cfrg] When's the decision? Parkinson, Sean
- Re: [Cfrg] When's the decision? Watson Ladd
- Re: [Cfrg] When's the decision? Parkinson, Sean
- Re: [Cfrg] When's the decision? Mike Hamburg
- Re: [Cfrg] When's the decision? Parkinson, Sean
- Re: [Cfrg] When's the decision? Phillip Hallam-Baker
- Re: [Cfrg] When's the decision? Mike Hamburg
- Re: [Cfrg] When's the decision? Parkinson, Sean
- Re: [Cfrg] Publicly verifiable benchmarks David Jacobson
- Re: [Cfrg] Publicly verifiable benchmarks Michael Hamburg
- Re: [Cfrg] Publicly verifiable benchmarks Andrey Jivsov
- Re: [Cfrg] Publicly verifiable benchmarks Watson Ladd
- Re: [Cfrg] Publicly verifiable benchmarks Parkinson, Sean
- Re: [Cfrg] Publicly verifiable benchmarks D. J. Bernstein
- Re: [Cfrg] Publicly verifiable benchmarks Michael Hamburg
- [Cfrg] Constant-time implementations D. J. Bernstein
- Re: [Cfrg] Constant-time implementations David Jacobson
- Re: [Cfrg] Constant-time implementations Adam Langley
- Re: [Cfrg] Constant-time implementations Yoav Nir
- Re: [Cfrg] Constant-time implementations Watson Ladd
- Re: [Cfrg] Constant-time implementations Mike Hamburg
- Re: [Cfrg] When's the decision? Paterson, Kenny
- Re: [Cfrg] When's the decision? Parkinson, Sean
- Re: [Cfrg] When's the decision? Ilari Liusvaara
- Re: [Cfrg] When's the decision? Yoav Nir
- [Cfrg] ed448goldilocks vs. numsp384t1 and numsp51… D. J. Bernstein
- Re: [Cfrg] ed448goldilocks vs. numsp384t1 and num… Ilari Liusvaara
- Re: [Cfrg] ed448goldilocks vs. numsp384t1 and num… Michael Hamburg
- Re: [Cfrg] ed448goldilocks vs. numsp384t1 and num… Ilari Liusvaara
- Re: [Cfrg] ed448goldilocks vs. numsp384t1 and num… Michael Hamburg