Re: [Cfrg] Publicly verifiable benchmarks

"Parkinson, Sean" <sean.parkinson@rsa.com> Sun, 12 October 2014 22:58 UTC

Return-Path: <sean.parkinson@rsa.com>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CC9041A9125 for <cfrg@ietfa.amsl.com>; Sun, 12 Oct 2014 15:58:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.001
X-Spam-Level:
X-Spam-Status: No, score=-1.001 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, J_CHICKENPOX_15=0.6, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sWZsYZO7-A0F for <cfrg@ietfa.amsl.com>; Sun, 12 Oct 2014 15:58:43 -0700 (PDT)
Received: from mailuogwdur.emc.com (mailuogwdur.emc.com [128.221.224.79]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4B3FC1A87A4 for <cfrg@irtf.org>; Sun, 12 Oct 2014 15:58:43 -0700 (PDT)
Received: from maildlpprd56.lss.emc.com (maildlpprd56.lss.emc.com [10.106.48.160]) by mailuogwprd53.lss.emc.com (Sentrion-MTA-4.3.0/Sentrion-MTA-4.3.0) with ESMTP id s9CMwV0d007117 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 12 Oct 2014 18:58:32 -0400
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd53.lss.emc.com s9CMwV0d007117
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=rsa.com; s=jan2013; t=1413154712; bh=sOZnHw5X8TrG50iI62e0IM9Esfs=; h=From:To:CC:Date:Subject:Message-ID:References:In-Reply-To: Content-Type:Content-Transfer-Encoding:MIME-Version; b=Msxj+xqAXlO1RrUk8LIORMYLR9hnKqMlFQJpRqTdUK4dYzsNQQqzvF8qb/tK26ol3 jSvCqchlJ7QhtMPwD2p/PcgcX/uWm19T9MwcIRoJ0Kr8hlGWbbnReoF9yGgRbXUEfm ejSfg0vfVTYtvd8+ZHypu8zi7f5TTLv0gc5yguGk=
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd53.lss.emc.com s9CMwV0d007117
Received: from mailusrhubprd02.lss.emc.com (mailusrhubprd02.lss.emc.com [10.253.24.20]) by maildlpprd56.lss.emc.com (RSA Interceptor); Sun, 12 Oct 2014 18:57:50 -0400
Received: from mxhub03.corp.emc.com (mxhub03.corp.emc.com [10.254.141.105]) by mailusrhubprd02.lss.emc.com (Sentrion-MTA-4.3.0/Sentrion-MTA-4.3.0) with ESMTP id s9CMw9Vb013931 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Sun, 12 Oct 2014 18:58:10 -0400
Received: from mx17a.corp.emc.com ([169.254.1.210]) by mxhub03.corp.emc.com ([10.254.141.105]) with mapi; Sun, 12 Oct 2014 18:58:08 -0400
From: "Parkinson, Sean" <sean.parkinson@rsa.com>
To: "djb@cr.yp.to" <djb@cr.yp.to>
Date: Sun, 12 Oct 2014 18:58:07 -0400
Thread-Topic: [Cfrg] Publicly verifiable benchmarks
Thread-Index: Ac/l01yrAL/KVl2rTruiGBepwBK1qgAnE5+A
Message-ID: <2FBC676C3BBFBB4AA82945763B361DE60A76B077@MX17A.corp.emc.com>
References: <20141010071847.9478.qmail@cr.yp.to> <5439ED77.3010701@brainhub.org> <CACsn0cnHJmydwsf5i9tHjvgawHN4fmQ8NwXaJMRgLEnEkcthEA@mail.gmail.com>
In-Reply-To: <CACsn0cnHJmydwsf5i9tHjvgawHN4fmQ8NwXaJMRgLEnEkcthEA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Sentrion-Hostname: mailusrhubprd02.lss.emc.com
X-RSA-Classifications: Source Code, DLM_1, public
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/xoGKG6T8HUXgLB4Tyfi6OguQRbs
Cc: "cfrg@irtf.org" <cfrg@irtf.org>
Subject: Re: [Cfrg] Publicly verifiable benchmarks
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Oct 2014 22:58:52 -0000

Is there a REST API to the benchmark data?

Sean
--
Sean Parkinson | Consultant Software Engineer | RSA, The Security Division of EMC
Office +61 7 3032 5232 | Fax +61 7 3032 5299
www.rsa.com


-----Original Message-----
From: Cfrg [mailto:cfrg-bounces@irtf.org] On Behalf Of Watson Ladd
Sent: Sunday, 12 October 2014 2:16 PM
To: Andrey Jivsov
Cc: cfrg@irtf.org
Subject: Re: [Cfrg] Publicly verifiable benchmarks

On Sat, Oct 11, 2014 at 7:54 PM, Andrey Jivsov <crypto@brainhub.org> wrote:
> On 10/10/2014 12:18 AM, D. J. Bernstein wrote:
>>
>> Parkinson, Sean writes:
>>>
>>> Making a decision on new elliptic curves based on data that hasn't 
>>> been corroborated by a 3rd party is bad practice.
>>
>>
>> More than 1500 implementations of various cryptographic functions 
>> have been contributed to, and are publicly available as part of, the 
>> state-of-the-art SUPERCOP benchmarking framework:
>>
>>     http://bench.cr.yp.to/supercop.html
>>
>> Practically all of the implementations are free to use, and many of 
>> them are in fact widely used. Most of the researchers producing speed 
>> records for cryptography are contributing their software to the same system.
>>
>> The eBACS benchmarking project systematically and reproducibly 
>> measures these implementations on more than 20 different microarchitectures:
>>
>>     http://bench.cr.yp.to/computers.html
>>
>> It's easy for other people to download and run the benchmarks on 
>> their favorite CPU, and to contribute the results to the central 
>> system, where detailed speed reports are posted publicly for other 
>> people to see and verify. eBACS has practically eliminated the 
>> measurement errors and disputes that plagued previous approaches to cryptographic benchmarking.
>> eBASH, the hash component of eBACS, was mentioned 30 times in NIST's 
>> final report on the SHA-3 competition.
>>
>> As a concrete example, now that Mike has sent 
>> crypto_dh/ed448goldilocks in for benchmarking, eBACS is automatically 
>> filling lines into
>>
>>     http://bench.cr.yp.to/results-dh.html
>>     http://bench.cr.yp.to/impl-dh/ed448goldilocks.html
>>
>> whenever machines finish benchmark runs: 529066 cycles on titan0, 
>> 689020 cycles on hydra8, 757676 cycles on h6sandy, etc. These don't 
>> exactly confirm Mike's comparisons to the Sandy Bridge numbers that 
>> Microsoft claimed in http://eprint.iacr.org/2014/130.pdf---although 
>> they do seem adequate to support his point about ed448goldilocks 
>> hitting a sweet spot on the security/speed curve while Microsoft's 
>> design strategy compromises the security/speed tradeoff:
>>
>>     * ed448goldilocks isn't quite twice as fast as numsp512t1
>>       (ed-512-mers): 757676 cycles vs. 1293000 cycles.
>>
>>     * ed448goldilocks is about 23% slower than numsp384t1 (ed-384-mers):
>>       757676 cycles vs. 617000 cycles.
>>
>> Of course, if Mike or anyone else thinks that ed448goldilocks can be 
>> computed more efficiently, he's welcome to prove it by contributing a 
>> better implementation of that function to SUPERCOP, and then the 
>> benchmarks will be updated appropriately. He can also raise 
>> reasonable questions about the accuracy of Microsoft's claims; if 
>> Microsoft's numbers are actually correct then Microsoft can dispel 
>> the skepticism by contributing their own code to SUPERCOP.
>>
>> As a more detailed example of reproducibility, let's look at what the 
>> benchmarks say about X25519 on Haswell. Checking
>>
>>     http://bench.cr.yp.to/results-dh.html
>>
>> we see a median of 145907 cycles (quartiles: 144894 and 147191 
>> cycles) for the crypto_dh/curve25519 software on an Intel Xeon E3-1275 V3.
>> Clicking on "titan0" shows more information: the best speeds found 
>> for
>> crypto_scalarmult/curve25519 on this machine used
>>
>>     gcc-4.8.1 -m64 -O -march=native -mtune=native 
>> -fomit-frame-pointer
>>
>> to compile the "amd64-51" implementation. Anyone can use the same 
>> free implementation with the same free compiler and will obtain the 
>> same compiled code running in the same number of Haswell cycles:
>>
>>     wget https://hyperelliptic.org/ebats/supercop-20140924.tar.bz2
>>     tar -xf supercop-20140924.tar.bz2
>>     cd supercop-20140924
>>     # compile and measure everything: nohup sh data-do &
>>     # alternatively, extract X25519 as follows:
>>     mkdir x25519
>>     cp measure-anything.c x25519
>>     cp crypto_scalarmult/measure.c x25519
>>     cp crypto_scalarmult/curve25519/amd64-51/* x25519
>>     cp include/randombytes.h x25519
>>     cp cpucycles/amd64cpuinfo.h x25519/cpucycles.h
>>     cp cpucycles/amd64cpuinfo.c x25519/cpucycles.c
>>     cp cpucycles/osfreq.c x25519/osfreq.c
>>     cd x25519
>>     ( sed s/CRYPTO_/crypto_scalarmult_/ < api.h
>>       echo '#define crypto_scalarmult_IMPLEMENTATION "amd64-51"'
>>       echo '#define crypto_scalarmult_VERSION "-"'
>>     ) > crypto_scalarmult.h
>>     echo 'static const char cpuid[] = {0};' > cpuid.h
>>     gcc -m64 -O -march=native -mtune=native -fomit-frame-pointer \
>>     -D COMPILER='"gcc"' \
>>     -D LOOPS=1 \
>>     -o measure measure-anything.c measure.c cpucycles.c \
>>     mont* fe*.c *.s
>>     ./measure
>
>
>
> Thank you for this information. I added timing code and built the "measure"
> program in the x25519 directory (see above) on
>
>    Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz, Ivy Bridge (not Haswell).
>
> 3 results are in op/sec:
>
> above x25519 variable base : openssl : curve25519-donna:
>
> 18167.7 : 11030.5 : 14098.2
>
> 18167.7/11030.5 = 65%
>
> The ./measure reported 182112 cycles for variable base before and 
> after my changes (which looks correct: 3300000000 Hz/182112=18120 
> sec). Specifically, I timed the crypto_scalarmult().
>
> I won't have access to my Haswell machine for a few days.
>
>>
>> For example, on one core of Andrey's 3.4GHz i7-4770 (Haswell), this
>> X25519 code will take the same ~146000 cycles: i.e., more than 23000 
>> operations/second, whereas the latest Haswell-optimized OpenSSL NIST
>> P-256 ECDH code that he measured was only 15000 operations/second.
>>
>> This is, by the way, rather old Curve25519 code optimized for 
>> Nehalem, the microarchitecture of the first Core i7 CPUs in 
>> 2008---but on Intel's latest Haswell CPUs it's still solidly beating 
>> NIST P-256 code that's optimized for Haswell. There's ample 
>> literature explaining that
>>
>>     * reductions mod 2^255-19 are faster than reductions mod
>>       2^256-2^224+2^192+2^96-1 on a broad range of platforms, and 
>> that
>>
>>     * Montgomery scalarmult is faster than Weierstrass scalarmult,
>>
>> so the performance gap is unsurprising.
>>
>> Why did Andrey report only 17289 operations/second for X25519 on 
>> Haswell? The answer, in a nutshell, is that there's an active 
>> ecosystem of Curve25519/X25519/Ed25519 implementations, and it's easy 
>> to find implementations that prioritize simplicity over 
>> speed---including the one implementation included in Andrey's manual 
>> benchmarks. Of course, any application developer who needs more speed 
>> will look for, and find, the faster X25519 implementations.
>>
>> ---Dan
>
>
> It's natural for a developer, the consumer of these open source 
> libraries, to test-drive the speed tests the way I did. If the 
> sequence of steps needed to run the tests is clear and short, the results are easily verifiable.
>
> In case of openssl the results depends on:
> * the exact version of the source code
> * the exact version of the compiler and assembler tools
> * the options given to the config, configure
> * CPU
>
> OpenSSL will autodetect the gcc and build the binary by excluding or 
> including certain highly-optimized pieces without any relevant 
> reports. Some beneficial configuration options are not always the 
> defaults. Not surprisingly, the openssl that comes with the latest OS 
> I test on, Fedora Core 20, is 3+ times slower than what I report above.
>
> I find that the quickest way for me to confirm that I am timing the 
> right code is to use the debugger. For example, I confirmed that 
> ./measure is spending time in 
> crypto_scalarmult_curve25519_amd64_51_ladderstep in a hand-written assembler file.

Take a look at http://bench.cr.yp.to/impl-scalarmult/curve25519.html.
There are multiple hand-written assembly implementations, with differing performance characteristics and portability. In particular AMD and Intel chips favor different implementations. Furthermore, there are multiple ways to compile code by fiddling with options.
eBATS handles most of this transparently.

Most applications will not need to go to these extremes. But for those that do, customized assembler per processor version and automated compiler tickling is not uncommon. Measurement techniques that don't account for this will not produce realistic results. Asking about minimax (the implementation with the fastest slowest speed) is a reasonable question, or maximal performance from pure C, is reasonable as these are common industrial constraints. But I think there will be very few cases where public key crypto speed is the limiting factor.

Sincerely,
Watson

>
> http://bench.cr.yp.to looks like an excellent way to try code on 
> platforms one doesn't own.
>
>
> _______________________________________________
> Cfrg mailing list
> Cfrg@irtf.org
> http://www.irtf.org/mailman/listinfo/cfrg



--
"Those who would give up Essential Liberty to purchase a little Temporary Safety deserve neither  Liberty nor Safety."
-- Benjamin Franklin

_______________________________________________
Cfrg mailing list
Cfrg@irtf.org
http://www.irtf.org/mailman/listinfo/cfrg