Re: [Cfrg] Publicly verifiable benchmarks

David Jacobson <dmjacobson@sbcglobal.net> Fri, 10 October 2014 15:44 UTC

Return-Path: <dmjacobson@sbcglobal.net>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B1AD91A6FD4 for <cfrg@ietfa.amsl.com>; Fri, 10 Oct 2014 08:44:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.4
X-Spam-Level:
X-Spam-Status: No, score=-1.4 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, J_CHICKENPOX_15=0.6, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O1Ui7vEuh5hM for <cfrg@ietfa.amsl.com>; Fri, 10 Oct 2014 08:44:30 -0700 (PDT)
Received: from nm2-vm2.access.bullet.mail.gq1.yahoo.com (nm2-vm2.access.bullet.mail.gq1.yahoo.com [216.39.63.30]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5B2F41A6FF7 for <cfrg@irtf.org>; Fri, 10 Oct 2014 08:44:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sbcglobal.net; s=s1024; t=1412955867; bh=QsPDIHKz4Re3zQcCRzTsWPV1bShjWBcidaq32qaa7bk=; h=Received:Received:Received:DKIM-Signature:X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:From:Subject; b=ToLRhnCNagJxvt0Hnvfq7yOibwC5KFg/NiMewMZSRvoVrMxpKAo8gSHP9jnruUZzks2cjCW5VSsEeqodi2fD0/TpjDsFfwQMeu9qV7Kjwu3o7r1RXUlKL4htqy5tNg4uNnqejq4uRlvTuhdi1Eg8V9mlO4GkZJ0PkHHQMAnGgQQ=
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=sbcglobal.net; b=gBbCKtf1puUTjdvRt/y7MEOYDXBlf3NipC5Ywi06GKzPD0ldekgUiWmUQqCG50bEh9vb4Ye7gN/qZlfXL/48IQa8KtHnsHM9GrukC20qS7ttdsdFKskyzwA0Q8akAMZyK6gXe2RwWFn9IUdsD2VAC/xpMRhPXsUOe27gpDCxJOw=;
Received: from [216.39.60.171] by nm2.access.bullet.mail.gq1.yahoo.com with NNFMP; 10 Oct 2014 15:44:27 -0000
Received: from [67.195.23.145] by tm7.access.bullet.mail.gq1.yahoo.com with NNFMP; 10 Oct 2014 15:44:26 -0000
Received: from [127.0.0.1] by smtp117.sbc.mail.gq1.yahoo.com with NNFMP; 10 Oct 2014 15:44:26 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sbcglobal.net; s=s1024; t=1412955866; bh=QsPDIHKz4Re3zQcCRzTsWPV1bShjWBcidaq32qaa7bk=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=ADYjw0WH8zDLCncTnzapwWk9Ua1g8nM92iDDXQ63rc97iCr5w60FCOdUI/yE8OXegpaNUtT0n6tBKAQZ+rFFdOkNx9Wa0ZfHicwtbeUgXpC9FMSLgtnTLtWChtNpO84xLSYU0yBVwlN4xrzDW3VSUCZL4cbc7w9rHaNWt7xqM4U=
X-Yahoo-Newman-Id: 892210.25832.bm@smtp117.sbc.mail.gq1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: fj4fkH4VM1k2.U6awmpf_VolhH0G_glUrdIeDssLgQvjis1 gUpul8cngMMkF.H2q1HxDJ1AVq5sGdRfAK9k.u2Rr44DPy434jfc92vwjUBJ QTO6LzBhGKTV8ezvBHC2HuH4uMmEbr.MGdJq.S.U1_9AjL6CzY6FbiSdVF_T XlaYSVXIFUar5NSyL4doiUkxQJoaff49s_MP.YkiYqGg83XbjDKyT.diHVXJ R583H6m0sAd0iLZzPmSzd9QU89w3nw8A7cwVjpTxC1vFjGTK3EXuRY.FbkDj rGEgXOxjpxB__VFuyDzbIAUVQQr96e6zhMg1o8fzQjVWII9nB7NcOZBAfaXn J9O03xrUbY6xBOhKWtC7IMMliR9DkPE5NJThjKY_SsoHsLlRrwYxNUK7Yvku rK3Aa4P8.L8MOm8ZxuiYlKhzu2qydbePJ3wjyg9fEdM7Q9vP9zpqj8qDD2Fd 2BmRbFYLjDUCFaFFWyeBgAM_sWRJW0nLYOacEwZC6Vz6lEgcymse7DbDK91a fvaeRvQBvU8IEeU0xVjFBBU3PvkrHjLVLMEF5tPRGU.X_qSaBovFdeTO2R08 iyMVf8MITWG7ZIwc02Lssi6vDiIuBh9eIPMtyRfIKxAeK6Mx_9FDS0QnKo1i NtEDQqrZv6kebxnrFXZUzPf7Vg1hOhGh0aClFx09Wuym3J2r0H2eq5DP1LDN csLCsQtmY62kogCGuhJAuKxp8BdkZAkw9DGk29MsNZjHigkPyYjTU.btxxCE Pr0T6TEauq53andRVoYYHRqdDIiwrtsjbXGtgPQyfHA9ii4W_WcxVZEz6KC5 K9Q--
X-Yahoo-SMTP: nOrmCa6swBAE50FabWnlVFUpgFVJ9Gbi__8U5mpvhtQq7tTV1g--
Message-ID: <5437FED9.50409@sbcglobal.net>
Date: Fri, 10 Oct 2014 08:44:25 -0700
From: David Jacobson <dmjacobson@sbcglobal.net>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: cfrg@irtf.org
References: <20141010071847.9478.qmail@cr.yp.to>
In-Reply-To: <20141010071847.9478.qmail@cr.yp.to>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/qTvGc4xnQUZjXhqAMZV9i1fgFPo
Subject: Re: [Cfrg] Publicly verifiable benchmarks
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Oct 2014 15:44:31 -0000

On 10/10/14, 12:18 AM, D. J. Bernstein wrote:
> Parkinson, Sean writes:
>> Making a decision on new elliptic curves based on data that hasn't
>> been corroborated by a 3rd party is bad practice.
> More than 1500 implementations of various cryptographic functions have
> been contributed to, and are publicly available as part of, the
> state-of-the-art SUPERCOP benchmarking framework:
>
>     http://bench.cr.yp.to/supercop.html
>
> Practically all of the implementations are free to use, and many of them
> are in fact widely used. Most of the researchers producing speed records
> for cryptography are contributing their software to the same system.
>
> The eBACS benchmarking project systematically and reproducibly measures
> these implementations on more than 20 different microarchitectures:
>
>     http://bench.cr.yp.to/computers.html
>
> It's easy for other people to download and run the benchmarks on their
> favorite CPU, and to contribute the results to the central system, where
> detailed speed reports are posted publicly for other people to see and
> verify. eBACS has practically eliminated the measurement errors and
> disputes that plagued previous approaches to cryptographic benchmarking.
> eBASH, the hash component of eBACS, was mentioned 30 times in NIST's
> final report on the SHA-3 competition.
>
> As a concrete example, now that Mike has sent crypto_dh/ed448goldilocks
> in for benchmarking, eBACS is automatically filling lines into
>
>     http://bench.cr.yp.to/results-dh.html
>     http://bench.cr.yp.to/impl-dh/ed448goldilocks.html
>
> whenever machines finish benchmark runs: 529066 cycles on titan0, 689020
> cycles on hydra8, 757676 cycles on h6sandy, etc. These don't exactly
> confirm Mike's comparisons to the Sandy Bridge numbers that Microsoft
> claimed in http://eprint.iacr.org/2014/130.pdf---although they do seem
> adequate to support his point about ed448goldilocks hitting a sweet spot
> on the security/speed curve while Microsoft's design strategy
> compromises the security/speed tradeoff:
>
>     * ed448goldilocks isn't quite twice as fast as numsp512t1
>       (ed-512-mers): 757676 cycles vs. 1293000 cycles.
>
>     * ed448goldilocks is about 23% slower than numsp384t1 (ed-384-mers):
>       757676 cycles vs. 617000 cycles.
>
> Of course, if Mike or anyone else thinks that ed448goldilocks can be
> computed more efficiently, he's welcome to prove it by contributing a
> better implementation of that function to SUPERCOP, and then the
> benchmarks will be updated appropriately. He can also raise reasonable
> questions about the accuracy of Microsoft's claims; if Microsoft's
> numbers are actually correct then Microsoft can dispel the skepticism
> by contributing their own code to SUPERCOP.
>
> As a more detailed example of reproducibility, let's look at what the
> benchmarks say about X25519 on Haswell. Checking
>
>     http://bench.cr.yp.to/results-dh.html
>
> we see a median of 145907 cycles (quartiles: 144894 and 147191 cycles)
> for the crypto_dh/curve25519 software on an Intel Xeon E3-1275 V3.
> Clicking on "titan0" shows more information: the best speeds found for
> crypto_scalarmult/curve25519 on this machine used
>
>     gcc-4.8.1 -m64 -O -march=native -mtune=native -fomit-frame-pointer
>
> to compile the "amd64-51" implementation. Anyone can use the same free
> implementation with the same free compiler and will obtain the same
> compiled code running in the same number of Haswell cycles:
>
>     wget https://hyperelliptic.org/ebats/supercop-20140924.tar.bz2
>     tar -xf supercop-20140924.tar.bz2
>     cd supercop-20140924
>     # compile and measure everything: nohup sh data-do &
>     # alternatively, extract X25519 as follows:
>     mkdir x25519
>     cp measure-anything.c x25519
>     cp crypto_scalarmult/measure.c x25519
>     cp crypto_scalarmult/curve25519/amd64-51/* x25519
>     cp include/randombytes.h x25519
>     cp cpucycles/amd64cpuinfo.h x25519/cpucycles.h
>     cp cpucycles/amd64cpuinfo.c x25519/cpucycles.c
>     cp cpucycles/osfreq.c x25519/osfreq.c
>     cd x25519
>     ( sed s/CRYPTO_/crypto_scalarmult_/ < api.h
>       echo '#define crypto_scalarmult_IMPLEMENTATION "amd64-51"'
>       echo '#define crypto_scalarmult_VERSION "-"'
>     ) > crypto_scalarmult.h
>     echo 'static const char cpuid[] = {0};' > cpuid.h
>     gcc -m64 -O -march=native -mtune=native -fomit-frame-pointer \
>     -D COMPILER='"gcc"' \
>     -D LOOPS=1 \
>     -o measure measure-anything.c measure.c cpucycles.c \
>     mont* fe*.c *.s
>     ./measure
>
> For example, on one core of Andrey's 3.4GHz i7-4770 (Haswell), this
> X25519 code will take the same ~146000 cycles: i.e., more than 23000
> operations/second, whereas the latest Haswell-optimized OpenSSL NIST
> P-256 ECDH code that he measured was only 15000 operations/second.
>
> This is, by the way, rather old Curve25519 code optimized for Nehalem,
> the microarchitecture of the first Core i7 CPUs in 2008---but on Intel's
> latest Haswell CPUs it's still solidly beating NIST P-256 code that's
> optimized for Haswell. There's ample literature explaining that
>
>     * reductions mod 2^255-19 are faster than reductions mod
>       2^256-2^224+2^192+2^96-1 on a broad range of platforms, and that
>       
>     * Montgomery scalarmult is faster than Weierstrass scalarmult,
>
> so the performance gap is unsurprising.
>
> Why did Andrey report only 17289 operations/second for X25519 on
> Haswell? The answer, in a nutshell, is that there's an active ecosystem
> of Curve25519/X25519/Ed25519 implementations, and it's easy to find
> implementations that prioritize simplicity over speed---including the
> one implementation included in Andrey's manual benchmarks. Of course,
> any application developer who needs more speed will look for, and find,
> the faster X25519 implementations.
>
> ---Dan
>
> _______________________________________________
> Cfrg mailing list
> Cfrg@irtf.org
> http://www.irtf.org/mailman/listinfo/cfrg
>
You have amassed a lot of data.  Wow!  But this seems to be concerned 
with raw speed.  It would be nice if you tagged implementations (of 
algorithms where it matters) into according to leakage resistance.  
There could be a tag for optimized solely for speed,  another for 
implementations that are timing leak resistant (either constant time or 
blinded), etc.

    --David Jacobson