Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base

Mike Hamburg <mike@shiftleft.org> Mon, 05 January 2015 09:35 UTC

Return-Path: <mike@shiftleft.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4859D1A212D for <cfrg@ietfa.amsl.com>; Mon, 5 Jan 2015 01:35:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 4.255
X-Spam-Level: ****
X-Spam-Status: No, score=4.255 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FH_HOST_EQ_D_D_D_D=0.765, FH_HOST_EQ_D_D_D_DB=0.888, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_NET=0.311, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Uxs3e7YHhUNn for <cfrg@ietfa.amsl.com>; Mon, 5 Jan 2015 01:35:17 -0800 (PST)
Received: from aspartame.shiftleft.org (199-116-74-168-v301.PUBLIC.monkeybrains.net [199.116.74.168]) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 904381A1F20 for <cfrg@irtf.org>; Mon, 5 Jan 2015 01:35:17 -0800 (PST)
Received: from [192.168.1.102] (unknown [192.168.1.1]) by aspartame.shiftleft.org (Postfix) with ESMTPSA id 016F83AA43; Mon, 5 Jan 2015 01:32:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=shiftleft.org; s=sldo; t=1420450368; bh=cLZqOTIKlOurWrnUUo68tfCdzUf/65a4ooMmNCBWtEE=; h=Date:From:To:Subject:References:In-Reply-To:From; b=iiuRjnA+hjZcFQloNqJA1xKvOqHpQpWpXYmwQFOTEdjFMNOLsyou1qc6+DXMlkwFD bvn6F4Nl1Of8czYE2uS7zG/61X+aihwn4XQFryiHqDESp5a+z1hFxDRBdTecCwLlaD 1QgYLkRTu3etgdlLBklyWQyNOmvr/if7VB1uIYOY=
Message-ID: <54AA5AD3.9020009@shiftleft.org>
Date: Mon, 05 Jan 2015 01:35:15 -0800
From: Mike Hamburg <mike@shiftleft.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Andrey Jivsov <crypto@brainhub.org>, "cfrg@irtf.org" <cfrg@irtf.org>
References: <54AA4AB9.70505@brainhub.org>
In-Reply-To: <54AA4AB9.70505@brainhub.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/UnyRsKYkqM95DOrVYLxQ4AafXVs
Subject: Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Jan 2015 09:35:19 -0000

On 01/05/2015 12:26 AM, Andrey Jivsov wrote:
>
> I was expecting that a Montgomery ladder will be faster. Which brings 
> the question how much faster should we expect Montgomery ladder to be 
> for p ~= 2^256 x86 or other architecture? I don't see it on x86 in my 
> quick tests.
>
> Also, I saw a remark in 
> https://www.ietf.org/mail-archive/web/cfrg/current/msg05712.html "At 
> the 128-bit security level the ladder can be faster. At the > 200-bit+ 
> security levels the ladder is slower."
Hi Andrey,

I don't know about Donna, but I have fairly comprehensive benchmarks on 
a 252-bit curve (Ed252-MontgomeryStation) and the Goldilocks curves.

Ed252 runs a shared secret computation in 133k cycles on my 3.6GHz 
i7-4790 machine (TB off, HT off).  The microbenchmark breaks this down 
as 120k cycles of constant-time Montgomery laddering, about 13k cycles 
compression, and about 1300 cycles unaccounted.

Constant-time windowed Edwards would replace the ladder with 126k 
cycles.  Variable-time wNAF Edwards takes 112k cycles. So variable-time 
Edwards is 7% faster, and constant-time Edwards is 5% slower, than the 
Montgomery ladder, not counting the final compression.  But the 
Montgomery ladder takes compressed points as input.  Decompression takes 
13.4k  cycles to Edwards, or about 11% of the ladder.  So if you're 
operating on compressed points throughout, the Montgomery ladder is 
about 16% faster (and the whole ECDH about 14-15% faster) than 
constant-time Edwards.


For Ed448, it's a little closer.  The Montgomery ladder and compression 
take 529kcy, and the Edwards consttime window and compression (but not 
decompression) take 540kcy.  The compression function itself is about 
48kcy for either, and decompression is 49kcy.  So with no decompression, 
Montgomery is 2% faster, and with decompression it's 11% faster.

However, replace that Edwards with a fixed-window with cache-vulnerable 
indexing, and it goes down to 503kcy.  Replace it with wNAF and it goes 
down to 490kcy.  At that point it's still slower than the Montgomery 
ladder on compressed points, but only by 2%.

This ratio should be affected by the S/M/m ratio of the field 
arithmetic, the speed of constant-time lookup and condswap, and the size 
of the scalars.  Montgomery does more S,m, and Edwards does more M, 
except that the decompression is also S heavy.  As the curve gets 
larger, the windows can get larger which favors Edwards.  But Edwards 
also burns some 7-8% of its time on constant-time lookups on Ed448, and 
that's with AVX2 to speed them up.

Microsoft's benchmarks on Ed-384-mers, and even some of my earlier Ed448 
numbers disagree with the Ed448 data above.  My earlier numbers were 
about 6% less favorable to Montgomery, and Microsoft's numbers were 
about 10-15% less favorable to Montgomery.  I don't know why my latest 
tests are more favorable to Montgomery than my earlier ones; it may be 
machine or compiler-dependent, or it may have been caused by source 
changes since then.  I tried to hash out the discrepancy between my 
numbers and Microsoft's with Patrick Longa Pierola, but that's as close 
as we got.

So, for larger primes, the Montgomery ladder is probably slightly faster 
than Edwards when operating on compressed points, and slower or the same 
otherwise.


Cheers,
-- Mike