Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base

Michael Hamburg <> Tue, 06 January 2015 00:04 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 6373E1A9092 for <>; Mon, 5 Jan 2015 16:04:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 2.156
X-Spam-Level: **
X-Spam-Status: No, score=2.156 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FH_HOST_EQ_D_D_D_D=0.765, FH_HOST_EQ_D_D_D_DB=0.888, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_NET=0.311, HTML_MESSAGE=0.001, J_CHICKENPOX_54=0.6, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9gKA8QPXUdmT for <>; Mon, 5 Jan 2015 16:04:45 -0800 (PST)
Received: from ( []) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 88DCE1A9096 for <>; Mon, 5 Jan 2015 16:04:45 -0800 (PST)
Received: from [] (unknown []) by (Postfix) with ESMTPSA id 918EE3AA43; Mon, 5 Jan 2015 16:02:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple;; s=sldo; t=1420502533; bh=mUIVUU/rZ9h7/DNVwYv+XjBrcn3s5uYgDy+d8LpxiLk=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=Jbm4UxQhPTs726dmQfjWGfJRRjvn4zUB+gdKta8NAnt/wyenVZBaez/5cwCFRCT37 TIjrFw8YX2sLslJiWLieCovle6MbCwnsRn1L/LOI9ftefTTvCb2dwi2OuAmbey4MC1 droe5C09IljvxADr6L8WUoKc7Hc477mD0bN1sbEw=
Content-Type: multipart/alternative; boundary="Apple-Mail=_71F50B2B-A9DB-4D5D-BDB9-9F4BB68084CD"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2064\))
From: Michael Hamburg <>
In-Reply-To: <>
Date: Mon, 05 Jan 2015 16:04:42 -0800
Message-Id: <>
References: <> <> <> <> <>
To: Andrey Jivsov <>
X-Mailer: Apple Mail (2.2064)
Cc: "" <>
Subject: Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 06 Jan 2015 00:04:47 -0000

> On Jan 5, 2015, at 3:07 PM, Andrey Jivsov <> wrote:
> The method uses window exponentiation-style scalar multiplication with doubling for each bit. The data-dependent decisions are done for additions, which are additional ~ 50% of these. The fix may lower the performance gap, but I think it will be within these 15%.

The difference is about 15% in my code.

> The openssl does this for in ECDSA P-256 code with w=7. My comments are regarding the generalization of this code for w=13, which means you are doing 20 ECC additions for the entire P-256 fixed-base scalar multiplication. At this rate the memory access is what slows you down, and things like unneeded inverse in ECDSA v.s. EdDSA.

2^13 >> 2^7.

> It may not be necessary to have side-channel protected code that generates one-time ECDH pair, otherwise, one can add cache line SC protection.

Partial SC protection can be very tricky to analyze.  I don’t know what you mean by “cache line SC protection”.  A cache line may not even hold 2^13 bits.

The same fixed-base code is likely to be used for both ECDH ephemeral keygen, and for signing.  This sort of leak in signing code could be disastrous.

> However, "free" assumes that you use compression. Removing compression gives you 10% performance gain, if you don't need to do conversions to isogenous curves.
> Is it worth to subject the X.509 cert verifiers to the 10% penalty for 32 byte saving?

Maybe.  We’re talking maybe a 50µs difference on a 1GHz Cortex-A9 phone.  While you can send 32 bytes in 50µs over a 5Mbit LTE link, I don’t know which will use more power.  On a laptop, it’s about 5µs (on one core), which equates to a 50Mbit link.  Obviously not all points will be sent over slow links, but the point is that the actual performance penalty is on the same order as the space savings.

Also, converting to an isogenous curve will be done in projective form in practice, so it’s “free”.

>>> Down the road, something like EdDSA+ECDH, uncompressed points in Edwards form, looks like the fastest alternative. Essentially this enables the system design with a single ECC primitive that does x*P for same field size, which is very fast when P=G, the base point.
>> Faster than compressed, yes.  Better, though… maybe?  Speaking of faster, most verifies will use double-scalar multiply because it’s faster, and they usually have different code between signing/ECDH and verification for side-channel reasons.
> "Batching" is a good thing. When x * basepoint is just a few additions, the main benefit of doing double scalar multiply is probably due to "simultaneous inverse" (single conversion to affine v.s. twice). TLS 1.3 enables additional "batching" between ECDH and signature. These cases can be handled by a single x*P primitive with a n-point export function.

I don’t mean batching, I mean Straus’ algorithm (aka Shamir’s trick) for linear combinations, as used within a single ECDSA verify.

— Mike