Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base

Michael Hamburg <> Mon, 05 January 2015 21:45 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 24D9E1A8ADF for <>; Mon, 5 Jan 2015 13:45:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 2.155
X-Spam-Level: **
X-Spam-Status: No, score=2.155 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FH_HOST_EQ_D_D_D_D=0.765, FH_HOST_EQ_D_D_D_DB=0.888, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_NET=0.311, J_CHICKENPOX_54=0.6, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id soUIY9waJU52 for <>; Mon, 5 Jan 2015 13:45:29 -0800 (PST)
Received: from ( []) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 1D32E1A8AD8 for <>; Mon, 5 Jan 2015 13:45:29 -0800 (PST)
Received: from [] (unknown []) by (Postfix) with ESMTPSA id A6F4C3AA43; Mon, 5 Jan 2015 13:42:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple;; s=sldo; t=1420494178; bh=8Fwc/9taQbKOp9RoBQ9zcAJcb7gY9BfIBWDBfBoqHEw=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=CTV2csmG+nkYRabkOt0Uz+6YTjmMFPMKDUEJWII56h9EqogYwZNv9HfRD2n3toBfC cuu+/00m/INK22os3XwasRgHmqmk8EVQtycbW/djo3jjkMpwL0z8HljQaNZ1ajlNtz SfWdfpa3iO7UxhnplzLOy9Ke0Kc1KlGH+pjsPQoQ=
Content-Type: text/plain; charset="windows-1252"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2064\))
From: Michael Hamburg <>
In-Reply-To: <>
Date: Mon, 05 Jan 2015 13:45:26 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <>
To: Andrey Jivsov <>
X-Mailer: Apple Mail (2.2064)
Cc: "" <>
Subject: Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 05 Jan 2015 21:45:31 -0000

> On Jan 5, 2015, at 12:07 PM, Andrey Jivsov <> wrote:
> Thank you, Mike, this is helpful. This brings a few thoughts, below.

You’re welcome.

> On 01/05/2015 01:35 AM, Mike Hamburg wrote:
>> Microsoft's benchmarks on Ed-384-mers, and even some of my earlier Ed448 numbers disagree with the Ed448 data above.  My earlier numbers were about 6% less favorable to Montgomery, and Microsoft's numbers were about 10-15% less favorable to Montgomery.  ...
> My tests seem to confirm this that Montgomery is 10-15% less favourable as well.

But it looks to me like you compared a constant-time Montgomery implementation to a variable-time Edwards one.

>> So, for larger primes, the Montgomery ladder is probably slightly faster than Edwards when operating on compressed points, and slower or the same otherwise. 
> A common assumption is that we need point compression in ECDH, from which it follows that Montgomery representation is a wash for p ~ 2^256.
> However, for the sake of technical argument let's leave aside the weight assigned to "widespread existing practice" and:
> * assume that the curve is specified in Edwards form, e.g. as in
> * assume that the point is sent uncompressed.
> This is fine for authenticated keys, e.g. because these will be distributed in X.509 certificates, DNS records, etc. Point validation is cheaper than decompression in other cases, and this is is how it's presently done on the Internet. Optional point compression is also well-understood.

Sure, this is an OK option.  I prefer always-compressed for reasons which have come up before, but optional compression is reasonable.

> * The fixed-base scalar multiplication is very fast, e.g. 15 times faster with 5.24 Mb of pre-computed static tables for the 13 bit window.

Most designers don't do this, because it’s vulnerable to timing attacks and can have caching problems (important for anything other than benchmarks).  More common values are somewhere between 3kiB and 20kiB, with constant-time table lookups, yielding about a 3x performance improvement instead of 15x.

> This is beneficial for protocols or implementation that cannot reuse an ephemeral EC key. E.g. a command-line application launched to encrypt data, a browser launched with open tabs, etc.
> Given that the calculation of a client ECDH share is so fast relatively to an ECDH shared secret point calculation, an implementation may as well always calculate it fresh. This reduces complexity and fits many APIs. This is because there is no need to to maintain a "state" with reused ephemeral share.
> * The same F(p) and ECC code can be used for signing and ECDH. Also, a small d in the Edwards equation is used.

Implementations which compress points and/or use the ladder will share F(p) code.  The small d in the Edwards equation can be used in either the isomorphic curve (as Robert Ransom and I found) or the isogenous one (a la Microsoft).  If an implementor wants to use the Edwards code for Montgomery-x ECDH as well, it costs no more than with any other point compression scheme.

> Down the road, something like EdDSA+ECDH, uncompressed points in Edwards form, looks like the fastest alternative. Essentially this enables the system design with a single ECC primitive that does x*P for same field size, which is very fast when P=G, the base point.

Faster than compressed, yes.  Better, though… maybe?  Speaking of faster, most verifies will use double-scalar multiply because it’s faster, and they usually have different code between signing/ECDH and verification for side-channel reasons.

> * Traditional ECC assumed that one can add points. This may affect adoption of the new curve.
> This is the case for NIST P-256. It may be desirable to sign with an ECDH key, e.g. for proof of key possession, without a special protocol.
> * Curves that allow point addition offer space/speed tradeoff option through the window size, and other methods.
> Unfortunately, the conversion between representations costs ~10% due to F(p) inverse. Decompression costs about the same. We also know that the performance difference between Montgomery/Edwards operations is in the similar range. Therefore, selection of the format on the wire determines the implementation. If we are sending a Montgomery x, we are committing to Montgomery ladder implementation. Is this justified?

Exactly what features are possible or desirable in a point encoding — and at what cost in complexity and generality — is an interesting question.

The point format in the experimental Goldilocks software (and in Ed252) allows adding points, because it transmits sign information.  I used the same format is used for laddered ECDH and for signatures.  It is suitable for both the Montgomery ladder and Edwards windowed implementations.  Furthermore, it reduces the curve’s cofactor to 2 (see also my post on reducing the cofactor to 1), and allows the Montgomery ladder to reject twist points.  It costs about the same as an ordinary point compression/decompression system, but it is slightly more complex and not batchable either for compression or decompression.

Others have also advocated a simpler format with similar properties, such as (Montgomery x, low bit of Edwards x).  This doesn’t reduce the cofactor, but it’s easier to describe and it allows compression to be batched.

— Mike