Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base

Andrey Jivsov <crypto@brainhub.org> Mon, 05 January 2015 20:22 UTC

Return-Path: <crypto@brainhub.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A30421A8A50 for <cfrg@ietfa.amsl.com>; Mon, 5 Jan 2015 12:22:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.301
X-Spam-Level:
X-Spam-Status: No, score=-1.301 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, J_CHICKENPOX_54=0.6, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1sg34gXfygiR for <cfrg@ietfa.amsl.com>; Mon, 5 Jan 2015 12:21:59 -0800 (PST)
Received: from resqmta-po-02v.sys.comcast.net (resqmta-po-02v.sys.comcast.net [IPv6:2001:558:fe16:19:96:114:154:161]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EBF381A891A for <cfrg@irtf.org>; Mon, 5 Jan 2015 12:07:26 -0800 (PST)
Received: from resomta-po-10v.sys.comcast.net ([96.114.154.234]) by resqmta-po-02v.sys.comcast.net with comcast id cL7S1p00353iAfU01L7Snz; Mon, 05 Jan 2015 20:07:26 +0000
Received: from [IPv6:::1] ([71.202.164.227]) by resomta-po-10v.sys.comcast.net with comcast id cL7Q1p00l4uhcbK01L7RHT; Mon, 05 Jan 2015 20:07:26 +0000
Message-ID: <54AAEEFC.9060309@brainhub.org>
Date: Mon, 05 Jan 2015 12:07:24 -0800
From: Andrey Jivsov <crypto@brainhub.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Mike Hamburg <mike@shiftleft.org>, "cfrg@irtf.org" <cfrg@irtf.org>
References: <54AA4AB9.70505@brainhub.org> <54AA5AD3.9020009@shiftleft.org>
In-Reply-To: <54AA5AD3.9020009@shiftleft.org>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1420488446; bh=paFND5hPRAyR4tkZZl6zQHOJ2sR4jUKMyym9PWy0A8s=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=jP6SS4UgbvN7RiOCNnaZMa5gKYXDjI8N4gPZrMVVx/xXnWsSItS20b2PiZTwvHUvc v7ELcPJqra8Fn/cVJatXYdIur8UN1iPoRMc+6qbCbmnVO4KLMQW169UpipWZeebzBA m44FdVxpC+MvrJKXMnwwoMGxF1VTVNqFTcEN80TRgTt9m7HbAjy/noz55z/Zd8crqQ 8OSrRStXrF0wMvcOB2uJEYDZ77EIr8e/SBOt0XFPqnTHzxsTSMcplhcsC4/IpyGtk/ 56XDy1Gkq+VPGUDKMOSafqnRubpqtCIEx7tAnbOgIXzteesXXnHrcTxJ9S0p5TbkWk apfIVhzWGu2oQ==
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/PWs7GAogP55-FNbdjzFICxwWavw
Subject: Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Jan 2015 20:22:01 -0000

Thank you, Mike, this is helpful. This brings a few thoughts, below.

On 01/05/2015 01:35 AM, Mike Hamburg wrote:
> Microsoft's benchmarks on Ed-384-mers, and even some of my earlier 
> Ed448 numbers disagree with the Ed448 data above.  My earlier numbers 
> were about 6% less favorable to Montgomery, and Microsoft's numbers 
> were about 10-15% less favorable to Montgomery.  ...
My tests seem to confirm this that Montgomery is 10-15% less favourable 
as well.

>
> So, for larger primes, the Montgomery ladder is probably slightly 
> faster than Edwards when operating on compressed points, and slower or 
> the same otherwise. 

A common assumption is that we need point compression in ECDH, from 
which it follows that Montgomery representation is a wash for p ~ 2^256.

However, for the sake of technical argument let's leave aside the weight 
assigned to "widespread existing practice" and:
* assume that the curve is specified in Edwards form, e.g. as in 
http://www.ietf.org/id/draft-black-rpgecc-01.txt
* assume that the point is sent uncompressed.


This is fine for authenticated keys, e.g. because these will be 
distributed in X.509 certificates, DNS records, etc. Point validation is 
cheaper than decompression in other cases, and this is is how it's 
presently done on the Internet. Optional point compression is also 
well-understood.

* The fixed-base scalar multiplication is very fast, e.g. 15 times 
faster with 5.24 Mb of pre-computed static tables for the 13 bit window.

This is beneficial for protocols or implementation that cannot reuse an 
ephemeral EC key. E.g. a command-line application launched to encrypt 
data, a browser launched with open tabs, etc.

Given that the calculation of a client ECDH share is so fast relatively 
to an ECDH shared secret point calculation, an implementation may as 
well always calculate it fresh. This reduces complexity and fits many 
APIs. This is because there is no need to to maintain a "state" with 
reused ephemeral share.

* The same F(p) and ECC code can be used for signing and ECDH. Also, a 
small d in the Edwards equation is used.

Down the road, something like EdDSA+ECDH, uncompressed points in Edwards 
form, looks like the fastest alternative. Essentially this enables the 
system design with a single ECC primitive that does x*P for same field 
size, which is very fast when P=G, the base point.

* Traditional ECC assumed that one can add points. This may affect 
adoption of the new curve.

This is the case for NIST P-256. It may be desirable to sign with an 
ECDH key, e.g. for proof of key possession, without a special protocol.

* Curves that allow point addition offer space/speed tradeoff option 
through the window size, and other methods.


Unfortunately, the conversion between representations costs ~10% due to 
F(p) inverse. Decompression costs about the same. We also know that the 
performance difference between Montgomery/Edwards operations is in the 
similar range. Therefore, selection of the format on the wire determines 
the implementation. If we are sending a Montgomery x, we are committing 
to Montgomery ladder implementation. Is this justified?