Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base

Peter Dettman <peter.dettman@bouncycastle.org> Mon, 12 January 2015 05:26 UTC

Return-Path: <peter.dettman@bouncycastle.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BC78C1A88A7 for <cfrg@ietfa.amsl.com>; Sun, 11 Jan 2015 21:26:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.202
X-Spam-Level:
X-Spam-Status: No, score=-0.202 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, RCVD_IN_DNSWL_NONE=-0.0001, RELAY_IS_203=0.994] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7Z99gDaOieto for <cfrg@ietfa.amsl.com>; Sun, 11 Jan 2015 21:26:09 -0800 (PST)
Received: from tauceti.org.au (mail.tauceti.org.au [203.32.61.25]) by ietfa.amsl.com (Postfix) with ESMTP id EADAC1A8895 for <cfrg@irtf.org>; Sun, 11 Jan 2015 21:26:08 -0800 (PST)
X-Default-Received-SPF: pass (skip=loggedin (res=PASS)) x-ip-name=ppp-49-237-36-181.revip6.asianet.co.th;
Message-ID: <54B35ADE.8020209@bouncycastle.org>
Date: Mon, 12 Jan 2015 12:25:50 +0700
From: Peter Dettman <peter.dettman@bouncycastle.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: cfrg@irtf.org
References: <54AA4AB9.70505@brainhub.org> <54B315CA.6040900@brainhub.org>
In-Reply-To: <54B315CA.6040900@brainhub.org>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Authenticated-User: peter.dettman@bouncycastle.org
Archived-At: <http://mailarchive.ietf.org/arch/msg/cfrg/6rMNRTs67fqn8k68fosr-329nUQ>
Subject: Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jan 2015 05:26:11 -0000

On 12/01/2015 7:31 am, Andrey Jivsov wrote:
> -----------------------------------------------------------------
> "Jacobian coordinates with a4=0 for short Weierstrass curves", window 
> method
> -----------------------------------------------------------------
> http://hyperelliptic.org/EFD/g1p/auto-shortw-jacobian-0.html
>
> doubling: 2M+5S
> addition:  11M+5S
>
> 2^(w-2)*(11*M+5*S) + 2^(w-2)*(2*M+5*S) + 
> w*(ceil(n/w)-1)*(2*M+5*S)+(ceil(n/w)-1)*(11*M+5*S) + 11*M + 245*S
>
> w=5: 1165*M + 1825*S
>
> for S=M cost
> 2990*M
>
> for S=0.625M cost
> 2305*M
[NOTE: I realise GLV curves are not on the table in the current curve 
selection process; this is just a response to the presented numbers, not 
advocacy]

The state-of-the-art precomputation for Weierstrass window method uses 
co-z arithmetic at a cost of ~(5M+2S)/point as set out in 
https://eprint.iacr.org/2008/051. In that paper it's shown how to 
convert the table to affine with a single inversion and total cost of 
~(9M+2S)/point + 1I, observing that the full Montgomery trick is not 
needed because the Z coordinates form a multiplicative chain already. 
However, based on the same observation, it is straightforward to bring 
all precomputed points to a common Z coordinate, avoid the inversion, 
and proceed to calculate on an isomorphic curve where those points are 
affine (by dropping the Z coordinate and setting a4' = Z^4.a4). A single 
multiplication is needed to bring the result point back to the original 
curve.

For a4=0, the isomorphism doesn't affect the formulae, so in the 
windowing, doubling costs (2M+5S), (mixed) addition costs (7M+4S). Also, 
a4=0 is chosen to allow use of a fast endomorphism, so numbers should 
really reflect its usage. It reduces the doublings by half, and (for 
secp256k1) adds an extra 1M per precomputed point (and doubles storage, 
but not lookup costs). Splitting the scalar costs a few multiplications.

So, slightly more roughly than above:
2^(w-1)*(10M + 2S) + (n/2)*(2M + 5S) + (n/w)*(7M + 4S) + (15M+255S) 
<--[Actual cost of inversion in a secp256k1 implementation]

n=256,w=5: 795*M + 1135*S (S=M: 1930*M, S=0.625M: 1505*M).

Regards,
Pete Dettman