Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base

Andrey Jivsov <crypto@brainhub.org> Tue, 06 January 2015 06:03 UTC

Message-ID: <54AB7AB9.10604@brainhub.org>
Date: Mon, 05 Jan 2015 22:03:37 -0800
From: Andrey Jivsov <crypto@brainhub.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Michael Hamburg <mike@shiftleft.org>
References: <54AA4AB9.70505@brainhub.org> <54AA5AD3.9020009@shiftleft.org> <54AAEEFC.9060309@brainhub.org> <EBF3EC55-3057-496A-8BAE-7EAD405518A7@shiftleft.org> <54AB194A.6020104@brainhub.org> <322BA3AD-7F5F-4A77-ADBA-7E5260DC690A@shiftleft.org>
In-Reply-To: <322BA3AD-7F5F-4A77-ADBA-7E5260DC690A@shiftleft.org>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/HiThyeTV1-jmBZCmPn-IXDvZslg
Cc: "cfrg@irtf.org" <cfrg@irtf.org>
Subject: Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base
Precedence: list

On 01/05/2015 04:04 PM, Michael Hamburg wrote:
>
>> On Jan 5, 2015, at 3:07 PM, Andrey Jivsov <crypto@brainhub.org
>> <mailto:crypto@brainhub.org>> wrote:
>> The method uses window exponentiation-style scalar multiplication with
>> doubling for each bit. The data-dependent decisions are done for
>> additions, which are additional ~ 50% of these. The fix may lower the
>> performance gap, but I think it will be within these 15%.
>
> The difference is about 15% in my code.
>
>> The openssl does this for in ECDSA P-256 code with w=7. My comments
>> are regarding the generalization of this code for w=13, which means
>> you are doing 20 ECC additions for the entire P-256 fixed-base scalar
>> multiplication. At this rate the memory access is what slows you down,
>> and things like unneeded inverse in ECDSA v.s. EdDSA.
>
> 2^13 >> 2^7.

Understood, but diverse implementations would be free to choose any the 
most suitable value of w for the same format on the wire.

Contrast this with Montgomery coordinate on the wire, which can be 
viewed as an implementation with fixed w=1 (but fewer F(p) per w).

>
>> It may not be necessary to have side-channel protected code that
>> generates one-time ECDH pair, otherwise, one can add cache line SC
>> protection.
>
> Partial SC protection can be very tricky to analyze.  I don’t know what
> you mean by “cache line SC protection”.  A cache line may not even hold
> 2^13 bits.

As a side note, I think that only 2^w / (256/w) need be protected.  W.g. 
for w=13 there are 20 sub-tables Ti, where Ti is always accesses in step 
i=[0,19]. The access within each Ti may need side channel protection, if 
this fits a threat model. These Tis are public, read-only tables that 
can be shared between processes and can reside in ROM.

>
> The same fixed-base code is likely to be used for both ECDH ephemeral
> keygen, and for signing.  This sort of leak in signing code could be
> disastrous.

Shouldn't it be possible to get a value T[i] from a public table T with 
a private i in a secure fashion?

In the worst case an implementation can use Montgomery ladder internally 
if this is the most suitable option.

>
>> However, "free" assumes that you use compression. Removing compression
>> gives you 10% performance gain, if you don't need to do conversions to
>> isogenous curves.
>>
>> Is it worth to subject the X.509 cert verifiers to the 10% penalty for
>> 32 byte saving?
>
> Maybe.  We’re talking maybe a 50µs difference on a 1GHz Cortex-A9 phone.
>   While you can send 32 bytes in 50µs over a 5Mbit LTE link, I don’t
> know which will use more power.  On a laptop, it’s about 5µs (on one
> core), which equates to a 50Mbit link.  Obviously not all points will be
> sent over slow links, but the point is that the actual performance
> penalty is on the same order as the space savings.
>
> Also, converting to an isogenous curve will be done in projective form
> in practice, so it’s “free”.

Thinking more about it, here is an interesting observation (probably 
that's what you meant).

Let's say that the wire coordinates are twisted Edwards (x,y).

The implementation that wants to use the t.Edwards curve will use (x,y) 
directly, validating the point as needed first.

An implementation that wants to use Montgomery ladder will do

   X = -(y+1)
   Z =   y-1

and proceed with the currently written X25519 code, other than it will 
use initial Z=y-1 v.s. Z=1. An interesting fact is that the (x,y) can be 
compressed as (y) and this will work without decompression penalty with 
a Montgomery ladder.

This is significant because doing it as in Adam's draft doesn't seem to 
avoid 10% penalty in some cases, specifically, if an implementation uses 
Edwards formula I think one needs to obtain Y for Montgomery X first, 
which involves a square root. This makes sense because Edwards 
calculation needs both (x,y), while Montgomery formula -- only X (and, 
even better, X happen to depend only on y).

This is largely from 
http://www.ietf.org/mail-archive/web/cfrg/current/msg05113.html, which 
relies on http://eprint.iacr.org/2008/013.pdf .

This means that the format on the wire is

   compressed:   t.Edwards (y)
   uncompressed: t.Edwards (x,y)

and the conversion to Montgomery is indeed free.

...

[Cfrg] On relative performance of Edwards v.s. Mo… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Mike Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Mike Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Peter Dettman
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Kurt Roeckx
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov