Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base

Michael Hamburg <mike@shiftleft.org> Tue, 06 January 2015 00:04 UTC

Content-Type: multipart/alternative; boundary="Apple-Mail=_71F50B2B-A9DB-4D5D-BDB9-9F4BB68084CD"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2064\))
From: Michael Hamburg <mike@shiftleft.org>
In-Reply-To: <54AB194A.6020104@brainhub.org>
Date: Mon, 05 Jan 2015 16:04:42 -0800
Message-Id: <322BA3AD-7F5F-4A77-ADBA-7E5260DC690A@shiftleft.org>
References: <54AA4AB9.70505@brainhub.org> <54AA5AD3.9020009@shiftleft.org> <54AAEEFC.9060309@brainhub.org> <EBF3EC55-3057-496A-8BAE-7EAD405518A7@shiftleft.org> <54AB194A.6020104@brainhub.org>
To: Andrey Jivsov <crypto@brainhub.org>
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/6hObRkYmpelc4XjHhOg3GKuaGdo
Cc: "cfrg@irtf.org" <cfrg@irtf.org>
Subject: Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base
Precedence: list

> On Jan 5, 2015, at 3:07 PM, Andrey Jivsov <crypto@brainhub.org> wrote:
> The method uses window exponentiation-style scalar multiplication with doubling for each bit. The data-dependent decisions are done for additions, which are additional ~ 50% of these. The fix may lower the performance gap, but I think it will be within these 15%.

The difference is about 15% in my code.

> The openssl does this for in ECDSA P-256 code with w=7. My comments are regarding the generalization of this code for w=13, which means you are doing 20 ECC additions for the entire P-256 fixed-base scalar multiplication. At this rate the memory access is what slows you down, and things like unneeded inverse in ECDSA v.s. EdDSA.

2^13 >> 2^7.

> It may not be necessary to have side-channel protected code that generates one-time ECDH pair, otherwise, one can add cache line SC protection.

Partial SC protection can be very tricky to analyze.  I don’t know what you mean by “cache line SC protection”.  A cache line may not even hold 2^13 bits.

The same fixed-base code is likely to be used for both ECDH ephemeral keygen, and for signing.  This sort of leak in signing code could be disastrous.

> However, "free" assumes that you use compression. Removing compression gives you 10% performance gain, if you don't need to do conversions to isogenous curves.
> 
> Is it worth to subject the X.509 cert verifiers to the 10% penalty for 32 byte saving?

Maybe.  We’re talking maybe a 50µs difference on a 1GHz Cortex-A9 phone.  While you can send 32 bytes in 50µs over a 5Mbit LTE link, I don’t know which will use more power.  On a laptop, it’s about 5µs (on one core), which equates to a 50Mbit link.  Obviously not all points will be sent over slow links, but the point is that the actual performance penalty is on the same order as the space savings.

Also, converting to an isogenous curve will be done in projective form in practice, so it’s “free”.

>>> Down the road, something like EdDSA+ECDH, uncompressed points in Edwards form, looks like the fastest alternative. Essentially this enables the system design with a single ECC primitive that does x*P for same field size, which is very fast when P=G, the base point.
>> Faster than compressed, yes.  Better, though… maybe?  Speaking of faster, most verifies will use double-scalar multiply because it’s faster, and they usually have different code between signing/ECDH and verification for side-channel reasons.
> "Batching" is a good thing. When x * basepoint is just a few additions, the main benefit of doing double scalar multiply is probably due to "simultaneous inverse" (single conversion to affine v.s. twice). TLS 1.3 enables additional "batching" between ECDH and signature. These cases can be handled by a single x*P primitive with a n-point export function.

I don’t mean batching, I mean Straus’ algorithm (aka Shamir’s trick) for linear combinations, as used within a single ECDSA verify.

Cheers,
— Mike

[Cfrg] On relative performance of Edwards v.s. Mo… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Mike Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Mike Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Peter Dettman
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Michael Hamburg
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Kurt Roeckx
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov
Re: [Cfrg] On relative performance of Edwards v.s… Watson Ladd
Re: [Cfrg] On relative performance of Edwards v.s… Andrey Jivsov