Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base

Andrey Jivsov <crypto@brainhub.org> Tue, 06 January 2015 06:03 UTC

Return-Path: <crypto@brainhub.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4FD561A1B86 for <cfrg@ietfa.amsl.com>; Mon, 5 Jan 2015 22:03:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5PTfYkJMCK4L for <cfrg@ietfa.amsl.com>; Mon, 5 Jan 2015 22:03:39 -0800 (PST)
Received: from resqmta-ch2-11v.sys.comcast.net (resqmta-ch2-11v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:43]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 36B311A90F6 for <cfrg@irtf.org>; Mon, 5 Jan 2015 22:03:38 -0800 (PST)
Received: from resomta-ch2-15v.sys.comcast.net ([69.252.207.111]) by resqmta-ch2-11v.sys.comcast.net with comcast id cW3Z1p0052Qkjl901W3e3P; Tue, 06 Jan 2015 06:03:38 +0000
Received: from [192.168.1.2] ([71.202.164.227]) by resomta-ch2-15v.sys.comcast.net with comcast id cW3d1p0074uhcbK01W3dvU; Tue, 06 Jan 2015 06:03:38 +0000
Message-ID: <54AB7AB9.10604@brainhub.org>
Date: Mon, 05 Jan 2015 22:03:37 -0800
From: Andrey Jivsov <crypto@brainhub.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Michael Hamburg <mike@shiftleft.org>
References: <54AA4AB9.70505@brainhub.org> <54AA5AD3.9020009@shiftleft.org> <54AAEEFC.9060309@brainhub.org> <EBF3EC55-3057-496A-8BAE-7EAD405518A7@shiftleft.org> <54AB194A.6020104@brainhub.org> <322BA3AD-7F5F-4A77-ADBA-7E5260DC690A@shiftleft.org>
In-Reply-To: <322BA3AD-7F5F-4A77-ADBA-7E5260DC690A@shiftleft.org>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1420524218; bh=Au0K+wWv5xcDimh902PIyusd/lX53Q2FyaBx2Rfu2vU=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=CXOu4Q6LA5Xi+Ohb/yrzeETa5QFHYtR9KPikGkA7YtV+KLP7f15Ca5kp0ei4Z+IAE 6wjOU0bkkPMYsH8PAZ54SJYwAVkVBcn3Q1YOc/G5QkvCdDfZ/x58sAato+5GAVndHq FKSh5T+I9H5ZoyFoz0Nx9HjiZCoaI0F2abBA0MsdzysmrLe7f9PTv1JP8ASMWd28vp VVgIqDtSlh7lVS0BQz4ruBv/aOuddYAXDx0HOmVtNlJAoCxEzvIPj895K37zvi/mBF JDiwaf3ysjX1CADFZYCgzNKY9nIVUu6QiBDf+ptF3k9XcduqKW7hsrRiy2b4QLl+vs QN4z3cycIhyJQ==
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/HiThyeTV1-jmBZCmPn-IXDvZslg
Cc: "cfrg@irtf.org" <cfrg@irtf.org>
Subject: Re: [Cfrg] On relative performance of Edwards v.s. Montgomery Curve25519, variable base
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Jan 2015 06:03:42 -0000

On 01/05/2015 04:04 PM, Michael Hamburg wrote:
>
>> On Jan 5, 2015, at 3:07 PM, Andrey Jivsov <crypto@brainhub.org
>> <mailto:crypto@brainhub.org>> wrote:
>> The method uses window exponentiation-style scalar multiplication with
>> doubling for each bit. The data-dependent decisions are done for
>> additions, which are additional ~ 50% of these. The fix may lower the
>> performance gap, but I think it will be within these 15%.
>
> The difference is about 15% in my code.
>
>> The openssl does this for in ECDSA P-256 code with w=7. My comments
>> are regarding the generalization of this code for w=13, which means
>> you are doing 20 ECC additions for the entire P-256 fixed-base scalar
>> multiplication. At this rate the memory access is what slows you down,
>> and things like unneeded inverse in ECDSA v.s. EdDSA.
>
> 2^13 >> 2^7.

Understood, but diverse implementations would be free to choose any the 
most suitable value of w for the same format on the wire.

Contrast this with Montgomery coordinate on the wire, which can be 
viewed as an implementation with fixed w=1 (but fewer F(p) per w).

>
>> It may not be necessary to have side-channel protected code that
>> generates one-time ECDH pair, otherwise, one can add cache line SC
>> protection.
>
> Partial SC protection can be very tricky to analyze.  I don’t know what
> you mean by “cache line SC protection”.  A cache line may not even hold
> 2^13 bits.

As a side note, I think that only 2^w / (256/w) need be protected.  W.g. 
for w=13 there are 20 sub-tables Ti, where Ti is always accesses in step 
i=[0,19]. The access within each Ti may need side channel protection, if 
this fits a threat model. These Tis are public, read-only tables that 
can be shared between processes and can reside in ROM.

>
> The same fixed-base code is likely to be used for both ECDH ephemeral
> keygen, and for signing.  This sort of leak in signing code could be
> disastrous.

Shouldn't it be possible to get a value T[i] from a public table T with 
a private i in a secure fashion?

In the worst case an implementation can use Montgomery ladder internally 
if this is the most suitable option.

>
>> However, "free" assumes that you use compression. Removing compression
>> gives you 10% performance gain, if you don't need to do conversions to
>> isogenous curves.
>>
>> Is it worth to subject the X.509 cert verifiers to the 10% penalty for
>> 32 byte saving?
>
> Maybe.  We’re talking maybe a 50µs difference on a 1GHz Cortex-A9 phone.
>   While you can send 32 bytes in 50µs over a 5Mbit LTE link, I don’t
> know which will use more power.  On a laptop, it’s about 5µs (on one
> core), which equates to a 50Mbit link.  Obviously not all points will be
> sent over slow links, but the point is that the actual performance
> penalty is on the same order as the space savings.
>
> Also, converting to an isogenous curve will be done in projective form
> in practice, so it’s “free”.

Thinking more about it, here is an interesting observation (probably 
that's what you meant).

Let's say that the wire coordinates are twisted Edwards (x,y).

The implementation that wants to use the t.Edwards curve will use (x,y) 
directly, validating the point as needed first.

An implementation that wants to use Montgomery ladder will do

   X = -(y+1)
   Z =   y-1

and proceed with the currently written X25519 code, other than it will 
use initial Z=y-1 v.s. Z=1. An interesting fact is that the (x,y) can be 
compressed as (y) and this will work without decompression penalty with 
a Montgomery ladder.

This is significant because doing it as in Adam's draft doesn't seem to 
avoid 10% penalty in some cases, specifically, if an implementation uses 
Edwards formula I think one needs to obtain Y for Montgomery X first, 
which involves a square root. This makes sense because Edwards 
calculation needs both (x,y), while Montgomery formula -- only X (and, 
even better, X happen to depend only on y).

This is largely from 
http://www.ietf.org/mail-archive/web/cfrg/current/msg05113.html, which 
relies on http://eprint.iacr.org/2008/013.pdf .

This means that the format on the wire is

   compressed:   t.Edwards (y)
   uncompressed: t.Edwards (x,y)

and the conversion to Montgomery is indeed free.

...