Re: [Cfrg] Timing of libsodium, curve25519-donna, MSR ECCLib, and openssl-master

Andrey Jivsov <crypto@brainhub.org> Thu, 09 October 2014 05:47 UTC

Return-Path: <crypto@brainhub.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EB0921A90DB for <cfrg@ietfa.amsl.com>; Wed, 8 Oct 2014 22:47:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id e1J4cnfGxsyc for <cfrg@ietfa.amsl.com>; Wed, 8 Oct 2014 22:47:16 -0700 (PDT)
Received: from resqmta-ch2-08v.sys.comcast.net (resqmta-ch2-08v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:40]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9B6501A90C4 for <cfrg@irtf.org>; Wed, 8 Oct 2014 22:47:16 -0700 (PDT)
Received: from resomta-ch2-01v.sys.comcast.net ([69.252.207.97]) by resqmta-ch2-08v.sys.comcast.net with comcast id 0tnB1p00226dK1R01tnFyv; Thu, 09 Oct 2014 05:47:15 +0000
Received: from [192.168.1.2] ([71.202.164.227]) by resomta-ch2-01v.sys.comcast.net with comcast id 0tnE1p00L4uhcbK01tnFPm; Thu, 09 Oct 2014 05:47:15 +0000
Message-ID: <54362162.8070506@brainhub.org>
Date: Wed, 08 Oct 2014 22:47:14 -0700
From: Andrey Jivsov <crypto@brainhub.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1
MIME-Version: 1.0
To: Watson Ladd <watsonbladd@gmail.com>
References: <53F0010B.6080101@brainhub.org> <CD159876-F061-4EB8-B1DC-FAB8E4798E26@shiftleft.org> <53F108CF.4040704@brainhub.org> <53F18607.3000005@brainhub.org> <5406C23E.80205@brainhub.org> <5407C176.3000109@brainhub.org> <5435DE66.7080803@brainhub.org> <29E067B7-C1F3-427C-8E4A-14F2096A71E4@shiftleft.org> <543616FF.4010503@brainhub.org> <CACsn0cnDKbiHjjOAAC_xb8bseCLHoS8bKExutMC5DKk8utYVjQ@mail.gmail.com>
In-Reply-To: <CACsn0cnDKbiHjjOAAC_xb8bseCLHoS8bKExutMC5DKk8utYVjQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1412833635; bh=PRdn4Tj/5+VV4vJ8G5zbjtcE75idAJYxzSLjakkD/74=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=WGhDs5Ge5L+LAE1oNVvb7ujdudieNuxlXVdjr3cQ6WSr9bj+Q6nCaY9HCbhY3togG zG6AEJ1HQVoWXNvnboaj7c9LaYIfP63L26TJePoE74BUyBKPzjG0Sgztor5CTsNvDM Meylehd9MiDSlTaCZLbgp3O2Wehnz9bQJg9Os8DQHddv0efITgQCMTc6pAGp/pdW7L as9Hy3+5LKmMsLp27eLPItk35shEnD0YWdk+daorslGdSHtmRDii66vCHtLYhgNyOv j5SUuQRv7sB11eTEHE/vhvuShduD4pZnYDy+eS8RWqx407pXRJBOfvADthjTxB0TSS 4cXzZzEiAM3sQ==
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/sIiWJU2waq2UW7rZZUWTNZ-fK2c
Cc: "cfrg@irtf.org" <cfrg@irtf.org>
Subject: Re: [Cfrg] Timing of libsodium, curve25519-donna, MSR ECCLib, and openssl-master
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Oct 2014 05:47:18 -0000

On 10/08/2014 10:05 PM, Watson Ladd wrote:
>> was 14131.5/5231.7=2.7 (reported on 09/03/2013)
>> >now: 14251.3/11105.2=1.27
>> >(apparently due to Montgomery-style assembler code specialized for P-256
>> >prime)
>> >
>> >This is even more interesting. These performance improvements apparently
>> >cover most of x86 CPUs in use today, clients and servers.
> Wouldn't the speedups from reducing the number of field operations by
> changing the curve shape stack on top of these? I don't really see the
> relevance to picking which wire format to use.
>
Just to be clear, I was saying that the apparent factor of ~2 
improvement of P-256 on non-AVX2 machine appears to be due to highly 
tuned Montgomery modulo prime reduction code, hardcoded for P-256 prime.

Montgomery curve has fewer underlying filed operations. The performance 
benefit will be lower than due to prime reduction/hardware/instruction 
assistance. However, given that the numbers are fairly close now, we can 
expect change in leadership depending on the mix of features. For 
example,  a hypothetical mix of the P-256 underlying field operations 
found in the code that I timed and a Montgomery curve on top would 
probably move such an implementation into the lead in the tests I performed.

P-256 has an advantage that it's in standards, widely deployed, can do 
point additions (without penalty of coordinate conversion), and you can 
get X.509 certs with it. It would have been easier to argue on its 
disadvantages if it had worse performance than it appears to have. I am 
aware of other disadvantages of P-256.

In your other e-mail, Watson, regarding AVX2/vector operations + X25519, 
it's an interesting question. The issues here are:
* will this hide some benefits of the 2^n-1 prime?
* increase code complexity?
* it seems that this is of no use to mobile devices (in the near future 
anyway)
* but servers will benefit from this.