Re: [Cfrg] revised requirements for new curves

Andrey Jivsov <crypto@brainhub.org> Mon, 15 September 2014 04:04 UTC

Return-Path: <crypto@brainhub.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AC3951A01E2 for <cfrg@ietfa.amsl.com>; Sun, 14 Sep 2014 21:04:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.002
X-Spam-Level:
X-Spam-Status: No, score=-0.002 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H59gK6kWmP26 for <cfrg@ietfa.amsl.com>; Sun, 14 Sep 2014 21:04:01 -0700 (PDT)
Received: from qmta14.westchester.pa.mail.comcast.net (qmta14.westchester.pa.mail.comcast.net [IPv6:2001:558:fe14:44:76:96:59:212]) by ietfa.amsl.com (Postfix) with ESMTP id 065801A0572 for <cfrg@irtf.org>; Sun, 14 Sep 2014 21:04:00 -0700 (PDT)
Received: from omta21.westchester.pa.mail.comcast.net ([76.96.62.72]) by qmta14.westchester.pa.mail.comcast.net with comcast id rG3X1o0011ZXKqc5EG402p; Mon, 15 Sep 2014 04:04:00 +0000
Received: from [192.168.1.2] ([71.202.164.227]) by omta21.westchester.pa.mail.comcast.net with comcast id rG3z1o0044uhcbK3hG3zxg; Mon, 15 Sep 2014 04:04:00 +0000
Message-ID: <5416652E.2080405@brainhub.org>
Date: Sun, 14 Sep 2014 21:03:58 -0700
From: Andrey Jivsov <crypto@brainhub.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0
MIME-Version: 1.0
To: cfrg@irtf.org
References: <CAMfhd9UaJtcaRurEOtN289ribT7ZH6OUB55or+T1NN6U8jv9Rw@mail.gmail.com> <CAMm+LwhATMG9Pco-Bt1n-rgr3vEmgFKbpbvA0f9V9fW9t36UVw@mail.gmail.com> <C6D1200E-0CD5-4716-9948-D0CF039F7F3A@shiftleft.org> <CAMm+LwipUBToG2b1A3z7NTDEzMC6og0zwjyQ=eGsdrCZGfwoQQ@mail.gmail.com> <9B3EDBDA-F4F4-441C-9F21-14DEC8BB530E@shiftleft.org>
In-Reply-To: <9B3EDBDA-F4F4-441C-9F21-14DEC8BB530E@shiftleft.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1410753840; bh=AG7UpV4+eC//foo/eafNuQZbUuUOhPiLAp4Igbs1JOk=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=aN4QfbsYb8Ry53iZclHDwqD4/luBowYqr/Ykkj6MgvkxeKAyQspdKDJfMaJ4OfTLR +Lw4pM8tafNwfaGnN4D0vry+kYMEhfnmmaeOOTMD95MNGI3WejRn2RSEK8mx4t8HF4 RedT9Bq8qSJ3TWJ9ma824qZkWuqST9xYzGfu6EneBtSNUQGtRKCdwkgZFTocVStFq5 AkCLHHbaGsq/ijqfJTXE7HHpQFeU75jXFKXgJdbizOIqi4wLPa4my8zl22j0yuwB6e brvdmvEDofITwa9Gs0FSNBz0d1VzlpLMUXz7aRiK/Y5DyU2TeQ9Eqi/5S98IDbCJe1 DXZlKf5Rosfxg==
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/46zXzViql9d87eRkVQn1y4NEO8c
Subject: Re: [Cfrg] revised requirements for new curves
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Sep 2014 04:04:05 -0000

The chart looks consistent with my measurements
http://www.ietf.org/mail-archive/web/cfrg/current/msg05053.html for x86-64.

Relying on approximate numbers with visual extrapolation and my eye sight:

1. Fully optimized with AVX2:
    365/220=1.66

2. Really apples-to-apples OpenSSL v.s. NUMS, in particular:
* 64 bit C code with uint128
* same author
* code that takes advantage of special structure of the primes
    570/220=2.60

In addition, I timed the #2 scenario on the discontinued Xeon CPU E5540 
@ 2.53GHz, LGA 1366, Nehalem architecture, for similar 2.79 factor (v.s. 
my reference 2.70 for Ivy bridge). I was timing curve25519-donna-master, 
but NUMS/Curve25519 this should not matter when comparing with P-256.

The architecture-neutral C code should demonstrate a factor of about 2.7 
slower performance between P-256 and Curve25519. In the case of OpenSSL 
it must be properly configured. I don't see why ARM code would be 
different here.

The factor of ~5 corresponds to OpenSSL doing Montgomery reduction 
treating NIST prime as a random prime. The factor of 5, therefore, 
corresponds to the code doing Brainpool-style random prime Fp 
operations. Prime-specific optimization should be allowed in both cases 
for fairness.

I've mentioned a couple of times that Curve25519 (and NUMS) are only 50% 
faster than P-256 for AVX2-optimized code. There is a similar unfairness 
present in this claim, however, this is a bit less unfair because:
* the comparison uses the best code easily available today
* there is a chance that Curve25519 optimized for AVX2 will see lower 
payout because the parallelism of AVX2 may reduce the benefit of the 
special structure of the pseudo-Mersenne prime.

The ratio that should be used by default is ~2.7.

On 09/14/2014 05:46 PM, Michael Hamburg wrote:
> While this is partly true, it’s also an example of the NIST curves not getting an entirely fair shake because I just ran `openssl speed` with whatever was on my Mac, whereas most of the other curves are optimized.
>
> I’ve added teal dots from the Gueron-Krasnov NIST-P256 paper (citing also the Käsper-Langley paper) to show this effect: at least for NIST-P256, the Mavericks OpenSSL release is about 3.4x worse than the state of the art.
>
> — Mike
>
>> On Sep 14, 2014, at 5:22 PM, Phillip Hallam-Baker <phill@hallambaker.com> wrote:
>>
>> Well one thing that leaps out is that I can get 512 bits of the NUMS
>> curves for the price of 256 bits of the NIST curves. And that is a BIG
>> difference as far as I am concerned.
>>
>> The performance difference between the NUMS curves and the various
>> 'cherry picked' primes is visible but much less significant.
>>
>> On Sun, Sep 14, 2014 at 6:59 PM, Michael Hamburg <mike@shiftleft.org> wrote:
>>>
>>> On Sep 14, 2014, at 7:22 AM, Phillip Hallam-Baker <phill@hallambaker.com>
>>> wrote:
>>>
>>> But if we plot the points on graphs of defender effort vs attacker
>>> work factor and look at the curves we can probably see quite easily
>>> what we are buying with the different approaches.
>>>
>>>
>>> I hacked up some plots for ECDH (protected variable-base scalarmul) on Intel
>>> Sandy Bridge and Cortex A8, A9 platforms at
>>>
>>> http://ed448goldilocks.sourceforge.net/comparison/
>>>
>>> Lots of caveats follow!
>>>
>>> It’s important to note that the curves aren’t all on equal footing here.
>>> The measurements are from the NUMS paper, the Curve41417 paper, `openssl
>>> speed`, the Goldilocks bench utility (on SBR and Tegra 2); curve25519-donna
>>> (Curve25519 on Tegra2), SUPERCOP (Curve25519 on other platforms, Goldilocks
>>> on A8+NEON).  Of these, SUPERCOP and `openssl speed` almost certainly give
>>> less favorable numbers.
>>>
>>> I haven’t fully implemented Ed480-Ridinghood.  The point in the graph is an
>>> estimate based on the field arithmetic being identical to Ed448-Goldilocks
>>> on 64-bit except for the shift amounts.  This is also why it doesn’t appear
>>> in the 32-bit numbers: it cannot use the same arithmetic on 32-bit
>>> platforms, but instead would take ~50% longer, which is why I didn’t propose
>>> it.
>>>
>>> Curve25519, Curve41417, Goldilocks, and the MS “mont” curves are performing
>>> point (de)compression, but OpenSSL and the NUMS “ed” curves are not.  Point
>>> decompression would be about a 10% performance hit, but this hit may be
>>> mitigated (for stronger curves) or negated (for weaker ones) by the use of
>>> the Montgomery ladder.  Some of the software is also hashing the results and
>>> some are not.  (Goldilocks hashes; NUMS do not; I don’t know about the
>>> others.)  Goldilocks tests whether points are on the curve or twist while
>>> Curve25519 does not, but this test is cheap and adds little security.
>>>
>>> The software benchmarked here has different amounts of optimization and is
>>> designed for different platforms.  Curve41417 has only been benchmarked on
>>> Cortex A8+NEON, and the NUMS curves have only been benchmarked on SBR.
>>>
>>> The OpenSSL curves don’t include the latest optimizations, since they’re
>>> from 1.0.1 or 1.0.1f, compiled however the OS maintainers decided.
>>>
>>> Curve25519-donna is not at all optimized for ARM, but I don’t have any other
>>> vectorless ARM numbers to go by, since I couldn’t get Curve25519 working in
>>> SUPERCOP on that platform.
>>>
>>> I expect that the NUMS curves would perform passably well in scalar code on
>>> the Cortex-A9, because UMAAL can help with the carry handling.  But they
>>> would be at a disadvantage in NEON, where carry handling is much more
>>> expensive.  So I would expect the situation on Tegra 2 (with its fast scalar
>>> core but no NEON) to be comparable to that on Sandy Bridge, but on A8+NEON I
>>> would expect it to be considerably worse for the NUMS curves.
>>>
>>> In the other direction, I expect that the numbers for Curve41417 would be
>>> similar on Sandy Bridge to how they are on NEON.  Field arithmetic is
>>> probably comparable to Ed448-Goldilocks, just as on NEON, so curve ops
>>> should be about 8% faster due to the 8% smaller curve.
>>>
>>> I know that variable-base scalar multiplication is not the only benchmark
>>> which matters, but it’s consistently available across many curves.
>>>
>>> I believe that all of the software contains comparable levels of
>>> side-channel protection.
>>>
>>> Cheers,
>>> — Mike
>
> _______________________________________________
> Cfrg mailing list
> Cfrg@irtf.org
> http://www.irtf.org/mailman/listinfo/cfrg
>