Re: [Cfrg] ed448goldilocks vs. numsp384t1 and numsp512t1

Michael Hamburg <> Mon, 20 October 2014 22:48 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id A011A1ACF93 for <>; Mon, 20 Oct 2014 15:48:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 1.555
X-Spam-Level: *
X-Spam-Status: No, score=1.555 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FH_HOST_EQ_D_D_D_D=0.765, FH_HOST_EQ_D_D_D_DB=0.888, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_NET=0.311, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id zb-sgG0b8ucJ for <>; Mon, 20 Oct 2014 15:48:53 -0700 (PDT)
Received: from ( []) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 0C0291ACF90 for <>; Mon, 20 Oct 2014 15:48:52 -0700 (PDT)
Received: from [] (unknown []) by (Postfix) with ESMTPSA id 7E7B0F5EAE; Mon, 20 Oct 2014 15:46:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple;; s=sldo; t=1413845197; bh=/LGjfmzZLQssmcsjgPeYwJPUW32zAoo2P8g6TwAlRGk=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=guCaepM6569UJCl+v8k78TutC3k8zg/Yv1A0yS9Q8eYYimFLgYB7BhIe2pVwNvvqG lsbE/otIw6ZXTmkwi0wddp6eu9k8Q/9JJPIK83xikjwcSR2Ln3by8se7A4DOvILaFD NGl0IRF4faTmNAskCZzR4uNLmLzh/LGyUNe1312c=
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1990.1\))
From: Michael Hamburg <>
In-Reply-To: <20141020212441.GA23673@LK-Perkele-VII>
Date: Mon, 20 Oct 2014 15:48:48 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <20141020212441.GA23673@LK-Perkele-VII>
To: Ilari Liusvaara <>
X-Mailer: Apple Mail (2.1990.1)
Subject: Re: [Cfrg] ed448goldilocks vs. numsp384t1 and numsp512t1
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 20 Oct 2014 22:48:54 -0000

> On Oct 20, 2014, at 2:24 PM, Ilari Liusvaara <> wrote:
> On Sat, Oct 18, 2014 at 08:30:17PM -0000, D. J. Bernstein wrote:
>> Michael Hamburg writes:
>>> I didn’t do this last time, which is (part of?) why the numbers from
>>> my own benchmarks do not match DJB’s numbers; see below.
>> Numbers are now coming into eBATS (see for Mike's
>> fixed ed448goldilocks software, and confirm what Mike said about speed
>> compared to Microsoft's claimed speed. Here's the updated comparison
>> chart on Sandy Bridge, the microarchitecture selected by Microsoft for
>> benchmarks in
>>    617000 cycles claimed: numsp384t1 (ed-384-mers),    ~2^192 security.
>>    666544 cycles measured on h6sandy: ed448goldilocks, ~2^224 security.
>>   1293000 cycles claimed: numsp512t1 (ed-512-mers),    ~2^256 security.
> IIRC, Mike has said that Ed448 software is not quite optimized as far
> as it would go.

Maybe not "as far as it would go”, but those numbers are for recent, optimized code, and fixes to the bug which was holding it back in previous SUPERCOP benchmarks.

> Also, these all are apples-to-apples comparisions (either all uncompressed
> or all compressed), right?

No, Goldilocks is compressed and NUMS is uncompressed.

I’ll see if I can get (possibly untwisted) NUMS curves working in the Goldilocks source tree, which would make a slightly more apples-to-apples comparison.

>> These DH ratios don't _perfectly_ predict ratios for other operations---
>> the instruction mix changes, and speeds of other operations depend on
>> choices of precomputed table size---but at this point it's unsurprising
>> to see ed448goldilocks close to numsp384t1 for signature generation:
>>    231000 cycles claimed: numsp384t1 (ed-384-mers),    ~2^192 security.
>>    234844 cycles measured on h6sandy: ed448goldilocks, ~2^224 security.
>>    446000 cycles claimed: numsp512t1 (ed-512-mers),    ~2^256 security.
>> Also signature verification:
>>    624000 cycles claimed: numsp384t1 (ed-384-mers),    ~2^192 security.
>>    729152 cycles measured on h6sandy: ed448goldilocks, ~2^224 security.
>>   1320000 cycles claimed: numsp512t1 (ed-512-mers),    ~2^256 security.
> Here Ed448 seems to be slightly slow for some reason.
> I would have estimated (on very dubious grounds) ~680k.
> -Ilari

It’s due in large part to point decompression.  Point compression costs about the same as affine serialization.

For signing, Goldilocks needs to compress points but not decompress them, so it doesn’t take a speed penalty.  Also, its precomputed tables are slightly bigger than the ECCLib tables.  On the other hand, the signature is a bit slower than I’d have guessed; it might just be that the constant-time selection is vectorized, and SBR doesn’t have AVX2 and so uses SSE2 instead.

For ECDH, Goldilocks uses the Montgomery ladder, which avoids decompression at the cost of a slightly slower scalarmul.  So it has a small disadvantage vs ECCLib.

For verification, Goldilocks has to decompress and ECCLib doesn’t, which costs almost 10%.

— Mike