Re: [Cfrg] ECC reboot

Phillip Hallam-Baker <> Thu, 23 October 2014 20:52 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 2AB3D1AD430 for <>; Thu, 23 Oct 2014 13:52:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.277
X-Spam-Status: No, score=-1.277 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id m13xAxwFgdYA for <>; Thu, 23 Oct 2014 13:52:16 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4010:c04::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id F103C1AD3E2 for <>; Thu, 23 Oct 2014 13:52:15 -0700 (PDT)
Received: by with SMTP id n15so1493053lbi.25 for <>; Thu, 23 Oct 2014 13:52:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=cwrgnw5g6LSWL1f+zq89BGxK2MPm7Tq/NykBP3H1I9s=; b=bSfCe5ysul6x8aKU5qFfFZzapxMKdY4UwnsqXDJVGP6sxJN1byZNc1qhGSvY4axPaR HXkDgTpN+r3FMGWzFmdvz5biMTjYdP1zw5x/Io9yRXIaoXkhLyDsWuID2vwaZDi3JhEZ luo+nAYJGs74SeJnLVgVgUGTc/fcJNFiow6p11ig/uqS7jjG4TDuzS0cJjrWzv8JcWoo jEVIEPb513KMrtd+HK4Xm/qEC/Fc1fvWEriQ35MKSLAoVXpbtCnksWDOjLO82O86m6oT 0VRfd3V2Xvty24ciYs6x2x9sf5q1H7nt5NNbquwyon+t2hILW9KK33VN93YZ0NR9664t p8dg==
MIME-Version: 1.0
X-Received: by with SMTP id qh2mr80640lbc.5.1414097533634; Thu, 23 Oct 2014 13:52:13 -0700 (PDT)
Received: by with HTTP; Thu, 23 Oct 2014 13:52:13 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
Date: Thu, 23 Oct 2014 16:52:13 -0400
X-Google-Sender-Auth: fvfxUxfeaam8UylqLpWSlxK8LsY
Message-ID: <>
From: Phillip Hallam-Baker <>
To: Michael Hamburg <>
Content-Type: multipart/alternative; boundary=001a1135ecb82b8fe505061d3dba
Cc: "" <>
Subject: Re: [Cfrg] ECC reboot
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 23 Oct 2014 20:52:18 -0000

On Thu, Oct 23, 2014 at 4:14 PM, Michael Hamburg <> wrote:

> > On Oct 23, 2014, at 11:22 AM, Andy Lutomirski <>
> wrote:
> Goldilocks should work well in 512-bit registers.  If Intel has
> single-cycle PMUL[U]DQ on 512 bits, that will become the largest multiplier
> on the chip: 8x32x32 -> 8x64 compared to the 64x64->128 scalar multiplier.
> The ARM NEON Karatsuba implementation should translate pretty well to that
> primitive, but the devil is in the details.

OK so that is an important data point.

What I am trying to get to here is to get people to explain those details
in terms more abstract than 'I ran X on Y and got time Z'.

On the other hand, Broadwell is slated to introduce ADCX/ADOX scalar
> carry-handling primitives which should mitigate some of the carry handling
> issues for packed NUMSp512 math.  It probably won’t be as big a difference
> as AVX-512, but it’s something.

Thats the point, if there is a 512 bit wide register and there is an
application that can be accelerated if 512 wide instructions are available
there is a reasonable expectation that we can get them implemented.

Adding in a Manchester carry chain isn't a big cost, nor is providing a
primitive for a fast 512 bit multiply. 512 bit multiply

> @PHB: How much experience do you have actually implementing elliptic
> curves or similar primitives?  You have an awful lot to say about why 512
> will be fast in the long run, and I’m curious where that comes from.  Is it
> “just common sense”, or have you done experiments?

Having built large parallel machines and used them for very large tasks, I
am going by a little more than common sense. 512 bit arithmetic operations
are certainly going to be faster than 521. Now what that implies for
Knotted Weierstrass curves with a double Irish Dutch sandwich or whatever
is what I am looking for an explanation of in terms that I can explain to
the general security community.

What I began arguing is that 521 is going to break stride on a 512 bit
architecture and should be taken off the table unless the speed advantage
is enormous. I don't think we need to implement the algorithm to see that.

Now if you are arguing that 512 will also break stride on a 512 bit
architecture then that is an important data point that by the same logic
argues for looking at a curve that does fit and maybe Golilocks is actually
the one.

But before conceding that point I would like to see more of an explanation
of the optimization being used and how much it delivers.