Re: [Cfrg] ECC reboot

Watson Ladd <> Thu, 23 October 2014 21:43 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id D70B81A1AD5 for <>; Thu, 23 Oct 2014 14:43:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id vd0HFeeTyfeh for <>; Thu, 23 Oct 2014 14:43:45 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:400d:c01::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 2AEFB1A0861 for <>; Thu, 23 Oct 2014 14:43:40 -0700 (PDT)
Received: by with SMTP id o8so1297000qcw.31 for <>; Thu, 23 Oct 2014 14:43:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=wGfOhPuYKp7AAzU9xLt7eU8vFD+8tb99c+7XLXvIGpA=; b=nuRuLcH43E+NgtMnDN0EpUBeT/MvO43KKqkwGPU/s2Z2Osqow1UXNB78umAxQi4WQn EenKnbJcTUVHpdbJB8CDgqs7ZpMhhyMtiSuxp/atBsL0Lh4wPrcqBYMq0qoOYOacNwq5 vOwMJ8r8Ltjhz5MbCB7K/mvL8UajIygPbY8tNFFM7xUaeG9AjFmmpx181ipTMF0nMXbj LUt9Kl/s1bpOOKcLaxRjZjXc7zfbah2r+pzqlGyqEbreywvKRtspwopanXjRwiyG/7XG 0e2OUghH6pbHEnG9fJtgC/mItqkdOml4SzXFaG0W6jIDQSmx/AkHPotvWLfC9T4e+isv zJEA==
MIME-Version: 1.0
X-Received: by with SMTP id y20mr301516yke.28.1414100619179; Thu, 23 Oct 2014 14:43:39 -0700 (PDT)
Received: by with HTTP; Thu, 23 Oct 2014 14:43:38 -0700 (PDT)
Received: by with HTTP; Thu, 23 Oct 2014 14:43:38 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
Date: Thu, 23 Oct 2014 14:43:38 -0700
Message-ID: <>
From: Watson Ladd <>
To: Phillip Hallam-Baker <>
Content-Type: multipart/alternative; boundary=001a1139e83e1559eb05061df50f
Subject: Re: [Cfrg] ECC reboot
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 23 Oct 2014 21:43:48 -0000

On Oct 23, 2014 1:52 PM, "Phillip Hallam-Baker" <>
> On Thu, Oct 23, 2014 at 4:14 PM, Michael Hamburg <>
>> > On Oct 23, 2014, at 11:22 AM, Andy Lutomirski <>
>> Goldilocks should work well in 512-bit registers.  If Intel has
single-cycle PMUL[U]DQ on 512 bits, that will become the largest multiplier
on the chip: 8x32x32 -> 8x64 compared to the 64x64->128 scalar multiplier.
The ARM NEON Karatsuba implementation should translate pretty well to that
primitive, but the devil is in the details.
> OK so that is an important data point.
> What I am trying to get to here is to get people to explain those details
in terms more abstract than 'I ran X on Y and got time Z'.
>> On the other hand, Broadwell is slated to introduce ADCX/ADOX scalar
carry-handling primitives which should mitigate some of the carry handling
issues for packed NUMSp512 math.  It probably won’t be as big a difference
as AVX-512, but it’s something.
> Thats the point, if there is a 512 bit wide register and there is an
application that can be accelerated if 512 wide instructions are available
there is a reasonable expectation that we can get them implemented.

This is wrong. Many vector units operate on two independent halves
internally, introducing an extra delay for any operation that crosses the
halves. Furthermore, maximal gate delay times force strict limits on
circuit depth.

ADOX, ADCX, and MULX are nothing special in comparison.  They use existing
hardware in slightly nicer ways. The carryless multiply and AES
instructions are much more similar to what you propose, and took a long
time to get into silicon. Even then the multiply instruction was a
sixteenth of the size of what you propose.

> Adding in a Manchester carry chain isn't a big cost, nor is providing a
primitive for a fast 512 bit multiply. 512 bit multiply
>> @PHB: How much experience do you have actually implementing elliptic
curves or similar primitives?  You have an awful lot to say about why 512
will be fast in the long run, and I’m curious where that comes from.  Is it
“just common sense”, or have you done experiments?
> Having built large parallel machines and used them for very large tasks,
I am going by a little more than common sense. 512 bit arithmetic
operations are certainly going to be faster than 521. Now what that implies
for implementing
> Knotted Weierstrass curves with a double Irish Dutch sandwich or whatever
is what I am looking for an explanation of in terms that I can explain to
the general security community.

But it impacts performance in ways that don't show up in benchmarks. If you
think 512 can go faster, roll up your sleeves and send it to SUPERCOP. If
what you want is hardware support, you will have to explain why it hasn't
already happened.

If you want to say performance doesn't matter, and the prime picking we use
isn't rigid enough, make that argument.  But don't say "this is faster"
when the benchmarks say otherwise.

Watson Ladd
> What I began arguing is that 521 is going to break stride on a 512 bit
architecture and should be taken off the table unless the speed advantage
is enormous. I don't think we need to implement the algorithm to see that.
> Now if you are arguing that 512 will also break stride on a 512 bit
architecture then that is an important data point that by the same logic
argues for looking at a curve that does fit and maybe Golilocks is actually
the one.
> But before conceding that point I would like to see more of an
explanation of the optimization being used and how much it delivers.
> _______________________________________________
> Cfrg mailing list