Re: [Cfrg] ECC reboot

Michael Hamburg <mike@shiftleft.org> Thu, 23 October 2014 23:14 UTC

Return-Path: <mike@shiftleft.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D853F1A1B0D for <cfrg@ietfa.amsl.com>; Thu, 23 Oct 2014 16:14:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.555
X-Spam-Level: *
X-Spam-Status: No, score=1.555 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FH_HOST_EQ_D_D_D_D=0.765, FH_HOST_EQ_D_D_D_DB=0.888, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_NET=0.311, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EGRPs4LYfmxm for <cfrg@ietfa.amsl.com>; Thu, 23 Oct 2014 16:14:38 -0700 (PDT)
Received: from aspartame.shiftleft.org (199-116-74-168-v301.PUBLIC.monkeybrains.net [199.116.74.168]) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A2B291A6EFB for <cfrg@irtf.org>; Thu, 23 Oct 2014 16:14:38 -0700 (PDT)
Received: from [10.184.148.249] (unknown [209.36.6.242]) by aspartame.shiftleft.org (Postfix) with ESMTPSA id 1327D3AA13; Thu, 23 Oct 2014 16:12:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=shiftleft.org; s=sldo; t=1414105932; bh=FdtUKXJLRDWW7+rY8YVeghDb/0g6EeI3+GqAbj97dWc=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=S9G7ziqkCA/kkvfhoSO0TndVdaKz/yOD61m0MarRTY6HqRUhRvuCgzk7f6C7qkS2v s74DjqP2bL0ItWNzPeTQ7JlsNoVOk9GscWFx1l5ymtkQDJyOrjX3/34sF1wWWoZf2U cfWwqbL8CICfwJG9o6JadUeLc+1dN1wRuhmwgahE=
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1990.1\))
From: Michael Hamburg <mike@shiftleft.org>
In-Reply-To: <54497F87.1070801@dei.uc.pt>
Date: Thu, 23 Oct 2014 16:14:35 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <E0A4DF0F-42D0-46FE-A9CE-726FDB9A5534@shiftleft.org>
References: <D065A817.30406%kenny.paterson@rhul.ac.uk> <54400E9F.5020905@akr.io> <CAMm+LwhVKBfcfrXUKmVXKsiAMRSTV+ws+u07grmxkfnR2oYJoQ@mail.gmail.com> <5218FD35-E00A-413F-ACCB-AA9B99DEF48B@shiftleft.org> <m3r3y6z3z8.fsf@carbon.jhcloos.org> <CA+Vbu7x4Y_=JZ9Ydp=U5QnJokL28QMQnV4XUn9S6+CUZR9ozEw@mail.gmail.com> <5444D89F.5080407@comodo.com> <90C609A5-ECB2-4FDC-9669-5830F3463D2B@akr.io> <5448DBE2.10107@comodo.com> <CACsn0cne95adtTbCf6WyAZGyCSyLXo5L0302rm7238yHAsE5EQ@mail.gmail.com> <54493DB1.5070204@akr.io> <CALCETrWjR4ROJJFBTo-zAVUg6t50ppm0O_fd=gf2tCr8-evDwg@mail.gmail.com> <CAMm+Lwi-X5_Bh-dwe54uzratLzpds=719F=hzpATCME4wDqxhA@mail.gmail.com> <CALCETrVicR0hj3oi1xCwfG9Z0n0PpBsrCCW7AGBo_-tpxcq3Rw@mail.gmail.com> <0317470A-AA6A-44FA-A831-81CB93204C78@shiftleft.org> <54497F87.1070801@dei.uc.pt>
To: Samuel Neves <sneves@dei.uc.pt>
X-Mailer: Apple Mail (2.1990.1)
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/iypS29tVeEjYF9JZZksBYhUax6s
Cc: "cfrg@irtf.org" <cfrg@irtf.org>
Subject: Re: [Cfrg] ECC reboot
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Oct 2014 23:14:40 -0000

> On Oct 23, 2014, at 3:21 PM, Samuel Neves <sneves@dei.uc.pt> wrote:
> 
> On 23-10-2014 21:14, Michael Hamburg wrote:
>> Goldilocks should work well in 512-bit registers.  If Intel has single-cycle PMUL[U]DQ on 512 bits, that will become the largest multiplier on the chip: 8x32x32 -> 8x64 compared to the 64x64->128 scalar multiplier.  The ARM NEON Karatsuba implementation should translate pretty well to that primitive, but the devil is in the details.
> 
> AVX-512 has better than VPMUL[U]DQ. Not only does AVX-512* have VPMULLQ (64x64->64), which is useful on its own, but
> also VPMADD52{L,H}UQ, which does a 52-bit multiply followed by a 64-bit addition of either the lower or upper 52 bits of
> the product. This latter instruction seems to expose the floating-point circuitry to the user.
> 
> * The fancy version of AVX-512, expected to be in Skylake; Knight's Corner only has a limited subset named AVX-512F.

Oh, huh, that’s pretty neat.  At first glance, it looks like it’d be most useful for an implementation of a 384-bit prime with radix 2^48.

Curve41417 or a curve over 2^416-2^208-1 would also work with radix 2^52 (since 8*52 = 416), but you’d have to reduce almost completely before multiplying which would offset the gains.

A final possibility is that since the register file of AVX512 is gi-freaking-normous, you could try to do 2-4 multiplies at a time using the extended addition formula.  Then you wouldn’t have to worry as much about the 416-bit input size of this operation, and it would be easier to apply to other primes.

— Mike