### Re: [Cfrg] Progress on curve recommendations for TLS WG

Andrey Jivsov <crypto@brainhub.org> Fri, 15 August 2014 22:52 UTC

Return-Path: <crypto@brainhub.org>

X-Original-To: cfrg@ietfa.amsl.com

Delivered-To: cfrg@ietfa.amsl.com

Received: from localhost (ietfa.amsl.com [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id 1B3E81A009A
for <cfrg@ietfa.amsl.com>; Fri, 15 Aug 2014 15:52:56 -0700 (PDT)

X-Virus-Scanned: amavisd-new at amsl.com

X-Spam-Flag: NO

X-Spam-Score: 0.799

X-Spam-Level:

X-Spam-Status: No, score=0.799 tagged_above=-999 required=5
tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
SPF_PASS=-0.001] autolearn=ham

Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id eE2uT03RKH6B for <cfrg@ietfa.amsl.com>;
Fri, 15 Aug 2014 15:52:54 -0700 (PDT)

Received: from qmta15.emeryville.ca.mail.comcast.net
(qmta15.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:228])
by ietfa.amsl.com (Postfix) with ESMTP id 670AD1A0089
for <cfrg@irtf.org>; Fri, 15 Aug 2014 15:52:54 -0700 (PDT)

Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74])
by qmta15.emeryville.ca.mail.comcast.net with comcast
id f9dR1o0031bwxycAFAsttW; Fri, 15 Aug 2014 22:52:53 +0000

Received: from [IPv6:::1] ([71.202.164.227])
by omta18.emeryville.ca.mail.comcast.net with comcast
id fAss1o00E4uhcbK8eAstP7; Fri, 15 Aug 2014 22:52:53 +0000

Message-ID: <53EE8F45.9070908@brainhub.org>

Date: Fri, 15 Aug 2014 15:52:53 -0700

From: Andrey Jivsov <crypto@brainhub.org>

User-Agent: Mozilla/5.0 (X11; Linux x86_64;
rv:24.0) Gecko/20100101 Thunderbird/24.7.0

MIME-Version: 1.0

To: cfrg@irtf.org

References: <CFFB1371.2916E%kenny.paterson@rhul.ac.uk>
<20140808141506.GA24645@LK-Perkele-VII> <53EDEE67.6090308@secunet.com>
<A1FCAAE8-4F20-4AC1-A635-AED20E8F7D5C@shiftleft.org>

In-Reply-To: <A1FCAAE8-4F20-4AC1-A635-AED20E8F7D5C@shiftleft.org>

Content-Type: text/plain; charset=UTF-8; format=flowed

Content-Transfer-Encoding: 8bit

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
s=q20140121; t=1408143173;
bh=eDVj/tsElHOXC64pBddPPE0wDFfevP/P7MGrdZTItic=;
h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject:
Content-Type;
b=uwra4BB6xchGDqpT2sRa1bnVzZq32FSegOftVi6+H3rRti+OUMgCvr4y6rP8gi/ny
atLu8tN6Znj4Lz6PGAnmnlbWrQzwsGMedMc0SZ6KCe3Q/Ub5j0oPcES+rbIHdeVsoW
4WFrybvRq8SPLkbisxPbAMs1kbR4xKNK0//6lb2FOdwYWknYWtIsgXRtI1dAV74lnV
CFxZP/XcKxIMd+0V3K1O5X1vwScoo2qgmDSO0Y79n6wLnBt2dPXb/bK+iaD+yv3ZWI
tiBWJBlq22m7JhZqMAh4mgAzKVKPu1omOaWG+BXqxVFsdfx2YwcHvzQ4taym9CSvpA
2VD/iZctOphFQ==

Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/ET22GmnWV2-DSkYXje_vRPI23_A

Subject: Re: [Cfrg] Progress on curve recommendations for TLS WG

X-BeenThere: cfrg@irtf.org

X-Mailman-Version: 2.1.15

Precedence: list

List-Id: Crypto Forum Research Group <cfrg.irtf.org>

List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>,
<mailto:cfrg-request@irtf.org?subject=unsubscribe>

List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>

List-Post: <mailto:cfrg@irtf.org>

List-Help: <mailto:cfrg-request@irtf.org?subject=help>

List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>,
<mailto:cfrg-request@irtf.org?subject=subscribe>

X-List-Received-Date: Fri, 15 Aug 2014 22:52:56 -0000

Here is a simple sketch how the prime affects the mop operations. Comparing multiplication of 2 n-limb numbers with the reduction mod n-limb number p: * in case of a random prime, we have n^2 multiplies. This gives 2n number, n higher limbs of which will need to be reduced. To accomplish this we multiply p by each of n high limbs, for the total n^2. We have 2n^2 multiplies * in case of p=2^k-C, we have the same n^2 multiplies, but the reduction results in only n multiplies of high limbs by C. We have n^2 + n So this is (n^2 + n) / 2n^2 ~= 1/2 (pseudo-Mersenne primes have ~50% fewer limb multiplies for multiply+reduce) This estimate underestimates the reduction that takes place for the general p. One needs an additional n divisions (of a limp by the highest limb of p) in denominator, which gives the Mike Hamburg's quantity exactly (which uses the Montgomery reduction method). I didn't count additions/subtractions. On 08/15/2014 02:28 PM, Mike Hamburg wrote: > On 8/15/2014 4:26 AM, Johannes Merkle wrote: >> Ilari Liusvaara wrote on 08.08.2014 16:15: >>> Picking prime randomly has _devastating_ impact on performance. >> I do not mean to doubt that statement, I'm just looking for hard numbers: Do you have any reference for a proper >> analysis supporting that? I mean something beyond "oh, that's common sense". > So, I don't have a proper reference, but in my experience the performance ratio is slightly more than 2, and probably creeps towards 2.5 or 3 as the number of bits increases towards 512. > > An n-digit multiply with no reduction algorithm costs n^2 muls, schoolbook. A Montgomery reduction costs n^2+n additional multiplies, for a total of 2n^2 + n. There may also be a conditional subtract, depending on the modulus. A special prime still requires some reduction, and Montgomery ends up being slightly less than twice as expensive as schoolbook. > > Supporting numbers: a few years ago I implemented 252-bit special-Montgomery primes on Sandy Bridge (mod the special prime, reduction isn't free but costs n=4 mulaccs), and a multiply took about 55 cycles, vs about 100 cycles for a 252-bit general prime. The algorithms were the same, and both implementations were written in assembly. This is what you'd have predicted from the above guesses, since (n^2 + n) / (2n^2 + n) = 20 / 36 = 55.555...%. > > Squaring costed about 10 cycles less in either case, for a flat ratio of 90:45 = 2:1. This is because Montgomery reduction does not go any faster when you square, which takes n(n+1)/2 multiplies schoolbook style, but 3x that with a Montgomery reduction. These asymptotics are not even close to true with n=4, though. > > As p increases, the ratio will get worse. The reduction mod the special prime is a smaller fraction of the total work, and squaring gets closer to asymptotic. Montgomery reduction does not improve as much with Karatsuba multiplication. Karatsuba may cut the cost of the multiplication by 25% or more at 384 or 512 bits, but it’s rather tricky to combine with a general Montgomery reduction efficiently. Furthermore, if you try to do this, the state will be larger and may spill out of the register file. For example, my 448-bit NEON code exactly uses the entire register file with no spilling; anything extra would add significant cost. > > Exactly 64n-bit aligned primes will be slightly worse when randomized, because numbers need to be reduced more often. > > Finally, powering ladders like inverse or square root are denser and therefore slower with a random prime A powering ladder for a special prime usually has a multiply density of around 5%, but for a random prime it is likely to be 20%. Inversion is about 10% of a variable-base and 25% of a fixed-base scalarmul, so this effect is unlikely to exceed 5% of the total time. > > In combination, these will push the ratio to ~2x at 256 bits, and higher for longer numbers, but probably not so high as 3x even at 500 bits (but maybe at 512 or 521 bits). Not having implemented both a large random and a large special prime together on the same platform, I’m not sure exactly what the ratio will be. > > Cheers, > -- Mike > > _______________________________________________ > Cfrg mailing list > Cfrg@irtf.org > http://www.irtf.org/mailman/listinfo/cfrg

- [Cfrg] Progress on curve recommendations for TL... Paterson, Kenny
- Re: [Cfrg] Progress on curve recommendations fo... Watson Ladd
- Re: [Cfrg] Progress on curve recommendations fo... Russ Housley
- Re: [Cfrg] Progress on curve recommendations fo... D. J. Bernstein
- Re: [Cfrg] Progress on curve recommendations fo... Dan Brown
- Re: [Cfrg] Progress on curve recommendations fo... Ilari Liusvaara
- Re: [Cfrg] Progress on curve recommendations fo... Robert Ransom
- Re: [Cfrg] Progress on curve recommendations fo... Johannes Merkle
- Re: [Cfrg] Progress on curve recommendations fo... Johannes Merkle
- Re: [Cfrg] Progress on curve recommendations fo... Alyssa Rowan
- Re: [Cfrg] Progress on curve recommendations fo... Johannes Merkle
- Re: [Cfrg] Progress on curve recommendations fo... Watson Ladd
- Re: [Cfrg] Progress on curve recommendations fo... Dan Brown
- Re: [Cfrg] Progress on curve recommendations fo... Johannes Merkle
- Re: [Cfrg] Progress on curve recommendations fo... Watson Ladd
- Re: [Cfrg] Progress on curve recommendations fo... Dan Brown
- Re: [Cfrg] Progress on curve recommendations fo... Dan Brown
- Re: [Cfrg] Progress on curve recommendations fo... Watson Ladd
- Re: [Cfrg] Progress on curve recommendations fo... Andy Lutomirski
- Re: [Cfrg] Progress on curve recommendations fo... Dan Brown
- Re: [Cfrg] Progress on curve recommendations fo... Mike Hamburg
- Re: [Cfrg] Progress on curve recommendations fo... Andrey Jivsov
- Re: [Cfrg] Progress on curve recommendations fo... Michael Hamburg
- Re: [Cfrg] Progress on curve recommendations fo... Watson Ladd
- Re: [Cfrg] Progress on curve recommendations fo... D. J. Bernstein
- Re: [Cfrg] Progress on curve recommendations fo... D. J. Bernstein
- Re: [Cfrg] Progress on curve recommendations fo... Andrey Jivsov
- Re: [Cfrg] Progress on curve recommendations fo... Michael Hamburg