### Re: [Cfrg] Curve selection revisited

Andrey Jivsov <crypto@brainhub.org> Wed, 30 July 2014 07:24 UTC

Return-Path: <crypto@brainhub.org>

X-Original-To: cfrg@ietfa.amsl.com

Delivered-To: cfrg@ietfa.amsl.com

Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 04F151B2953 for <cfrg@ietfa.amsl.com>; Wed, 30 Jul 2014 00:24:40 -0700 (PDT)

X-Virus-Scanned: amavisd-new at amsl.com

X-Spam-Flag: NO

X-Spam-Score: -1.9

X-Spam-Level:

X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham

Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o2yIM-XJurFS for <cfrg@ietfa.amsl.com>; Wed, 30 Jul 2014 00:24:35 -0700 (PDT)

Received: from qmta06.emeryville.ca.mail.comcast.net (qmta06.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:56]) by ietfa.amsl.com (Postfix) with ESMTP id 7196A1B2954 for <cfrg@irtf.org>; Wed, 30 Jul 2014 00:24:34 -0700 (PDT)

Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta06.emeryville.ca.mail.comcast.net with comcast id YXMb1o0040EPchoA6XQZN5; Wed, 30 Jul 2014 07:24:33 +0000

Received: from [192.168.1.2] ([71.202.164.227]) by omta01.emeryville.ca.mail.comcast.net with comcast id YXQY1o00R4uhcbK8MXQZeJ; Wed, 30 Jul 2014 07:24:33 +0000

Message-ID: <53D89DB0.9020505@brainhub.org>

Date: Wed, 30 Jul 2014 00:24:32 -0700

From: Andrey Jivsov <crypto@brainhub.org>

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

MIME-Version: 1.0

To: Robert Ransom <rransom.8774@gmail.com>

References: <CA+Vbu7xroa68=HOZtbf=oz7kK2EeUv_z1okpnjxHPR0ZtHD5cA@mail.gmail.com> <CFF7E184.28E9F%kenny.paterson@rhul.ac.uk> <53D2781B.8030605@sbcglobal.net> <CACsn0ckqFigWoH2+OOEHSd2VWPp8y6=m8H5OsFRyjXmjK7+m4w@mail.gmail.com> <CABqy+srxMNuG0AaQd0SaegHvZWgbW762EQq+iAHL_fbu6sOJJQ@mail.gmail.com> <53D420B3.10707@brainhub.org> <CABqy+so6JcL3drjXuiQfLhm-LPMOJuS9ES5Hyb1UQRhi-gV2jA@mail.gmail.com>

In-Reply-To: <CABqy+so6JcL3drjXuiQfLhm-LPMOJuS9ES5Hyb1UQRhi-gV2jA@mail.gmail.com>

Content-Type: multipart/alternative; boundary="------------010301070108070304020403"

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1406705073; bh=tFJASxSTQhPBb5J2PQPtMkwa2IHcf7JLAdw9YKtJQJU=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=P6wlcy/eu2xbPJ7UiJvsua6SSnKOoF/eQBw+m+zYpRxIR2Cmk952FPJiECZWIOJal OvClhQnfgf7K9qpiUYrhTjT/KYHf1FCizE2rRYQ46bIfqiQR5mfMDpsqWmYL1S4n8J a6YKD4/sNYJUbN2F+zWCUMYDX95L+chk932vUOxPZMTpQfiMqepN+d1BBAxKMiz2HI UqSyyKD/B+XO4U4vzyHRBWh7GvJAVWeOfs/qxrVExeuEq/eHumDCVl6a6i1or1KMj8 7Nb+oMCQ4YSsiLIhwosJK3PRJFMgDJEIEFRUYzAE80eyXhGibGyCO4XLpqpKakklbs hJsiKoYYkGIqg==

Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/-yzHPyRwz7zKlHRJ63ZZ6MJwczM

Cc: cfrg@irtf.org

Subject: Re: [Cfrg] Curve selection revisited

X-BeenThere: cfrg@irtf.org

X-Mailman-Version: 2.1.15

Precedence: list

List-Id: Crypto Forum Research Group <cfrg.irtf.org>

List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>

List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>

List-Post: <mailto:cfrg@irtf.org>

List-Help: <mailto:cfrg-request@irtf.org?subject=help>

List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>

X-List-Received-Date: Wed, 30 Jul 2014 07:24:40 -0000

On 07/29/2014 04:14 AM, Robert Ransom wrote: > On 7/26/14, Andrey Jivsov <crypto@brainhub.org> wrote: >> On 07/25/2014 07:25 PM, Robert Ransom wrote: >>> The exact value of c determines whether a portable or vectorized >>> implementation can delay carries until after reduction. For example, >>> after curve25519-donna performs a multiply or squaring, each of 19 >>> 64-bit limbs has absolute value less than 14*2^54. When >>> freduce_degree reduces a field element to 10 limbs without carrying, >>> the result has absolute value less than (c+1)*14*2^54 = 280*2^54, or a >>> bit more than 2^62. In the Microsoft field 2^255 - 765, or even 2^256 >>> - 189, this would not be possible. >>> >>> Carries between limbs can be performed more efficiently than in >>> curve25519-donna -- curve25519-donna goes to extra effort to keep >>> limbs signed. In the Montgomery ladder, this optimization does not >>> add any extra carry steps, because the Montgomery-ladder formulas >>> never perform more than two add/subtract operations between a >>> multiply/square operation. With Edwards-curve operations, this >>> optimization may require an additional carry step in the doubling >>> formula. >>> >>> The exact value of s determines whether carries can be vectorized >>> within a coordinate-field element. For example, 2^251 - 9 can be >>> represented using one 26-bit limb (in the least significant position) >>> and 9 25-bit limbs, but this interferes with the vectorization >>> strategy described in the neoncrypto paper (by Dr. Bernstein et al.). >>> Since c=9 is so small, this can be worked around by representing >>> coordinates modulo 2^252 - 18 instead, but this trick is rarely >>> possible without running into the bound on c. >> The main argument for p = 2^255-19 v.s. p = 1^256-189 is that the C=19 >> is smaller than C=189 and this helps to "manage" the carries. > Actually, the main argument for p = 2^255 - 19 is that Curve25519 is > the de-facto standard curve for high-performance ECC in a group of > near-256-bit order. Part of the reason for that is that implementers > have, in practice, been able to rely on the small value of what you > denote as “C” for that field in optimizing their software, so no one > has felt a need to switch to a different field of that form for > performance reasons. Yes, I accept it. > > (Mike Hamburg introduced ‘special Montgomery primes’ and showed > evidence that they can provide better performance in some > implementations than the Curve25519 coordinate field in > <https://eprint.iacr.org/2012/309.pdf>, but that's a different form of > prime.) > > >> Let's assume that an F(p) integer is represented as n limbs (words). >> >> 256 bit Montgomery curve will have on the order of (5 mul + 4 squares) >> and about the same number F(p) additions per one scalar bit. >> >> Therefore, we can talk about one unreduced F(p) multiplication + F(p) >> reduction (where C plays its role) + F(p) addition. With this in mind >> the contribution of C can be estimated within this group. >> >> F(p) multiplications (counting squares here as well) dominate the >> performance of EC operations: >> * the result of the multiplication is 2*n limbs and the complexity of >> this operation is n^2 limb multiplications and O(n^2) additions. >> * the 2*n limbs must be reduced (mod p) to n limbs immediately after >> multiplication >> * there is on average ~1 addition/subtraction of n limbs per each >> multiplication due to EC formula >> >> Each multiplication needs degree reduction and settling of accumulated >> carries before the next multiplication. One can only postpone settling >> of carries while he does additions (prior to the next multiplciation). >> >> We have about n^2 limb additions, plus n limb additions due to EC >> formular, plus we must perform n additions of the form <some_limb>*C >> (for modp reduction). >> >> Assuming n=10 limbs, the entire contribution of n <some_limb>*C is bound >> by ~10% of all the additions. >> >> We are not talking about C=0, however, but rather about |C|=4 bits v.s. >> |C| = 8 bits difference. >> >> A rough estimate here is 5% penalty. > As Dr. Bernstein has pointed out, a larger value of C requires 3*n > carries instead of n carries (see > <https://moderncrypto.org/mail-archive/curves/2014/000237.html>), and > carries have high latency (see section 4, subsection ‘Reduction’ of > naclcrypto). > > >> One reason why I think this is a good estimate is that there may be a >> benefit of more fine-tuned control of carries. The highest accumulation >> of the carries happens in the inner limbs of the 2*n integer (imagine >> paper-and-pencil multiplication of two n-limb numbers: the stack is the >> highest in the middle). Therefore, if we have the need to control >> overflow within, say, multiplication, perhaps it's possible to >> accomplish this within a subset of n limbs. > So you expect greater implementation effort and complexity to reduce > the added cost, but not eliminate it? I think it's fair to say that lower C will be faster if you only consider F(p) ( multiplies and squares; the additions/subtractions don't matter). > > >> ( BTW, I don't see a clear argument why can't any F(p) be vectorized >> (some more efficiently than others). The vectorization using x86 SSE2 is >> known to speeds up RSA operations for generic F(p)s. ) > As I said, the vectorization strategy used in the neoncrypto paper is > incompatible with 251-bit moduli. That strategy requires pairing > together limbs of the same size, so that a vectorized shift operation > can be used to carry out of both limbs at the same time. > > In RSA, where (in your notation) C is necessarily huge, there is no > performance penalty from making C a few bits larger, so it's easy to > lengthen the modulus and make every limb the same size, so it's easy > to vectorize carries across any pair of limbs. But that's not very > relevant to the primes used for fast ECC. That's correct. For RSA with SSE you do similar things you do here, but the reduction is generic and slow. > > > Do you see any benefit of p = 2^256 - 189 over p = 2^255 - 19 that you > believe offsets the costs of switching to a different coordinate > field? They are : * aesthetics * it happened that square root of p = 2^255 - 19 requires multiplication by a sqrt(-1) in half the cases when decompression is involved. Both are tolerable from aesthetics point of view IMO if we (effectively) standardize on p = 2^255 - 19. p = 2^255 - 19's slightly higher performance than 2^256 - 189 should balance *sqrt(-1) out. ( Compare this with 2^414-17, which won't be trivial to match with other algorithms. )

- [Cfrg] Curve selection revisited Benjamin Black
- Re: [Cfrg] Curve selection revisited Yoav Nir
- Re: [Cfrg] Curve selection revisited Paterson, Kenny
- Re: [Cfrg] Curve selection revisited David Jacobson
- Re: [Cfrg] Curve selection revisited Watson Ladd
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Watson Ladd
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Watson Ladd
- Re: [Cfrg] Curve selection revisited Andrey Jivsov
- Re: [Cfrg] Curve selection revisited Ilari Liusvaara
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Michael Jenkins
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Hannes Tschofenig
- Re: [Cfrg] Curve selection revisited Hannes Tschofenig
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Stephen Farrell
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Paul Lambert
- Re: [Cfrg] Curve selection revisited Paul Lambert
- Re: [Cfrg] Curve selection revisited Mike Hamburg
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Andrey Jivsov
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Phillip Hallam-Baker
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Russ Housley
- Re: [Cfrg] Curve selection revisited Salz, Rich
- Re: [Cfrg] Curve selection revisited Phillip Hallam-Baker
- Re: [Cfrg] Curve selection revisited Salz, Rich