Re: [Cfrg] New names for draft-ladd-safecurves

Mike Hamburg <> Tue, 21 January 2014 08:29 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 540621A0087 for <>; Tue, 21 Jan 2014 00:29:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 4.256
X-Spam-Level: ****
X-Spam-Status: No, score=4.256 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FH_HOST_EQ_D_D_D_D=0.765, FH_HOST_EQ_D_D_D_DB=0.888, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_NET=0.311, RDNS_DYNAMIC=0.982, SPF_PASS=-0.001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id tfn-1wfD8_U2 for <>; Tue, 21 Jan 2014 00:29:11 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 00B791A029B for <>; Tue, 21 Jan 2014 00:29:10 -0800 (PST)
Received: from [] (unknown []) by (Postfix) with ESMTPSA id ED25E3AA04; Tue, 21 Jan 2014 00:27:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple;; s=sldo; t=1390292824; bh=K8DuXg35+Nq962UhIK+hDbFFdmryF1cIeMLrsInLA/4=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=HG86/eRAkkxOcrbzfmmnZXwd4K095CRowmb53aPhz3lgG4za+Exc5TIm7boZpJIW2 Rf2zAWwaO/Vi/9X1zCsolepIHq59QfWjWbgo7FuI1uRYULrdx7VxYMUXCsdSziDbMO IzNmIlJ5lwQLTz8qt1+8kquJDmEdrn+XTBGlQXrw=
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\))
From: Mike Hamburg <>
In-Reply-To: <>
Date: Tue, 21 Jan 2014 00:29:08 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <> <> <> <> <> <>
To: Robert Ransom <>
X-Mailer: Apple Mail (2.1827)
Subject: Re: [Cfrg] New names for draft-ladd-safecurves
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 21 Jan 2014 08:29:13 -0000

On Jan 20, 2014, at 10:44 PM, Robert Ransom <> wrote:

> On 1/20/14, Mike Hamburg <> wrote:
>> On Jan 20, 2014, at 9:06 PM, Robert Ransom <> wrote:
>>> I would suggest using the Montgomery-form x coordinate with the sign
>>> bit of the Edwards-form x coordinate.  (In fact, I *did* suggest that:
>>> <>
>>> <>)
>> Amusingly, if you say that "sign" is "Jacobi symbol" (when p==3 mod 4), the
>> two are the same for q-torsion points.  But either way, the difference is a
>> couple of field muls vs a slightly messier spec.
> The least significant bit of the least positive residue is easier to
> compute during point compression, and is well-defined regardless of
> the field order, so I would use that instead.

Yeah, I love the Jacobi thing because it's a cool trick, but I still think the low bit is probably better for these reasons.

>>> And yes, the Brier-Joye formulas to recover Montgomery-form y after
>>> the Montgomery ladder would be faster than Brauer's algorithm on
>>> Edwards-form points for variable-base single-scalar multiplication.
>> Is this still true for large curves?  I don't think it's true
>> asymptotically, and if people switch from inverse-by-exp to
>> inverse-by-blind-and-EGCD, then you lose your free square root and the
>> tradeoff might happen before 521 bits.  You might get your freebie back
>> again if you spec that sign is Jacobi symbol and use a blind EGCD
>> inverse-and-Jacobi-symbol, but again this only works for p==3 mod 4.
> The Montgomery-form ladder takes 5M+4S+1*param+... per scalar bit.
> Edwards-form Brauer (assuming a=-1) takes 3M+4S+... per doubling, and
> 1M (conversion to extended coordinates) + 7M+... (madd-2008-hwcd-4;
> assuming the table is affine) per scalar limb, so it could (just
> barely, counting multiplications only) appear to break even using
> 4-bit limbs.  But the table precomputation for that includes multiple
> point additions and an inversion, and each 4-bit scalar limb requires
> a constant-time lookup from a 7-element table, so even asymptotically,
> you're eaten by overhead.

For practical purposes, you're right: it's probably at most a wash even at longer bit lengths.

I actually measured it for a 252-bit prime, 250-bit curve by microbenching some old code.  I measured a constant-time Edwards scalar multiply (no conversion to/from affine) vs a constant-time Montgomery scalar multiply (start in affine, but no conversion back to affine at the end). The result was pretty close, within the margin of error of the benchmark on Sandy Bridge.

The code is using "extensible coordinates" for the accumulating point (a lazy man's version of Hisil et al's extended with lookahead, which would give the same result) and gives you that 8M/add without putting the table in affine (it's in "projective half-Niels coordinates", i.e. 4-coordinate readd form).

Otherwise your math is fine.  You're trading off a constant-time lookup in a big table every k bits, and some precomputation (slightly cheaper than the lookups) in the Edwards version vs a conditional swap every bit, a multiplication by a small number and a couple of extra additions for the Montgomery ladder.

After about 300 bits, extrapolation suggests that a 5-bit window should be faster than a 4-bit one, if only marginally.  All in all, by 448 bits (extrapolating to Ed448-Goldilocks timing measurements) the Edwards implementation is maybe half a multiply per bit ahead of the Montgomery one, before compression and decompression are taken into account.

But then there's compression and decompression.  I'm pretty sure a Montgomery implementation can always use the inverse square root trick at worst, whereas Edwards always has to take a square root and a full inversion separately (to decompress and to compress, respectively).  I don't have a timing for blind-and-EGCD on me.  Maybe it's half a mul per bit, and then the whole thing is a wash.  Maybe it's 0.3, and that giant pile of complexity and memory thrashing bought you a 2% win.  Woo.

So Montgomery is the better choice on Sandy Bridge out to 448 bits and probably to 521 as well.

-- Mike