Re: [Cfrg] Elliptic Curves - curve form and coordinate systems

Andrey Jivsov <crypto@brainhub.org> Mon, 16 March 2015 05:27 UTC

Return-Path: <crypto@brainhub.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0B46E1A014C for <cfrg@ietfa.amsl.com>; Sun, 15 Mar 2015 22:27:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, LOTS_OF_MONEY=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aTIdz4JhsLwX for <cfrg@ietfa.amsl.com>; Sun, 15 Mar 2015 22:26:54 -0700 (PDT)
Received: from resqmta-po-09v.sys.comcast.net (resqmta-po-09v.sys.comcast.net [IPv6:2001:558:fe16:19:96:114:154:168]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 93FF81A0118 for <cfrg@irtf.org>; Sun, 15 Mar 2015 22:26:54 -0700 (PDT)
Received: from resomta-po-14v.sys.comcast.net ([96.114.154.238]) by resqmta-po-09v.sys.comcast.net with comcast id 45SW1q00458ss0Y015StQA; Mon, 16 Mar 2015 05:26:53 +0000
Received: from [192.168.1.2] ([71.202.164.227]) by resomta-po-14v.sys.comcast.net with comcast id 45Ss1q0064uhcbK015SsA8; Mon, 16 Mar 2015 05:26:53 +0000
Message-ID: <5506699C.3070006@brainhub.org>
Date: Sun, 15 Mar 2015 22:26:52 -0700
From: Andrey Jivsov <crypto@brainhub.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: cfrg@irtf.org
References: <20150316002255.28855.qmail@cr.yp.to>
In-Reply-To: <20150316002255.28855.qmail@cr.yp.to>
Content-Type: multipart/alternative; boundary="------------060106090503030607060405"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1426483613; bh=LeMfSl939AEf68B6omefMWPkzU51VFT84nxv7NUtfiw=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=wZfaRh21zdnWFXLo0do+FsqcijkrO8bXa0Lknxjo1YyVRjVV84kSkArXxcw6dl3E0 L26ta0+Ra+sLD5gjhSohzbUVMV1Y9D08g7qdDDqVMtEBHigBjTXyDJ30sGv2BGnUWH J/OfSd/KdyNGUmHD8I6Q9rY+r9rb7R2Deu6pKIj4vkUyfLy7icpcwLExz2xmxcNwV/ XL+bTBKqGEBBuN/gN48yhbWFCgcGps/bHQQvp9l9vKxkPTaRJ8CjdZNgZLZgS+w88v BGo2lrhUiHThj1uX6Om9LmkkmpqfiievMcxx7n4RScpD2CYiohipLo3gGIW6Y2zeMI l/ADB8yt1C47A==
Archived-At: <http://mailarchive.ietf.org/arch/msg/cfrg/Z21C6wXgU7NSVgLGrAmB1K5Moaw>
Subject: Re: [Cfrg] Elliptic Curves - curve form and coordinate systems
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Mar 2015 05:27:00 -0000

On 03/15/2015 05:22 PM, D. J. Bernstein wrote:
> One of the OpenSSL security fixes this year was correcting a bug in
> point validation. The bug is estimated to allow more than 2^192 bogus
> (x,y) pairs for NIST P-256, and presumably a few of those have order
> below 2^66. Finding these medium-order points is an open problem for the
> moment, but anyone who _can_ find them can use an active invalid-point
> attack to break Microsoft's two-hour ECDH keys with <2^70 computations,
> and thus to decrypt all other connections made during that two-hour
> period. (The computations don't have to finish within the two hours.
> Maybe this work can even be shared across multiple ECDH keys; I'd have
> to look at other protocol details to figure this out.)
>
> How many years did it take for this bug to be found in a very widely
> used cryptographic library? Who's reviewing all the other NIST P-256
> implementations to see whether they even _try_ to validate points, never
> mind whether they get it right?

Don, are you referring to the 
https://www.openssl.org/news/secadv_20150108.txt

CVE-2014-3570 "1/2^64 on the single affected 32-bit platform (MIPS) and
1/2^128 on affected 64-bit platforms", "No exploits are known and straightforward bug attacks fail" ?

I read it as that this doesn't apply to 32 bit x86, and ignoring MIPS, this 1/2^128 for 64 bit
  without a known exploit (because the bug was at the very low level).

>
> With the Montgomery ladder, the curve is reused in every step, and a bug
> that corrupts computations will simply move to different points on the
> same curve or its twist. Next-generation curves such as Curve25519
> guarantee that _none_ of those points have dangerous orders. The fault
> attack in
>
>     https://www.di.ens.fr/~fouque/pub/fdtc08.pdf
>
> worked only because it targeted old-fashioned curves that weren't
> twist-secure, such as NIST P-224. With proper choices (twist-secure
> curve, and scalar bits set accordingly) Montgomery-ladder ECDH doesn't
> need _any_ input validation.
>
> In particular, input 0 is handled correctly without any special tests,
> contrary to Rene's comments and earlier comments from Microsoft. See
> https://www.ietf.org/mail-archive/web/cfrg/current/msg05004.html.
>
> Here's another interesting data point regarding code simplicity. In
> flipping through Microsoft's "constant-time" ECCLib_v1.1, I noticed
>
>     temp = (scalar[0] & val1) - val2; // ki = (k mod 2^w)/2^(w-1)
>     *digits = (int)temp;
>     digits++;
>     res = scalar[0] - temp;           // k = (k-ki)/2^(w-1)
>     borrow = (temp > 0) & (scalar[0] < (unsigned int)temp);
>
> inside the fixed_window_recode() function. The scalar is, apparently,
> supplied by an attacker, and I would expect the compiler in most cases
> to turn the comparisons into variable-time branches, leaking scalar bits
> through timing---clearly violating the "constant-time" promise. With the
> Montgomery ladder, the scalar bits are used in a much simpler way,
> minimizing the number of opportunities to screw up the implementation.
>
> Andrey Jivsov writes:
>> A sample of the interest in unified format can be found here:
> You misunderstand. There are two different types of ECDH keygen code
> available for X25519:
>
>     * _simple_ key generators that use the Montgomery ladder;
>     * _fast_ key generators that use precomputed Edwards points and
>       convert the output to Montgomery format.
>
> Both of these approaches produce identical Montgomery x-coordinates on
> the wire. The fact that the fast implementations use Edwards internally
> doesn't mean that all implementations want to do this (for most people
> simplicity is more important than this extra burst of keygen speed) and
> certainly doesn't mean that it's a good idea to use anything other than
> Montgomery x on the wire for ECDH.

I like the _fast_ choice above. Most people will use libraries and I 
think it will make sense for a library to implement the _fast_ method 
for keygen. OpenSSL has already made this distinction for ECC: its P-256 
ECDSA is faster than its variable-base ECDH. When we get to using 
signatures with Curve25519, the speed of keygen benefits the speed of 
signatures on the server (e.g. a TLS server). Signers cannot reuse 
ephemeral keys and thus have the performance incentive to justify more 
complexity, following the OpenSSL model; likewise, signature 
verification needs point additions and can't use Montgomery. When a 
library has _fast_ keygen for signatures, it's logical to also use this 
code for ECDHE key generation. Having fast ECDHE key generation removes 
the necessity to reuse the ephemeral key pair, which promotes higher 
security and gives simplicity of no cache. Command-line applications, 
sandboxed code like JavaScript, etc typically cannot or should not cache 
ephemeral keys, and so _fast_ keygen is beneficial to e.g. 'openpgp-cmd 
--encrypt recipient -o file_out.pgp file_in.txt'.

In my proposal the v, the second coordinate, is only sent in the first 
flight (or a part of the public key in general). That's where we can 
expect that the Montgomery ladder will be less common. The second 
flight, the shared secret calculation, doesn't use v and thus there is 
no change to the Montgomery ladder.

>
>    [ send an extra coordinate v along with the Montgomery x ]
>> An implementation that uses a Montgomery ladder simply ignores v
> Well, yes, this is what any sensible ECDH receiver will do, so why waste
> the network traffic? Meanwhile you're forcing the ECDH key generator to
> compute v in the first place---a significant increase in the complexity
> of the simplest (Montgomery ladder) keygen software.

The main rationale is that this allows more implementation choices. The 
past research on ECC optimizations was counting on the ability to add 
points. If the recipient must decompress, this works as 10% toll on 
present and future improvements.


For what it's worth, I added the calculation of v at the end of 
Montgomery ladder this weekend to curve25519-donna tree. The diff is: 
https://github.com/brainhub/curve25519-donna/commit/abc601836b75ba6399c775842647e0f3b66061c4 
.

The diff includes point validation, which is presently not invoked (or 
needed). The v is not serialized, but otherwise the pair of coordinates 
(u,v) is properly computed. The sign of u is not calculated here (let's 
say this is a ECDH case). Most of this work is copy and modify of the 
existing code based on "High-speed high-security signatures" sec 5. I 
had to solve the chicken-and-egg problem of calculating u=x/z and v = 
SQRT(u^3 + 486662*u^2 + u) in one pass (and there may be a faster way to 
do this?).

The performance penalty to calculate v is 3.0% (e.g. 14050/13660 op/sec 
on an 3.30GHz Ivy bridge x86_64).

>
>> implement both signatures and ECDH
> Switching from a "unified" format to Montgomery x for ECDH usually makes
> this type of software _simpler_. The underlying issue is that there are
> actually three different scalar-multiplication operations of interest:
>
>     * double-scalar multiplication (signature verification);
>     * fixed-base-point single-scalar multiplication (signing/keygen); and
>     * variable-base-point single-scalar multiplication (ECDH).
>
> For example, in Microsoft's library these are ECC_DBLMUL_TE,
> ECC_MUL_FIXED_TE, and ECC_MUL_TE. Ripping all the ECDH-specific
> functions out of the library (ECC_MUL_TE, fixed_window_recode, etc.),
> and replacing them with a Montgomery ladder, would _simplify_ the code.
>
> Furthermore, even if you can find a signature+ECDH library where a
> "unified" format _reduces_ complexity rather than _increasing_ it, the
> percentage saved will obviously be quite small---whereas sticking to
> Montgomery x for ECDH saves a _much_ larger percentage of a pure ECDH
> implementation.

If a particular application must optimize code and data size, it should 
use the Montgomery ladder. There are many applications for which code 
size of the public key crypto is at the bottom of the priority list.

>
>> a single set of test vectors, code reuse, and the benefits of sharing
>> pre-computed values
> Every function needs its own test vectors in any case. Pre-computed
> values are shared in any case. As for code reuse, see above.
>
>> no (measurable) performance penalty
> The extra network traffic is a measurable performance penalty, and is
> more important than the CPU time saved for decompression inside
> signature verification. See below.
>
> Michael Hamburg writes:
>    [ compression ]
>> saves time and energy on radio links
> Yes, obviously.
>
>> but costs time and energy on wired ones
> No.
>
> Time: Curve25519 decompression takes roughly 5 microseconds on a typical
> server CPU core, but this is less than the time that the core needs to
> receive 256 bits from the Internet---unless you've bought a single
> quad-core server to handle more than 200Mbps of Internet traffic; this
> doesn't make economic sense.

Each 32 bytes arrive every 1/((200*10^6/8)/32) = 0.00000128 seconds 
(every 1.2 microsec on 200 Mbps). While we seem OK with the *whole* 
quad-code CPU, the CPU is a precious resource on a busy server (so in 
reality we are not OK in this scenario). Why pay 10% penalty?

If we look at https://tools.ietf.org/html/rfc5246#section-6.2.1, the 
World is wasting 2 bytes per each

2^14 bytes of traffic in the best case scenario. There is also some waste in 4K handshake and block cipher padding that could be eliminated.


>
> Energy: 10 USD of electricity will pay for about 10 watt-years of this
> 125-watt quad-core CPU, i.e., a month of CPU time, i.e., 2000000000000
> decompressions, saving 64 terabytes of Internet traffic---a month of
> continuous traffic at 200Mbps, as above. Some ISPs do sell noticeable
> fractions of 200Mbps for 10 USD/month, but that's because they know that
> most users won't use the network anywhere near continuously.
>
>> or when the point is cached
> No. You pay _once_ to decompress a point the first time you use it, and
> after that you cache the decompressed version. (I'm assuming that it's
> cheaper for you to store the extra bytes than to recompute them; if not,
> you should cache the compressed version.) Local caching doesn't affect
> the performance impact of compression on the wire.

This means that the crypto doesn't work as a blackbox anymore and 
requires dependency on caches.

Applying this to TLS, I came up with the following "decompressed cache" 
scenario. A Web browser is caching the server-cert issuing leaf CA 
public signing key, with the idea that this key will be used often to 
sign multiple Web server certs. This doesn't cover other cases like ECDH 
non-Montgomery, CRLs, and CA validation will cache the whole cert 
status. Out "decompression cache" doesn't avoid the first hit. It's 
interesting to see if such caching will be beneficial in practice. 
Bloated  caches may cause a slowdown.

Also please note that helping a receiver avoid 10% performance penalty 
lowers the latency.

Thank you.