Re: [Cfrg] Hardware requirements for elliptic curves

Torsten Schuetze <torsten.schuetze@gmx.net> Thu, 11 September 2014 08:00 UTC

Return-Path: <torsten.schuetze@gmx.net>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 67E371A0667 for <cfrg@ietfa.amsl.com>; Thu, 11 Sep 2014 01:00:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.852
X-Spam-Level:
X-Spam-Status: No, score=-0.852 tagged_above=-999 required=5 tests=[BAYES_50=0.8, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-1.652, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I-DTpKGGdTtt for <cfrg@ietfa.amsl.com>; Thu, 11 Sep 2014 01:00:45 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A26C71A064B for <cfrg@irtf.org>; Thu, 11 Sep 2014 01:00:44 -0700 (PDT)
Received: from [192.168.0.12] ([109.192.91.63]) by mail.gmx.com (mrgmx002) with ESMTPSA (Nemesis) id 0Lvl0u-1YQZe62FO0-017UO6; Thu, 11 Sep 2014 10:00:14 +0200
Message-ID: <5411569F.6080102@gmx.net>
Date: Thu, 11 Sep 2014 10:00:31 +0200
From: Torsten Schuetze <torsten.schuetze@gmx.net>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1
MIME-Version: 1.0
To: cfrg@irtf.org
In-Reply-To: <12A4E7B4-8303-449F-A04B-8366BBC5B1E3@shiftleft.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:3AycQYS0IOck31j2IiplGutQxrLw61yJcpchChJXSNWznbeNB+i 5WsTd6ZJ0tt+qoPblFtGFRQaK72NoBeriYwLhxSUzSu4mIx27FGjxQofGtrvnoxNDMZ4Kr6 A0mjcJtm9peQZ1lF5Fe7yOCZXKRlqtxqJVuzMEeJ7TBQiyET7qT0pJLuCEN+aaJlzpztKiU OLdw4wC36AVGgFISdltIg==
X-UI-Out-Filterresults: notjunk:1;
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/awgbdV_SOXGRH9C2IYlWhk4-thM
Subject: Re: [Cfrg] Hardware requirements for elliptic curves
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Sep 2014 08:00:47 -0000

On Sep 7, 2014, at 22:25:57 -0700 Paul Lambert <paul at marvell.com>
wrote:

> On 9/2/14, 9:31 AM, "Michael Hamburg" <mike at shiftleft.org> wrote:
>
>> I agree with Alyssa that hardware performance isn't our concern here.
>>
>> However, I'm curious about Joppe Bos' assertion that random primes are
>> just as fast and more secure in hardware.  What kind of hardware are you
>> thinking about?  Some kind of residue number system?  Or a bit-at-a-time
>> reduction system?
>
> I work at a chip company checking on design considerations:
> - random primes and special run at the same speed
> - flexibility and die size are more important than prime specific
> optimization
>
> Paul

I fully agree with Paul's statement on flexibility and random primes.
I work at Rohde & Schwarz SIT in the high-security/high-assurance
crypto business. We use ASICs, FPGAs and smart cards and even some
software crypto implementations in our products. Our requirements on
ECC are (in this order) 1. flexibility, 2. security, and, only then 3.
performance.

First, I would like to add, that in my opinion the discussion in this
thread should not be about software versus hardware rather than

       software in a secure environment, e.g., on a secured server,
       i.e., where only the global timing as side-channel is relevant

       versus

       software on constrained devices or hardware in a hostile
       environment, i.e., where the full set of local and global
       side-channels applies.

Most of the discussion until Joppe Bos' thread was on the first group
only. I represent the second group, for short, hardware.

Some proponents of highly optimized curves with special primes seem
only be focused on the software scenario with supposedly fast ability
for bug fixes (and short product cycles?). But TLS meanwhile is not
only used in the typical web server scenario. In Germany, it is
standardized for smart metering communication (with much longer
product cycles than typical Internet business and, later,
thousands/millions of devices rolled out in the field). ECC will be
used in Car2X and in digital tachographs (where even the legitimate
driver is a potential attacker).

Let's now come to the flexibility issue: In our company we must be
able to work with hitherto unknown domain parameters, i.e., arbitrary
random primes. The only thing we can assume is that the curves are
Brainpool like. Beyond that, the curves are part of the crypto setup.
This is a hard requirement of the government agency, Federal Office of
Information Security in our case, and is required by some (not all)
other nations as part of the crypto customization.

I suppose that crypto customization may sound strange to some people
on the list who think that we can live with one and only one curve.
But that's the reality in high-assurance crypto.

Besides that we have to use different curves (all in short Weierstrass
form, not different curve types as Montgomery etc.) for signature and
ECDH. Additional measures are taken, for example, for ECDH such that
an attacker cannot easily learn the domain parameters from the data on
the wire (point compression not for the sake of saving bandwidth,
additional symmetric schemes).

As a company representative I could relax and say that we are not
affected by standardization of Edwards-, Montgomery- or even Brainpool
curves as we have to use Brainpool-like curves. But my concerns go a
step further: I'm afraid that we as a crypto community are heading
for the wrong direction. We need flexibility, i.e., no curves with a
very special structure, especially no curves with a specific prime
structure as NIST-primes or other proposed primes.

The last statement was common consensus at the recent Brainpool
meeting. Other discussed features were not that unanimously. Most of
the participants could live with twist-secure curves, less favorable
would be the neglection of the property b non-square mod p (because of
distinguished point attacks), even less favorable the cofactor
question. But it seems to me that all these specific questions can be
solved, but the random primes issue is a remaining one. All three
participating hardware manufactures (two smart card manufacturers, NXP
and Infineon, and one HSM manufacturer) stated that they currently use
(and will use) random primes in their coprocessors.

Finally, let's come to the question of Mike Hamburg. I can only speak
about the Infineon coprocessor as I was involved in that. NXP and
other smart card manufacturers use completely different designs.

The current Infineon coprocessor is a unified coprocessor for GF(p)
and GF(2^n) based on a serial-parallel multiplier which can work with
up to 2303 bit (smaller bit length are possible and favorable for ECC).
Essentially all ECC (and RSA) operations boil down to a number of
modular multiplications Z=M*C mod N. This is the essential operation
of the coprocessor.

The algorithms of all other operations, e. g., operations in
GF(2)[x]/<N[x]> aka GF(2^n), can be derived from this basic algorithm.

The algorithm for modular multiplication basically consists of a
modified Booth algorithm with variable shift length for multiplication
coarsely interleaved with a modular reduction. For modular reduction,
the ZDN-principle developed by Sedlak is used. This reduction is based
on the repeated comparison of intermediate results Z with the value Zwei
Drittel N, i.e. 2/3*N. At the end of each step we have

2/3 N < |Z| <= 4/3 N

The multiplication part results in

Z=Z+a*2^{s_C}*C

with "sign" a = -1,0,+1 and shift 0<= s_c <= const_1.
The reduction part results in

Z=Z+b*2^{s_N}*N

with "sign" b=-1,0,+1 and shift -const_2 <= s_N <= const_2. These two
operational parts are combined into a Three Operand Addition

(#) Z=Z+a*2^{s_C}*C + b*2^{s_N}*N

The difficult algorithmic part is the data-dependent determination of
optimal "signs" a,b and shift values s_C, s_N. It should be noted that
the described scheme is that of the old ACE (i.e. before 2002), the
new Crypto@2304T uses further, rather technical refinements.

The 3-Operand-Addition is done by transformation into Carry-Save
Addition form -> (C,S) (fully parallelized) and further Carry Look
Ahead Adders.

The nice fact about this coprocessor is that in every step
approximately E(s_z,s_C) \approx 2.7 ... 2.8 bits of the multiplier
are processed and that Booth and ZDN are in a certain sense
optimal/dual.

One can easily see from (#) that a special structure of the modulus N
is of no use here. This can be verified by simulations and
measurements (take an arbitrary IFX smart card with public-key
coprocessor).

To sum up, especially for hardware vendors special primes are of no
use and can even be dangerous, see the separate mail by Wieland
Fischer on blinding issues.

Regards,

Torsten
-- 
ROHDE & SCHWARZ SIT GmbH
Hemminger Str. 41
D-70499 Stuttgart/Weilimdorf
Tel ++ 49 711 69945 183
Fax ++ 49 711 69945 170
http://www.sit.rohde-schwarz.com
mailto:torsten.schuetze@rohde-schwarz.com