Re: [Cfrg] Requirements for curve candidate evaluation update

"D. J. Bernstein" <djb@cr.yp.to> Thu, 14 August 2014 20:15 UTC

Return-Path: <djb-dsn2-1406711340.7506@cr.yp.to>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 31FD21A0683 for <cfrg@ietfa.amsl.com>; Thu, 14 Aug 2014 13:15:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.1
X-Spam-Level: *
X-Spam-Status: No, score=1.1 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, J_CHICKENPOX_210=0.6, J_CHICKENPOX_29=0.6, J_CHICKENPOX_31=0.6, RCVD_IN_DNSWL_LOW=-0.7, UNPARSEABLE_RELAY=0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zl1nWBxVfnZP for <cfrg@ietfa.amsl.com>; Thu, 14 Aug 2014 13:15:47 -0700 (PDT)
Received: from mace.cs.uic.edu (mace.cs.uic.edu [131.193.32.224]) by ietfa.amsl.com (Postfix) with SMTP id 3A82C1A056D for <cfrg@irtf.org>; Thu, 14 Aug 2014 13:15:47 -0700 (PDT)
Received: (qmail 22843 invoked by uid 1011); 14 Aug 2014 20:15:45 -0000
Received: from unknown (unknown) by unknown with QMTP; 14 Aug 2014 20:15:45 -0000
Received: (qmail 8909 invoked by uid 1001); 14 Aug 2014 19:38:59 -0000
Date: Thu, 14 Aug 2014 19:38:59 -0000
Message-ID: <20140814193859.8907.qmail@cr.yp.to>
From: "D. J. Bernstein" <djb@cr.yp.to>
To: cfrg@irtf.org
Mail-Followup-To: cfrg@irtf.org
In-Reply-To: <CA+Vbu7zfbx-OqU=ggXgutDb+GNwvS3QpkTwzU1c+2Lcv=3Gawg@mail.gmail.com>
Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/t6fOganOca6UY7UWRqQitL-ZVmw
Subject: Re: [Cfrg] Requirements for curve candidate evaluation update
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Aug 2014 20:15:49 -0000

The current topic is the simplicity of secure implementations: in
particular, how this is affected by choosing Montgomery coordinates
rather than Edwards coordinates for DH, assuming that one has already
chosen Edwards coordinates for signatures (in violation of standards
such as ECDSA, contrary to MSR's claims, but that's a separate issue).
Everyone seems to agree that simplicity of secure implementations is
desirable, so I presume that it will end up on the list of desiderata.

Benjamin Black writes:
> Using the same scalar multiplication code for kx and signatures is the
> norm. 

Really? Then how do you explain MSR ECCLib? I peeked briefly at the code
and found a rather long function

   ECCRYPTO_STATUS ECC_DBLMUL_INTERNAL_TE(POINT_PRECOMP_EXTAFF_TE
   *P_table, dig *k, POINT_TE Q, dig *l, POINT_TE R, unsigned int w_P,
   PCurveStruct TedCurve)
   { // Double-base scalar multiplication R = k.P + l.Q using wNAF with Interleaving

which doesn't seem to be used in your DH performance numbers, along with
another rather long function

   ECCRYPTO_STATUS ECC_MUL_TE(POINT_TE P, dig *k, POINT_TE Q, PCurveStruct TedCurve)
   { // Variable-base scalar multiplication Q = k.P using fixed-window method 

which doesn't seem to be used in your signature (mislabeled "ECDSA")
performance numbers. I don't immediately find functions for signing and
verification (is this actually a complete library?), and I suppose that
with some extra work you could implement signature verification using
ECC_MUL_TE rather than ECC_DBLMUL_INTERNAL_TE, but this doesn't seem
consistent with the performance claims. Am I misunderstanding something?

> As illustrated in the NaCl and TweetNaCl libraries having multiple
> required models makes implementation more challenging

Have you actually looked at the code? It doesn't support your claims.

Let's start with TweetNaCl, which prioritizes simplicity (but is still
fast enough for many applications). Here's the complete high-security
constant-time TweetNaCl code for DH shared-secret computation using the
Montgomery x-coordinate for Curve25519---excluding the subroutines that
are already used in TweetNaCl's implementation of Ed25519 signatures:

  int crypto_scalarmult(u8*q,const u8*n,const u8*p){u8 z[32];i64 x[80],r,i;
  gf a,b,c,d,e,f;FOR(i,31)z[i]=n[i];z[31]=(n[31]&127)|64;z[0]&=248;
  unpack25519(x,p);FOR(i,16){b[i]=x[i];d[i]=a[i]=c[i]=0;}a[0]=d[0]=1;
  for(i=254;i>=0;--i){r=(z[i>>3]>>(i&7))&1;sel25519(a,b,r);sel25519(c,d,r);
  A(e,a,c);Z(a,a,c);A(c,b,d);Z(b,b,d);S(d,e);S(f,a);M(a,c,a);M(c,b,e);
  A(e,a,c);Z(a,a,c);S(b,a);Z(c,d,f);M(a,c,_121665);A(a,a,d);M(c,c,a);
  M(a,d,f);M(d,b,x);S(b,e);sel25519(a,b,r);sel25519(c,d,r);}
  FOR(i,16){x[i+16]=a[i];x[i+32]=c[i];x[i+48]=b[i];x[i+64]=d[i];}
  inv25519(x+32,x+32);M(x+16,x+16,x+32);pack25519(q,x+16);return 0;}

For DH keygen you also have to include the following lines:

  static const u8 _0[16], _9[32] = {9};
  int crypto_scalarmult_base(u8 *q,const u8 *n)
  { return crypto_scalarmult(q,n,_9); }

That's it.

The total complexity of the combined DH+signature implementation in
TweetNaCl is much higher: it includes the top-level signature code and
verification code, a hash function, and (shared between signatures and
DH) the underlying code for arithmetic mod 2^255-19, serialization, etc.
An implementation focused on DH---exactly what many applications need---
can skip many lines of signature-specific code, although obviously it
still needs the arithmetic mod 2^255-19. An implementation focused on
signatures---exactly what many other applications need---would only be
able to skip the above lines.

The arithmetic mod 2^255-19 is also the part that requires much more
work for a high-speed implementation. There's no reason to deviate from
the simple high-level Montgomery structure, but top performance requires
careful attention to the lower-level details of arithmetic.

The arithmetic mod 2^255-19 is _also_ the part that requires by far the
most work for auditors. Our auditing standards for NaCl are far beyond
normal industry practice, and the bug you mentioned was in code that was
never accepted into NaCl---a bug not in something high-level such as the
Montgomery ladder, but rather in the lower-level details of highly
optimized arithmetic. Note that the subsequent paper

   http://cryptojedi.org/papers/#verify25519

(ACM CCS 2014) formally verifies the Montgomery-ladder steps in two
highly optimized assembly-language implementations of Curve25519---
including all the details of arithmetic mod 2^255-19.

You correctly observe that the signature portion of TweetNaCl is simpler
than the DH+signature portion: it saves the lines of code shown above. 
But what you're claiming is something completely different (and
incorrect), namely that Montgomery+Edwards on the wire for DH+signatures
"makes implementation more challenging" compared to Edwards+Edwards.

In fact, switching from Montgomery coordinates to Edwards coordinates
for DH would make the simplest DH-specific implementations noticeably
more complex, and certainly wouldn't allow a DH+signature implementation
to be less "challenging." For higher-speed implementations, whether pure
DH or DH+signature, much more code is required for fast arithmetic mod
the prime, further reducing any possible value in these speculative
high-level DH+signature simplifications.

---Dan