Re: [Cfrg] Curve selection revisited

Which remark?

Schnorr signatures are well-known to scale down to 3/4 the size, because they do not rely on collision resistance of the hash function.  See for example, http://www.neven.org/papers/schnorr.pdf.

That you need a hardware accelerator for v2v: well, partly company-confidential reasons.  But back of the envelope, an Edwards Schnorr sig verify is likely to cost around 3000 field multiplies at the 256-bit level, give or take.  Maybe more if you’re memory-constrained.  Weierstrass is slower.  ECDSA is slower.  Each 256-bit field multiply, implemented naively, costs 1024 MACs on an 8-bit machine.  That’s 3 million MACs, and each one is going to cost multiple cycles because of load / store.  Call it 12 million cycles minimum, and you need a few milliseconds latency / inverse throughput tops.  Even a 16-bit micro isn’t enough, since you’re not running at a GHz.

That binary curves are smaller and faster in hardware: see any paper or spec sheet on them.  For example, http://www.ipcores.com/elliptic_curve_crypto_ip_core.htm

About inversion mod r, I’ve implemented it.  A Schnorr verification routine does not need any math mod r, except that it may want to compute or check reduction mod r for speed / malleability reasons.  Depending on the exact form of Schnorr, it may not even need to reserialize points (just to check equality), so it may not need an inversion routine.  If it uses point compression, it will need a square root routine mod p, which can be overloaded into inversion mod p for almost no cost.  If the implementation uses inversion mod p, exponentiation is usually fast enough to be better than extra code for EGCD.

An ECDSA verify needs to invert and multiply mod r.  The inversion can be EGCD, but you would not have needed EGCD otherwise; or it can be a powering ladder.  The powering ladder would take 6% of the time if multiplication mod r is as fast as mod p.  But r isn’t a special form prime, so actually it will take at least twice as long, at least 12%.  Also, the powering ladder will be a different subroutine from the one mod p for the inverse/square root, and r isn’t sparse/special so it’s a longer routine.

In hardware, you have to choose between implementing reasonably fast multiplication mod r and a powering ladder, or implementing EGCD.  Both of these add significant complexity, which is the hardware architect’s enemy.  If you’re using a binary curve, you could have verified with no integer math at all, but now you can’t.

In at least two implementations of “ECDSA”, I've had the signer tweak the signature by inverting s, since there isn’t any reason that the verifier should do it and it doesn’t even cost the signer anything.  Obviously this doesn’t change the security, but it breaks standard point formats.

Likewise Schnorr signing does not need an inversion routine mod r, just multiply and add/sub mod r.  The multiply is used only once, so it can be very slow.  But in ECDSA, an inversion is required.  It needs DPA protection, which will make it slower.  Signatures are in principle faster than verifies, taking perhaps under 1000 multiplies, so the powering ladder starts at about 20% if multiply mod r and mod p are the same, but actually it creeps up toward 35% because it’s not a special-form prime, and perhaps even more because of DPA protection.  Again EGCD is an option, and it’s not required otherwise, and now it has to be blinded for DPA protection as well.

All of this would be fine if ECDSA added security vs Schnorr.  In fact it adds nothing but complexity.

Cheers,
— Mike

On Jul 28, 2014, at 11:12 AM, Hannes Tschofenig <hannes.tschofenig@gmx.net> wrote:

> Mike, do you have data that supports your remark?
> 
> Ciao
> Hannes
> 
> 
> On 07/28/2014 07:57 PM, Michael Hamburg wrote:
>> Actually, one more thing: in constrained environments, you really want Schnorr instead of ECDSA.  That extra inversion mod r is a giant pain.  It’s slow and often doesn’t share code with anything.  Plus Schnorr can go down to 3/4 the size if necessary.
>> 
>> Cheers,
>> — Mike
>> 
>> On Jul 28, 2014, at 10:05 AM, Michael Hamburg <mike@shiftleft.org> wrote:
>> 
>>> You’re going to need a hardware accelerator for some of these small things.  Eg for v2v safety, the real ask has to be “fast and small and easy in hardware”.  The best solution to this is binary curves, though they are considered slightly edgy right now.
>>> 
>>> But yeah, 8-bit processor performance is important too.
>>> 
>>> Cheers,
>>> — Mike
>>> 
>>> On Jul 28, 2014, at 7:58 AM, Robert Moskowitz <rgm-sec@htt-consult.com> wrote:
>>> 
>>>> The whole world is NOT TLS.  For example I am working in IEEE 1609 which is Vehicle-to-vehicle safety communications and their need is for signed data objects that are quickly generated and verified (you have ~100ms from the braking event to all the cars being able to act on that information).  I also work in IEEE 802.15.4 highly constrained devices (some much smaller than what many IETF groups accept as their lower limit).
>>>> 
>>>> 8 bit processors are the norm, and NO, Dr. Moore is NOT going to change that.  Lower energy demand is much higher on the design plan than faster processing.  Also whatever is done will be running on fielded devices for the next 10 - 20 years.  Hardware acceleration is acceptable, but that could make in-field corrections impossible. Over-the-air firmware updates may also be impossible (802.15.4k is 40kb/s) or only done in the dealership (yes!  another recall!).
>>>> 
>>>> So getting on to my thoughts of curve selection requirements for the world of small:
>>>> 
>>>> 1) Low over-the-air demands.  Every BIT counts.  We have to live with 32 bytes; but 64 bytes may be too much.  Oh, IEEE 1609.2 has its own 'certificate' and signature format as even the standard asn.1 is too wasteful of the very limited channel.  This applies also in 802.15.4 where adding another MAC frame might mean that the whole thing fails because now it is more likely that one frame got lost.
>>>> 
>>>> 2) Easy/size of implementation.  There has been lots of chatter here about how NIST p256 can be implemented wrongly.  Here it is of greater concern, as it might require a physical swap-out to correct an implementation error.  Manufacturers look at the cost of a programmer to save 5K code cheaper than that extra 5K footprint.  So little code and easy to code.  This world is extremely price sensitive.
>>>> 
>>>> 3) Performance on an 8 bit  processor.  Performance is measured in time and energy.  In-vehicle devices tend to have a ready source of electrons (except for tire gauges!), but some devices have limited battery (your front door lock) or NO battery (MEMEs energy harvesters on bridge stress sensors).  So low energy is as important as quickly done.  Here 10% faster IS 10% faster; and it could make a life-or-death difference. (In Positive Train Control, the automatics MUST take over if the human can no longer respond in time and bullet trains can take 7Km to stop safely)
>>>> 
>>>> 4) preserving ECDSA and ECDHE.  This is what is currently in use, and in some cases written into regulations (urgh), so we have to still keep them working.  But if something 'new' and provably good makes a big difference in one of the previous needs I suspect there will be an attentive audience.
>>>> 
>>>> I have tried to keep things general so as not too much to limit choices.  I never claimed to be a crytographer or to have the math skills to evaluate the various contenders.  Others working in this area might have their own reading on the needs of constrained devices/communications.
>>>> 
>>>> But please just don't solve this for TLS on octo-core 64-bit systems.
>>>> 
>>>> 
>>>> On 07/25/2014 01:10 AM, Benjamin Black wrote:
>>>>> Inspired by some comments from other and in hopes of returning to a more productive debate rather than the pointed back and forth, of which I am absolutely guilty, I'm suggesting a few constraints. These are not priority ordered and none of them deal with the math, instead focusing on process and practice.
>>>>> 
>>>>> 1) As we are discussing this in the context of TLS, the performance of various kx options must be evaluated for ECDHE as typically deployed. ECDH is neither common nor recommended in the TLS BCP draft. In addition, I hope there will be discussion of benchmarking methodology, with the understanding the ultimate choice is for the CFRG chairs.
>>>>> 
>>>>> 2) Performance must be measured using "production-quality" implementations. By this I mean implementations which employ the sort of techniques/optimizations appropriate for large scale deployment. This is specifically intended to exclude discussion of how simple or fast an implementation _could_ be, in favor of what they actually are in practice. However, the goal is to select curves which strike the best balance between various requirements, not simply the fastest.
>>>>> 
>>>>> 3) The timeframe for a decision constrains the scope of this process. If a decision is desired within a few months, then it is difficult to include options beyond new curves. If a decision can take a year, new algorithms could be considered. Given the importance, impact, and contentiousness of new algorithms, I believe it would be difficult to complete a thorough, careful algorithm and curve selection process in a short amount of time.
>>>>> 
>>>>> 4) The precise set of security levels needed should be specified in advance (whether by TLS or CFRG) and curve recommendations delivered together for all levels. A stated goal is to use these curves in new drafts, and it minimizes work to specify all security levels for a proposal in the same draft. This is strictly a practical matter recognizing the chairs and area directors have limited bandwidth and implementors can get everything done in a single go.
>>>>> 
>>>>> 5) Selections must efficiently support TLS 1.2. Adoption of new curves, and potentially new algorithms, can't be gated on completion and widespread deployment of TLS 1.3.
>>>>> 
>>>>> 6) Drop-in replacement for the NIST curves is _not_ a requirement in this process.
>>>>> 
>>>>> 7) I don't think the chairs are signing up for a full-on, NIST-style selection competition and we should aim to keep things are lightweight as possible while still achieving the required level of rigor. I hope they will address this point soon.
>>>> 
>>>> _______________________________________________
>>>> Cfrg mailing list
>>>> Cfrg@irtf.org
>>>> http://www.irtf.org/mailman/listinfo/cfrg
>>> 
>>> _______________________________________________
>>> Cfrg mailing list
>>> Cfrg@irtf.org
>>> http://www.irtf.org/mailman/listinfo/cfrg
>> 
>> _______________________________________________
>> Cfrg mailing list
>> Cfrg@irtf.org
>> http://www.irtf.org/mailman/listinfo/cfrg
>> 
>