Re: [Cfrg] Point format endian (was: Adoption of draft-ladd-spake2 as a RG document)

Watson Ladd <> Tue, 27 January 2015 06:00 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id C3C301B2B9D for <>; Mon, 26 Jan 2015 22:00:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.7
X-Spam-Status: No, score=0.7 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id w2w-cn_uiOgP for <>; Mon, 26 Jan 2015 22:00:53 -0800 (PST)
Received: from ( [IPv6:2607:f8b0:4002:c01::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 9DACB1A008C for <>; Mon, 26 Jan 2015 22:00:53 -0800 (PST)
Received: by with SMTP id 29so5337834yhl.2 for <>; Mon, 26 Jan 2015 22:00:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=bcvlFtKoNEYyQLNVF6VkdxBk8Jceu5ouGf4LLi2wSK8=; b=APARXJs537aTXNk01sBrXVQsqaZes25Mc06drrcVd7/2JqOox1XDAAZvs4qQqgYqGt QJJham8HhPnj0tFB0NHo9ydjsBJ7TPi7T6yOu/LRB5HjNelD4ZeSFkdxo/buSCVv9mee o/OzdC6A/mtQg/fWHFUzUbECd2jJ0mSzotzpAjUcZBsTTrgAesQojqLVjrccKdfuLq5F BmxJXdJsnGeQwKgYzgAowvBupSJ9SmV/S2ICrq7IkSKy9c5b5A+d0b10dXGUpLYJx5pv 7ghnPJeTIWpq71WMbcwYnj+PIZ+p+fNY86ikO/LlHqmywbwJh9Hw8p6qraaxNZvpVFRW DyAw==
MIME-Version: 1.0
X-Received: by with SMTP id v8mr10383883yhc.44.1422338452841; Mon, 26 Jan 2015 22:00:52 -0800 (PST)
Received: by with HTTP; Mon, 26 Jan 2015 22:00:52 -0800 (PST)
In-Reply-To: <>
References: <> <>
Date: Mon, 26 Jan 2015 22:00:52 -0800
Message-ID: <>
From: Watson Ladd <>
To: Michael Clark <>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <>
Cc: "" <>, Peter Gutmann <>
Subject: Re: [Cfrg] Point format endian (was: Adoption of draft-ladd-spake2 as a RG document)
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 27 Jan 2015 06:00:57 -0000

On Mon, Jan 26, 2015 at 4:57 AM, Michael Clark <> wrote:
> On 26/1/15 6:20 pm, Peter Gutmann wrote:
>> Mike Hamburg <> writes:
>>> But will the endian complicate the implementation of the new curves?
>> Given that every single crypto bignum representation that I'm aware of (X.509,
>> S/MIME, PGP, SSH, TLS, OCSP, CMP, TSP, and many more) are all big-endian,
>> there are probably lots of crypto libraries that simply don't have little-
>> endian import/export routines.  Looking at BN_bn2bin()/BN_bin2bn() from one
>> widely-used library that powers half the Internet (and mobile phone market),
>> it only supports the big-endian format.  So that's automatic non-support for
>> the world's most widely-used crypto library outside of MS CryptoAPI (which
>> doesn't expose its bignum library).
>> I can't believe we're even having this endless discussion.  The universal
>> standard for external representations of bignums in IETF protocols is big-
>> endian.  Even the PGP and S/MIME, and SSH and TLS, folks have managed to agree
>> on that one.
> Yes. Legacy. "network byte order" is big-endian.
> It's tradition versus efficiency. big-endian is tradition. little-endian
> is how most CPUs work these days. The reality is Sparc9, ARMv3, and
> Power8 are now all bi-endian and are adopting little endian ABIs. Power8
> being the most recent to change (ppc64le).

No it's not because of efficiency. Nor is C code necessarily endian
dependent. See
for details, and explanations of why code that depends on endianness
is usually wrong. Rather, it has to do with the format of existing
implementations: everyone who grabs the NaCl implementation is using
an little-endian encoding. Changing this would mean that there are no
existing implementations. It would cause significant confusion between
old and new versions.

> Given the algo is new, one could /consider/ a BN_bn2binle and
> BN_binle2bn (or an endian argument variant). It's a small change.

Why do people feel the need to write their own buggy implementations
of Curve25519, when there are several public domain ones for the
taking, especially when this means giving up the optimizations related
to reduced radix format?

> I see the current code is working byte by byte due to endianness. It
> takes 2nd order little endian (with host endian limbs due to the shifts)
> and converts to big endian wire format. It reads from the distant big
> end (little-endian) and writes out big endian. It bswaps.
>         n=i=BN_num_bytes(a);
>         while (i--)
>                 {
>                 l=a->d[i/BN_BYTES];
>                 *(to++)=(unsigned char)(l>>(8*(i%BN_BYTES)))&0xff;
>                 }
>         return(n);
> This code could do with some optimization, bswap(32|64) intrinsic (uses
> BSWAP machine instruction on x86_64, ARM, etc), working with a machine
> word size loop with a small tail loop to handle the modulus of the
> difference in length of the bignum from the machine word size. It is
> working byte by byte. It may not even register on a profile (we can use
> gperftools to check). This code is *absolutely fine*. That said we could
> make BN_bn2bin faster, use BSWAP intrinsic on x86_64, sparc, arm, ppc
> and use 64-bit loads and stores (including test cases and corner cases).
> Let me know if anyone wants to sponsor?

And you have benchmarks showing that this code is the bottleneck, and
not the bignum multiply inner loop?

> Also BN_num_bits_word could use __builtin_clz or __builtin_clzll
> (_BitscanForward, _BitscanReverse on MSC). No cache surface for a table
> lookup. Modern compilers have these intrinsics. On Pentium III there is
> BSR, and in SSE4 there is LZCNT instruction, and there is even a cache
> friendly alternative that uses Henry S. Warren's Hacker's Delight tricks:
> Let me know if you can find me a sponsor. I'm currently (as a passtime)
> doing cycle counts on intrinsic and C fallbacks for these kinds of
> routines.. (to fill in for compilers without them). Testing on
> MSC/GCC/Clang on Darwin/Android/Windows/Linux.
> With little endian wire format and use of 32-bit or 64-bits limbs on all
> modern systems (sparc9, power8, x86, arm) BN_bn2binle routine would
> essentially be memcpy. That's the sad part.
> So with big-endian wire-formats, practically all machines are going to
> bswap as the internal rep will be little endian. That cost is probably
> low compared to the cost of the computation, so there is a reasonable
> rationale to stick to big-endian.
>   BSWAP is one 1 cycle per 32-bit word on i386
>   BSWAP is one 1 cycle per 64-bit word on x86_64
>   BSWAP has been around since i486
> Someone starting today would indicate endianness statically or
> dyanamically in their number format and use the hot potato principle.
> Get it off your hands at the lowest cost to yourself, which nowadays is
> little-endian. Remember GHASH is Bit Reflected (base 2 little endian)
> and it made it into a spec.

 Re GHASH. It doesn't matter in hardware either way, as reflection is
just relabeling, but in software you have to do some weird twists to
make it work out. Even Intel's carryless multiplies do things the
natural way. By contrast the Curve25519 format we are deciding is
between s[0..31] is s[0]+256*s[1]+..256^(31)*s[31], or
256^(31)*s[0]+256^(30)*s[1]+..s[31]: all systems agree on the value of
each byte, and the question is which bignum it represents. It's easy
to turn this into whatever internal representation you need, unlike
swapping bits in a byte.

Let me see if I get your second line correct. Instead of picking one
format, and encoding to and from it with a trivial amount of portable
C, we should use architecture specific micro-optimizations, and force
the implementation and testing of two distinct output and input
methods, for a nearly unmeasurable speed gain. Of course, there are
many architectures where you cannot serialize in the way you suggest:
unaligned loads and stores are not supported on many MIPS processors,
and there is no guarantee that alignment of bignum fields will be

> Peter makes a pretty good case to stick with big-endian, from a
> consistency point of view, but not from an efficiency point of view.

He argued that this would mean automatic OpenSSL nonsupport. Given
that Rich Salz is on the development team of OpenSSL, I consider this
unlikely. Furthermore, you can take curve25519-donna and use it today:
it doesn't entail modifications (which are themselves trivial) to the
bignum library.

I am amazed that we are repeating discussions that were had in TLS. Or
maybe I shouldn't be. At what point can we declare that Curve25519 can
be adopted by IETF protocols? It would be nice if we could report
something by Austin, and maybe have a plan to finish up higher
security levels and signatures by the summer.


Watson Ladd
> _______________________________________________
> Cfrg mailing list

"Those who would give up Essential Liberty to purchase a little
Temporary Safety deserve neither  Liberty nor Safety."
-- Benjamin Franklin