Re: [Cfrg] Point format endian (was: Adoption of draft-ladd-spake2 as a RG document)

Michael Clark <> Mon, 26 January 2015 12:58 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 88E161A8A81 for <>; Mon, 26 Jan 2015 04:58:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.441
X-Spam-Status: No, score=0.441 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, IP_NOT_FRIENDLY=0.334, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_DKIM_INVALID=0.01] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id VJNSY2rXKp3c for <>; Mon, 26 Jan 2015 04:58:22 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id CE92D1A8A78 for <>; Mon, 26 Jan 2015 04:58:22 -0800 (PST)
Received: from monty.local ( [] (may be forged)) (authenticated bits=0) by (8.14.4/8.14.4/Debian-4) with ESMTP id t0QDK7cA031921 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Mon, 26 Jan 2015 13:20:10 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple;; s=klaatu; t=1422278412; bh=Z2TzjBvzqYZsq3AubSlfm9+ULXHcYdAuQQMPCBGyoTs=; h=Date:From:To:Subject:References:In-Reply-To:From; b=m+eB07m7Io5G6gbJLMiurGb1GWE5aptm5lWsbhhC4HW5fTVfakzk38tqUpuzjUoul lXQG0Sx8uLXeNMnb4X83cqE/dUF3U/s6at33nf1y07uWWnyPU9X4Mi4Y6lX/rot5kL 3bzR0Yw6NuKY912PPDpY9L4ggrWqhKAQg6EFRyWc=
Message-ID: <>
Date: Mon, 26 Jan 2015 20:57:57 +0800
From: Michael Clark <>
Organization: Metaparadigm
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Peter Gutmann <>, "''" <>
References: <>
In-Reply-To: <>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: clamav-milter 0.98.4 at
X-Virus-Status: Clean
Archived-At: <>
Subject: Re: [Cfrg] Point format endian (was: Adoption of draft-ladd-spake2 as a RG document)
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 26 Jan 2015 12:58:24 -0000

On 26/1/15 6:20 pm, Peter Gutmann wrote:
> Mike Hamburg <> writes:
>> But will the endian complicate the implementation of the new curves?
> Given that every single crypto bignum representation that I'm aware of (X.509,
> S/MIME, PGP, SSH, TLS, OCSP, CMP, TSP, and many more) are all big-endian,
> there are probably lots of crypto libraries that simply don't have little-
> endian import/export routines.  Looking at BN_bn2bin()/BN_bin2bn() from one
> widely-used library that powers half the Internet (and mobile phone market),
> it only supports the big-endian format.  So that's automatic non-support for
> the world's most widely-used crypto library outside of MS CryptoAPI (which
> doesn't expose its bignum library).
> I can't believe we're even having this endless discussion.  The universal
> standard for external representations of bignums in IETF protocols is big-
> endian.  Even the PGP and S/MIME, and SSH and TLS, folks have managed to agree
> on that one.

Yes. Legacy. "network byte order" is big-endian.

It's tradition versus efficiency. big-endian is tradition. little-endian
is how most CPUs work these days. The reality is Sparc9, ARMv3, and
Power8 are now all bi-endian and are adopting little endian ABIs. Power8
being the most recent to change (ppc64le).

Given the algo is new, one could /consider/ a BN_bn2binle and
BN_binle2bn (or an endian argument variant). It's a small change.

I see the current code is working byte by byte due to endianness. It
takes 2nd order little endian (with host endian limbs due to the shifts)
and converts to big endian wire format. It reads from the distant big
end (little-endian) and writes out big endian. It bswaps.

        while (i--)
                *(to++)=(unsigned char)(l>>(8*(i%BN_BYTES)))&0xff;

This code could do with some optimization, bswap(32|64) intrinsic (uses
BSWAP machine instruction on x86_64, ARM, etc), working with a machine
word size loop with a small tail loop to handle the modulus of the
difference in length of the bignum from the machine word size. It is
working byte by byte. It may not even register on a profile (we can use
gperftools to check). This code is *absolutely fine*. That said we could
make BN_bn2bin faster, use BSWAP intrinsic on x86_64, sparc, arm, ppc
and use 64-bit loads and stores (including test cases and corner cases).
Let me know if anyone wants to sponsor?

Also BN_num_bits_word could use __builtin_clz or __builtin_clzll
(_BitscanForward, _BitscanReverse on MSC). No cache surface for a table
lookup. Modern compilers have these intrinsics. On Pentium III there is
BSR, and in SSE4 there is LZCNT instruction, and there is even a cache
friendly alternative that uses Henry S. Warren's Hacker's Delight tricks:

Let me know if you can find me a sponsor. I'm currently (as a passtime)
doing cycle counts on intrinsic and C fallbacks for these kinds of
routines.. (to fill in for compilers without them). Testing on
MSC/GCC/Clang on Darwin/Android/Windows/Linux.

With little endian wire format and use of 32-bit or 64-bits limbs on all
modern systems (sparc9, power8, x86, arm) BN_bn2binle routine would
essentially be memcpy. That's the sad part.

So with big-endian wire-formats, practically all machines are going to
bswap as the internal rep will be little endian. That cost is probably
low compared to the cost of the computation, so there is a reasonable
rationale to stick to big-endian.

  BSWAP is one 1 cycle per 32-bit word on i386
  BSWAP is one 1 cycle per 64-bit word on x86_64
  BSWAP has been around since i486

Someone starting today would indicate endianness statically or
dyanamically in their number format and use the hot potato principle.
Get it off your hands at the lowest cost to yourself, which nowadays is
little-endian. Remember GHASH is Bit Reflected (base 2 little endian)
and it made it into a spec.

Peter makes a pretty good case to stick with big-endian, from a
consistency point of view, but not from an efficiency point of view.

Seems like we will be swizzling onto and off the network into
eternity... unless someone makes a decision otherwise...