Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06

Watson Ladd <watsonbladd@gmail.com> Mon, 18 September 2017 19:35 UTC

MIME-Version: 1.0
In-Reply-To: <71d10985-4c46-4a7c-e634-76a822102a61@openssl.org>
References: <EA4347BF-D26F-4303-9A8D-E7B28986DE56@isode.com> <71d10985-4c46-4a7c-e634-76a822102a61@openssl.org>
From: Watson Ladd <watsonbladd@gmail.com>
Date: Mon, 18 Sep 2017 12:35:19 -0700
Message-ID: <CACsn0cnSq9nJpdjpDQ-HpHX7i6W-0=JkCOB-WenBRoMSKO9ypA@mail.gmail.com>
To: Andy Polyakov <appro@openssl.org>
Cc: "cfrg@irtf.org" <cfrg@irtf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/x0kXFGt8iTk1Ys028pLAjFFWHtY>
Subject: Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06
Precedence: list

On Mon, Sep 18, 2017 at 10:51 AM, Andy Polyakov <appro@openssl.org> wrote:
> Hi,
>
> I apologize for chiming in so late, but I wasn't aware of this endeavour
> till very recently. The question I find myself struggling with is that I
> can't find introduction of new primitives justified, at least not in
> this context. I did my best sieving through
> https://mailarchive.ietf.org/arch/search/?q=gcmsiv) to locate relevant
> prior comments, so even with those in mind. One of arguments was that
> POLYVAL gets bit order right in comparison to GHASH. Well, comparing the
> two is actually boils down to one question: what is it you choose to
> bit-flip, input or polynomial? [Do note "bit-flip", not "byte-flip".]
> What I'm trying to say is that we have to recognize that POLYVAL is
> simply formulated in terms of GHASH polynomial in reverse bit order. On
> the other hand on more practical level, i.e. from software
> implementation viewpoint, answer to the question is self-obvious, it's
> more efficient to bit-flip single polynomial than each fragment of the
> input. This last thing effectively means that sensible GHASH
> implementation would actually use reverse-bit polynomial (and at least
> OpenSSL implementations all do). This in turn means that the only
> essential difference between GHASH and POLYVAL is byte order. Now, it
> was argued that little-endian byte order gives performance edge [on
> platform of popular choice]. But relevant question in the context is if
> the edge actually justifies introduction of new algorithm. It was
> asserted that it provides 20% improvement and it was attributed to the
> fact that PCLMULQDQ-based GHASH implementations use vector byte swap
> instructions, *one* per block. But we have to recognize that said
> improvement coefficient is actually specific to Skylake. For example on
> Haswell we'll observe ... 0% improvement, on Broadwell - 6%, on Ryzen -
> 3%. Let's even have a look at absolute cycles per processed byte values
> for GHASH:

It's important to not conflate the following 3 distinct things:
- The maximum performance attainable by the best implementation on
each microarchitecture
- The maximum performance attainable by a single implementation on
each microarchitecture
- The maximum performance attainable by the best implementation for a
single implementation strategy on each microarchitecture

It's also important not to conflate the following 4 things:
- The arrangement of limbs in a bignum
- The internal arrangement of bits in a limb
- The arrangement of bits in a byte
- Reduction strategy

I don't get a clear impression from your email what implementations
you used or what strategies and representation.

POLYVAL takes the least significant bit of a byte to be the constant
coefficient. No matter what architecture (big or little endian) you
have, this is clearly the right choice as it is exactly the choice
made when considering a byte to be an integer in the range 0 to 255.
GHASH does something weird I don't recall exactly.

dot(a,b) is the result of a Montgomery reduction of the product ab
with Montgomery modulus x^128. This is true no matter what
representation of the polynomials is chosen. Most fast GHASH
implementations use Montgomery form internally, due to the advantages
of Montgomery form. POLYVAL thus removes conversions into and out of
Montgomery form. These conversions might be extremely cheap, but they
are not free. If combined with the GHASH weirdness they may be very
cheap indeed, but POLYVAL conversions are not more expensive even if
you have to swap byte order in a word on read.

I'm thus extremely doubtful that your numbers below are correct: they
may be. But I'm skeptical. Benchmarking is really hard, and I know
I've screwed up plenty of these measurements myself.

Sincerely,
Watson Ladd

Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Paterson, Kenny
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Yehuda Lindell
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Stefano Tessaro
[Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06 Alexey Melnikov
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Blumenthal, Uri - 0553 - MITLL
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Ted Krovetz
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Watson Ladd
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Adam Langley
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Shay Gueron
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Shay Gueron
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Andy Polyakov
Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv… Stefano Tessaro