Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06

Watson Ladd <> Mon, 18 September 2017 19:35 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8C5A81321AC for <>; Mon, 18 Sep 2017 12:35:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id MDhwP-n6PMf0 for <>; Mon, 18 Sep 2017 12:35:21 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:400c:c08::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 6625D1326ED for <>; Mon, 18 Sep 2017 12:35:21 -0700 (PDT)
Received: by with SMTP id s15so951106uag.1 for <>; Mon, 18 Sep 2017 12:35:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=sUVAW3TB90U+2XkFOT/gpPAQdVUfesxcpSb8Z4kUWTU=; b=QBB4KFUgrK+5dexoA7mpQqESwVkd8522lv5F7pKBGltg6SiLU3A09dO1zfR//hleCB V+e9Yl6/VRO1PcD8NeGo//3j6pehqhMqEEKQCeBbQeSTE00qX8ZOfZ0Mdbb7POm6KR6r cpLMaxC3yAJxzLjii/3rW7zQJfyIi2ugYyrjgb+pXpXUAtEUuHGhFH1tcu8iHxmyxEMv ziRyh+KONIP/RhviJk77KQUJgz810HDEdky/KrR3EELkL/4fwwKDweaMQuInYqnDRcxC 78+xHKjt7d/DgMPNP9W115i+6srZBgCg3BeKmkY9LiRM7t+sQP3ytWB5NhPqawgy3fkL 8jvw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=sUVAW3TB90U+2XkFOT/gpPAQdVUfesxcpSb8Z4kUWTU=; b=GzULP2R1rtr/3iRHmIpLl2/0dOzAlwC7FR1jiYUPeqhx8iTsGuw/jI2zszDXZNjZVR iseIBHQvkrfuAo7EVeqcTs+JxHsEJcaKwO23T/byIXcFxYNML0dxF3u5pMkK7a/9DtVw 5QLJsWwD/vGafaed7dyAzoD4qaecbboSyshbzd3zaIi3lyogNg4dARQYYU2Y+EhaP5W2 dYLZ5bYkjd50mMAQ692NMbYpoyLw68M1bBX4etEGdb73H7mQ8BHcST5b0Dcyn0zAF9SI joo75DyI+sMFt7d5t22TCa5sQLaetrT5HMMhkAaXck/DXQKs8hSTurlCUrPMmoc+EYKk YKjQ==
X-Gm-Message-State: AHPjjUhtdi/khYmSd3Weira+vH9MTtMyLdearmHsmVVuJgOtYghk91WU trFtnsbfHA4esbSaK07Pd8qqBjD5bvQz1MHDqR98kQ==
X-Google-Smtp-Source: AOwi7QD/lEB6DUsAe//FK1ztr6Jq+G51ZfiGM3g8mUAW2ppNF0ZfiQEEZ1F1jqfY+2i6dXplPP9jMfazQFngMfbXQMo=
X-Received: by with SMTP id l48mr26866409uad.144.1505763320267; Mon, 18 Sep 2017 12:35:20 -0700 (PDT)
MIME-Version: 1.0
Received: by with HTTP; Mon, 18 Sep 2017 12:35:19 -0700 (PDT)
In-Reply-To: <>
References: <> <>
From: Watson Ladd <>
Date: Mon, 18 Sep 2017 12:35:19 -0700
Message-ID: <>
To: Andy Polyakov <>
Cc: "" <>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <>
Subject: Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 18 Sep 2017 19:35:23 -0000

On Mon, Sep 18, 2017 at 10:51 AM, Andy Polyakov <> wrote:
> Hi,
> I apologize for chiming in so late, but I wasn't aware of this endeavour
> till very recently. The question I find myself struggling with is that I
> can't find introduction of new primitives justified, at least not in
> this context. I did my best sieving through
> to locate relevant
> prior comments, so even with those in mind. One of arguments was that
> POLYVAL gets bit order right in comparison to GHASH. Well, comparing the
> two is actually boils down to one question: what is it you choose to
> bit-flip, input or polynomial? [Do note "bit-flip", not "byte-flip".]
> What I'm trying to say is that we have to recognize that POLYVAL is
> simply formulated in terms of GHASH polynomial in reverse bit order. On
> the other hand on more practical level, i.e. from software
> implementation viewpoint, answer to the question is self-obvious, it's
> more efficient to bit-flip single polynomial than each fragment of the
> input. This last thing effectively means that sensible GHASH
> implementation would actually use reverse-bit polynomial (and at least
> OpenSSL implementations all do). This in turn means that the only
> essential difference between GHASH and POLYVAL is byte order. Now, it
> was argued that little-endian byte order gives performance edge [on
> platform of popular choice]. But relevant question in the context is if
> the edge actually justifies introduction of new algorithm. It was
> asserted that it provides 20% improvement and it was attributed to the
> fact that PCLMULQDQ-based GHASH implementations use vector byte swap
> instructions, *one* per block. But we have to recognize that said
> improvement coefficient is actually specific to Skylake. For example on
> Haswell we'll observe ... 0% improvement, on Broadwell - 6%, on Ryzen -
> 3%. Let's even have a look at absolute cycles per processed byte values
> for GHASH:

It's important to not conflate the following 3 distinct things:
- The maximum performance attainable by the best implementation on
each microarchitecture
- The maximum performance attainable by a single implementation on
each microarchitecture
- The maximum performance attainable by the best implementation for a
single implementation strategy on each microarchitecture

It's also important not to conflate the following 4 things:
- The arrangement of limbs in a bignum
- The internal arrangement of bits in a limb
- The arrangement of bits in a byte
- Reduction strategy

I don't get a clear impression from your email what implementations
you used or what strategies and representation.

POLYVAL takes the least significant bit of a byte to be the constant
coefficient. No matter what architecture (big or little endian) you
have, this is clearly the right choice as it is exactly the choice
made when considering a byte to be an integer in the range 0 to 255.
GHASH does something weird I don't recall exactly.

dot(a,b) is the result of a Montgomery reduction of the product ab
with Montgomery modulus x^128. This is true no matter what
representation of the polynomials is chosen. Most fast GHASH
implementations use Montgomery form internally, due to the advantages
of Montgomery form. POLYVAL thus removes conversions into and out of
Montgomery form. These conversions might be extremely cheap, but they
are not free. If combined with the GHASH weirdness they may be very
cheap indeed, but POLYVAL conversions are not more expensive even if
you have to swap byte order in a word on read.

I'm thus extremely doubtful that your numbers below are correct: they
may be. But I'm skeptical. Benchmarking is really hard, and I know
I've screwed up plenty of these measurements myself.

Watson Ladd