Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06

Andy Polyakov <> Tue, 19 September 2017 07:46 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id AF1EB13306A for <>; Tue, 19 Sep 2017 00:46:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -6.9
X-Spam-Status: No, score=-6.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id iU1-znx_mHtD for <>; Tue, 19 Sep 2017 00:46:16 -0700 (PDT)
Received: from ( [IPv6:2001:608:c00:180::1:e6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 58C92126DFE for <>; Tue, 19 Sep 2017 00:46:16 -0700 (PDT)
Received: from [] (localhost [IPv6:::1]) by (Postfix) with ESMTP id AB0FDE03D1; Tue, 19 Sep 2017 07:46:14 +0000 (UTC)
From: Andy Polyakov <>
To: Adam Langley <>
Cc: "" <>
References: <> <> <> <>
Message-ID: <>
Date: Tue, 19 Sep 2017 09:46:14 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <>
Subject: Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 19 Sep 2017 07:46:18 -0000

>> But it's also the case that little-endian machines now dominate. It's
>> nice that some big cores can largely hide the cost of doing the byte
>> swap, but hiding work is not the same as avoiding it (as your Skylake
>> numbers show) and smaller cores may not be able to do the
>> out-of-order, superscalar magic needed.
> The assertion that 20% improvement is rather anomaly than rule has
> lesser to do with cores being "little" or "big". And I feel that logic
> is getting twisted here. "Big" cores are considered to be more likely to
> amortize additional cost of byte swap, right? Skylake is "big" core and
> it does have computational resources to do so. And fails miserably.

I think I've misinterpreted the original remark in my head, as reply
appears orthogonal now, in the morning. Yes, some processors will fail
to amortize costs of byte swap, but point is that even if they won't,
*customarily* unamortized cost won't be anywhere near 20%. Yes, avoiding
byte swap will lift the question altogether, but question still is if
gain (and we are be talking about non-anomalous cases) justifies
introduction of new primitives. Yes, they might appear more elegant to
little-endian eye, but these two, performance and elegance, are not the
only parts of equation. Not to mention that every term has weight
coefficient. Performance's weight is improvement coefficient (6% on
Skylake, none to effectively negligible on others, negative for those
who choose easy way out(*)), and elegance's weight is...

(*) As mentioned earlier, "easy way out" refers to performing byte
swapping of GHASH input and output, and building modified CTR with
signle-block cipher subroutine. As opposite to investing into
high-performance code that would have to be dedicated. Even unstitched.