Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06

Andy Polyakov <> Mon, 18 September 2017 22:33 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A00681342EB for <>; Mon, 18 Sep 2017 15:33:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -6.9
X-Spam-Status: No, score=-6.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id q4vIh5nX8uI8 for <>; Mon, 18 Sep 2017 15:33:34 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 1A34B1342E8 for <>; Mon, 18 Sep 2017 15:33:33 -0700 (PDT)
Received: from [] (localhost [IPv6:::1]) by (Postfix) with ESMTP id EA165E038F; Mon, 18 Sep 2017 22:33:29 +0000 (UTC)
To: Adam Langley <>
Cc: "" <>
References: <> <> <>
From: Andy Polyakov <>
Message-ID: <>
Date: Tue, 19 Sep 2017 00:33:34 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <>
Subject: Re: [Cfrg] RG Last Call on draft-irtf-cfrg-gcmsiv-06
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 18 Sep 2017 22:33:37 -0000

>> Well, it's not like I actually reject the sentiment to favour
>> little-endian platform[s], question rather is if *mode* specification is
>> right place for it. Protocols are about *communication* and there are
>> more factors in play, most notably using already available primitives
>> would facilitate adoption. And if little-endian-centric primitives are
>> deemed desirable, wouldn't it be more appropriate to define them
>> separately to promote modularity and re-usability?
> Anyone who has implemented GHASH has had to wrestle with the
> specification's handling of bit ordering, it's confusing beyond
> questions of little- vs big-endian.

As mentioned in 1st message, it's all about what you bit-flip,
polynomial or input. Or rather how you view polynomial. If GCM
specification is considered confusing (I'm not saying that it isn't),
then why not address *that*? By re-specifying it with bit-flipped

> But it's also the case that little-endian machines now dominate. It's
> nice that some big cores can largely hide the cost of doing the byte
> swap, but hiding work is not the same as avoiding it (as your Skylake
> numbers show) and smaller cores may not be able to do the
> out-of-order, superscalar magic needed.

The assertion that 20% improvement is rather anomaly than rule has
lesser to do with cores being "little" or "big". And I feel that logic
is getting twisted here. "Big" cores are considered to be more likely to
amortize additional cost of byte swap, right? Skylake is "big" core and
it does have computational resources to do so. And fails miserably.

> POLYVAL is GHASH done right
> for the world as we find it.

Then let's say exactly that. Let's specify it as re-usable primitive.
Does it actually have to be interwoven into *this* or *any* specific
mode specification?

> As for diversity: that's a cost. Although it's smaller here than in
> many cases. Where the diversity costs really bubble up is when they
> spread up the stack and we have, say, RSA-with-SHAKE128 to test and
> validate everywhere.

I don't quite follow. This is rather argument against introducing new
primitives. Because as far as testing and validation goes, there is no
difference between these cases. I mean you get more to test in both
cases. But it would be less to test and validate if suggested mode used
existing primitives.

> I think the diversity cost of AES-GCM-SIV is
> concentrated in its very existence (i.e. that it's not AES-GCM). For
> cases where AES-GCM-SIV isn't being written as a stitched asm blob
> (where the full implementation cost of either POLYVAL or GHASH is be
> paid anyway). BoringSSL uses the mapping to GHASH to avoid another> implementation, as specified in the draft.

I sense contradiction here. It's argued that alternative primitives
provide better performance. But attaining acclaimed performance *will*
take dedicated modules, i.e. significantly increase diversity. Even
without stitched implementation. If expectation is that majority will
take easy way out (perform byte swapping of GHASH input and output, and
build modified CTR with signle-block cipher subroutine), then question
about alternative primitives being well justified becomes even more acute...

[On side note, as for stitched implementation, core function in should be reusable if GHASH and standard CTR were used.]