Re: [Cfrg] AES GCM SIV analysis

On Mon, Jan 30, 2017 at 1:58 PM, Alex Cope <alexcope@google.com> wrote:
> I'm also unconvinced that defining POLYVAL as modulo x^128 + x^127 + x^126 +
> x^121 + 1 is better than defining it modulo x^128 + x^7 + x^2 + x + 1.
>
> My reasons for thinking that  x^128 + x^7 + x^2 + x + 1 is a better reducing
> polynomial are:
> 1) On X86, no performant implementation of POLYVAL will reduce to GHASH, as
> that would result in unnecessarily byte swapping twice then calling
> PCLMULQDQ. The same seems true for any implementation in a little-endian
> machine.

I think it is important to look at how endianness matters in
polynomial arithmetic. GHASH takes the least
significant bit of a byte to be x^7, then takes a sequence of bytes
(a_0, a_1, \ldots a_15) to represent
a_0+a_1*x^8+a_2*x^16+\ldots+a_15*x^120. This is "little-endian": the
least significant byte is lowest. However,
it is an independent choice from the representation and layout of
words in memory.

Intel's implementation of GHASH does not reflect bytes, nor is byte
reflection sufficient in a naive implementation. For
details see: https://software.intel.com/sites/default/files/m/4/1/2/2/c/1230-Carry-Less-Multiplication-and-The-GCM-Mode_WP_.pdf

It's true that the specified polynomial doesn't have a Barret friendly
form. But the division by 128 naturally falls out of a Montgomery
based approach: precompute H*x^-8, then perform a multiplication add a
multiple of p(x) to clear out from x^0 to x^120, then shift down by a
whole number of bytes. The
multiplication by p(x) can be written as (x^8+x^7+x^6+x^1)*x^120+1 to
show that we don't need that many invocations of multiply. (Also, the
operations are linear: there are many ways to reduce the number of
reductions).

> 2) If the concern is making a less performance sensitive implantation of
> GCM-SIV easier to write given a GCM implantation, the relationship between
> POLYVAL as currently defined and GHASH does help slightly. However,
> implementing finite field multiplication with POLYVAL'S byte ordering modulo
> x^128 + x^7 + x^2 + x + 1 given a reference GHASH implementation is quite
> easy*, and most accidental errors in such implementation will be detected by
> test vectors.  Thus I don't think the current design choice makes GCM-SIV
> substantially easier to implement correctly.
> 3) x^128 + x^7 + x^2 + x + 1 is more 'natural' as as the lexicography first
> irreducible polynomial.  It seems likely that future work that relies on
> finite field multiplication will opt to use the byte ordering of POLYVAL
> because it makes the most sense on little endian machines, and will also
> reduce modulo  x^128 + x^7 + x^2 + x + 1.  I doubt anyone else will want to
> use GHASH as currently defined, but if it were defined modulo x^128 + x^7 +
> x^2 + x + 1, the implementation could be nicely reused. I think that in the
> long term this will lead to more code reuse than having POLYVAL as a one off
> finite field representation.
>
> If you are strongly concerned reusing dedicated hardware or big-endian
> implementations of GHASH then the current design makes some sense as a
> compromise, but I'm unaware of such requirements.

I don't know what "big-endian" is supposed to mean here. But yes,
reusing polynomial and field specific hardware is important on devices
which aren't Intel chips, many of which will want to use hardcoded
keys and may not be able to generate nonces effectively.

>
> Regards
> -Alex
>
> *I did it recently for a table-based implementation. You can look at the
> patch here: https://patchwork.kernel.org/patch/9428397/
>
> On Fri, Jan 27, 2017 at 12:21 PM, Adam Langley <agl@imperialviolet.org>
> wrote:
>>
>> On Thu, Jan 26, 2017 at 6:26 PM, Dan Harkins <dharkins@lounge.org> wrote:
>> >   But that is the definition used in the seminal work on the matter,
>> > [1].
>> > If you want to have a different notion concerning a lesser restriction
>> > on
>> > nonce reuse then you should use a different term.
>>
>> That paper formalises an advantage that an attacker might have over an
>> ideal scheme. In the same way that block ciphers aren't ideal PRFs,
>> nonce-misuse-resistant schemes aren't hitting that ideal either. RFC
>> 5297 doesn't hit it, at minimum because it'll run out of counter-space
>> after enough messages. AES-GCM-SIV isn't hitting it either.
>>
>> But that doesn't mean that they aren't practically useful.
>>
>> I'm not sure what an ideal NMR AEAD would look like, but it's probably
>> quite different to both RFC 5297 and AES-GCM-SIV and probably looks
>> like a wide-block construction. If someone can point at something they
>> think does hit it, that would be interesting to me at least. (Although
>> perhaps it's well trodden ground for those who are more familiar with
>> the literature than I.)
>>
>> >   Which brings up a question I've resisted asking: Why are you doing
>> > this?
>> >
>> >   If you want to have an AEAD scheme that is nonce-misuse resistant that
>> > can use a fast(er) authentication scheme then why not just do RFC 5297
>> > with GHASH instead of AES-CMAC?
>>
>> AES-GCM has lead to a state of the world where our large machines,
>> which need to encrypt and decrypt lots of data, end up having hardware
>> support for AES and GF(128) operations. We would like to take
>> advantage of that because it's hard to beat the speed and power
>> efficiency of dedicated hardware, but sometimes we want not to have to
>> worry about nonces.
>>
>> > You're defining a new irreducible
>> > polynomial that, to my knowledge, is not in existing hardware the way
>> > that PCLMULQDQ using x^128 + x^7 + x^2 + x + 1, is in Intel chips.
>>
>> PCLMULQDQ (and other hardware implementations, to my knowledge) is not
>> specific to that polynomial. Also, the polynomial in AES-GCM-SIV is
>> that polynomial, just with some ordering oddities addressed. See
>> https://tools.ietf.org/html/draft-irtf-cfrg-gcmsiv-03#appendix-A.
>>
>> > You're defining a(nother) KDF /inside/ the cipher mode itself instead of
>> > just letting a KDF, which all users of AES-GCM-SIV will use, generate a
>> > double-wide key. And I don't see the reason for either.
>>
>> Our KDF is per-nonce; it's not the same as having a double-width key
>> and partitioning it internally. We do this in order to get better
>> bounds when encrypting very large numbers of messages.
>>
>>
>> Cheers
>>
>> AGL
>>
>> _______________________________________________
>> Cfrg mailing list
>> Cfrg@irtf.org
>> https://www.irtf.org/mailman/listinfo/cfrg
>
>
>
> _______________________________________________
> Cfrg mailing list
> Cfrg@irtf.org
> https://www.irtf.org/mailman/listinfo/cfrg
>

-- 
"Man is born free, but everywhere he is in chains".
--Rousseau.