[Cfrg] Reviews of AES-GCM-SIV (draft-irtf-cfrg-gcmsiv-04.txt)

"Paterson, Kenny" <Kenny.Paterson@rhul.ac.uk> Wed, 05 July 2017 07:43 UTC

From: "Paterson, Kenny" <Kenny.Paterson@rhul.ac.uk>
To: Shay Gueron <shay.gueron@gmail.com>, Adam Langley <agl@imperialviolet.org>, Yehuda Lindell <Yehuda.Lindell@biu.ac.il>
CC: "cfrg@irtf.org" <cfrg@irtf.org>
Thread-Topic: Reviews of AES-GCM-SIV (draft-irtf-cfrg-gcmsiv-04.txt)
Thread-Index: AQHS9WJqB08RIbvyeEucDrswwscvig==
Date: Wed, 05 Jul 2017 07:43:38 +0000
Message-ID: <D5825737.9801A%kenny.paterson@rhul.ac.uk>
Accept-Language: en-GB, en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.7.1.161129
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-ID: <01693C62E3B9AF42A7D1D1ACAF3F8772@eurprd03.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Jul 2017 07:43:38.7445 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 2efd699a-1922-4e69-b601-108008d28a2e
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0301MB1906
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/Hkp2l5_x_vx7SO1HSmWV6Hvt6zM>
Subject: [Cfrg] Reviews of AES-GCM-SIV (draft-irtf-cfrg-gcmsiv-04.txt)
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Jul 2017 07:43:46 -0000

Dear Adam, Shay and Yehuda,

We've now received 3 reviews on draft-irtf-cfrg-gcmsiv-04.txt from the
CFRG crypto review panel, see below - thanks to Scott Fluhrer, Tibor Jager
and Bjoern Tackmann for producing these.

Please read these reviews carefully and take them into account in the next
version of your draft. It would be helpful if you could respond on list as
to how you have handled the various comments.

Note that you will not be able to post a new draft for a while because of
the cutoff in advance of the forthcoming IETF meeting.

Once the next version is available, we will go to last call.

Best wishes,

Kenny (for the chairs)


==============================
Reviewer: Scott Fluhrer

Summary: Almost Ready

Major Concerns:

None - from a security perspective, it looks pretty good


Minor Concern:

One thing that may be problematic to an implementor was the nonces listed
in the test vectors.  AES-GCM-SIV takes 12 byte nonces, the nonces listed
are 16 bytes long.  While this is unlikely to be a major source of
confusion for an implementator, I suggest that the nonces be trimmed down
before publishing.


Nits:

The encryption/decryption algorithms are given using fairly terse English
descriptions.  While they are moderately clear to me, I wonder if they'd
be as clear to someone else; I'm wondering if a pseudocode description
would work better?  On the other hand, the test vectors give intermediate
cipherstates (which would help a lot).

While the test vectors are quite good in general, they use only one nonce,
and two keys (one for each key length).  While there is certainly value in
reusing the same key and nonce for slightly different plaintexts/AADs, I
believe there would also be value in showing how different keys/nonces
work.  One example: at one point,  AES-GCM-SIV xor's in the nonce into the
POLYVAL result.  If someone did an incorrect implementation where (say)
they exclusive-or'ed only the first 4 or 8 bytes of the nonce, the current
test vectors would still pass.

The bytes in the test vector are listed LSB to MSB.  This rather assumes
that the implementor is using little-endian byte ordering; I would suggest
that this be changed to a more endian-neutral notation (possibly by just
omitted the LSB and MSB labels).



===========================================================================
=====

Reviewer: Tibor Jager

Summary: almost ready

========================================
Major concerns:
========================================

none

========================================
Minor comments and recommendations:
========================================


The draft is somewhat unclear whether plaintexts and additional data
(AD) are always assumed to be given as *byte*-strings, or whether it is
also possible to encrypt arbitrary-length *bit*-strings whose length is
not a multiple of 8.


On the one hand, at the beginning of Section 4 it is clearly stated that
the encryption algorithm takes "arbitrary-length plaintext & additional
data *byte*-strings".

On the other hand, then it would also be somewhat more
intuitive/consistent to include
the *byte*-length of plaintext and AD in the length block. The current
draft includes the bit-length. (This is of course technically fine and
essentially just a different notation, but *could* be misleading for
developers.) Also the example in Section 8 mentions the bit-length of
plaintext/AD.

Hence, I want to suggest to mention this assumption about the plaintext
length more clearly - even if it seems quite standard and will most
likely hold for most application anyway.

In the worst case, if a developer misunderstands this and allows the
encryption with arbitrary bit-length plaintext and AD, then this may
enable to break the integrity of ciphertexts (at least in theory), if
the bytelen() function is implemented in the natural way.

Let C = Enc(k,m,d,n) be a ciphertext, encrypting plaintext m with key k,
nonce n, and additional data d. Suppose that |d| = 7 bits, and that
bytelen() is implemented such that bytelen(d) = 1 (which seems natural,
even tough bytelen(d) = 7/8 would actually be correct). Note that the
encryption algorithm pads d with zeroes to a multiple of 16 bytes before
it is processed by POLYVAL, such that in particular it holds that

  C = Enc(k,m,d,n) = Enc(k,m,d||0,n)

and the decryption algorithm accepts both d and d||0 as "valid"
additional data for C.

Of course this attack is rather theoretical, but it can easily be
avoided by either including the precise *bit*-length of plaintext and AD
into the length block, or by letting the encryption algorithm abort, if
the lengths of plaintexts or AD are not a multiple of 8 bits (and one
could ignore this check in applications where this is guaranteed by the
environment - but this is of course something that only the application
developer can decide).


========================================
Nitpicking:
========================================

Section 1 "Introduction", 1st paragraph: I suggest to replace
  "...that is easier for practitioners to use correctly."
with
  "that is easier to use correctly."


In Section 4, first paragraph, the text suggests that plaintexts and
additional data of arbitrary length can be encrypted. However, the
description of the decryption procedure in Section 5 rejects ciphertexts
of size larger than 2^36+16 bytes, and Section 6 gives upper bounds on
the plaintext and AD sizes P_MAX and A_MAX.


In Section 4, last paragraph, the result of encryption is the "resulting
ciphertext ... followed by the tag". Thus, in this notation, the tag is
not part of the "ciphertext", but it is separate and sent along with the
ciphertext.
However, at the beginning of Section 5, decryption algorithm receives as
input key, nonce, AD, and a ciphertext, and the ciphertext is split into
the encrypted plaintext and the tag, thus the "ciphertext" contains the
tag here. One could unify this, by always considering the tag as part of
the ciphertext.


Section 8, very very nitpicking: One could mention here that the
plaintext are the bit strings corresponding to the *ASCII encoding* of
"Hello world" and "example".


Section 8, 5th paragraph, again very nitpicking: Some developers may
have difficulties in understanding immediately which numbers are given
in hexadecimal notation, and which in decimal notation. For clarity, one
could write here something like:
"example": 7 characters = 56 bits = 0x38 bits
"Hello world": 11 characters = 88 bits = 0x58 bits


Section 9, 7th paragraph: "Suzuki et al. [multibirthday]", the reference
lists Kazuhiro as first author, so it seems this should be Kazuhiro et al.


I did not check the test vectors.


Regarding Scott's comment on the verbal description of the encryption
and decryption algorithms: I had the same impression, some pseudocode
may be helpful to clarify what is happening here.


Apart from the above minor comments, I think that this is an excellent
RFC, which is very clear, precise, easy to understand, and
well-readable. The large number of test vectors will certainly be
considered very helpful to many implementers. I think it is very useful
to have a nonce misuse-resistant encryption scheme defined in an RFC, in
particular if it is as competitive with weaker solutions regarding
implementational difficulty and computational efficiency as this one.

===========================================================================
=====

Reviewer: Bjoern Tackmann

Summary: Almost ready

Major issues: none

Minor issues:

I found the description of the encryption method comprehensible but a bit
too informal. Maybe a pseudo-code description may be easier to understand?

In that description, I stumbled particularly over the “_length block_” -
why the underscores? Also, I think it would make sense to explicitly
clarify that the 32-bit counter value for the CTR part is explicitly
allowed to overflow.

One aspect that read a bit arbitrary to me was the domain separation for
AES by setting the first bit of the last byte to 1/0, and in particular
that seemed a bit wasteful since it is evaluated only once with the bit
set to 0. Wouldn't it be simpler (and possibly even with better security,
but at the cost of limiting the length to 2^32 - 1 blocks) to not fiddle
with the bits but just use the first counter-block to compute the tag?
Note that this would certainly require re-doing some part of the security
analysis, and the gains seem moderate, so I’m fine with this comment being
discarded. (Just wanted to make sure it is discarded consciously.)

After discussion with the other reviewers: The draft is ambiguous with
respect to byte-strings vs. bit-strings. My interpretation, stemming from
the beginning of Section 4 explicitly mentioning byte-strings, is that all
strings must be properly byte-aligned. But given that, using
bytelen(additional_length) * 8 and bytelen(plaintext) * 8, i.e.
bit-length, is confusing. If the draft is supposed to deal with
byte-aligned strings only (which appears practically sensible), then this
should be made clear in the document, and byte-length should be used in
the encryption. Otherwise, the algorithms have to be specified more
clearly for the case of bit-strings that are not byte-aligned. In either
case, the draft should be clarified in this respect.

[Cfrg] Reviews of AES-GCM-SIV (draft-irtf-cfrg-gc… Paterson, Kenny
Re: [Cfrg] Reviews of AES-GCM-SIV (draft-irtf-cfr… Adam Langley