Re: [TLS] draft-agl-tls-snapstart-00

Adam Langley <agl@google.com> Tue, 22 June 2010 17:01 UTC

DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:date:message-id:subject:from:to: cc:content-type:content-transfer-encoding:x-system-of-record; b=x/ap32Ky/wtLHMKsE0V84VYqx6Wzug33CK83bwfLYjjUW7JFqA1GBxmmV5cnoTkgk Cwq3L2Pwg4Wc9MtnpJgnA==
MIME-Version: 1.0
In-Reply-To: <48520393-14A8-4E46-B9B4-ED18A3F0216C@cisco.com>
References: <48520393-14A8-4E46-B9B4-ED18A3F0216C@cisco.com>
Date: Tue, 22 Jun 2010 13:01:53 -0400
Message-ID: <AANLkTikdqgjBPdqdMQjVw8qV1ayy9L-2C5FwxFfCHNwT@mail.gmail.com>
From: Adam Langley <agl@google.com>
To: David McGrew <mcgrew@cisco.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Cc: cfrg@irtf.org, tls@ietf.org
Subject: Re: [TLS] draft-agl-tls-snapstart-00
Precedence: list

On Mon, Jun 21, 2010 at 10:53 AM, David McGrew <mcgrew@cisco.com> wrote:
> What anti-replay assurance does the server have, and how exactly is it
> provided?   If I understand correctly, the intent is that the client
> suggests a a truly random value for the server_random, and the server is
> responsible for rejecting duplicate values, and that a time-window strategy
> can be used by the server to minimize storage.   This would put a tension
> between storage requirements (which would want a short window) and snapstart
> re-establishment (which would want a long window).

I believe that your understanding of the replay prevention is correct.
However, I don't see where the tension for wanting a long window comes
from. Is 'snapstart re-establishment' session resumption with snap
start? Each new connection (be it resuming a session or not) uses new
nonces, so resuming a session from 24-hours ago does not require that
the server still have its strike register from that point in time.

The pressures on the strike register (the data structure which is
recording used server_randoms) are the obvious need to keep storage
requirements down, and tolerance of clock skew. In an ideal world, all
clients' clocks would be perfectly synced so the only source of skew
would be the network latency. The server could then reject all
server_randoms with a timestamp < a couple of seconds ago.

In reality, client's clocks aren't that good and the server will want
to cope with skews of 10s of seconds or more.

There is an additional downwards pressure on the strike register's
size: the possibility of an attacker delaying a handshake and forcing
the client to retry: a pseudo-replay attack. The security
considerations here have to include the details of the application
level protocol. For HTTPS, this attack is already possible and
limiting the allowed clock skew mostly brings us back to the status
quo.

> I suggest adding this as a requirement for the "orbit": it MUST be generated
> uniquely at random by each server.  If virtualization is in use, each
> instance of the server MUST generate and use a distinct orbit value.

I think I disagree. There are situations where servers would want to
share an orbit: as described in section 7.

> An option that might be worth considering: have the server issue the client
> a range of nonce values that can be included in the server_random, and
> require the client to use them sequentially (e.g. as integers) or at least
> in a non-decreasing way.  In this scheme, a server can check for a replay by
> storing a single integer, instead of the set of server_random values within
> the time window.  This would have the benefit of removing the tension on
> window size.  Of course, it will be necessary for the server_random to
> contain an actual random value, because we want to retain that property of
> the protocol to the greatest extent possible (e.g. to retain analyses of the
> key derivation processes).

I don't think the space savings are worth the additional complexity.
Consider a cluster which is terminating a million TLS
connections/second. The strike register takes 24 bytes to store a
server_random (no need to save the orbit) and 8 bytes of overhead per
entry. If we allow a window of +/- 60 seconds then that's 4GB of
state: which is nothing for such a cluster.

> "In order for the server to be assured of uniqueness, it will have to
> remember every "server_random" value that has been used so that it may
> reject duplicates."   I suggest changing this to " ... it will need to be
> able to detect potential duplicates", which is less pessimistic about the
> amount of server state.

Agreed. Thanks.

> It would be disastrous if an attacker could cause a collision in the
> predicted_server_handshake, that is, if the client and server had different
> ideas about what the response flow actually was (e.g. if the client thought
> that RC4 was offered, while the server meant to offer only AES-128).  I
> suggest either changing the computation of this field to use a
> collision-resistant hash, or documenting how the current spec avoids this
> problem.

I think the discussion in section 6 ('Active attack considerations')
covers this. I believe that any modification of the handshake will be
detected by the Finished hash, although if I've omitted any cases
please let me know.

> There should be some discussion about denial of service attacks.

Yes probably. I don't think there are any new DoS possibilities except
for flooding or algorithmic complexity attacks against the strike
register, but that's worth a mention in the next revision.

Cheers

AGL

Re: [TLS] draft-agl-tls-snapstart-00 Adam Langley
Re: [TLS] draft-agl-tls-snapstart-00 Brian Smith