Re: [TLS] Security review of TLS1.3 0-RTT

Victor Vasiliev <vasilvv@google.com> Tue, 30 May 2017 21:38 UTC

Return-Path: <vasilvv@google.com>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3BB1C127F0E for <tls@ietfa.amsl.com>; Tue, 30 May 2017 14:38:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Level:
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jTygyBM7gRXt for <tls@ietfa.amsl.com>; Tue, 30 May 2017 14:38:05 -0700 (PDT)
Received: from mail-qk0-x229.google.com (mail-qk0-x229.google.com [IPv6:2607:f8b0:400d:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 588C8124D37 for <tls@ietf.org>; Tue, 30 May 2017 14:38:05 -0700 (PDT)
Received: by mail-qk0-x229.google.com with SMTP id u75so79503141qka.3 for <tls@ietf.org>; Tue, 30 May 2017 14:38:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=fgcAQ9B19jVcQY5Z5l9hoa1c+hlNhMu1GHu17eLBEy8=; b=m5TFTaictONYtswPidC6AAZy/50+MANxUZldEcgLfW0edj8n9YjIRNPZnWf6iLBALy m0SCDruHZFX7sPVBvzZndahX2LJptp8uFRpJiLCeNcTyjOjivuO4Bl7Gqrl1T23haayv igcNfyvJ76P6fJ+togHV1dZP4rEDaHzc9NfKTdjJ5j3SLu0s6D13L4iBW2f8bwlLY2dX xdHvgzft2cntH4NXjF34bOtENOhiP7calHDat2GF8BvVlp6qZGD9apdgNO2tvUKsibQX 55+C7QklR277qWz2mDG+xXmNknBXZW3JnRpWK8mbJuim2A1O9oeF1cNNtyNDVDLUmsvq 06SA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=fgcAQ9B19jVcQY5Z5l9hoa1c+hlNhMu1GHu17eLBEy8=; b=J/kqq6f5f1wJn/Y+I+ApDghSrB0nzwhW6oKlsIer7bxeLSTFly+Z/v/BycCU2xPeN6 CpMfZSFEeqFB810jCCT0zfgVKXW0QqFWtONT0mdZx40+fz3xVAMUt7x/185ZltIw47y8 W9sihoznLeXKZW+5edLgtLm6jh8zCuiAkc5NI7AH0mrnqhq2e3wz/eyGOYizJ0oDKdMa fEia73xlGMXI/eW4knMUvxhhFb6AbsyncVZeZjBYEW0fD48WRF93Cszyr/HO44Q8OwDi uXy+mAAapYTn424XuXtkBOGzbBjCCcT187HWUUv7GJkR5MsBOnTtdxgdXlSIAOLDztgh JJjg==
X-Gm-Message-State: AODbwcB/fqltHE2PNmO/W+9R0Czb5uS23vSSH9h4CPUUMkjs0LRq002P 21C7EJvPKP4dMI10u1vaQINUWc5pvRmM
X-Received: by 10.55.16.206 with SMTP id 75mr11342546qkq.81.1496180283975; Tue, 30 May 2017 14:38:03 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.55.76.70 with HTTP; Tue, 30 May 2017 14:38:02 -0700 (PDT)
In-Reply-To: <CAAF6GDcKZj9F-eKAeVj0Uw4aX_EgQ4DuJczL4=fsaFyG9Yjcgw@mail.gmail.com>
References: <CAAF6GDcKZj9F-eKAeVj0Uw4aX_EgQ4DuJczL4=fsaFyG9Yjcgw@mail.gmail.com>
From: Victor Vasiliev <vasilvv@google.com>
Date: Tue, 30 May 2017 17:38:02 -0400
Message-ID: <CAAZdMacpJ-qoQt2pDBjTq6ADwmRKOHXTHDyDTzb+g2gYPvtZzQ@mail.gmail.com>
To: Colm MacCárthaigh <colm@allcosts.net>
Cc: "tls@ietf.org" <tls@ietf.org>
Content-Type: multipart/alternative; boundary="001a1146cb2058f0cc0550c49e47"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/x8PS_3uJQ2Apm_vnWj3qAqwOVXQ>
Subject: Re: [TLS] Security review of TLS1.3 0-RTT
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>, <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tls/>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>, <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 May 2017 21:38:08 -0000

Hi Colm,

Thank you for your analysis!  I appreciate the attention to the security
properties of the 0-RTT requests, as it is the more delicate part of the
protocol.  It took me a while to get through the entire review, and there
are
many things on which I would like to comment, but if I do that for every
one of
them, this reply would be too long.  Hence, I’ll try to concentrate on the
key
points.

1) The security of a system is an end-to-end property.

TLS isn’t a magical “transport security solution”, it provides a very
specific
set of explicit security guarantees to the applications that use it.  It
can be
used as a building block for the secure systems, but at the end of the day,
the
security of a system hinges on the designers’ understanding of the security
contracts provided by individual components.

TLS 1.3 as-is does not remove any of the replay protection guarantees
provided
by TLS 1.2.  However, if the user chooses to waive said protection in order
to
do 0-RTT, they can do that with an API explicitly designed for that purpose.

What I believe is important here is that we are very explicit about the
security
guarantees the application is expected to get from TLS.  PR#1005 currently
does
not answer this question in a satisfying manner, since it says that the
server
SHOULD implement three different mechanisms with very different
properties.  If
we provide any replay protection at all, it needs to be well-defined as a
security property.  That is,

2) We have to define explicitly what security properties TLS 0-RTT has.

There are two obvious levels of protection we can provide:

 A. No protection.  Applications are expected to handle idempotency
themselves,
    which is something they already probably should be doing anyways.

 B. Full protection.  Applications can rely on the assumption that once
data is
    written to TLS stack, it will arrive to the receiver at most once, and
TLS
    will enforce that even in presence of passive and active attackers on
the
    path.

I think all of us agree is that we should go for level B if we can have
it.  The
“if” here is important, since...

3) Full replay protection for 0-RTT is not realistically feasible at TLS
layer,
   because TLS layer is not the right place for that.

You’ve already pointed out the existence of the DKG attack, where the
attacker
forces the client to retry the request with 1-RTT.  In this scenario, the
attacker has a buffered 0-RTT request and is ready to insert it at any
convenient point in time, violating the initial order assumptions.

There is also a wide variety of cases in which a web browser or an HTTP
proxy
can be induced to retry a request, with mechanism often as simple as a TCP
RST.
Even if browser will refuse to do that, most users will “reload” themselves.
This is why the applications *need* to protect against replays end-to-end --
asking users to roll back an extra $1000 transaction is not actually
acceptable.

With respect to the non-browser HTTP applications, I am not sure I am sold
on
the notion of “careful clients”.  My understanding is that you suggest to
treat
0-RTT failure as semantically equivalent to a regular connection failure,
under
the assumption that all of your clients already have to handle that case
properly.  I see how this is supposed to work out, but most systems of this
nature work well because connections succeed most of the time -- and as you
go
more and more towards the path of promising full replay protection, you will
break more and more 0-RTT handshakes.  Both session resumption in general
and
0-RTT in particular are designed with assumption that they can be declined
at
any point, that this is a normal and expected event, and at worst it would
result in a minor performance degradation; “careful clients” violate that
assumption.

As for the non-HTTP applications, that would have to be determined by their
respective application profiles, but I suspect they are going to be less
complex.  For example, I expect 0-RTT for SMTP to be as simple as just
sending
“EHLO mail.example.org\n”, which can be replayed regardless.

My key point here is that the problem of protecting against duplicate
messages
in a system is sufficiently complex that it can only be reliably solved
end-to-end.

(In fact, the original End-to-End Argument paper also considers that case:
“if
the application level has to have a duplicate-suppressing mechanism anyway,
that
mechanism can also suppress any duplicates generated inside the
communication
network, so the function can be omitted from that lower level.” [Saltzer et
al.,
1981] -- here, I suggest to think of the “communication network” as the
system
of HTTP-layer boxes and middleware that modern systems are made of.)

------------

So, I hope we can agree that we cannot really guarantee B.  There is
another,
third level of protection that surprisingly is worth considering:

  C. “Nebulous” replay bound.  0-RTT data can be replayed, but only for some
      finitely bounded number of times.  I initially wanted to call this
“weak
      replay protection”, but that felt too generous.

Normally I would dismiss this as a useful security property due to its
inherent
vagueness, for the same reasons that “your server is too far away for
nanosecond-level timing attacks on Intel CPUs” is a property that no serious
cryptographer would admit in a research paper.  However, you’ve pointed out
some
interesting side channel leaks, which exist even without 0-RTT, but can be
amplified by replaying 0-RTT queries. C, like B, mitigates this, so this is
a
meaningful property to provide.

Let’s talk about what mechanisms are and are not viable for the nebulous
bound
promise.

4) Operational experience on datacenter-local strike registers is negative.

We deployed strike registers in the initial QUIC deployment, and it was an
operational hassle, so once we discovered that they do not provide full
replay
protection (due to all issues outlined above), the cost/benefit analysis
became
decidedly not in their favor.

Our deployment experience also suggests that the negative impact from
limiting
0-RTT to the same datacenter is not negligible.

I feel like you are underestimating the cost and complexity of the
distributed
solutions proposed:
 - RAM might be cheap in a big datacenter, but on a CDN node, the
opportunity
   cost of not using that RAM for something else is higher.
 - There is nothing simple about running distributed storage, given that on
top
   of usual operational requirements (ability to rescale and some degree of
   fault tolerance), you also require an atomic read-and-delete.
 - Strike registers with strong guarantees are probably even worse in that
   regard, since inserting a strike record requires a consistent write.

5) Nebulous replay bounds are possible but very much deployment-dependant.

If your deployment consists of one server that can accept a given ticket,
you
can just run a single strike register in memory.  Not particularly painful,
and
solves the problem for simple deployments.

If you wish to accept tickets across multiple servers, to improve 0-RTT
rate,
now we have the tension between wanting cross-server 0-RTT and avoiding
distributed cross-server state. But if your servers are individually
addressable, we can put the client-presumed server IP into the ClientHello
and
use machine-local strike registers. Servers decline 0-RTT if the
client-supplied
IP does not match.  This gives us global replay protection without the need
to
share state.

If you are using QUIC, any of your IPs is terminated by the same load
balancer,
and the load balancers use connection ID to pick the backend, you will also
normally arrive at the same TLS server provided that the server pool was not
resharded.  At that point, local strike register is sufficient; in fact, if
you
have a time-wait list, you get it almost for free.

Of course, this all relies on resumption arriving to the same server as the
original request.  This is obviously not a property you want in a
large-scale
reliable systems, so as soon as you introduce anycast IP addresses or load
balancing based on parameters that are not bound cryptographically, you’ll
need
a strike register of a scope larger than a machine, so you have to either
make
one (since we only promising nebulous bounds, this can be eventually
consistent).  Or you could just settle for server-local strike registers --
since we’ve already decided that the bounds are nebulous, and I see no
point of
haggling on how many replays is too many.

I feel like the topic of “how do I make the lowest nebulous replay bound
with
the lowest amount of effort” is very large on its own, and is honestly not
that
important for the protocol design.  The client’s ability to specify the
presumed
server’s IP address is nice, and it would require a new extension in the
specification itself, but that would be as far as I would go.  What’s
important
is that we can require implementations to provide nebulous bounds.  We of
course
should, on top of that, require clients and endorse servers to enforce the
application-level requirements on idempotence.

------------

Specific proposals I would like to make for TLS 1.3 draft based on the
discussion above:
 1. Emphasize that 0-RTT data is fundamentally replayable, and can be
replayed
    by the attacker at a convenient point.
 2. Replace the current suggestion to use specific mechanisms with a generic
    requirement to provide the “nebulous bound” for replays (terminology up
to
    your preference).
 3. Leave the exact details of how said bound is achieved up to the
    implementations, but point out that time-based replay protection is not
    sufficient.  It might be worth to keep the current discussion of
particular
    mechanisms as an appendix.
 4. Add a “server IP” extension to the 0-RTT ClientHello, while we are at
it.

  -- Victor.