[TLS] TLS 1.3 and TCP interactions

David Benjamin <davidben@chromium.org> Fri, 29 May 2020 21:00 UTC

MIME-Version: 1.0
From: David Benjamin <davidben@chromium.org>
Date: Fri, 29 May 2020 16:59:57 -0400
Message-ID: <CAF8qwaBBKvcGMFRxxuVvfBo2Z96mqiEwLfG7H2ZQw0m5+TMnVg@mail.gmail.com>
To: "<tls@ietf.org>" <tls@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000044c29505a6cfbb83"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/hymweZ66b2C8nnYyXF8cwj7qopc>
Subject: [TLS] TLS 1.3 and TCP interactions
Precedence: list

Hi all,

As we’ve been using TLS 1.3 in more scenarios, we’ve encountered some
interesting interactions with TCP. We thought we’d document these and send
a note here. In general, we've found that TLS implementations need to be
wary of post-handshake messages and “unexpected” transport writes. This
unfortunately also includes some server handshake alerts.

TLS APIs

First, some background on APIs for TLS libraries. TLS is often deployed
“transparently” underneath a TCP-based protocol. HTTPS sandwiches TLS
between HTTP and TCP, etc. By and large, reads and writes over TLS are
one-to-one with reads and writes over TCP.

TLS APIs and callers can subtly rely on this. Some libraries expose an
interface like the POSIX sockets API, including non-blocking behavior. If
the transport is blocked on I/O, this is surfaced as an error for the
caller to retry later. Importantly, the library cannot drive transport I/O
on its own. The caller must drive the operation to completion. Other APIs
transform bytes and leave I/O to the application. Any TCP writes triggered
by TLS reads and vice versa are even more directly part of the API surface.
In contrast, sometimes the TLS library can drive I/O itself. For example, a
Go TLS implementation can do background work in a goroutine.

Also note that libraries may predate TLS 1.3, but now enable TLS 1.3 by
default. Those libraries must ensure callers written against TLS 1.2 work
in TLS 1.3.

Post-handshake messages and flow control

TLS 1.2 and TLS 1.3 both have post-handshake messages, but TLS 1.2 only
uses them for renegotiation, which is rare and often disabled. TLS 1.3 has
post-handshake NewSessionTickets. A server will typically send tickets
immediately after the handshake

We initially treated NewSessionTicket as an extra flight in the server
handshake. After receiving the client Finished, the server would write
NewSessionTicket and then signal handshake completion. This kept tickets
working in unmodified server callers.

However, this can lead to a deadlock in some cases. A typical HTTP/1.1
client will first write its request and, only when this is complete, read
the response. If the write exceeds the transport buffer, it will not
complete, and thus the client will not read, until after the server starts
reading. An HTTP/1.1 server caller knows to read first, but only after the
handshake completes. If NewSessionTicket also exceeds the transport buffer,
this strategy means the server won’t complete the handshake until the
client starts reading. Thus the connection deadlocks.

While these messages usually fit in transport buffers, we don’t like
systems with invisible cliffs, particularly deadlocks. Some TLS
implementations embed client certificates in tickets, which can make them
large. Additionally, mock transports in tests sometimes use artificially
small buffers.

Recommendation: We switched to deferring NewSessionTicket to the first
application write by default. Server callers which wish to flush them
earlier may, but they should not block normal I/O on it. TLS
implementations which can drive transport I/O themselves may be able to
instead write them in the background after the handshake. Note, however,
the discussion on “Client-write-only protocols” below.

Separately, in case the server does not do this, TLS 1.3 client
implementations should eagerly read from the socket after the handshake,
even if the caller isn’t expecting application data. However, this is only
possible at a layer which can drive I/O itself. We implement this in
Chromium’s abstractions over BoringSSL, but cannot do so in BoringSSL
itself.

Likewise, while they are unlikely to exceed the transport buffer, TLS
libraries should defer KeyUpdate acknowledgements to the next application
write, possible from the KeyUpdate tweaks
<https://mailarchive.ietf.org/arch/msg/tls/cfw4paCGxI7Fj8QNmj6k1I66VII/>
made early on.

0-RTT and flow control

There is a similar effect in 0-RTT. The client writes the ClientHello and
early data. The server responds with ServerHello..Finished. Depending on
I/O strategy, implementations may hit a similar deadlock if the client
won’t read the ServerHello flight until it has written its early data, but
the server won’t read early data until it has written the ServerHello
flight.

Some factors make this deadlock less of a concern than NewSessionTicket:


   -

   0-RTT is new as of TLS 1.3 and should not be enabled by default. 0-RTT
   clients are already expected to handle extra cases such as 0-RTT rejects
   and replayability. That means libraries can impose extra requirements or
   introduce APIs for these I/O patterns.
   -

   The ServerHello flight does not contain a certificate. It is more likely
   to fit in transport buffers and has more-or-less fixed size.
   -

   For HTTP, RFC8470 only sends GETs over early data by default, which are
   smaller than POSTs and more likely to fit in transport buffers. But this is
   another invisible cliff in the system.


Recommendation: 0-RTT clients should eagerly read from the connection, even
if the application isn’t expecting data yet. This avoids this deadlock and
opportunistically confirms the handshake sooner, so more data is sent over
1-RTT. Note this must be done at a layer which can drive transport I/O
itself.

Client certificate errors and TCP resets

If a server rejects a client certificate, it should end an alert, so the
client can react accordingly. The client may display an error to the user,
clear a cache of certificate decisions, or prompt the user to select a
different certificate.

TLS 1.2 has a two round-trip handshake:

      ClientHello        -------->

                                            ServerHello

                                           Certificate*

                                     ServerKeyExchange*

                                    CertificateRequest*

                         <--------      ServerHelloDone

      Certificate*

      ClientKeyExchange

      CertificateVerify*

      [ChangeCipherSpec]

      Finished           -------->

                                     [ChangeCipherSpec]

                         <--------             Finished

The server has a handshake flight after the client certificate. If it
rejects the certificate, the TLS implementation will write an alert and
then report failure, at which point the caller will close the socket and
discard the connection. From the client’s perspective, this alert comes
instead of ChangeCipherSpec/Finished, during the handshake. It processes
the alert and cleanly fails the handshake, before any application data
flows.

TLS 1.3 reduces the handshake to one round-trip:

       ClientHello       -------->

                                            ServerHello

                                  {EncryptedExtensions}

                                  {CertificateRequest*}

                                         {Certificate*}

                                   {CertificateVerify*}

                                             {Finished}

                         <--------  [Application Data*]

       {Certificate*}

       {CertificateVerify*}

       {Finished}        -------->

There is no server flight after the client certificate. If the server
rejects it, it will again write an alert and report failure. However, now
the client receives it instead of the first application data record. This
is a behavior change to callers, who now must handle client certificate
errors out of read as well as connect.

Moreover, in a client-speaks-first protocol, the error now comes after the
client has already sent its request. This is not only a behavior change but
makes it unreliable over TCP. TCP sees:


   1.

   Client: write(ClientHello);
   2.

   Server: read(ClientHello); write(ServerHello..Finished);
   3.

   Client: read(ServerHello..Finished); write(Certificate..Finished);
   4.

   Server: read(Certificate..Finished); write(bad_certificate); close();
   5.

   Client: write(“GET / ...”); read(???);


Note (4) and (5) happen in parallel. Ideally ??? would be a bad_certificate
alert, but it is sometimes a TCP reset. I’m not a TCP expert, but I believe
this is because the client writes data (“GET / ...”) the server never
consumes. If it arrives at the server TCP stack before close(), the socket
is closed with unread data. If it arrives after close(), the socket
receives data after close(). TCP appears to consider either condition an
application protocol error and triggers a reset shortly after sending the
alert. If the client consumes the alert before its TCP stack sees the
reset, the alert gets through. Otherwise, TCP will not reliably deliver the
alert. Receive buffers may be cleared, data isn’t retransmitted, etc. This
is particularly pronounced on loopback.

Note, if TCP did not reset, we’d deadlock other scenarios. Suppose the
client request did not fit in transport buffers. The client would not read
until that is flushed, but the server will never ACK it. The client would
then never progress to the alert and get stuck. By resetting, TCP
interrupts large client writes with some error, albeit the wrong one.

Recommendation: We do not have a good answer here. The deadlock scenario
means we cannot hope to reliably deliver alerts unless the client eagerly
reads as above. But servers cannot rely on clients to do this, and this is
not sufficient because of the TCP reset.

It seems the only fix is for the server to keep the connection alive for
some time after the failure, maybe draining some bytes from the
application, with some limit before giving up and resetting if the client
seems to be writing a lot of data without ever reading. This would need to
be quite up the stack. We have not implemented this.


TLS False Start (RFC7918) exposes much of the same issues, but, in TLS 1.3,
this flow is not optional. Clients cannot even choose to pay a round-trip
to restore the TLS 1.2 flow because, in the successful case, there is
nothing to wait for. One could imagine an extension that adds an optional
server flight, but a round-trip to fix an error condition is an
unsatisfying trade-off.

Clients could also consider TCP resets to be potential client certificate
errors. This is also unsatisfying as TCP resets are unauthenticated and may
have other causes.

Client-write-only protocols

Edge cases may have other unexpected writes. Consider a protocol where the
server never writes, and thus the client never reads. TLS 1.3 introduces
server NewSessionTicket messages, so we again trigger the deadlock above.
If the client further shuts down the read half of the connection, the
NewSessionTicket message will also trigger the TCP reset behavior above.

Recommendation: Don’t do this. If you must, either the client must read
anyway to pick up the ticket, or the server must not send tickets. A TLS
server library probably should default to deferring tickets to application
write, which would do the latter. Note this means such protocols don’t get
resumption.

Half-RTT data

We haven’t done much with half-RTT data outside of 0-RTT connections, but
half-RTT may risk similar issues. Half-RTT data in a client certificate
connection is sent before the server learns the client identity. That means
the connection is writable before all its properties are established, which
is an awkward API.

One might think to avoid this state by configuring half-RTT data ahead of
time for the library to write during the handshake, immediately after
ServerHello..Finished. A half-RTT HTTP/2 SETTINGS frame doesn’t need a
streaming API, and this avoids exposing the incomplete state to the caller.

However, this also risks flow control issues, depending on sizes and I/O
patterns. If both half-RTT data and client Certificate..Finished are too
large, this design has another flow control deadlock: the server will not
read client Certificate..Finished until writing half-RTT data, and the
client will not read half-RTT data until it has written its flight.

Recommendation: Sadly, it seems half-RTT APIs need to be more complicated
than this.


Hopefully this is helpful to folks.

David

[TLS] TLS 1.3 and TCP interactions David Benjamin
Re: [TLS] TLS 1.3 and TCP interactions Jeremy Harris
Re: [TLS] TLS 1.3 and TCP interactions David Benjamin
Re: [TLS] TLS 1.3 and TCP interactions Watson Ladd
Re: [TLS] TLS 1.3 and TCP interactions Nico Williams
Re: [TLS] TLS 1.3 and TCP interactions Jeremy Harris
Re: [TLS] TLS 1.3 and TCP interactions Viktor Dukhovni
Re: [TLS] TLS 1.3 and TCP interactions Ilari Liusvaara
Re: [TLS] TLS 1.3 and TCP interactions David Benjamin
Re: [TLS] TLS 1.3 and TCP interactions David Benjamin
Re: [TLS] TLS 1.3 and TCP interactions Nico Williams