[tcpinc] Cutting tcpcrypt latency and state complexity

Bob Briscoe <bob.briscoe@bt.com> Mon, 27 October 2014 23:43 UTC

Message-ID: <201410272341.s9RNfsD8017263@bagheera.jungle.bt.co.uk>
Date: Mon, 27 Oct 2014 23:41:55 +0000
To: Andrea Bittau <bittau@cs.stanford.edu>, David Mazieres expires 2014-12-28 PST <mazieres-3uy2t3qwjnnucer4x2aruwdufi@temporary-address.scs.stanford.edu>, dm@uun.org, Mark Handley <m.handley@cs.ucl.ac.uk>, dabo@cs.stanford.edu, mike@shiftleft.org, sqs@cs.stanford.edu, draft-bittau-tcpinc@tools.ietf.org
From: Bob Briscoe <bob.briscoe@bt.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="=====================_273600153==.ALT"
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpinc/MPpUjJXzqI4Jvw6q3nh32B4TN4A
Cc: tcpinc@ietf.org
Subject: [tcpinc] Cutting tcpcrypt latency and state complexity
Precedence: list

tcpcrypt coauthors (Andrea, Dan, Mike, Mark, David, Quinn)

Thank you very much for draft-bittau-tcpinc-tcpcrypt-00. Very clearly 
and comprehensively specified.

Currently the opportunistic encryption defined in tcpinc-tcpcrypt-00 
adds handshaking latency to each connection. This is likely to make 
opportunistic encryption unacceptable to everyone except the most 
paranoid individuals - who want privacy whatever the performance cost.

The proposals below reduce the tcpcrypt latency (before it can send 
encrypted user-data) from two round trips to one for a new session 
and from one round to zero for a resumed session.

They also remove the HELLO-SENT and S-MODE states and a number of 
state transitions, which I think considerably simplifies the 
otherwise complex state machine of tcpcrypt. Indeed, by separating 
out the grunt work of framing options within the payload, traversing 
middleboxes and reliable ordered option delivery, it is possible that 
tcpcrypt's dependency on the TCP state machine will be greatly 
reduced or possibly eliminated.

However, this separation makes tcpcrypt depend on other work 
(draft-briscoe-tcpm-inner-space-01). I am trying to get it adopted 
onto the agenda of the IETF tcpm WG as quickly as possible. But this 
breaks one of my own rules, "Avoid one research item depending on 
another". For now, let's just break my rules and see where we get to...

1) Root Causes of Tcpcrypt's Latency Problems

1.1) Options in Payload

The first underlying problem is that the keying material in the 
tcpcrypt INIT1 and INIT2 options makes them too big to fit in the TCP 
header. So during the tcpcrypt setup phase, tcpcrypt places these two 
options in the payload.

But a client can't put control messages in the payload until it knows 
it's talking to a server that understands tcpcrypt. Otherwise, on 
completion of the 3WHS, a legacy server would send the tcpcrypt 
options to the app.

This forces a tcpcrypt client to consume a round to check that the 
server supports tcpcrypt before it can use INIT1 & INIT2. This means 
tcpcrypt still needs another round after the 3WHS before it can start 
sending data - it can't finish the tcpcrypt handshake within the 3WHS.

1.2) No way to signal a boundary between set-up and user-data

The second change that tcpcrypt needs is a boundary within the TCP 
payload so it can transition from tcpcrypt setup options to encrypted 
data within the same segment. Then we will not need the rule that 
says "Implementations MUST NOT include application data in TCP 
segments during setup", which adds an extra round.

TLS has always been placed within the TCP payload, and it's always 
had its record structure and the 'Finished' message to signal this 
boundary - as the end of setup and start of ciphered user-data. So in 
theory TLS is already structured to make it easy to send the same 
messages but in fewer rounds.

This is why /in theory/ it was possible to cut latency in the three 
low latency variants of TLS that I know of, False Start, Snap Start 
and MinimaLT. We've written a summary of each and drawn timing 
diagrams of TLS with and without False Start in Section III of this 
Survey paper:
<http://riteproject.eu/?attachment_id=735>

However, as we explain in the survey, the theory didn't work out in 
practice. For instance the different message timings of False Start 
interacted non-deterministically with a number of SSL termination 
boxes used on the edge of data centres. This prevented widespread 
deployment of False Start. But in the paper we explain an exception 
that can allow it to be deployed in certain deployments (e.g. for SPDY).

Tcpcrypt is fortunate in not having to deal with such legacy 
middleboxes. So it's worth restructuring it now, while its design is 
still fluid.

1.3) Unnecessary round to resume a tcpcrypt session

Tcpcrypt consumes a round trip to resume a session. I can't see any 
particular reason why it needs this round, other than complacency 
because the traditional TCP 3WHS consumes a round anyway, so why try 
to start any faster?

I believe it's easy to cut this round of latency out. Then, now that 
TCP Fast Open is available, tcpcrypt could resume sending ciphered 
user-data in zero rounds.

2) Solution

2.1) How to stop a legacy TCP passing TCP Options within the Payload 
of a SYN to the app

The approach in draft-briscoe-tcpm-inner-space (or 
draft-touch-tcpm-tcp-syn-ext-opt)
* makes space for extra TCP options beyond the Data Offset of a SYN 
and sets a boundary between these extra options and any payload data.
* ensures legacy TCP servers don't pass these extra TCP options to the app

Similarly, the approach in draft-briscoe-tcpm-inner-space (or 
draft-ietf-tcpm-tcp-edo) makes space for extra TCP options in 
segments after the first SYN.

In the rest of this email, I'll use only Inner Space, for all the 
following reasons:
- to be concrete;
- because it should traverse most middleboxes, including connection 
splitters, resegmentation, and option strippers;
- because it gives you a nice reliable ordered property for TCP 
Options, which would otherwise require tcpcrypt to have to fix this problem;
- because I've been using tcpcrypt as one of the main use-cases for 
Inner Space, altho it's also general enough for other TCP options.
- because the alternative combination of tcp-syn-ext-opt and tcp-edo 
creates extra space on the SYN and on data segments, but neither 
address the SYN/ACK in a way that's robust to middleboxes.


2.2) How to Reduce Tcpcrypt Latency and Complexity.

The design below assumes a tcpcrypt+ spec that either REQUIRES Inner 
Space or it uses a similar approach in a tcpcrypt-specific 
implementation. Then the SYN can include large options like INIT1; 
and encrypted user-data can be included in the same segment as 
options like INIT2, NEXTK1 or NEXTK2. The proposed changes are in two 
main parts:
- New Session
- Resumed Session

2.2.1 New session

Recap from draft-bittau-tcpinc-tcpcrypt-00

            C -> S:  HELLO
            S -> C:  PKCONF, pub-cipher-list
            C -> S:  INIT1, sym-cipher-list, N_C, pub-cipher, PK_C
            S -> C:  INIT2, sym-cipher, KX_S

With Inner Space, there's no need for the HELLO or PKCONF - the whole 
round is redundant. HELLO can be implicit in INIT1. And INIT1 can be 
extended (I'll call it INIT1+) to optimistically offer a choice of 
pub-cipher-list as well as offering a public key for its preferred 
first pub-cipher in the list, optimistically hoping that the server 
supports it.

Optimistic choice of cipher suite is not a new idea - for instance 
False Start uses the same approach.

If the server accepts the client's choice of pub-cipher, it can 
complete the INIT2 within the first round.

                  C -> S:  INIT1+, pub-cipher-list, sym-cipher-list, N_C, PK_C
                  S -> C:  INIT2+, sym-cipher, KX_S; MAC<m>; data<...>

In this case, tcpcrypt can be ready to encrypt data after one round 
or even after half a round. Obviously, the server won't usually have 
any data to send until the client has sent an encrypted request 
(after this first round).

If the server does not accept the first pub-cipher in the client's 
pub-cipher-list, the server can take over the task of sending INIT1. 
Then the client can respond with INIT2, followed directly by 
encrypted data in the same packet.

                  C -> S:  INIT1+, pub-cipher-list, sym-cipher-list, N_C, PK_C
                  S -> C:  INIT1+, pub-cipher, sym-cipher, N2_C, PK2_C
                  C -> S:  INIT2+, KX_S; MAC<m>; data<...>

INIT1+ will have to be qualified by any one of the following:
- regular
- app-support
- app-mandatory
because these modifiers could previously be applied to HELLO, which 
is now implicit in INIT1+

Similarly, because INIT2+ subsumes PKCONF, it will have to be 
qualified with either of:
- regular
- app-support


2.2.1.1 New DDoS Attack and a Defence

An army of clients could flood S with INIT1+'s containing an 
unacceptable pub-cipher at the head of the list, trying to make S 
take the heavier role of verifier rather than encrypter for the 
asymmetric key setup crypto.

If under stress, S can strictly prioritise those requests with an 
acceptable pub-cipher at the head of the list, and queue up requests 
with unacceptable pub-cipher-lists. Then, assuming a client will not 
want to risk being confused as attack traffic, it will be worth it 
picking a pub-cipher that the server will accept.

One could reintroduce PKCONF to the protocol, to allow S to return a 
PKCONF. But why reintroduce complexity to the protocol, when the 
above approach is sufficient?

2.2.2 Session Resume

Recap from draft-bittau-tcpinc-tcpcrypt-00

                           A -> B:  NEXTK1, SID[i]
                           B -> A:  NEXTK2

With Inner Space, there's now no need to wait for one round before A 
or B can encrypt user-data. They can send encrypted data straight away:

                           A -> B:  NEXTK1, SID[i]; MAC<m>; data<...>
                           B -> A:  NEXTK2; MAC<m>; data<...>

If B declines state re-use (equivalent to example 5 in 
draft-bittau-tcpinc-tcpcrypt-00), it can discard the encrypted data 
and return an INIT1+. Then A can re-transmit the data with the newly 
negotiated keys, as follows:

                           A -> B:  NEXTK1, SID[i]; MAC<m>; data<...>
                           B -> A:  INIT1+; pub-cipher, sym-cipher, N2_C, PK2_C
                           A -> B:  INIT2+, KX_S; MAC<m>; data<...>

3) Next Steps

Either
* tcpcrypt could mandate use of Inner Space as it stands
* or tcpcrypt could repurpose Inner Space, but as a tcpcrypt-specific design.

IMO, the first alternative makes sense while the second would be 
silly - requiring implementation of the same capability in two 
different ways in the same stack, with two lots of debugging to do, 
etc. However, I should add that the ideas behind Inner Space have 
only been around since the end of Jul'14 and tcp-edo is still young 
(born Apr'14) even tho it's the oldest. So I doubt there will be an 
implementation until the designs settle into some degree of stability.

Focusing on Inner Space in particular, I'm trying to be ambitious. I 
believe it will be able to encrypt not just the payload of a SYN, but 
also the tcpcrypt options in the same payload that control the 
encryption of itself. That needs:
a) zero latency key agreement (as above for a session resume),
b) TCP Option processing rules to trigger a second pass to look for 
more TCP Options once the payload is decrypted.

Even if I have to give up on a) for an initial connection, the 
inner-space draft already defines b) and it's also still usable when 
a connection resumes with TCP Fast Open.


Cheers



Bob




________________________________________________________________
Bob Briscoe,                                                  BT

[tcpinc] Cutting tcpcrypt latency and state compl… Bob Briscoe