[Plus] Supporting connection tracking and basic diagnostics: a minimal PLUS

Brian Trammell <ietf@trammell.ch> Fri, 09 December 2016 12:09 UTC

Return-Path: <ietf@trammell.ch>
X-Original-To: plus@ietfa.amsl.com
Delivered-To: plus@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 9F206129FC6 for <plus@ietfa.amsl.com>; Fri, 9 Dec 2016 04:09:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.798
X-Spam-Status: No, score=-4.798 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-2.896, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id vZQB6F5KbWrG for <plus@ietfa.amsl.com>; Fri, 9 Dec 2016 04:09:07 -0800 (PST)
Received: from trammell.ch (trammell.ch []) by ietfa.amsl.com (Postfix) with ESMTP id A4AC412A516 for <plus@ietf.org>; Fri, 9 Dec 2016 03:55:48 -0800 (PST)
Received: from public-docking-etx-0900.ethz.ch (public-docking-pat-etx-mapped-0019.ethz.ch []) by trammell.ch (Postfix) with ESMTPSA id CC2301A086B for <plus@ietf.org>; Fri, 9 Dec 2016 12:55:15 +0100 (CET)
Content-Type: multipart/signed; boundary="Apple-Mail=_1FD89851-1AF6-4733-BC55-98FAF6B53D5D"; protocol="application/pgp-signature"; micalg=pgp-sha512
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Pgp-Agent: GPGMail
From: Brian Trammell <ietf@trammell.ch>
In-Reply-To: <83EEC537-3486-4864-ACA2-911F570D0C57@trammell.ch>
Date: Fri, 9 Dec 2016 12:55:15 +0100
Message-Id: <15261501-1F9C-41CA-87D0-4E8FCD862044@trammell.ch>
References: <83EEC537-3486-4864-ACA2-911F570D0C57@trammell.ch>
To: plus@ietf.org
X-Mailer: Apple Mail (2.3124)
Archived-At: <https://mailarchive.ietf.org/arch/msg/plus/gt2L9qaXvlukYbjsrVZD-erU5SM>
Subject: [Plus] Supporting connection tracking and basic diagnostics: a minimal PLUS
X-BeenThere: plus@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Discussion of a Path Layer UDP Substrate \(PLUS\) protocol for in-band management of in-network state for UDP-encapsulated transport protocols." <plus.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/plus>, <mailto:plus-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/plus/>
List-Post: <mailto:plus@ietf.org>
List-Help: <mailto:plus-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/plus>, <mailto:plus-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Dec 2016 12:09:11 -0000

Greetings, all,

We were sitting around a whiteboard yesterday, thinking about (1) how to implement the signals required to drive the state machine outlined in the -plus-statefulness-01 draft, and (2) how to provide on-path diagnosability as we're discussing in the thread on Juho's blog post. Indeed, we think these two use cases are by far the most important for PLUS, in that they provide equivalent-to-TCP support for basic operations and troubleshooting practices for encrypted, UDP-encapsulated transports like QUIC, and one could implement either of them in QUIC directly without much disruption.

We came up with two implementation sketches, both of which use the same header fields to drive the association and confirmation signals as well as basic measurability.

The simpler one is pretty much the design Jana alluded to in his previous message. It looks kind of like TCP on the wire, and has basically the same properties.It uses three header fields: a connection ID which appears in packets of both directions and is chosen by the connection initiator; a packet serial number (PSN) whose initial value in each direction is chosen randomly by the sender (like the TCP sequence number, and as under discussion for QUIC) and is incremented by one for each packet sent regardless of content or lack thereof (like the QUIC packet number); and a maximum packet serial echo, which is the highest packet number received by the sender before a packet was sent.

This provides an association signal on the initial echo of the initiator's PSN, and a confirmation signal on the initiator's initial echo of the responder's PSN -- just like the ack numbers on the SYN/ACK and ACK legs of the TCP handshake. The connection ID here simply provides additional bits of protection against completely off-path attempts to force the state machine to tick over. Note there's no need for SYN or ACK flags -- the association and confirmation signals continually demonstrate that each side has seen packets from the other side.

A stop signal is considered authentic if it has a correct connection ID and a plausible PSN. This is path-verifiable, but provides no protection against on-path or path-side injection attacks against state on middleboxes (though, note that since even unencrypted headers are authenticated, the endpoint can always detect an attempt to inject a stop). A variation of the mechanism described in statefulness-01 would use a two-way stop signal: a stop is only considered valid along the path if one endpoint sends a stop signal in reply to the other endpoint's stop signal. This would make the path-side injection much harder to perform: in order to remove state on a given middlebox (presuming said middlebox isn't stupid about accepting packets from anywhere), the attacker would need to be able to inject packets on the interfaces facing both endpoints.

One-point measurement of the PSN and echo streams gives you two-sided RTT and upstream loss and reordering. Coordinated two-point analysis gives you a lot more. As noted, though, two-point analysis is far more complex.

This approach has the advantage of being extremely simple (it meets the "someone with wireshark could reverse-engineer this in an hour or so" requirement), and very close to what's there in QUIC right now. If implemented with a small number of possible header layouts (preferably one) the wire image could be trivially offloaded to hardware.

It all of the disadvantages as SEQ/ACK tracking in TCP, and of resistance to off-path meddling that TCP sequence numbers do, though it does give better RTT indication on lossy links (since the echo is always the max packet number, not the highest continuous ack), and two-side stop is better than RST at resisting path-side attacks. The definition of "plausible" next PSN or "plausible" echo when seen at a midpoint device is fuzzy, which could lead to difficult to debug problems with middleboxes that try to drop packets with implausible values (as some state-tracking TCP firewalls do now).

The second one replaces the packet number and the echo with a token and a nonce. The connection initiator chooses an initial random token and a nonce. The connection responder applies a function to the token and nonce to generate its own token, and chooses a random nonce. Each side generates a new token from the token and nonce it receives each time the token it receives changes. Like the simpler implementation, this one provides continual association and confirmation signals. It also provides one-point measurement of RTT, since the token change is an RTT-clocked signal. The RTT clock would also behave odd ways under high-reordering situations, and additional complexity (which involves remembering a few past tokens, but which we didn't work through) would be needed to fix that.

The token and nonce could be separate from an additional connection ID, or the connection ID and the token could be the same -- though this would require much more state to be kept everywhere in order to allow the connection ID to be useful for NAT rebinding and injection defense purposes.

The main advantage over the simpler approach is that the fuzziness around plausible PSNs and echoes goes away, as does the predictability of association and confirmation values after the initial connection establishment. However, it does not provide loss measurement without additional information, and it places more state and processing requirements on endpoints.

Either of these mechanisms could used together with a path-and-endpoint verifiable, on-path and side-path attack resistant stop signal: during connection setup each endpoint generates a random value, and exposes the result of the application of a hash function to that random value as its stop signal proof. To send a stop signal, it reveals the random value as a stop signal verification (this is the essence of PR#20 on QUIC). Any endpoint or on-path device can verify that the hash of the verification is the proof. Of course, devices that don't keep the proof value (or never saw it) can't verify it. The tradeoff here is additional complexity versus additional resistance against path-side injection meddling with state on middleboxes.

Simple packet number and echo signaling for association and confirmation signaling with two-way stop seems to us like a reasonable "minimal functionality set" at the moment.



Brian and Mirja