Re: [Plus] Supporting connection tracking and basic diagnostics: a minimal PLUS

I have been discussing the first option with people for a while, and
everyone has indicated it's relatively simple to implement, potentially
even in hardware, and provides quite a bit of extra useful information(RTT,
approximate bandwidth) over QUIC's status quo.

I believe packet numbers being in order makes the first case both easier to
implement and provides some valuable extra information(it's easy to infer
upstream loss and reordering), so I think it's definitely a better approach.

Also, given an equal number of bytes spent on the echo, I think they both
have about the same fuzziness properties.  For example, if 2 bytes were
used in the first case, assuming this echo must be monotonically
increasing, an ack or packet gap of over 65,000 packets would have to occur
for the signal to be misinterpreted.  If we're getting 65,000 packet gaps,
we might be outside the realm of QUIC's wire format to help you diagnose
your network issues, to put it mildly.

On Fri, Dec 9, 2016 at 6:55 AM, Brian Trammell <ietf@trammell.ch> wrote:

> Greetings, all,
>
> We were sitting around a whiteboard yesterday, thinking about (1) how to
> implement the signals required to drive the state machine outlined in the
> -plus-statefulness-01 draft, and (2) how to provide on-path diagnosability
> as we're discussing in the thread on Juho's blog post. Indeed, we think
> these two use cases are by far the most important for PLUS, in that they
> provide equivalent-to-TCP support for basic operations and troubleshooting
> practices for encrypted, UDP-encapsulated transports like QUIC, and one
> could implement either of them in QUIC directly without much disruption.
>
> We came up with two implementation sketches, both of which use the same
> header fields to drive the association and confirmation signals as well as
> basic measurability.
>
>
> The simpler one is pretty much the design Jana alluded to in his previous
> message. It looks kind of like TCP on the wire, and has basically the same
> properties.It uses three header fields: a connection ID which appears in
> packets of both directions and is chosen by the connection initiator; a
> packet serial number (PSN) whose initial value in each direction is chosen
> randomly by the sender (like the TCP sequence number, and as under
> discussion for QUIC) and is incremented by one for each packet sent
> regardless of content or lack thereof (like the QUIC packet number); and a
> maximum packet serial echo, which is the highest packet number received by
> the sender before a packet was sent.
>
> This provides an association signal on the initial echo of the initiator's
> PSN, and a confirmation signal on the initiator's initial echo of the
> responder's PSN -- just like the ack numbers on the SYN/ACK and ACK legs of
> the TCP handshake. The connection ID here simply provides additional bits
> of protection against completely off-path attempts to force the state
> machine to tick over. Note there's no need for SYN or ACK flags -- the
> association and confirmation signals continually demonstrate that each side
> has seen packets from the other side.
>
> A stop signal is considered authentic if it has a correct connection ID
> and a plausible PSN. This is path-verifiable, but provides no protection
> against on-path or path-side injection attacks against state on middleboxes
> (though, note that since even unencrypted headers are authenticated, the
> endpoint can always detect an attempt to inject a stop). A variation of the
> mechanism described in statefulness-01 would use a two-way stop signal: a
> stop is only considered valid along the path if one endpoint sends a stop
> signal in reply to the other endpoint's stop signal. This would make the
> path-side injection much harder to perform: in order to remove state on a
> given middlebox (presuming said middlebox isn't stupid about accepting
> packets from anywhere), the attacker would need to be able to inject
> packets on the interfaces facing both endpoints.
>
> One-point measurement of the PSN and echo streams gives you two-sided RTT
> and upstream loss and reordering. Coordinated two-point analysis gives you
> a lot more. As noted, though, two-point analysis is far more complex.
>
> This approach has the advantage of being extremely simple (it meets the
> "someone with wireshark could reverse-engineer this in an hour or so"
> requirement), and very close to what's there in QUIC right now. If
> implemented with a small number of possible header layouts (preferably one)
> the wire image could be trivially offloaded to hardware.
>
> It all of the disadvantages as SEQ/ACK tracking in TCP, and of resistance
> to off-path meddling that TCP sequence numbers do, though it does give
> better RTT indication on lossy links (since the echo is always the max
> packet number, not the highest continuous ack), and two-side stop is better
> than RST at resisting path-side attacks. The definition of "plausible" next
> PSN or "plausible" echo when seen at a midpoint device is fuzzy, which
> could lead to difficult to debug problems with middleboxes that try to drop
> packets with implausible values (as some state-tracking TCP firewalls do
> now).
>
>
> The second one replaces the packet number and the echo with a token and a
> nonce. The connection initiator chooses an initial random token and a
> nonce. The connection responder applies a function to the token and nonce
> to generate its own token, and chooses a random nonce. Each side generates
> a new token from the token and nonce it receives each time the token it
> receives changes. Like the simpler implementation, this one provides
> continual association and confirmation signals. It also provides one-point
> measurement of RTT, since the token change is an RTT-clocked signal. The
> RTT clock would also behave odd ways under high-reordering situations, and
> additional complexity (which involves remembering a few past tokens, but
> which we didn't work through) would be needed to fix that.
>
> The token and nonce could be separate from an additional connection ID, or
> the connection ID and the token could be the same -- though this would
> require much more state to be kept everywhere in order to allow the
> connection ID to be useful for NAT rebinding and injection defense purposes.
>
> The main advantage over the simpler approach is that the fuzziness around
> plausible PSNs and echoes goes away, as does the predictability of
> association and confirmation values after the initial connection
> establishment. However, it does not provide loss measurement without
> additional information, and it places more state and processing
> requirements on endpoints.
>
>
> Either of these mechanisms could used together with a path-and-endpoint
> verifiable, on-path and side-path attack resistant stop signal: during
> connection setup each endpoint generates a random value, and exposes the
> result of the application of a hash function to that random value as its
> stop signal proof. To send a stop signal, it reveals the random value as a
> stop signal verification (this is the essence of PR#20 on QUIC). Any
> endpoint or on-path device can verify that the hash of the verification is
> the proof. Of course, devices that don't keep the proof value (or never saw
> it) can't verify it. The tradeoff here is additional complexity versus
> additional resistance against path-side injection meddling with state on
> middleboxes.
>
>
> Simple packet number and echo signaling for association and confirmation
> signaling with two-way stop seems to us like a reasonable "minimal
> functionality set" at the moment.
>
> Thoughts?
>
> Cheers,
>
> Brian and Mirja
>
>
> _______________________________________________
> Plus mailing list
> Plus@ietf.org
> https://www.ietf.org/mailman/listinfo/plus
>
>