Re: [Plus] Supporting connection tracking and basic diagnostics: a minimal PLUS

I agree with Ian that the first option is probably easier and more viable as a solution. The second option is attractive in how it makes the association and confirmation values less predictable, and thus more trustworthy. However, the extra state required to debug and reconstruct the flow does seem too high if want wide adoption.

When we are looking at the first option, you mentioned strategies to avoid on-path injection attacks of stops. Is there any attack that could be made regarding the PSN? If the attacker can modify PSN values or inject packets with higher PSN values in either direction, what side effects can we see? I'm assuming that the endpoints themselves will be able to validate the authenticity of the fields, but could a middle box that is trying to monitor the plausible-ness of the PSN get thrown off? Or does it only ever adjust its plausibility window when it sees a PSN acknowledged? (To that end, if an acknowledgment value is faked, what does that do?)

—Tommy

> On Dec 9, 2016, at 7:40 AM, Ian Swett <ianswett@google.com> wrote:
> 
> I have been discussing the first option with people for a while, and everyone has indicated it's relatively simple to implement, potentially even in hardware, and provides quite a bit of extra useful information(RTT, approximate bandwidth) over QUIC's status quo.
> 
> I believe packet numbers being in order makes the first case both easier to implement and provides some valuable extra information(it's easy to infer upstream loss and reordering), so I think it's definitely a better approach.
> 
> Also, given an equal number of bytes spent on the echo, I think they both have about the same fuzziness properties.  For example, if 2 bytes were used in the first case, assuming this echo must be monotonically increasing, an ack or packet gap of over 65,000 packets would have to occur for the signal to be misinterpreted.  If we're getting 65,000 packet gaps, we might be outside the realm of QUIC's wire format to help you diagnose your network issues, to put it mildly. 
> 
> 
> 
> On Fri, Dec 9, 2016 at 6:55 AM, Brian Trammell <ietf@trammell.ch <mailto:ietf@trammell.ch>> wrote:
> Greetings, all,
> 
> We were sitting around a whiteboard yesterday, thinking about (1) how to implement the signals required to drive the state machine outlined in the -plus-statefulness-01 draft, and (2) how to provide on-path diagnosability as we're discussing in the thread on Juho's blog post. Indeed, we think these two use cases are by far the most important for PLUS, in that they provide equivalent-to-TCP support for basic operations and troubleshooting practices for encrypted, UDP-encapsulated transports like QUIC, and one could implement either of them in QUIC directly without much disruption.
> 
> We came up with two implementation sketches, both of which use the same header fields to drive the association and confirmation signals as well as basic measurability.
> 
> 
> The simpler one is pretty much the design Jana alluded to in his previous message. It looks kind of like TCP on the wire, and has basically the same properties.It uses three header fields: a connection ID which appears in packets of both directions and is chosen by the connection initiator; a packet serial number (PSN) whose initial value in each direction is chosen randomly by the sender (like the TCP sequence number, and as under discussion for QUIC) and is incremented by one for each packet sent regardless of content or lack thereof (like the QUIC packet number); and a maximum packet serial echo, which is the highest packet number received by the sender before a packet was sent.
> 
> This provides an association signal on the initial echo of the initiator's PSN, and a confirmation signal on the initiator's initial echo of the responder's PSN -- just like the ack numbers on the SYN/ACK and ACK legs of the TCP handshake. The connection ID here simply provides additional bits of protection against completely off-path attempts to force the state machine to tick over. Note there's no need for SYN or ACK flags -- the association and confirmation signals continually demonstrate that each side has seen packets from the other side.
> 
> A stop signal is considered authentic if it has a correct connection ID and a plausible PSN. This is path-verifiable, but provides no protection against on-path or path-side injection attacks against state on middleboxes (though, note that since even unencrypted headers are authenticated, the endpoint can always detect an attempt to inject a stop). A variation of the mechanism described in statefulness-01 would use a two-way stop signal: a stop is only considered valid along the path if one endpoint sends a stop signal in reply to the other endpoint's stop signal. This would make the path-side injection much harder to perform: in order to remove state on a given middlebox (presuming said middlebox isn't stupid about accepting packets from anywhere), the attacker would need to be able to inject packets on the interfaces facing both endpoints.
> 
> One-point measurement of the PSN and echo streams gives you two-sided RTT and upstream loss and reordering. Coordinated two-point analysis gives you a lot more. As noted, though, two-point analysis is far more complex.
> 
> This approach has the advantage of being extremely simple (it meets the "someone with wireshark could reverse-engineer this in an hour or so" requirement), and very close to what's there in QUIC right now. If implemented with a small number of possible header layouts (preferably one) the wire image could be trivially offloaded to hardware.
> 
> It all of the disadvantages as SEQ/ACK tracking in TCP, and of resistance to off-path meddling that TCP sequence numbers do, though it does give better RTT indication on lossy links (since the echo is always the max packet number, not the highest continuous ack), and two-side stop is better than RST at resisting path-side attacks. The definition of "plausible" next PSN or "plausible" echo when seen at a midpoint device is fuzzy, which could lead to difficult to debug problems with middleboxes that try to drop packets with implausible values (as some state-tracking TCP firewalls do now).
> 
> 
> The second one replaces the packet number and the echo with a token and a nonce. The connection initiator chooses an initial random token and a nonce. The connection responder applies a function to the token and nonce to generate its own token, and chooses a random nonce. Each side generates a new token from the token and nonce it receives each time the token it receives changes. Like the simpler implementation, this one provides continual association and confirmation signals. It also provides one-point measurement of RTT, since the token change is an RTT-clocked signal. The RTT clock would also behave odd ways under high-reordering situations, and additional complexity (which involves remembering a few past tokens, but which we didn't work through) would be needed to fix that.
> 
> The token and nonce could be separate from an additional connection ID, or the connection ID and the token could be the same -- though this would require much more state to be kept everywhere in order to allow the connection ID to be useful for NAT rebinding and injection defense purposes.
> 
> The main advantage over the simpler approach is that the fuzziness around plausible PSNs and echoes goes away, as does the predictability of association and confirmation values after the initial connection establishment. However, it does not provide loss measurement without additional information, and it places more state and processing requirements on endpoints.
> 
> 
> Either of these mechanisms could used together with a path-and-endpoint verifiable, on-path and side-path attack resistant stop signal: during connection setup each endpoint generates a random value, and exposes the result of the application of a hash function to that random value as its stop signal proof. To send a stop signal, it reveals the random value as a stop signal verification (this is the essence of PR#20 on QUIC). Any endpoint or on-path device can verify that the hash of the verification is the proof. Of course, devices that don't keep the proof value (or never saw it) can't verify it. The tradeoff here is additional complexity versus additional resistance against path-side injection meddling with state on middleboxes.
> 
> 
> Simple packet number and echo signaling for association and confirmation signaling with two-way stop seems to us like a reasonable "minimal functionality set" at the moment.
> 
> Thoughts?
> 
> Cheers,
> 
> Brian and Mirja
> 
> 
> _______________________________________________
> Plus mailing list
> Plus@ietf.org <mailto:Plus@ietf.org>
> https://www.ietf.org/mailman/listinfo/plus <https://www.ietf.org/mailman/listinfo/plus>
> 
> 
> _______________________________________________
> Plus mailing list
> Plus@ietf.org
> https://www.ietf.org/mailman/listinfo/plus