Re: [Masque] QUIC proxying: feedback on forwarding design and documentation

Hi Martin,

Thanks for the email. A quick set of responses inline.

> On Apr 11, 2023, at 11:44 PM, Martin Thomson <mt@lowentropy.net> wrote:
> 
> Hey all, I'm going through draft-pauly-masque-quic-proxy and trying to reconcile what that says with what I have in my head and I'm coming up with a pretty big mismatch, plus some new ideas.
> 
> # Expectations
> 
> After our discussion at the last meeting, I think that we generally have a fairly good shared understanding here of what the requirements look like, but I'll recap just in case.
> 
> The basic idea is that the client tunnels a QUIC connection through a proxy toward a target server.  The initial exchange is completely tunneled, but once the connection is established, the flow of packets (all of which currently have a QUIC short header) are moved outside of the tunnel onto a rapid forwarding path.
> 
> The server can be completely ignorant of anything that happens between the client and proxy.  It is able to run an unmodified QUIC stack.  (Big disclaimer here: I think that we probably do want to talk about a new extension for servers that would make this process much easier to manage, but more on that below.)
> 
> In theory at least, the proxy could just learn about connection IDs that the client and server use and just forward packets, rewriting only the IP and port on the way through, but this isn't great for many reasons.  Aside from the obvious privacy downsides (for which we agree we would use some sort of encryption), the proxy wouldn't be able to scale out without having some control over connection ID allocation.  If the proxy forwards to multiple servers, using connection IDs would not work in the case that different servers chose the same connection ID.  The same applies to the clients, which (today) tend not to use connection IDs at all, which makes forwarding on the server to client direction very difficult; of course, clients will be aware of their participation in proxying and can ensure that they use amply-sized connection IDs.

This summary seems correct!

> 
> # My Understanding of the Intended Design
> 
> We can break this down into the two directions that packets flow.  In both cases, the additional work always happens between client and proxy; the server remains unmodified.
> 
> ## From the Client to the Server
> 
> In this direction, the server supplies the client with connection IDs (and stateless reset tokens).  But the proxy would prefer if the client use connection IDs that it chose.  So, we have the following flow:
> 
> Server -> Client: a QUIC NEW_CONNECTION_ID frame
> Client -> Proxy (as a capsule): The server wants to use this connection ID.
> Proxy -> Client (as a capsule): For that connection ID, please encapsulate using this connection ID instead.
> 
> I'm not going to list stateless reset tokens in these exchanges for now; in general, you should assume that if you are providing a connection ID, you would also provide a stateless reset token along with it, in case you lose state associated with that connection ID and you want to signal that fact.

This is correct, yes.

> 
> ### Capsules
> 
> What I see in the protocol is a REGISTER_TARGET_CID capsule, which contains approximately what you would expect here.  The client can send that toward a proxy with a copy of the information that the server provided.  It doesn't include the QUIC sequence number, which is totally fine.
> 
> The response to that seems fine.  The connection ID that the client registered is used as a transaction identifier in the ACK_TARGET_CID capsule (the order of these would be fine).  This includes a connection ID for the client to use and a stateless reset token.  The client 
> 
> ### Cleanup
> 
> For cleanup, if the client wants to retire one of these connection IDs, it can send a CLOSE_TARGET_CID capsule.  This includes the server-provided connection ID as a key.
> 
> Use of the server-provided connection ID as a means of correlating items is a bit questionable, but it should work.  I might prefer the use of sequence numbers or similar here.
> 
> Note however that these CLOSE_XX_CID capsules are not acknowledged, so the resource management stuff is a bit loose (as connection ID usage already is in QUIC).
> 
> There is no way for the proxy to inform the client of connection ID limits.  The client might forward these on to the server so that the server doesn't over-commit, or the client could at least limit its usage of connection IDs to fit within the proxy's constraints.  This applies to both what the client uses and in terms of what it advertises; presumably the proxy will want to limit how many entries it needs to track for each connection.
> 

Agreed that the closing mechanisms could be expanded and made more robust, yes.

> ### Connection ID Lengths
> 
> In the client to server direction, there is a catch here that might be worth exploring further.  Forwarding is more efficient if you can ensure that the connection ID length doesn't change.  A proxy might require some minimum number of bytes for its connection IDs, which might be longer than the length chosen by servers.  In that case, it might be nice if the client could negotiate the use of longer connection IDs with the server.  Otherwise, the proxy has a set of choices, none of which are particularly nice: disable this forwarding and tunnel everything, deal with a shorter connection ID for forwarding, or move the payload when forwarding packets.
> 
> An extension to QUIC that requests a longer connection ID might be helpful here.  The proxy could inform the client of its requirements and the client could include this request.  Servers that support that feature might pad their connection IDs out in response to that.
> 
> This protocol would need a way for the proxy to signal its requirements, before the client constructs its QUIC Initial.  That potentially cuts into the low latency connection establishment, but it seems like it might be necessary.  It's something that clients can probably remember though, which means that it might be possible to avoid a round trip.  (Or it could be configured, along with the URL template for the proxy, but that seems fragile to me.)

Yes, having connection ID lengths that match makes the virtual CID swapping much cleaner. For coordinated deployments, like chains of MASQUE proxies, this can be implicit. We currently take the approach of needing to move the payload if the sizes don’t match, but that never happens in practice for the chaining case. Talking to unmodified servers, though, this could be an issue.

I’m fine if there is a way to try to negotiate this, but I also want to make sure that in cases where we do have prior coordination (or get lucky) we don’t need to take the hit of a round trip. 

> 
> ## From the Server to the Client
> 
> Here, the client is the source of connection IDs, so the flow would look like this:
> 
> Client -> Proxy: I want to use this connection ID.
> Proxy -> Client: Please tell the server to use this connection instead.
> Client -> Server: NEW_CONNECTION_ID (with the choice from the proxy)
> 
> This is where my understanding and the documented designs diverge.
> 
> In the other direction, the ACK_TARGET_CID frame included a connection ID that was already decided.  But here the client is supposed to send REGISTER_CLIENT_CID with two connection IDs in it.  That doesn't match the above flow.  It looks like the client is expected to put in the two connection ID fields, which means that the client chooses connection IDs for the proxy?  That interpretation is consistent with the shape of the ACK_CLIENT_CID frame, which is effectively just an acknowledgment.
> 
> I think that this aspect of the design is not right.  The proxy should be responsible for choosing the connection IDs that the server uses in packets directed toward the proxy.
> 
> Also, while the client informs the proxy of a stateless reset token, if the proxy loses state, it appears as though there is no way to tell the server about this.  That's probably fine, because the server is ignorant of the special status of the proxy and so it would give the proxy the ability to destroy the entire connection state at the server, which we might not want.  However, this case is not explained in the document.
> 

This indeed is where our approaches diverge.

A few points:
- In order to do “fast open” of the tunnel, the QUIC initial is sent alongside the CONNECT(connect-udp) requests to the proxy, and alongside the capsules used to set up the client CID. If the client needed to wait for the proxy to decide on a CID, it would lose a round trip, or else it would need to tell the end server to switch client CIDs after the initial handshake and retire the original. The latter case is doable, but adds complexity and also makes these proxied connections work differently than most QUIC connections.
- The flow from server to proxy is not really a QUIC connection, but a tunneled UDP socket. I don’t think of that side as having anything like a QUIC load balancer, etc, such that the proxy would need to be controlling the CIDs.
- The client can’t guarantee that anything will be forwarded by the proxy vs tunneled. Since the server can’t know if a packet is going to be tunneled or forwarded, it’s not a in a good position to know which CIDs to use. The packets from the server to the proxy might get forwarded, or they might get tunneled, etc. So when the client chooses two CIDs, it is choosing one for the packets that the server sends to the client, and one for the proxy to send to the client in case of forwarding.

> ### Connection ID Length
> 
> In the server to client direction, again the proxy might have minimum length constraints for connection IDs.  At least in this case it can just tell the client how big connection IDs need to be.  In a lot of cases, the client is able to bind a unique port for the forwarded connection, so it can just make connection IDs as big as necessary.
> 
> However, I see no place where this item might be communicated from proxy to client, just like the above.

Indeed, in the current design these CIDs are both determined by the client, and can have an easily coordinated size.

> 
> ## Path Migration (new idea)
> 
> The proxy should also be in a position to choose a source address.  We should allow the proxy to change the address (and port) it uses for a flow once it moves from tunneling to forwarding.  While CONNECT-UDP might have stricter constraints on operation, a QUIC connection can use connection IDs, which might allow for greater density than strictly CONNECT-UDP.  In that case, it might be useful to treat a switch to forwarding mode as a strict, one-way operation.
> 
> That is, after forwarding starts, tunneling could cease.  The HTTP stream might remain active for control purposes, like these capsules, but it might be completely unusable for tunneling.  That would allow the proxy to reclaim resources.  If the proxy is permitted to forward on a new IP and port, this will look like connection migration to the proxy.  It might also allow for better separation of functions, with dedicated endpoints that manage forwarding.
> 
> That suggests a few things:
> 
> 1. In the client to server direction, a switch to forwarding might entail a connection migration.  This might look like a NAT rebinding in the best case, at least to the server, but it can be a little bit better than that for the client.  The client can follow logic similar to the server preferred address and do a clean path migration to a new IP and port at the proxy.
> 
> 2. In the server to client direction, the server can just switch over based on the apparent address of the client changing.  Nothing special is needed there, though it might trigger path validation (which, again, should just work).  Having the client receive packets from a new proxy address is part of the path migration, which (once again) should work fine.
> 
> 3. In order for this to work, the proxy might like to tell the client where to send packets for forwarding.  Maybe it can't just rely on them arriving at the same IP and port as the tunnel itself (though that might be a reasonable default).

If we have these explicit switches, then I agree that the CID negotiation could work like you mention above.

I’m a bit concerned about the client switching to a different socket for forwarding to the proxy, especially if the H3 “control” session still exists over the original socket to the proxy. Having it share fate with the tunnel socket itself seems preferable.
> 
> ## Document Feedback
> 
> The presentation of this flow in the draft is incredibly hard to follow.  The way that the draft is structured around formats and not exchanges makes it very hard to understand.  I found the mappings section to be completely unhelpful here.  Perhaps that is just how my brain works, but the entirely of Sections 2.1 through 2.3 make no sense to me.
> 
> For instance: "Each client <-> proxy HTTP stream MUST be mapped to a single target-facing socket" -- I think that this assumes a great deal about what a socket is.  If this is a connected UDP socket, which binds local address and port plus remote (target) address and port, then I can maybe see how that might play out.  But these are implementation details and not explained adequately.

The terminology is defined in 1.2:

Socket: a UDP 4-tuple (local IP address, local UDP port, remote IP address, remote UDP port). In some implementations, this is referred to as a "connected" socket.
Client-facing socket: the socket used to communicate between the client and the proxy.
Target-facing socket: the socket used to communicate between the proxy and the target.

Essentially, the document uses “socket” as a shorthand for “4-tuple”, since that’s awkward to use over and over. If there’s a better term, happy to switch to it.

Thanks,
Tommy
> 
> -- 
> Masque mailing list
> Masque@ietf.org
> https://www.ietf.org/mailman/listinfo/masque