Benjamin Kaduk's Discuss on draft-ietf-quic-transport-33: (with DISCUSS and COMMENT)
Benjamin Kaduk via Datatracker <noreply@ietf.org> Wed, 06 January 2021 21:45 UTC
Return-Path: <noreply@ietf.org>
X-Original-To: quic@ietf.org
Delivered-To: quic@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 89A2F3A12BA; Wed, 6 Jan 2021 13:45:09 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-quic-transport@ietf.org, quic-chairs@ietf.org, quic@ietf.org, quic-chairs@ietf.org, lars@eggert.org
Subject: Benjamin Kaduk's Discuss on draft-ietf-quic-transport-33: (with DISCUSS and COMMENT)
X-Test-IDTracker: no
X-IETF-IDTracker: 7.24.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <160996950953.25754.14270013028683006869@ietfa.amsl.com>
Date: Wed, 06 Jan 2021 13:45:09 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/Chlsaiwv0w_qQ_rk7nbUzm_GU7k>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jan 2021 21:45:10 -0000
Benjamin Kaduk has entered the following ballot position for draft-ietf-quic-transport-33: Discuss When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html for more information about IESG DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-quic-transport/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- This is very much a "discuss discuss", and I am not strongly convinced that there is a problem here, but if there is a problem it is a fairly big one. This document defers creation of a downgrade protection mechanism for version negotiation; after all, if there is only one version in existence, there is nothing to negotiate. However, an effective downgrade protection mechanism requires support from all potentially affected parties in order to be reliable, so some careful thought is in order. If we limit ourselves to a mindset where QUIC versions are infrequent undertakings brought about by standards action (i.e., we don't have to worry until a "v2" exists), then deferring seems to be okay (but part of the Discuss is to confirm that my reasoning is valid). The main goal of downgrade protection is to be able to distinguish a node that only supports v1 (or in general, any single version, or set of versions that only has one overlapping version with the peer) from one that supports a different shared version but was tricked by an attacker into using v1 when it otherwise would have used a different version. I'll call that different version v2 for clarity. However, if the peer only supports v1, there's nothing to distinguish and nothing to negotiate; it suffices to ensure that all nodes that are capable of v2 support the downgrade protection scheme. That is, an attacker can only change the negotiated protocol version (as opposed to just causing connection failure, which can be done in many other ways) if there is some shared version other than v1 that would have been negotiated in the absence of the attacker. So, if v2 is definitly going to be defined+implemented before other versions, and all nodes that support v2 support downgrade protection, we are guaranteed that in any case where two peers would negotiate v2 in the absence of an attack, both peers support the downgrade protection mechanism and thus that mechanism will be effective in the face of an attack. Peers that don't support the mechanism only do v1 and so there is no downgrade possible when they are participating in the connection. (We would, of course, still need to be confident that we could define such a downgrade protection scheme in a backwards-compatible manner, though this seems like a fairly low bar given the extensibility provided by transport parameters and frame types.) However, it's not clear to me that this assumption holds that v2 is going to be the next version and that every node that implements v1 and some other version will definitely implement v2. In particular, we currently have a very open registration policy for new versions, and there may be a desire to have some custom version of QUIC, perhaps that only has a small difference from v1, and furthermore a desire to use that custom version when available but be able to use v1 when not available. There might be multiple such new versions in development in parallel, with no clear "first new version" tasked with the responsibility to develop a downgrade protection mechanism for global use. The interaction between multiple competing downgrade-protection mechanisms seems likely to become quite messy quite quickly, so I am inclined to see "make each non-standards-track version specify their own downgrade protection" as a non-starter. I think that the lack of a secure downgrade protection mechanism is fundamentally incompatible with an open procedure for creating new versions while claiming that the protocol is a secure protocol. While it would not be a pleasant choice, I think we might be forced to require standards action for new QUIC versions until we have a single global downgrade protection mechanism defined. Or perhaps I misunderstand the ecosystem that we are trying to produce, or am making erroneous assumptions. I'd love to hear more about how the WG decided to proceed with the current formulation, especially with regard to what consideration was given to non-standards-track new versions. The above notwithstanding, I support this protocol and I expect to change my position to Yes once this point is resolved in some manner. ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- This protocol is nicely done and the document well-written and well-organized. It's really exciting to see what we can do with a nearly clear slate and the accumulated modern knowledge about protocol design. Congratulations and thanks to all who contributed! Special thanks (in advance) to the chairs for helping translate my comments into github issues. I have some editorial suggestions that I put up at https://github.com/quicwg/base-drafts/pull/4597 . Abstract, Section 1 My understanding is that "exemplary" typically is used to refer to not just any example of the topic in question, but to one that is the pinnacle of achievement or the best of its kind. Do we intend to make such a claim about the congestion control algorithm in [QUIC-RECOVERY]? Section 1.3 x (E) ...: Indicates that x is repeated zero or more times (and that each instance is length E) Does "repeated zero or more times" mean that there is zero or more instances in total, or one or more instances in total? (I assume the latter, and would suggest additional wording if the former is what was intended.) I am specifically curious about the application to the Ack Range field of the ACK Frame, since it seems like early on in a connection it is not always possible to construct a valid Gap field, but will not duplicate the question there. Section 2.4 * write data, understanding when stream flow control credit (Section 4.1) has successfully been reserved to send the written data; (side note) I note that "flow control credit has been reserved" is different than "has been ACKd by the peer". I guess the expectation is that if such information is needed, an application-layer ACK should be used, since QUIC ACKs can be sent before the peer application has consumed the received data? Section 3.1 I'm not sure why "Peer Creates Bidirectional Stream" appears on two transitions in the figure. The prose suggests that the copy on the Ready->Send transition is erroneous. Section 4.1 QUIC employs a limit-based flow-control scheme where a receiver advertises the limit of total bytes it is prepared to receive on a given stream or for the entire connection. [...] Should I be reading some distinction into "limit-based" (here) vs "credit-based" (earlier) flow control? Section 4.2 When a sender receives credit after being blocked, it might be able to send a large amount of data in response, resulting in short-term congestion; see Section 6.9 in [QUIC-RECOVERY] for a discussion of how a sender can avoid this congestion. No such section exists in the -33; perhaps §7.7 is intended? Section 5.1.1 An endpoint SHOULD ensure that its peer has a sufficient number of available and unused connection IDs. Endpoints advertise the number of active connection IDs they are willing to maintain using the active_connection_id_limit transport parameter. An endpoint MUST NOT provide more connection IDs than the peer's limit. [...] IIUC, the reason that a negotiated limit is needed here is not exactly for the storage of the connection ID values themselves (though the requirement to explicitly use RETIRE_CONNECTION_ID makes it not as simple as it might be), but rather due to the requirement to recognize the associated stateless reset tokens. Is that correct? If so, perhaps it could be mentioned as a factor, here. Section 5.2.2 Servers SHOULD respond with a Version Negotiation packet, provided that the datagram is sufficiently long. This SHOULD seems redundant with the first sentence of the section? Packets with a supported version, or no version field, are matched to a connection using the connection ID or - for packets with zero- length connection IDs - the local address and port. These packets are processed using the selected connection; otherwise, the server continues below. (side note) Packets with a short header do not indicate the connection ID length (or version). I think we admit the possibility that a server could advertise a zero-length connection ID to some but not all clients; does that imply that a lookup by address/port in the local connection database would be needed to determine whether such a short-header packet would have a zero-length connection ID or not? Section 6.2 A client MUST discard any Version Negotiation packet if it has received and successfully processed any other packet, including an earlier Version Negotiation packet. [...] It seems like this requirement might be too strict, in that it could prevent otherwise successful communication when a limited on-path attacker injects a Version Negotiation packet. Section 7.2 When an Initial packet is sent by a client that has not previously received an Initial or Retry packet from the server, the client populates the Destination Connection ID field with an unpredictable value. This Destination Connection ID MUST be at least 8 bytes in length. [...] My usual policy on seeing a random nonce shorter than 128 bits is to ask for explanation of why the shorter value is safe for the use it is being put to. (How bad would it be to bump this to 16?) Once a client has received a valid Initial packet from the server, it MUST discard any subsequent packet it receives with a different Source Connection ID. This seems over-zealous as written, since it would seem to prevent the server from ever using a new Source Connection ID on that connection, even in response to a client migration event. Is it intended to be scoped to just Initial (and Handshake?) packets as is done for the server in the following paragraph? Section 7.4 Application protocols can recommend values for transport parameters, such as the initial flow control limits. However, application protocols that set constraints on values for transport parameters could make it impossible for a client to offer multiple application protocols if these constraints conflict. Should we go a step further and forbid application protocols from making such requirements? Section 7.5 Once the handshake completes, if an endpoint is unable to buffer all data in a CRYPTO frame, it MAY discard that CRYPTO frame and all CRYPTO frames received in the future, or it MAY close the connection with a CRYPTO_BUFFER_EXCEEDED error code. [...] How future-proof is this? NewSessionTicket is "safe" to discard in that doing so has no negative consequences for the current connection (it may indicate that previously received tickets have been invalidated, but that is also fairly benign and can happen for other reasons), but in general post-handshake CRYPTO frames are not constrained in what TLS messages they can carry, including any TLS messages defined in the future. I am not sure that we can effectively require all future TLS post-handshake messages to be safe to discard. Section 8.1.3 Clients that want to break continuity of identity with a server MAY discard tokens provided using the NEW_TOKEN frame. [...] This seems more like a SHOULD than a MAY, given the precondition. A server SHOULD encode tokens provided with NEW_TOKEN frames and Retry packets differently, and validate the latter more strictly. [...] Didn't we already have a MUST-level requirement for being able to tell them apart, in §8.1.1? Section 8.1.4 An address validation token MUST be difficult to guess. Including a large enough random value in the token would be sufficient, but this depends on the server remembering the value it sends to clients. (How large is "large enough?") Section 8.2 Path validation is used by both peers during connection migration (see Section 9) to verify reachability after a change of address. In path validation, endpoints test reachability between a specific local address and a specific peer address, where an address is the two- tuple of IP address and port. In light of the toplevel definition of "address" in this document, the last clause seems unnecessary. Section 9 The design of QUIC relies on endpoints retaining a stable address for the duration of the handshake. [...] Why do we allow PATH_CHALLENGE (and the impossible PATH_RESPONSE) to be sent in 0-RTT frames, then? (This might dupe a comment later...) Section 9.5 Similarly, an endpoint MUST NOT reuse a connection ID when sending to more than one destination address. Due to network changes outside the control of its peer, an endpoint might receive packets from a new source address with the same destination connection ID, in which case it MAY continue to use the current connection ID with the new remote address while still sending from the same local address. The MAY seems like it may be in conflict with the MUST. Section 10.2.1 An endpoint's selected connection ID and the QUIC version are sufficient information to identify packets for a closing connection; the endpoint MAY discard all other connection state. [...] Are the packet protection keys for outgoing traffic (or the literal CONNECTION_CLOSE packet contents) not considered part of "connection state"? Section 10.3 Just to check my understanding: it is "safe" for a server to decide that it will never use stateless reset and send random values in the stateless_reset_token fields that it immediately forgets? (It would then not be able to ever send stateless reset, of course, but AFAICT that is the only consequence. The client still has to track the relevant state, and the server has to track per-connection-ID-sequence-number state.) I don't know if that's worth mentioning, since the field is otherwise essentially mandatory when server connection IDs are in use. An endpoint that sends a stateless reset in response to a packet that is 43 bytes or shorter SHOULD send a stateless reset that is one byte shorter than the packet it responds to. (side note) We haven't gotten to the anti-looping stuff in §10.3.3 yet, so the "one byte shorter" is a bit out of the blue until we get there. Section 10.3.2 I'm not sure whether it should be here or earlier, but somewhere we should motivate this construction as being something that can be generated statelessly, e.g., by a server restarting after a crash, based only on attributes (e.g., Connection ID) of a received packet from a previous connection. The value is computed in advance and given to the peer so that the peer will know what to expect if the stateless reset functionality needs to be used, and then generated at runtime by an endpoint that has lost state. The scheme described in the first paragraph requires state, and thus is hard to justify describing as a stateless reset... A single static key can be used across all connections to the same endpoint by generating the proof using a second iteration of a preimage-resistant function that takes a static key and the connection ID chosen by the endpoint (see Section 5.1) as input. [...] The phrase "second iteration" doesn't seem to be explained at all. This design relies on the peer always sending a connection ID in its packets so that the endpoint can use the connection ID from a packet to reset the connection. An endpoint that uses this design MUST either use the same connection ID length for all connections or encode the length of the connection ID such that it can be recovered without state. In addition, it cannot provide a zero-length connection ID. (There is a corollary that the length of the connection ID has to be enough so that multiple connection IDs can be issued to all potential peers cumulatively over the lifetime of the static key, though perhaps it's sufficiently obvious so as to go without saying.) Section 12.4 A forward-reference to §19.21 for how extension frames can be used might be useful. Why are PATH_CHALLENGE/PATH_RESPONSE allowed in the 0-RTT packet number space? (Hmm, the prose and table may be inconsistent here, too. I touch some things in my PR but I think not all of them.) The server isn't allowed to send at that level, so it seems that even if a client did sent PATH_CHALLENGE in 0-RTT, the PATH_RESPONSE would have to come back in 1-RTT. (And with no challenge to reply to, the client can't very well send a PATH_RESPONSE in 0-RTT.) Also, IIRC, we state elsewhere that we require a stable path for the duration of the handshake. Even preferred-address probing can't occur until the client has the server's transport parameters, which aren't usable until 1-RTT keys are available (even if they may technically be available prior to then). Section 13.2.3 ACK frames SHOULD always acknowledge the most recently received packets, and the more out-of-order the packets are, the more important it is to send an updated ACK frame quickly, [...] Do we have a strict definition of "out of order" that we adhere to? I didn't see one in [QUIC-RECOVERY] on a quick search, but haven't read it in depth yet. (TCP RACK used roughly "X is out of order when X has lower sequence number than Y and X is received after Y is received", which is what I'll use as a mental model until I read otherwise.) Section 13.2.5 When the measured acknowledgement delay is larger than its max_ack_delay, an endpoint SHOULD report the measured delay. This information is especially useful during the handshake when delays might be large; see Section 13.2.1. I'm not sure I understand when this delay would not be reported even in the absence of this guidance. Is the idea that you should specifically send an ACK with Largest Acknowledged corresponding to the highly delayed packet, even if you could send an ACK with a larger Largest Acknowledged (but lower delay)? Section 17 (side note) I don't know that it would be of particular benefit to have it explained in the document, but I do find myself curious why there is a Fixed Bit. Section 17.2.5 Should we give some guidance on the (minimum) length of the Retry Token (possibly in a subsection)? I am not sure that the "MUST be non-zero-length" from §17.2.5.2 is the only guidance we want to give... Section 17.2.5.1 A server MAY send Retry packets in response to Initial and 0-RTT packets. A server can either discard or buffer 0-RTT packets that it receives. A server can send multiple Retry packets as it receives Initial or 0-RTT packets. A server MUST NOT send more than one Retry packet in response to a single UDP datagram. When the server sends multiple Retry packets in such a scenario, do the Source Connection ID fields need to be different (or the same) across all of them? (Or does it not matter?) Section 17.2.5.2 Token field to the token provided in the Retry. The client MUST NOT change the Source Connection ID because the server could include the connection ID as part of its token validation logic; see Section 8.1.4. It looks like Section 8.1.4 doesn't actually talk specifically about connection IDs being bound to the token, just "additional information about clients". Perhaps it should? Section 19.16 Retiring a connection ID invalidates the stateless reset token associated with that connection ID. Is it the sending or the ACKing of the frame that effectuates the invalidation? Section 19.17 As above, please confirm that the 64-bit nonce is wide enough to support the purpose for which it is being used. Section 19.19 Reason Phrase: A human-readable explanation for why the connection was closed. This can be zero length if the sender chooses not to give details beyond the Error Code. This SHOULD be a UTF-8 encoded string [RFC3629]. Does "human-readable" engage BCP 18 and the SHOULD-level requirement for carrying information about language? Section 20.1 TRANSPORT_PARAMETER_ERROR (0x8): An endpoint received transport parameters that were badly formatted, included an invalid value, was absent even though it is mandatory, was present though it is forbidden, or is otherwise in error. (nit) maybe some singluar/plural mismatch here, though the RFC Editor would probably have a better suggested fix than I could come up with. Section 21 Should we reiterate somewhere that 1-RTT data sent by the server prior to handshake completion is being sent to an unauthenticated client? Section 21.1 As was noted by others, the terminology used for the taxonomy of attackers is a bit unusual (i.e., divergent from [SEC-CONS]) and surprising. It does seem to be fairly clearly specified, though, so is probably not worth trying to change at this point. Section 21.1.1 [Just noting here for visibility that my editorial PR adds some text here to emphasize that the peer authentication provided by the TLS handshake is a vital property as well. Clients that do not check TLS server certificates against the identity of the endpoint they're attempting to contact are not guaranteed secure operation.] Section 21.1.3.1 * Split and merge datagrams along packet boundaries Those packet boundaries are obfuscated by header protection, though, right? Section 21.1.3.3 3. A limited on-path attacker can cause an idle connection to be deemed lost if the server is the first to resume activity. Pedantically, if the client resumes activity after the server does, but within, say, 2*PTO, I don't think the connection would actually be lost. But perhaps I am missing something. Section 21.2 The Source and Destination Connection ID fields are the primary means of protection against off-path attack during the handshake. These Presumably we should then give some guidance on how to generate the Connection IDs so that they can fulfill this role effectively. Though it is likely to change significantly as a result of IETF Review, draft-gont-numeric-ids-sec-considerations may have some useful advice here. In particular, it seems like they actually do need to be unpredictable in order to fulfill this role, and very short Connection IDs will not be very effective either. Section 21.3 Servers SHOULD provide mitigations for this attack by limiting the usage and lifetime of address validation tokens; see Section 8.1.3. I recognize that it is probably an advanced implementation feature to do so, but implementations can also mitigate by detecting elevated rates across all connections or some identifiable cateogry of connections, of sending responses to 0-RTT data that are declared lost, treating that as indication of an ongoing attack, and racheting down the congestion window it uses for such data for the duration of the attack. This in theory could improve the stability of the Internet as a whole, which would be the motivation for mentioning it in this document as opposed to leaving it at the discretion of implementors. Section 21.5 For example, cross-site request forgery [CSRF] exploits on the Web cause a client to issue requests that include authorization cookies [COOKIE], allowing one site access to information and actions that are intended to be restricted to a different site. Server-side request forgery is also a powerful example of this type of attack. Though it's not really clear that we need more than one example here, so "no change" is okay with me. Section 21.5.2 Clients however are not obligated to use the NEW_TOKEN frame. Request forgery attacks that rely on the Token field can be avoided if clients send an empty Token field when the server address has changed from when the NEW_TOKEN frame was received. Clients could avoid using NEW_TOKEN if the server address changes. However, not including a Token field could adversely affect performance. Servers could rely on NEW_TOKEN to enable sending of data in excess of the three times limit on sending data; see Section 8.1. In particular, this affects cases where clients use 0-RTT to request data from servers. This seems to be setting us up to specifically recommend trading away security in favor of performance. Is that the right tradeoff? (Also, in the vein of my earlier comment, advanced implementation strategies involving detecting the presence of an attack might provide global benefit to the Internet, though the case is a bit weaker here than it was in the other situation.) Section 21.10 An on-the-side attacker can duplicate and send packets with modified This is the only instance of "on-the-side" (or "on the side") in the document; I suggest rephrasing to conform to the prevailing terminology. Section 21.11 This would be a good place (or §10.3) to discuss the procedures for rotating the static key used by stateless reset, per BCP 107. Section 21.13 Ideally, routing decisions are made independently of client-selected values; a Source Connection ID can be selected to route later packets to the same server. Is the client's address considered to be "client-selected"? It's not clear to me that ignoring IP address locality and routing efficiency is going to be the ideal behavior. Section 22.1.1 Are there going to be PII concerns with requiring contact information in the (public) registry? Section 22.1.2 Applications to register codepoints in QUIC registries MAY include a codepoint as part of the registration. IANA MUST allocate the selected codepoint if the codepoint is unassigned and the requirements of the registration policy are met. Is this intended to preclude the experts from requiring an allocation to be made from a range with a different-length encoding? Section 22.1.4 Should we require an expectation that the publicly available specification is going to be stable and/or continue to be accessible, as a condition of a permanent registration? Section 23.1 I think RFC 4086 is typically listed only as an Informative reference. It is amusing that IPv4 is a normative reference but IPv6 only informative. (Given how they are referenced, I don't fault this categorization.) Section 23.2 Should [QUIC-INVARIANTS] also be required reading?
- Benjamin Kaduk's Discuss on draft-ietf-quic-trans… Benjamin Kaduk via Datatracker
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Lucas Pardue
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Martin Thomson
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Benjamin Kaduk
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Martin Thomson
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Jana Iyengar
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Mikkel Fahnøe Jørgensen
- RE: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Mike Bishop
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Eric Rescorla
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Benjamin Kaduk
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Eric Rescorla
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Spencer Dawkins at IETF
- Re: Benjamin Kaduk's Discuss on draft-ietf-quic-t… Martin Duke