[tsvwg] Benjamin Kaduk's No Objection on draft-ietf-tsvwg-datagram-plpmtud-19: (with COMMENT)
Benjamin Kaduk via Datatracker <firstname.lastname@example.org> Wed, 08 April 2020 21:26 UTC
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id A72363A1A02; Wed, 8 Apr 2020 14:26:00 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
From: Benjamin Kaduk via Datatracker <email@example.com>
To: "The IESG" <firstname.lastname@example.org>
Cc: email@example.com, firstname.lastname@example.org, email@example.com, Wesley Eddy <firstname.lastname@example.org>, email@example.com
Reply-To: Benjamin Kaduk <firstname.lastname@example.org>
Date: Wed, 08 Apr 2020 14:26:00 -0700
Subject: [tsvwg] Benjamin Kaduk's No Objection on draft-ietf-tsvwg-datagram-plpmtud-19: (with COMMENT)
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:email@example.com?subject=unsubscribe>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:firstname.lastname@example.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Apr 2020 21:26:01 -0000
Benjamin Kaduk has entered the following ballot position for draft-ietf-tsvwg-datagram-plpmtud-19: No Objection When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html for more information about IESG DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-tsvwg-datagram-plpmtud/ ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- [I'm reviewing the native HTML, so line breaks in quoted text are locally generated] We could perhaps benefit from more clarity in several places about whether a given size is measured at the IP layer or the PL (some locations mentioned in the per-section comments). Should RFC 8085 be cited as BCP 145? It would be nice to have a bit more detail about how to effectuate things like robustness to inconsistent paths, but I don't think the lack of such concrete guidance should hold up publication of this document. I mention in the per-section comments a few instances of apparent internal inconsistency; they don't quite rise to DISCUSS-level, but please take a look. Abstract This document updates RFC 4821 to specify the method for datagram PLs, and updates RFC 8085 as the method to use in place of RFC 4821 with UDP datagrams. [...] nit(?): the wording here is a bit odd; I think we want to say that it "updates RFC 8085 to refer to the method specified in this document instead of the method in RFC 4821 for use with UDP datagrams". Section 1.1 Classical PMTUD is subject to protocol failures. One failure arises when traffic using a packet size larger than the actual PMTU is black-holed (all datagrams sent with this size, or larger, are discarded). nit: if "this size" is "the actual PMTU", shouldn't this just be "strictly larger than this size", since packets of exactly that size will pass? When the router issuing the ICMP message drops a tunneled packet, the resulting ICMP message will be directed to the tunnel ingress. This tunnel endpoint is responsible for forwarding the ICMP message and also processing the quoted packet within the payload field to remove the effect of the tunnel, and return a correctly formatted ICMP message to the sender [I-D.ietf-intarea-tunnels]. Failure to do this prevents the PTB message reaching the original sender. Is it supposed to be implie that this is a little bit complicated and many existing tunnels don't do it properly? In this case, when a packet sent by the server encounters a problem after the ECMP router, then any resulting ICMP message also needs to be directed by the ECMP router towards the original sender. Similarly, is it implicit that it's complicated for the ECMP router to do this, it's not currently implemented well, etc.? When an ICMP message is generated by a router in a network segment that has inserted a header into a packet, the quoted packet could contain additional protocol header information that was not included in the original sent packet, and which the PL sender does not process or may not know how to process. This could disrupt the ability of the sender to validate this PTB message. Should/can the entity that added the extra header be inspecting ICMP to remove the header from the quoted packet in the reverse direction? (We could perhaps try to normalize the language between this point and the following on on NAT behavior.) Section 10.2 of [RFC4821] recommended a PLPMTUD probing method for the Stream Control Transport Protocol (SCTP). SCTP utilizes probe packets consisting of a minimal sized HEARTBEAT chunk bundled with a PAD chunk as defined in [RFC4820]. However, RFC 4821 did not provide a complete specification. The present document replaces this by providing a complete specification. nit(?): s/replaces this/replaces that description/? Section 2 Packetization Layer (PL): The PL is a layer of the network stack that places data into packets and performs transport protocol functions. Examples of a PL include: TCP, SCTP, SCTP over DTLS or QUIC. nit: Serial comma, please! PTB_SIZE: The PTB_SIZE is a value reported in a validated PTB message that indicates next hop link MTU of a router along the path nit: missing article ("the next hop link MTU"?) Section 3 A PTB message MUST NOT be used to increase the PLPMTU [RFC8201], but could trigger a probe to test for a larger PLPMTU. A PL_PTB_SIZE that is greater than that currently probed MUST be ignored. A valid PTB_SIZE is converted to a PL_PTB_SIZE before it is to be used in the DPLPMTUD state machine. Perhaps these last two sentences should be swapped, so we talk about how the PL_PTB_SIZE is computed before we talk about ignoring invalid values? The decision about when to send a probe packet does not need to be limited by the congestion controller. Does the congestion controller need to account for the size/bandwidth of the probe packets in its pacing mechanism, though? Path validation: It is RECOMMENDED that methods are robust to path changes that could have occurred since the path characteristics were last confirmed, and to the possibility of inconsistent path information being received. What would such robustness look like? (Why is it not REQUIRED?) Section 4.1 A PL that uses a probe packet carrying application data and needs protection from the loss of this probe packet could perform transport-layer retransmission/repair of the data block (e.g., by retransmission after loss is detected or by duplicating the data block in a datagram without the padding data). This retransmitted data block might possibly need to be sent using a smaller PLPMTU, which could need the PL to to use a smaller packet size to traverse the end-to-end path. (This could utilize endpoint network-layer or a PL that can re-segment the data block into multiple datagrams). nit: check the grammar around "utilize endpoint network-layer"? Section 4.2 Transport protocols can include end-to-end methods that detect and report reception of specific datagrams that they send (e.g., DCCP and SCTP provide keep-alive/heartbeat features). Does a packet-level ACK mechanism (e.g., like QUIC but not TCP) suffice? Section 4.3 When the method detects the current PLPMTU is not supported, DPLPMTUD sets a lower PLPMTU, and sets a lower MPS. The PL then confirms that the new PLPMTU can be successfully used across the path. A probe packet could need to have a size less than the size of the data block generated by the application. Why is "less than half the size" noteworthy? Section 4.4 Operational experience reveals that IP fragmentation can reduce the reliability of Internet communication [I-D.ietf-intarea-frag-fragile], which may reduce the success of retransmission. nit: if "success" is a binary yes/no, then this could be "reduce the probability of success of the retransmission". (I add "the" in "the retransmission" as well, since we refer to the specific retransmission after the initial loss and MPS change, not retransmission in general.) Section 4.6.1 A PL that receives a PTB message from a router or middlebox performs ICMP validation as specified in Section 5.2 of [RFC8085][RFC8201]. If we're going to have a section number, we need only one RFC reference! PTB messages that have been validated MAY be utilized by the DPLPMTUD algorithm, but MUST NOT be used directly to set the PLPMTU. The PL_PTB_SIZE is smaller than the PTB_SIZE because it is reduced by headers below the PL including any IP options or extensions added to the PL packet. nit: this transition is pretty abrupt. Section 4.6.2 Before using the size reported in the PTB message it must first be converted to a PL_PTB_SIZE. A set of checks are intended to provide protection from a router that reports an unexpected PTB_SIZE. The PL also needs to check that the indicated PL_PTB_SIZE is less than the size used by probe packets and at least the minimum size accepted. How do these "checks" relate to the "validation" that was discussed in the previous section? Section 5 DPLPMTUD SHOULD NOT be used by an upper PL or application if it is already used in a lower layer DPLPMTUD SHOULD only be performed once between a pair of endpoints. This seems to imply an expectation of a communications channel between PLs on the same endpoint. I don't see anything in this document discussing such a channel; is there anything (more) that's useful that we can say? Section 5.1.1 DPLPMTUD MAY inhibit sending probe packets when no application data has been sent since the previous probe packet. A PL preferring to use an up-to-date PMTU once user data is sent again, can choose to continue PMTU discovery for each path. However, this could result in sending additional packets. How is this "could" and not "will"/"would"? (Twice) The various timers could be implemented using a single timer Is this an incomplete thought? (There's no full stop, and it doesn't give much clarity for me.) Section 5.1.2 MIN_PLPMTU: The MIN_PLPMTU is the smallest allowed probe packet size. For IPv6, this value is 1280 bytes, as specified in [RFC8200]. For IPv4, the minimum value is 68 bytes. This feels like slightly curious naming, as the "PL" infix suggests that it is the size usable at the packetization layer, but the quoted constants include the IP headers. BASE_PLPMTU: The BASE_PLPMTU is a configured size expected to work for most paths. The size is equal to or larger than the MIN_PLPMTU and smaller than the MAX_PLPMTU. In the case of IPv6, this value is derived from the IPv6 minimum link MTU of 1280 bytes "Derived from" but using what derivation mechanism and/or arriving at what value? Section 5.1.3 PROBED_SIZE: The PROBED_SIZE is the size of the current probe packet. This is a tentative value for the PLPMTU, which is awaiting confirmation by an acknowledgment. Size at which layer? Section 5.1.4 The Base Phase confirms connectivity to the remote peer using packets of the BASE_PLPMTU. This phase is implicit for a connection-oriented PL (where it can be performed in a PL connection handshake). This seems slightly internally inconsistent, in that it seems to say that BASE_PLPMTU size packets are always used, but the implicit connection-handshake method seems unlikely to actually do so. (Also, if we always did so, then the following paragraph about confirming BASE_PLPMTU support would be redundant.) Search Complete: The Search Complete Phase is entered when the PLPMTU is supported across the network path It seems like this would be trivially true for (e.g.) the initial PLPMTU set to BASE_PLPMTU on the Base->Search transition, yet I don't expect we terminate Search immediately in that case! Presumably this is intended to say also that the PLPMTU corresponds to the actual PMTU as well? Section 5.2 nit(?): the diagram shows PTB: PLPTB_SIZE < BASE_PLPMTU as cause for a transition from BASE to ERROR, though of course this would only occur after the PTB is validated. Perhaps that doesn't need to be in the figure, though. Section 5.3.1 The PROBE_COUNT is initialized to zero when the first probe with a size greater than or equal to PLPMTUD is sent. A timer is used to trigger the sending of probe packets of size PROBED_SIZE, larger than the PLPMTU. This seems inconsistent about whether we use PROBE_COUNT when probing with the current PLPMTU size -- we initialize the count to zero when we first send one, but the timer only triggers sending of larger packets. Section 5.3.2 I confess I was expecting a more detailed specification of how to pick probe sizes, here. Section 6.1 To use common method for managing the PLPMTU has benefits, both in the ability to share state between different processes and opportunities to coordinate probing. Nit: singular/plural mismatch "common"/"method" Section 6.2 Section 10.2 of [RFC4821] specified a recommended PLPMTUD probing method for SCTP and Section 7.3 of [RFC4960] and recommended an endpoint apply the techniques in RFC4821 on a per-destination-address basis. nit: spurious "and" after "[RFC4960]". Section 6.9 of [RFC4960] describes dividing the user messages into data chunks sent by the PL when using SCTP. This notes that once an SCTP message has been sent, it cannot be re-segmented. [RFC4960] describes the method to retransmit data chunks when the MPS has reduced, and the use of IP fragmentation for this case. Should we say something about "such behavior is unchanged by this document"? Section 188.8.131.52 The HEARTBEAT chunk carries a Heartbeat Information parameter which includes, besides the information suggested in [RFC4960], the probe size, which is the size of the complete datagram. IIUC the Heartbeat Information layout is entirely at the sender's discretion, so the implementation will have to pick a way to convey the probe size in it; should we be more explicit about this? Probing starts directly after the PL handshake, before data is sent. Assuming this behavior (i.e., the PMTU is smaller than or equal to the interface MTU), this process will take several round trip time periods, dependent on the number of DPLPMTUD probes sent. The Heartbeat timer can be used to implement the PROBE_TIMER. (But sending data doesn't have to wait for DPLPMTUD to complete, right?) Section 184.108.40.206 Packet probing can be performed as specified in Section 220.127.116.11. The maximum payload is reduced by 8 bytes, which has to be considered when filling the PAD chunk. (8 bytes of UDP header?) Section 18.104.22.168 (Interesting that we don't have a note about payload reduction to be considered when filling the PAD chunk, as we did in Section 22.214.171.124.) Section 126.96.36.199 [RFC4960] does not specify a way to validate SCTP/DTLS ICMP message payload. ("and neither does this document"?) Section 6.3.2 QUIC provides an acknowledged PL, a sender can therefore enter the BASE state as soon as connectivity has been confirmed. [This seems entirely redundant with the content of Section 6.3.1.] Section 9 An on-path attacker able to create a PTB message could forge PTB messages that include a valid quoted IP packet. Such an attack could be used to drive down the PLPMTU. Are there cases where such an on-path attacker would not also be able to actually drop traffic larger than the forged PTB PMTU value? It is possible that the information about a path is not stable. This could be a result of forwarding across more than one path that has a different actual PMTU or a single path presents a varying PMTU. The design of a PLPMTUD implementation SHOULD consider how to mitigate the effects of varying path information. One possible mitigation is to provide robustness (see Section 5.4) in the method that avoids oscillation in the MPS. This document specifies some PLPMTUD mechanics; should we say what we do (and don't) say about this topic for those protocols we do cover? A node performing DPLPMTUD could experience conflicting information about the size of supported probe packets. This could occur when multiple paths are concurrently in use and these exhibit a different PMTU. If not considered, this could result in packets not being delivered (black holed) when the PLPMTU results in a packet larger than the smallest actual PMTU. It feels like this content has high overlap with the previous paragraph; is it reasonable to try consolidating them?
- [tsvwg] Benjamin Kaduk's No Objection on draft-ie… Benjamin Kaduk via Datatracker