[tsvwg] Benjamin Kaduk's No Objection on draft-ietf-tsvwg-datagram-plpmtud-19: (with COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Wed, 08 April 2020 21:26 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: tsvwg@ietf.org
Delivered-To: tsvwg@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id A72363A1A02; Wed, 8 Apr 2020 14:26:00 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: "The IESG" <iesg@ietf.org>
Cc: draft-ietf-tsvwg-datagram-plpmtud@ietf.org, tsvwg-chairs@ietf.org, tsvwg@ietf.org, Wesley Eddy <wes@mti-systems.com>, wes@mti-systems.com
X-Test-IDTracker: no
X-IETF-IDTracker: 6.125.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <158638116064.15219.7033842912815959942@ietfa.amsl.com>
Date: Wed, 08 Apr 2020 14:26:00 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/b8rFbU7jWITSyQtRjva5WY5pW_I>
Subject: [tsvwg] Benjamin Kaduk's No Objection on draft-ietf-tsvwg-datagram-plpmtud-19: (with COMMENT)
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Apr 2020 21:26:01 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-tsvwg-datagram-plpmtud-19: No Objection

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)

Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.

The document, along with other ballot positions, can be found here:


[I'm reviewing the native HTML, so line breaks in quoted text are
locally generated]

We could perhaps benefit from more clarity in several places about
whether a given size is measured at the IP layer or the PL (some
locations mentioned in the per-section comments).

Should RFC 8085 be cited as BCP 145?

It would be nice to have a bit more detail about how to effectuate
things like robustness to inconsistent paths, but I don't think the lack
of such concrete guidance should hold up publication of this document.

I mention in the per-section comments a few instances of apparent
internal inconsistency; they don't quite rise to DISCUSS-level, but
please take a look.


    This document updates RFC 4821 to specify the method for datagram PLs,
    and updates RFC 8085 as the method to use in place of RFC 4821 with UDP
    datagrams. [...]

nit(?): the wording here is a bit odd; I think we want to say that it
"updates RFC 8085 to refer to the method specified in this document
instead of the method in RFC 4821 for use with UDP datagrams".

Section 1.1

    Classical PMTUD is subject to protocol failures. One failure arises when
    traffic using a packet size larger than the actual PMTU is black-holed
    (all datagrams sent with this size, or larger, are discarded).

nit: if "this size" is "the actual PMTU", shouldn't this just be
"strictly larger than this size", since packets of exactly that size
will pass?

    When the router issuing the ICMP message drops a tunneled packet, the
    resulting ICMP message will be directed to the tunnel ingress. This
    tunnel endpoint is responsible for forwarding the ICMP message and also
    processing the quoted packet within the payload field to remove the
    effect of the tunnel, and return a correctly formatted ICMP message to
    the sender [I-D.ietf-intarea-tunnels]. Failure to do this prevents the
    PTB message reaching the original sender.

Is it supposed to be implie that this is a little bit complicated and
many existing tunnels don't do it properly?

    In this case, when a packet sent by the server encounters a problem
    after the ECMP router, then any resulting ICMP message also needs to be
    directed by the ECMP router towards the original sender.

Similarly, is it implicit that it's complicated for the ECMP router to
do this, it's not currently implemented well, etc.?

    When an ICMP message is generated by a router in a network segment that
    has inserted a header into a packet, the quoted packet could contain
    additional protocol header information that was not included in the
    original sent packet, and which the PL sender does not process or may
    not know how to process. This could disrupt the ability of the sender to
    validate this PTB message.

Should/can the entity that added the extra header be inspecting ICMP to
remove the header from the quoted packet in the reverse direction?  (We
could perhaps try to normalize the language between this point and the
following on on NAT behavior.)

    Section 10.2 of [RFC4821] recommended a PLPMTUD probing method for
    the Stream Control Transport Protocol (SCTP). SCTP utilizes probe
    packets consisting of a minimal sized HEARTBEAT chunk bundled with a
    PAD chunk as defined in [RFC4820]. However, RFC 4821 did not provide
    a complete specification. The present document replaces this by
    providing a complete specification.

nit(?): s/replaces this/replaces that description/?

Section 2

    Packetization Layer (PL):
    The PL is a layer of the network stack that places data into packets
    and performs transport protocol functions. Examples of a PL include:
    TCP, SCTP, SCTP over DTLS or QUIC.

nit: Serial comma, please!

    The PTB_SIZE is a value reported in a validated PTB message that
    indicates next hop link MTU of a router along the path

nit: missing article ("the next hop link MTU"?)

Section 3

    A PTB message MUST NOT be used to increase the PLPMTU [RFC8201], but
    could trigger a probe to test for a larger PLPMTU. A PL_PTB_SIZE
    that is greater than that currently probed MUST be ignored. A valid
    PTB_SIZE is converted to a PL_PTB_SIZE before it is to be used in
    the DPLPMTUD state machine.

Perhaps these last two sentences should be swapped, so we talk about how
the PL_PTB_SIZE is computed before we talk about ignoring invalid

    The decision about when to send a probe packet does not need to be
    limited by the congestion controller.

Does the congestion controller need to account for the size/bandwidth of
the probe packets in its pacing mechanism, though?

    Path validation: It is RECOMMENDED that methods are robust to path
    changes that could have occurred since the path characteristics were
    last confirmed, and to the possibility of inconsistent path
    information being received.

What would such robustness look like?  (Why is it not REQUIRED?)

Section 4.1

    A PL that uses a probe packet carrying application data and needs
    protection from the loss of this probe packet could perform
    transport-layer retransmission/repair of the data block (e.g., by
    retransmission after loss is detected or by duplicating the data
    block in a datagram without the padding data). This retransmitted
    data block might possibly need to be sent using a smaller PLPMTU,
    which could need the PL to to use a smaller packet size to traverse
    the end-to-end path. (This could utilize endpoint network-layer or a
    PL that can re-segment the data block into multiple datagrams).

nit: check the grammar around "utilize endpoint network-layer"?

Section 4.2

    Transport protocols can include end-to-end methods that detect and
    report reception of specific datagrams that they send (e.g., DCCP
    and SCTP provide keep-alive/heartbeat features).

Does a packet-level ACK mechanism (e.g., like QUIC but not TCP) suffice?

Section 4.3

    When the method detects the current PLPMTU is not supported,
    DPLPMTUD sets a lower PLPMTU, and sets a lower MPS. The PL then
    confirms that the new PLPMTU can be successfully used across the
    path. A probe packet could need to have a size less than the size of
    the data block generated by the application.

Why is "less than half the size" noteworthy?

Section 4.4

    Operational experience reveals that IP fragmentation can reduce the
    reliability of Internet communication
    [I-D.ietf-intarea-frag-fragile], which may reduce the success of

nit: if "success" is a binary yes/no, then this could be "reduce the
probability of success of the retransmission".  (I add "the" in "the
retransmission" as well, since we refer to the specific retransmission
after the initial loss and MPS change, not retransmission in general.)

Section 4.6.1

    A PL that receives a PTB message from a router or middlebox performs
    ICMP validation as specified in Section 5.2 of [RFC8085][RFC8201].

If we're going to have a section number, we need only one RFC reference!

    PTB messages that have been validated MAY be utilized by the
    DPLPMTUD algorithm, but MUST NOT be used directly to set the PLPMTU.
    The PL_PTB_SIZE is smaller than the PTB_SIZE because it is reduced
    by headers below the PL including any IP options or extensions added
    to the PL packet.

nit: this transition is pretty abrupt.

Section 4.6.2

    Before using the size reported in the PTB message it must first be
    converted to a PL_PTB_SIZE. A set of checks are intended to provide
    protection from a router that reports an unexpected PTB_SIZE. The PL
    also needs to check that the indicated PL_PTB_SIZE is less than the
    size used by probe packets and at least the minimum size accepted.

How do these "checks" relate to the "validation" that was discussed in
the previous section?

Section 5

    DPLPMTUD SHOULD NOT be used by an upper PL or application if it is
    already used in a lower layer DPLPMTUD SHOULD only be performed once
    between a pair of endpoints.

This seems to imply an expectation of a communications channel between
PLs on the same endpoint.  I don't see anything in this document
discussing such a channel; is there anything (more) that's useful that
we can say?

Section 5.1.1

    DPLPMTUD MAY inhibit sending probe packets when no application data
    has been sent since the previous probe packet. A PL preferring to
    use an up-to-date PMTU once user data is sent again, can choose to
    continue PMTU discovery for each path. However, this could result in
    sending additional packets.

How is this "could" and not "will"/"would"?  (Twice)

    The various timers could be implemented using a single timer

Is this an incomplete thought?  (There's no full stop, and it doesn't
give much clarity for me.)

Section 5.1.2

    The MIN_PLPMTU is the smallest allowed probe packet size. For IPv6,
    this value is 1280 bytes, as specified in [RFC8200]. For IPv4, the
    minimum value is 68 bytes.

This feels like slightly curious naming, as the "PL" infix suggests that
it is the size usable at the packetization layer, but the quoted
constants include the IP headers.

    The BASE_PLPMTU is a configured size expected to work for most
    paths. The size is equal to or larger than the MIN_PLPMTU and
    smaller than the MAX_PLPMTU. In the case of IPv6, this value is
    derived from the IPv6 minimum link MTU of 1280 bytes

"Derived from" but using what derivation mechanism and/or arriving at
what value?

Section 5.1.3

    The PROBED_SIZE is the size of the current probe packet. This is a
    tentative value for the PLPMTU, which is awaiting confirmation by an

Size at which layer?

Section 5.1.4

    The Base Phase confirms connectivity to the remote peer using
    packets of the BASE_PLPMTU. This phase is implicit for a
    connection-oriented PL (where it can be performed in a PL connection

This seems slightly internally inconsistent, in that it seems to say
that BASE_PLPMTU size packets are always used, but the implicit
connection-handshake method seems unlikely to actually do so.
(Also, if we always did so, then the following paragraph about
confirming BASE_PLPMTU support would be redundant.)

    Search Complete:
    The Search Complete Phase is entered when the PLPMTU is supported
    across the network path

It seems like this would be trivially true for (e.g.) the initial PLPMTU
set to BASE_PLPMTU on the Base->Search transition, yet I don't expect we
terminate Search immediately in that case!  Presumably this is intended
to say also that the PLPMTU corresponds to the actual PMTU as well?

Section 5.2

nit(?): the diagram shows PTB: PLPTB_SIZE < BASE_PLPMTU as cause for a
transition from BASE to ERROR, though of course this would only occur
after the PTB is validated.  Perhaps that doesn't need to be in the
figure, though.

Section 5.3.1

    The PROBE_COUNT is initialized to zero when the first probe with a
    size greater than or equal to PLPMTUD is sent. A timer is used to
    trigger the sending of probe packets of size PROBED_SIZE, larger
    than the PLPMTU.

This seems inconsistent about whether we use PROBE_COUNT when probing
with the current PLPMTU size -- we initialize the count to zero when we
first send one, but the timer only triggers sending of larger packets.

Section 5.3.2

I confess I was expecting a more detailed specification of how to pick
probe sizes, here.

Section 6.1

    To use common method for managing the PLPMTU has benefits, both in
    the ability to share state between different processes and
    opportunities to coordinate probing.

Nit: singular/plural mismatch "common"/"method"

Section 6.2

    Section 10.2 of [RFC4821] specified a recommended PLPMTUD probing
    method for SCTP and Section 7.3 of [RFC4960] and recommended an
    endpoint apply the techniques in RFC4821 on a
    per-destination-address basis.

nit: spurious "and" after "[RFC4960]".

    Section 6.9 of [RFC4960] describes dividing the user messages into
    data chunks sent by the PL when using SCTP. This notes that once an
    SCTP message has been sent, it cannot be re-segmented. [RFC4960]
    describes the method to retransmit data chunks when the MPS has
    reduced, and the use of IP fragmentation for this case.

Should we say something about "such behavior is unchanged by this


    The HEARTBEAT chunk carries a Heartbeat Information parameter which
    includes, besides the information suggested in [RFC4960], the probe
    size, which is the size of the complete datagram.

IIUC the Heartbeat Information layout is entirely at the sender's
discretion, so the implementation will have to pick a way to convey the
probe size in it; should we be more explicit about this?

    Probing starts directly after the PL handshake, before data is sent.
    Assuming this behavior (i.e., the PMTU is smaller than or equal to
    the interface MTU), this process will take several round trip time
    periods, dependent on the number of DPLPMTUD probes sent. The
    Heartbeat timer can be used to implement the PROBE_TIMER.

(But sending data doesn't have to wait for DPLPMTUD to complete, right?)


    Packet probing can be performed as specified in Section The
    maximum payload is reduced by 8 bytes, which has to be considered
    when filling the PAD chunk.

(8 bytes of UDP header?)


(Interesting that we don't have a note about payload reduction to be
considered when filling the PAD chunk, as we did in Section


    [RFC4960] does not specify a way to validate SCTP/DTLS ICMP message

("and neither does this document"?)

Section 6.3.2

    QUIC provides an acknowledged PL, a sender can therefore enter the
    BASE state as soon as connectivity has been confirmed.

[This seems entirely redundant with the content of Section 6.3.1.]

Section 9

    An on-path attacker able to create a PTB message could forge PTB
    messages that include a valid quoted IP packet. Such an attack could
    be used to drive down the PLPMTU.

Are there cases where such an on-path attacker would not also be able to
actually drop traffic larger than the forged PTB PMTU value?

    It is possible that the information about a path is not stable. This
    could be a result of forwarding across more than one path that has a
    different actual PMTU or a single path presents a varying PMTU. The
    design of a PLPMTUD implementation SHOULD consider how to mitigate
    the effects of varying path information. One possible mitigation is
    to provide robustness (see Section 5.4) in the method that avoids
    oscillation in the MPS.

This document specifies some PLPMTUD mechanics; should we say what we do
(and don't) say about this topic for those protocols we do cover?

    A node performing DPLPMTUD could experience conflicting information
    about the size of supported probe packets. This could occur when
    multiple paths are concurrently in use and these exhibit a different
    PMTU. If not considered, this could result in packets not being
    delivered (black holed) when the PLPMTU results in a packet larger
    than the smallest actual PMTU.

It feels like this content has high overlap with the previous paragraph;
is it reasonable to try consolidating them?