< draft-ietf-bier-path-mtu-discovery-11.txt | draft-ietf-bier-path-mtu-discovery-12.txt > | |||
---|---|---|---|---|
BIER Working Group G. Mirsky | BIER Working Group G. Mirsky | |||
Internet-Draft Ericsson | Internet-Draft Ericsson | |||
Intended status: Standards Track T. Przygienda | Intended status: Standards Track T. Przygienda | |||
Expires: 7 April 2022 Juniper Networks | Expires: 12 April 2022 Juniper Networks | |||
A. Dolganow | A. Dolganow | |||
Individual contributor | Individual contributor | |||
4 October 2021 | 9 October 2021 | |||
Path Maximum Transmission Unit Discovery (PMTUD) for Bit Index Explicit | Path Maximum Transmission Unit Discovery (PMTUD) for Bit Index Explicit | |||
Replication (BIER) Layer | Replication (BIER) Layer | |||
draft-ietf-bier-path-mtu-discovery-11 | draft-ietf-bier-path-mtu-discovery-12 | |||
Abstract | Abstract | |||
This document describes Path Maximum Transmission Unit Discovery | This document describes Path Maximum Transmission Unit Discovery | |||
(PMTUD) in Bit Indexed Explicit Replication (BIER) layer. | (PMTUD) in Bit Indexed Explicit Replication (BIER) layer. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
skipping to change at page 1, line 35 ¶ | skipping to change at page 1, line 35 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on 7 April 2022. | This Internet-Draft will expire on 12 April 2022. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
extracted from this document must include Simplified BSD License text | extracted from this document must include Simplified BSD License text | |||
as described in Section 4.e of the Trust Legal Provisions and are | as described in Section 4.e of the Trust Legal Provisions and are | |||
provided without warranty as described in the Simplified BSD License. | provided without warranty as described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
1.1. Conventions used in this document . . . . . . . . . . . . 3 | 1.1. Conventions used in this document . . . . . . . . . . . . 2 | |||
1.1.1. Acronyms . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1.1. Terminology . . . . . . . . . . . . . . . . . . . . . 2 | |||
1.1.2. Requirements Language . . . . . . . . . . . . . . . . 3 | 1.1.2. Requirements Language . . . . . . . . . . . . . . . . 3 | |||
2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 3 | |||
3. PMTUD Mechanism for BIER . . . . . . . . . . . . . . . . . . 4 | 3. PMTUD Mechanism for BIER . . . . . . . . . . . . . . . . . . 4 | |||
3.1. Data TLV for BIER Ping . . . . . . . . . . . . . . . . . 6 | 3.1. Data TLV for BIER Ping . . . . . . . . . . . . . . . . . 6 | |||
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 | 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 | |||
5. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | 5. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | |||
6. Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . 7 | 6. Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
7.1. Normative References . . . . . . . . . . . . . . . . . . 7 | 7.1. Normative References . . . . . . . . . . . . . . . . . . 7 | |||
7.2. Informative References . . . . . . . . . . . . . . . . . 8 | 7.2. Informative References . . . . . . . . . . . . . . . . . 7 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
1. Introduction | 1. Introduction | |||
In packet switched networks, when a host seeks to transmit data to a | In packet switched networks, when a host seeks to transmit data to a | |||
target destination, the data is transmitted as a set of packets. In | target destination, the data is transmitted as a set of packets. In | |||
many cases, it is more efficient to use the largest size packets that | many cases, it is more efficient to use the largest size packets that | |||
are less than or equal to the least Maximum Transmission Unit (MTU) | are less than or equal to the least Maximum Transmission Unit (MTU) | |||
for any forwarding device along the routed path to the IP destination | for any forwarding device along the routed path to the IP destination | |||
for these packets. Such "least MTU" is known as Path MTU (PMTU). | for these packets. Such "least MTU" is known as Path MTU (PMTU). | |||
Fragmentation or packet drop, silent or not, may occur on hops along | Fragmentation or packet drop, silent or not, may occur on hops along | |||
the route where an MTU is smaller than the size of the datagram. To | the route where an MTU is smaller than the size of the datagram. To | |||
avoid any of the listed above behaviors, the packet source must find | avoid any of the listed above behaviors, the packet source must find | |||
the value of the least MTU, i.e., PMTU, that will be encountered | the value of the least MTU, i.e., PMTU, that will be encountered | |||
along the route that a set of packets will follow to reach the given | along the route that a set of packets will follow to reach the given | |||
set of destinations. Such MTU determination along a specific path is | set of destinations. Such MTU determination along a specific path is | |||
referred to as path MTU discovery (PMTUD). | referred to as path MTU discovery (PMTUD). | |||
[RFC8279] introduces and explains Bit Index Explicit Replication | [RFC8279] introduces and explains Bit Index Explicit Replication | |||
(BIER) architecture and how it supports the forwarding of multicast | (BIER) architecture and how it supports the forwarding of multicast | |||
data packets. A BIER domain consists of Bit-Forwarding Routers | data packets. [I-D.ietf-bier-ping] introduced BIER Ping as a | |||
(BFRs) that are uniquely identified by their respective BFR-ids. An | transport-independent OAM mechanism to detect and localize failures | |||
ingress border router (acting as a Bit Forwarding Ingress Router | in the BIER data plane. This document specifies how BIER Ping can be | |||
(BFIR)) inserts a Forwarding Bit Mask (F-BM) into a packet. Each | used to perform efficient PMTUD in the BIER domain. | |||
targeted egress node (referred to as a Bit Forwarding Egress Router | ||||
(BFER)) is represented by Bit Mask Position (BMP) in the BMS. A | ||||
transit or intermediate BIER node, referred to as BFR, forwards BIER | ||||
encapsulated packets to BFERs, identified by respective BMPs, | ||||
according to a Bit Index Forwarding Table (BIFT). | ||||
1.1. Conventions used in this document | 1.1. Conventions used in this document | |||
1.1.1. Acronyms | 1.1.1. Terminology | |||
BFR: Bit-Forwarding Router | ||||
BFER: Bit-Forwarding Egress Router | ||||
BFIR: Bit-Forwarding Ingress Router | ||||
BIER: Bit Index Explicit Replication | ||||
BIFT: Bit Index Forwarding Tree | ||||
F-BM: Forwarding Bit Mask | ||||
MTU: Maximum Transmission Unit | ||||
OAM: Operations, Administration and Maintenance | ||||
PMTUD: Path MTU Discovery | This document uses terminology defined in [RFC8279]. Familiarity | |||
with this specification and the terminology used is expected. | ||||
1.1.2. Requirements Language | 1.1.2. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
2. Problem Statement | 2. Problem Statement | |||
skipping to change at page 4, line 5 ¶ | skipping to change at page 3, line 30 ¶ | |||
primarily targeted to work on point-to-point, i.e. unicast paths. | primarily targeted to work on point-to-point, i.e. unicast paths. | |||
These mechanisms use packet fragmentation control by disabling | These mechanisms use packet fragmentation control by disabling | |||
fragmentation of the probe packet. As a result, a transient node | fragmentation of the probe packet. As a result, a transient node | |||
that cannot forward a probe packet that is bigger than its link MTU | that cannot forward a probe packet that is bigger than its link MTU | |||
sends to the packet source an error notification, otherwise the | sends to the packet source an error notification, otherwise the | |||
packet destination may respond with a positive acknowledgment. Thus, | packet destination may respond with a positive acknowledgment. Thus, | |||
possibly through a series of iterations, varying the size of the | possibly through a series of iterations, varying the size of the | |||
probe packet, the packet source discovers the PMTU of the particular | probe packet, the packet source discovers the PMTU of the particular | |||
path. | path. | |||
Thus applied such existing PMTUD solutions are inefficient for point- | Applying such existing PMTUD solutions are inefficient for point-to- | |||
to-multipoint paths constructed for multicast traffic. Probe packets | multipoint paths constructed for multicast traffic. Probe packets | |||
must be flooded through the whole set of multicast distribution paths | must be flooded through the whole set of multicast distribution paths | |||
over and over again until the very last egress responds with a | over and over again until the very last egress responds with a | |||
positive acknowledgment. Consider without loss of generality an | positive acknowledgment. Consider the multicast network presented in | |||
example multicast network presented in Figure 1, where MTU on all | Figure 1, where MTU on all links but one (B, D) is the same. If MTU | |||
links but one (B, D) is the same. If MTU on the link (B, D) is | on the link (B, D) is smaller than the MTU on the other links, using | |||
smaller than the MTU on the other links, using existing PMTUD | existing PMTUD mechanism probes will unnecessarily flood to leaf | |||
mechanism probes will unnecessary flood to leaf nodes E, F, and G for | nodes E, F, and G for the second and consecutive times and positive | |||
the second and consecutive times and positive responses will be | responses will be generated and received by root A repeatedly. | |||
generated and received by root A repeatedly. | ||||
----- | ----- | |||
--| D | | --| D | | |||
----- / ----- | ----- / ----- | |||
--| B |-- | --| B |-- | |||
/ ----- \ ----- | / ----- \ ----- | |||
/ --| E | | / --| E | | |||
----- / ----- | ----- / ----- | |||
| A |--- ----- | | A |--- ----- | |||
----- \ --| F | | ----- \ --| F | | |||
skipping to change at page 5, line 5 ¶ | skipping to change at page 4, line 40 ¶ | |||
to forward towards the subset of targeted downstream BFERs, the BFR | to forward towards the subset of targeted downstream BFERs, the BFR | |||
responds with a partial (compared to the one it received in the | responds with a partial (compared to the one it received in the | |||
request) bitmask towards the originating BFIR in error notification. | request) bitmask towards the originating BFIR in error notification. | |||
That allows for retransmission of the next probe with a smaller MTU | That allows for retransmission of the next probe with a smaller MTU | |||
address only towards the failed downstream BFERs instead of all BFERs | address only towards the failed downstream BFERs instead of all BFERs | |||
addressed in the previous probe. In the scenario discussed in | addressed in the previous probe. In the scenario discussed in | |||
Section 2 the second and all following (if needed) probes will be | Section 2 the second and all following (if needed) probes will be | |||
sent only to the node D since MTU discovery of E, F, and G has been | sent only to the node D since MTU discovery of E, F, and G has been | |||
completed already by the first probe successfully. | completed already by the first probe successfully. | |||
[I-D.ietf-bier-ping] introduced BIER Ping as a transport-independent | ||||
OAM mechanism to detect and localize failures in the BIER data plane. | ||||
This document specifies how BIER Ping can be used to perform | ||||
efficient PMTUD in the BIER domain. | ||||
Consider the network displayed in Figure 1 to be a presentation of a | Consider the network displayed in Figure 1 to be a presentation of a | |||
BIER domain and all nodes to be BFRs. To discover MTU over BIER | BIER domain and all nodes to be BFRs. To discover MTU over BIER | |||
domain to BFERs D, F, E, and G BFIR A will use BIER Ping with Data | domain to BFERs D, F, E, and G BFIR A will use BIER Ping with Data | |||
TLV, defined in Section 3.1. Size of the first probe set to M_max | TLV, defined in Section 3.1. Size of the first probe set to M_max | |||
determined as minimal MTU value of BFIR's links to BIER domain. As | determined as minimal MTU value of BFIR's links to BIER domain. As | |||
has been assumed in Section 2, MTUs of all links but the link (B, D) | has been assumed in Section 2, MTUs of all links but the link (B, D) | |||
are the same. Thus BFERs E, F, and G would receive BIER Echo Request | are the same. Thus BFERs E, F, and G would receive BIER Echo Request | |||
and will send their respective replies to BFIR A. BFR B may pass the | and will send their respective replies to BFIR A. BFR B may pass the | |||
packet which is too large to forward over egress link (B, D) to the | packet which is too large to forward over egress link (B, D) to the | |||
appropriate network layer for error processing where it would be | appropriate network layer for error processing where it would be | |||
recognized as a BIER Echo Request packet. BFR B MUST send BIER Echo | recognized as a BIER Echo Request packet. BFR B MUST send BIER Echo | |||
Reply to BFIR A and MUST include Downstream Mapping TLV, defined in | Reply to BFIR A and MUST include Downstream Mapping TLV, defined in | |||
[I-D.ietf-bier-ping] setting its fields in the following fashion: | [I-D.ietf-bier-ping] setting its fields in the following fashion: | |||
* MTU SHOULD be set to the minimal MTU value among all egress BIER | * MTU SHOULD be set to the minimal MTU value among all egress BIER | |||
links, logical links between this and downstream BFRs, that could | links, logical links between this and downstream BFRs, that could | |||
be used to reach B's downstream BFERs; | be used to reach B's downstream BFERs; | |||
* Address Type MUST be set to 0 [Ed.note: we need to define 0 as | * Address Type MAY be set to any value defined in Section 3.3.4 | |||
valid value for the Address Type field with the specific semantics | [I-D.ietf-bier-ping]. | |||
to "Ignore" it.] | ||||
* I flag MUST be cleared; | * I flag MUST be cleared to direct the responding BFR not to include | |||
the Incoming SI-BitString TLV in the BIER Echo Response. | ||||
* Downstream Interface Address field (4 octets) MUST be zeroed and | * Downstream Interface Address field MUST be zeroed. | |||
MUST include in the Egress Bitstring sub-TLV the list of all BFERs | ||||
that cannot be reached because the attempted MTU turned out to be | * List of Sub-TLVs MUST include the Egress Bitstring sub-TLV with | |||
too small. | the list of all BFERs that cannot be reached because the egress | |||
MTU turned out to be too small. | ||||
The BFIR will receive either of the two types of packets: | The BFIR will receive either of the two types of packets: | |||
* a positive Echo Reply from one of BFERs to which the probe has | * a positive Echo Reply from one of BFERs to which the probe has | |||
been sent. In this case, the bit corresponding to the BFER MUST | been sent. In this case, the bit corresponding to the BFER MUST | |||
be cleared from the BMS; | be cleared from the bitmask string (BMS); | |||
* a negative Echo Reply with bit string listing unreached BFERs and | * a negative Echo Reply with bit string listing unreached BFERs and | |||
recommended MTU value MTU'. The BFIR MUST add the bit string to | recommended MTU value MTU'. The BFIR MUST add the bit string to | |||
its BMS and set the size of the next probe as min(MTU, MTU') | its BMS and set the size of the next probe as min(MTU, MTU') | |||
If upon expiration of the Echo Request timer BFIR didn't receive any | If a negative Echo Reply is received, the BFIR MUST wait for the | |||
Echo Replies, then the size of the probe SHOULD be decreased. There | expiration of the Echo Request before transmitting the updated Echo | |||
are scenarios when an implementation of the PMTUD would not decrease | Request. If upon expiration of the Echo Request timer BFIR didn't | |||
the size of the probe. For example, suppose upon expiration of the | receive any Echo Replies, then the size of the probe SHOULD be | |||
Echo Request timer BFIR didn't receive any Echo Reply. In that case, | decreased. There are scenarios when an implementation of the PMTUD | |||
BFIR MAY continue to retransmit the probe using the initial size and | would not decrease the size of the probe. For example, suppose upon | |||
MAY apply probe delay retransmission procedures. The algorithm used | expiration of the Echo Request timer BFIR didn't receive any Echo | |||
to delay retransmission procedures on BFIR is outside the scope of | Reply. In that case, BFIR MAY continue to retransmit the probe using | |||
this specification. The BFIR sends probes using BMS and locally | the initial size and MAY apply probe delay retransmission procedures. | |||
defined retransmission procedures until either the bit string is | The algorithm used to delay retransmission procedures on BFIR is | |||
clear, i.e., contains no set bits, or until the BFIR retransmission | outside the scope of this specification. The BFIR sends probes using | |||
procedure terminates and PMTU discovery is declared unsuccessful. In | BMS and locally defined retransmission procedures, but not more | |||
the case of convergence of the procedure, the size of the last probe | frequently than after the Echo Request timer expired, until either | |||
indicates the PMTU size that can be used for all BFERs in the initial | the bit string is clear, i.e., contains no set bits, or until the | |||
BMS without incurring fragmentation. | BFIR retransmission procedure terminates and PMTU discovery is | |||
declared unsuccessful. In the case of convergence of the procedure, | ||||
the size of the last probe indicates the PMTU size that can be used | ||||
for all BFERs in the initial BMS without incurring fragmentation. | ||||
Thus we conclude that in order to comply with the requirement in | Thus we conclude that in order to comply with the requirement in | |||
[I-D.ietf-bier-oam-requirements]: | [I-D.ietf-bier-oam-requirements]: | |||
* a BFR SHOULD support PMTUD; | * a BFR SHOULD support PMTUD; | |||
* a BFR MAY use defined per BIER sub-domain MTU value as initial MTU | * a BFR MAY use defined per BIER sub-domain MTU value as initial MTU | |||
value for discovery or use it as MTU for this BIER sub-domain to | value for discovery or use it as MTU for this BIER sub-domain to | |||
reach BFERs; | reach BFERs; | |||
End of changes. 18 change blocks. | ||||
73 lines changed or deleted | 51 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |