[RTG-DIR]Rtgdir early review of draft-ietf-mpls-ri-rsvp-frr-18
Ketan Talaulikar via Datatracker <noreply@ietf.org> Mon, 27 May 2024 15:39 UTC
Return-Path: <noreply@ietf.org>
X-Original-To: rtg-dir@ietf.org
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id E6BBCC1CAF45; Mon, 27 May 2024 08:39:04 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Ketan Talaulikar via Datatracker <noreply@ietf.org>
To: rtg-dir@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 12.13.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <171682434492.24139.11163008006225314909@ietfa.amsl.com>
Date: Mon, 27 May 2024 08:39:04 -0700
Message-ID-Hash: CVOXTUHVA4T3LPTUIP4ZQ3T2N5M7P7DI
X-Message-ID-Hash: CVOXTUHVA4T3LPTUIP4ZQ3T2N5M7P7DI
X-MailFrom: noreply@ietf.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-rtg-dir.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: draft-ietf-mpls-ri-rsvp-frr.all@ietf.org, mpls@ietf.org
X-Mailman-Version: 3.3.9rc4
Reply-To: Ketan Talaulikar <ketant.ietf@gmail.com>
Subject: [RTG-DIR]Rtgdir early review of draft-ietf-mpls-ri-rsvp-frr-18
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/PRZJa7LH9b3J1aRFo3BvYZ3DQUQ>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Owner: <mailto:rtg-dir-owner@ietf.org>
List-Post: <mailto:rtg-dir@ietf.org>
List-Subscribe: <mailto:rtg-dir-join@ietf.org>
List-Unsubscribe: <mailto:rtg-dir-leave@ietf.org>
Reviewer: Ketan Talaulikar Review result: Has Issues Hello, I've been asked to review this document for the RTGDIR. Thanks to the authors and the WG members for a very well written and thorough document. My concern is with the overloading of the I-flag with additional semantics instead of defining a new one and most of the major comments are related to this. Other than that, I believe the other comments are minor and should be easy to address/clarify. Please find below inline comments in the ID nits output (with portions snipped out). Thanks, Ketan 15 Abstract 17 The RSVP-TE Fast Reroute extensions specified in RFC 4090 defines two 18 local repair techniques to reroute Label Switched Path (LSP) traffic 19 over pre-established backup tunnel. Facility backup method allows 20 one or more LSPs traversing a connected link or node to be protected 21 using a bypass tunnel. The many-to-one nature of local repair 22 technique is attractive from scalability point of view. This 23 document enumerates facility backup procedures in RFC 4090 that rely 24 on refresh timeout and hence make facility backup method refresh- 25 interval dependent. The RSVP-TE extensions defined in this document 26 will enhance the facility backup protection mechanism by making the 27 corresponding procedures refresh-interval independent and hence 28 compatible with Refresh-interval Independent RSVP (RI-RSVP) specified 29 in RFC 8370. Hence, this document updates RFC 4090 in order to 30 support RI-RSVP capability specified in RFC 8370. < major > Since RFC8370 does not formally update the base RSVP-TE specs, is it appropriate for this document to formally update RFC4090. I would think that it enhances RFC4090? I'll let the responsible AD determine if this is appropriate. 360 4.1. Requirement on RFC 4090 Capable Node to advertise RI-RSVP 361 Capability 363 A node supporting facility backup protection [RFC4090] MUST set the 364 RI-RSVP flag (I bit) that is defined in Section 3.1 of RSVP-TE 365 Scaling Techniques [RFC8370] only if it supports all the extensions 366 specified in the rest of this document. Hence, this document updates < major > Was the use of another different capability considered instead of overloading the existing I bit with additional requirements? This would have taken care of some of the backward compatibility issues described further. There is no discussion why this isn't being done - assuming there is/was a good reason for it. 367 [RFC4090] by defining extensions and additional procedures over 368 facility backup protection [RFC4090] in order to advertise RI-RSVP 369 capability [RFC8370]. However, if a node supporting facility backup 370 protection [RFC4090] does set the RI-RSVP capability (I bit) but does 371 not support all the extensions specified in the rest of this 372 document, then it leaves room for stale state to linger around for an 373 inordinate period of time given the long refresh intervals 374 recommended by [RFC8370] or disruption of normal FRR operation. 375 Procedures for backward compatibility (see Section 4.6.2.3 of this 376 document) delves on this in detail. 378 4.2. Signaling Handshake between PLR and MP 380 4.2.1. PLR Behavior 382 As per the facility backup procedures [RFC4090], when an LSP becomes 383 operational on a node and the "local protection desired" flag has 384 been set in the SESSION_ATTRIBUTE object carried in the Path message 385 corresponding to the LSP, then the node attempts to make local 386 protection available for the LSP. 388 - If the "node protection desired" flag is set, then the node tries 389 to become a PLR by attempting to create a NP-bypass LSP to the 390 NNhop node avoiding the Nhop node on protected LSP path. In case 391 node protection could not be made available, the node attempts to 392 create an LP-bypass LSP to the Nhop node avoiding only the link 393 that the protected LSP takes to reach the Nhop 395 - If the "node protection desired" flag is not set, then the PLR 396 attempts to create an LP-bypass LSP to the Nhop node avoiding the 397 link that the protected LSP takes to reach the Nhop 399 With regard to the PLR procedures described above and that are 400 specified in [RFC4090], this document specifies the following 401 additional procedures to support RI-RSVP [RFC8370]. 403 - While selecting the destination address of the bypass LSP, the PLR 404 MUST select the router ID of the NNhop or Nhop node from the Node- 405 ID sub-object included in the RRO object carried in the most 406 recent Resv message corresponding to the LSP. If the MP has not 407 included a Node-ID sub-object in the Resv RRO and if the PLR and 408 the MP are in the same area, then the PLR may utilize the TED to < minor > Shouldn't this be MAY ? 409 determine the router ID corresponding to the interface address 410 included by the MP in the RRO object. If the NP-MP in a different 411 IGP area has not included a Node-ID sub-object in RRO object, then 412 the PLR MUST execute backward compatibility procedures as if the 413 downstream nodes along the LSP do not support the extensions 414 defined in the document (see Section 4.6.2.1). 416 - The PLR MUST also include its router ID in a Node-ID sub-object in 417 RRO object carried in any subsequent Path message corresponding to 418 the LSP. While including its router ID in the Node-ID sub-object 419 carried in the outgoing Path message, the PLR MUST include the 420 Node-ID sub-object after including its IPv4/IPv6 address or 421 unnumbered interface ID sub-object. 423 - In parallel to the attempt made to create NP-bypass or LP-bypass, 424 the PLR MUST initiate a Node-ID based Hello session to the NNhop 425 or Nhop node respectively along the LSP to establish the RSVP-TE 426 signaling adjacency. This Hello session is used to detect MP node 427 failure as well as determine the capability of the MP node. If 428 the MP has set the I-bit in the CAPABILITY object [RFC8370] 429 carried in Hello message corresponding to the Node-ID based Hello 430 session, then the PLR MUST conclude that the MP supports refresh- 431 interval independent FRR procedures defined in this document. If 432 the MP has not sent Node-ID based Hello messages or has not set 433 the I-bit in CAPABILITY object [RFC8370], then the PLR MUST 434 execute backward compatibility procedures defined in 435 Section 4.6.2.1 of this document. 437 - When the PLR associates a bypass to a protected LSP, it MUST 438 include a B-SFRR-Ready Extended Association object [RFC8796] and 439 trigger a Path message to be sent for the LSP. If a B-SFRR-Ready 440 Extended Association object is included in the Path message 441 corresponding to the LSP, the encoding and object ordering rules 442 specified in RSVP-TE Summary FRR [RFC8796] MUST be followed. In 443 addition to those rules, the PLR MUST set the Association Source 444 in the object to its Node-ID address. 446 4.2.2. Remote Signaling Adjacency 448 A Node-ID based RSVP-TE Hello session is one in which Node-ID is used 449 in the source and the destination address fields of RSVP Hello 450 messages [RFC4558]. This document extends Node-ID based RSVP Hello 451 session to track the state of any RSVP-TE neighbor that is not 452 directly connected by at least one interface. In order to apply 453 Node-ID based RSVP-TE Hello session between any two routers that are 454 not immediate neighbors, the router that supports the extensions 455 defined in the document MUST set TTL to 255 in all outgoing Node-ID 456 based Hello messages exchanged between the PLR and the MP. The 457 default hello interval for this Node-ID hello session MUST be set to 458 the default specified in RSVP-TE Scaling Techniques [RFC8370]. < minor > Is it possible that there already exists a RSVP Hello session between the PLR and MP (for some reasons other than FRR)? < major > Is it not necessary to add some text to indicate when these RSVP hello session states need to be cleaned-up/removed? 460 In the rest of the document the term "signaling adjacency", or 461 "remote signaling adjacency" refers specifically to the RSVP-TE 462 signaling adjacency. 511 4.2.4. "Remote" State on MP 513 Once a router concludes it is the MP for a PLR running refresh- 514 interval independent FRR procedures as described in the preceding 515 section, it MUST create a remote path state for the LSP. The only 516 difference between the "remote" path state and the LSP state is the 517 RSVP_HOP object. The RSVP_HOP object in a "remote" path state 518 contains the address that the PLR uses to send Node-ID hello messages 519 to the MP. 521 The MP MUST consider the "remote" path state corresponding to the LSP 522 automatically deleted if: 524 - The MP later receives a Path message for the LSP with no matching 525 B-SFRR-Ready Extended Association object corresponding to the 526 PLR's IP address contained in the Path RRO, or 528 - The Node-ID signaling adjacency with the PLR goes down, or < minor > I assume the above also includes "down" due to something like BFD? 530 - The MP receives backup LSP signaling for the LSP from the PLR or < minor > Does the above assume that the PLR is signaling a different backup LSP? 532 - The MP receives a PathTear for the LSP, or 534 - The MP deletes the LSP state on a local policy or an exception 535 event 537 The purpose of "remote" path state is to enable the PLR to explicitly 538 tear down the path and reservation states corresponding to the LSP by 539 sending a tear message for the "remote" path state. Such a message 540 tearing down "remote" path state is called "Remote" PathTear. 542 The scenarios in which a "Remote" PathTear is applied are described 543 in Section 4.5 of this document. 869 4.6. Backward Compatibility Procedures 871 "Refresh interval Independent FRR" or RI-RSVP-FRR refers to the set 872 of procedures defined in this document to eliminate the reliance of 873 periodic refreshes. The extensions proposed in RSVP-TE Summary FRR 874 [RFC8796] may apply to implementations that do not support RI-RSVP- 875 FRR. On the other hand, RI-RSVP-FRR extensions relating to LSP state 876 cleanup namely Conditional and "Remote" PathTear require support from 877 one-hop and two-hop neighboring nodes along the LSP path. So 878 procedures that fall under LSP state cleanup category MUST NOT be 879 turned on if any of the nodes involved in the node protection FRR 880 i.e. the PLR, the MP and the intermediate node in the case of NP, 881 DOES NOT support RI-RSVP-FRR extensions. Note that for LSPs 882 requesting link protection, only the PLR and the LP-MP MUST support 883 the extensions. 885 4.6.1. Detecting Support for Refresh interval Independent FRR 887 An implementation supporting RI-RSVP-FRR extensions SHOULD set the < minor > I believe this is a MUST per RFC8370 sec 3.1, or am I missing something? 888 flag "Refresh interval Independent RSVP" or RI-RSVP flag in the 889 CAPABILITY object carried in Hello messages as specified in RSVP-TE 890 Scaling Techniques [RFC8370]. If an implementation does not set the 891 flag even if it supports RI-RSVP-FRR extensions, then its neighbors 892 will view the node as any node that does not support the extensions. < major > The above seems to be conflicting with RFC8370 in that it changes the meaning of the I - flag. See previous comments as well. 894 - As nodes supporting the RI-RSVP-FRR extensions initiate Node-ID 895 based signaling adjacency with all immediate neighbors, such a 896 node on the path of a protected LSP can determine whether its Phop 897 and Nhop neighbors support RI-RSVP-FRR enhancements. 899 - As nodes supporting the RI-RSVP-FRR extensions also initiate Node- 900 ID based signaling adjacency with the NNhop along the path of the 901 LSP requested node protection (see Section 4.2.1 of this 902 document), each node along the LSP path can determine whether its 903 NNhop node supports RI-RSVP-FRR enhancements. If the NNhop (a) 904 does not reply to remote Node-ID Hello messages or (b) does not 905 set the RI-RSVP flag in the CAPABILITY object carried in its Node- 906 ID Hello messages, then the node acting as the PLR can conclude 907 that NNhop does not support RI-RSVP-FRR extensions. 909 - If node protection is requested for an LSP and if (a) the PPhop 910 node has not included a matching B-SFRR-Ready Extended Association 911 object in its Path messages or (b) the PPhop node has not 912 initiated remote Node-ID Hello messages or (c) the PPhop node does 913 not set the RI-RSVP flag in the CAPABILITY object carried in its 914 Node-ID Hello messages, then the node MUST conclude that the PLR 915 does not support RI-RSVP-FRR extensions. 980 4.6.2.3. Advertising RI-RSVP without RI-RSVP-FRR 982 If a node supporting facility backup protection [RFC4090] sets the 983 RI-RSVP capability (I bit) but does not support the RI-RSVP-FRR 984 extensions, then it leaves room for stale state to linger around for 985 an inordinate period of time or disruption of normal FRR operation 986 (see Section 3 of this document). Consider the example topology 987 Figure 1 provided in this document. 989 - Assume node B does set RI-RSVP capability in its Node-ID based 990 Hello messages even though it does not support RI-RSVP-FRR 991 extensions. When B detects the failure of its Phop link along an 992 LSP, it will not send Conditional PathTear to C as required by the 993 RI-RSVP-FRR procedures. If B simply leaves the LSP state without 994 deleting, then B may end up holding on to the stale state until 995 the (long) refresh timeout. 997 - Instead of node B, assume node C does set RI-RSVP capability in 998 its Node-id based Hello messages even though it does not support 999 RI-RSVP-FRR extensions. When B details the failure of its Phop 1000 link along an LSP, it will send Conditional PathTear to C as 1001 required by the RI-RSVP-FRR procedures. But, C would not 1002 recognize the condition encoded in the PathTear and end up tearing 1003 down the LSP. 1005 - Assume node B does set RI-RSVP capability in its Node-ID based 1006 Hello messages even though it does not support RI-RSVP-FRR 1007 extensions. Also assume local repair is about to commence on node 1008 B for an LSP that has only requested link protection. That is, B 1009 has not initiated the backup LSP signaling for the LSP. If node B 1010 receives a normal PathTear at this time from ingress A because of 1011 a management event initiated on A, then B simply deletes the LSP 1012 state without sending a Remote PathTear to the LP-MP C. So, C may 1013 end up holding on to the stale state until the (long) refresh 1014 timeout. < major > As mentioned in a previous comment, all these backward compatibility issues would have been mitigated with the use of a new flag than the existing I flag? 1089 6.2. CONDITIONS Flags 1091 Apart from allocating Class-Number for the CONDITIONS object, the 1092 allocation of the Merge-point condition bit or M-bit (see Section 4.4 1093 of this document) will also be done by IANA. 1095 Flag: 0x1 Name: Merge-point condition bit or M-bit < major > This seems like a new registry. What is the allocation policy for it? OTOH from the picture, this seems like a single flag since the rest is marked as a reserved and not a flags field - in that case, we don't need anything from IANA, right?
- [RTG-DIR]Rtgdir early review of draft-ietf-mpls-r… Ketan Talaulikar via Datatracker
- [RTG-DIR]Re: Rtgdir early review of draft-ietf-mp… Chandrasekar Ramachandran