[mpls] John Scudder's Discuss on draft-ietf-mpls-ri-rsvp-frr-18: (with DISCUSS and COMMENT)

John Scudder via Datatracker <noreply@ietf.org> Wed, 29 May 2024 00:35 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: mpls@ietf.org
Delivered-To: mpls@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id BB31DC14F700; Tue, 28 May 2024 17:35:45 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: John Scudder via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
X-Test-IDTracker: no
X-IETF-IDTracker: 12.13.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <171694294575.2989.12240535542000524349@ietfa.amsl.com>
Date: Tue, 28 May 2024 17:35:45 -0700
Message-ID-Hash: SKT6MCPWGTEH4WZIH6L6Q4LNHHBV563L
X-Message-ID-Hash: SKT6MCPWGTEH4WZIH6L6Q4LNHHBV563L
X-MailFrom: noreply@ietf.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-mpls.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: draft-ietf-mpls-ri-rsvp-frr@ietf.org, mpls-chairs@ietf.org, mpls@ietf.org
X-Mailman-Version: 3.3.9rc4
Reply-To: John Scudder <jgs@juniper.net>
Subject: [mpls] John Scudder's Discuss on draft-ietf-mpls-ri-rsvp-frr-18: (with DISCUSS and COMMENT)
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/mpls/LxtfVT9SalfmHYl6Msx40Cy-baA>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mpls>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Owner: <mailto:mpls-owner@ietf.org>
List-Post: <mailto:mpls@ietf.org>
List-Subscribe: <mailto:mpls-join@ietf.org>
List-Unsubscribe: <mailto:mpls-leave@ietf.org>

John Scudder has entered the following ballot position for
draft-ietf-mpls-ri-rsvp-frr-18: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ 
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-mpls-ri-rsvp-frr/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

# John Scudder, RTG AD, comments for draft-ietf-mpls-ri-rsvp-frr-18
CC @jgscudder

Thanks for this document. It's a dense read for someone like me who is not an
expert in RSVP, but it seems important, useful, and carefully done. I
appreciate the precise and detailed work.

I have one concern I've flagged as a DISCUSS point, which may turn out not to
be a big deal if I've just misunderstood some nuance, but in any case, we
should talk through it. I also have some other comments I hope might be helpful.

## DISCUSS

### Section 4.1

   A node supporting facility backup protection [RFC4090] MUST set the
   RI-RSVP flag (I bit) that is defined in Section 3.1 of RSVP-TE
   Scaling Techniques [RFC8370] only if it supports all the extensions
   specified in the rest of this document.

I have several concerns about this. At least some of them probably relate to my
limited expertise in the subject area, but maybe I can at least expose some
areas where additional explanation might be helpful in the document.

First and foremost, as written this sentence doesn't seem like it's achievable
in practice. Couldn't there be a legacy router in the field that supports both
RFC 4090 and RFC 8370? Wouldn't that router then advertise the I bit, at least
in some circumstances?

Second, this MUST seems to conflict with a SHOULD in section 4.6.1, which says,

   An implementation supporting RI-RSVP-FRR extensions SHOULD set the
   flag "Refresh interval Independent RSVP" or RI-RSVP flag in the
   CAPABILITY object carried in Hello messages as specified in RSVP-TE
   Scaling Techniques [RFC8370].

It makes me wonder if the section 4.1 paragraph should be more like this.

NEW:
   A node supporting facility backup protection [RFC4090] MUST NOT set the
   RI-RSVP flag (I bit) that is defined in Section 3.1 of RSVP-TE
   Scaling Techniques [RFC8370] unless it supports all the extensions
   specified in the rest of this document.

(Although even though that might be clearer, the first point stands, that it
might not be achievable.)

Third, moving on to the end of the paragraph,

   Procedures for backward compatibility (see Section 4.6.2.3 of this
   document) delves on this in detail.

As far as I can tell, despite the title of Section 4.6.2, Subsection 4.6.2.3
isn't a procedure for backward compatibility, it's a warning about the terrible
things that can happen if some node in the network is incompatible.

Is all of this right? That there could be a legacy node in the network, with no
way to detect that it doesn't comply with the present specification, and that
terrible things such as 4.6.2.3 describes, could happen as a result?

(I see Ketan Talaulikar raised a related concern in his RTGDIR review.)


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

## COMMENT

### RTGDIR review

Please take a look at Ketan Talaulikar's RTGDIR review that was posted
yesterday,
(https://mailarchive.ietf.org/arch/msg/rtg-dir/PRZJa7LH9b3J1aRFo3BvYZ3DQUQ/) I
won't repeat his points here, other than the one that appeared in my DISCUSS,
but please consider all of them.

### Section 4.3.3, use of RFC 2119 keyword in example

You have a number of sections that include examples. The examples are very
helpful for understanding the specification, thank you! However, you use RFC
2119/BCP 14 keywords in the examples, and I think you shouldn't. Those keywords
are reserved for specifying procedures, and an example isn't specifying
procedure, it's demonstrating it.

I'll call out each example separately but only provide the explanation here.

In this section, what I noticed was,

   1.  When C receives a Conditional PathTear from B, it decides to
      retain LSP state as it is the NP-MP of the PLR A.  C also MUST
      check whether Phop B had previously signaled availability of node
      protection.  As B had previously signaled NP availability by
      including B-SFRR-Ready Extended Association object, C MUST remove
      the B-SFRR-Ready Extended Association object containing
      Association Source set to B from the Path message and trigger a
      Path to D.

There are various ways this could be rewritten, the absolute easiest would be
to simply substitute "must" for "MUST", although it would be cleaner to fully
rewrite in terms of actions instead of expectations, as in,

NEW:
   1.  When C receives a Conditional PathTear from B, it decides to
      retain LSP state as it is the NP-MP of the PLR A.  It also
      checks whether Phop B had previously signaled availability of node
      protection.  As B had previously signaled NP availability by
      including B-SFRR-Ready Extended Association object, C removes
      the B-SFRR-Ready Extended Association object containing
      Association Source set to B from the Path message and triggers a
      Path to D.

### Section 4.4.3, RFC 2119 keyword misuse

   As any implementation that does not support Conditional PathTear MUST
   ignore the new object but process the message as a normal PathTear
   without generating any error, the Class-Num of the new object MUST be
   10bbbbbb where 'b' represents a bit (from Section 3.10 of [RFC2205]).

I think the first MUST is unnecessary since you're stating a need, not a
requirement. The need is fulfilled as a natural consequence of your choice of
code point and the underlying requirement from RFC 2205. The second MUST is
also unnecessary, you are not telling the implementer anything, you're just
explaining why you chose the type code you did.

One possible fix would be to delete the paragraph, but it's probably nicer to
the reader if you leave the explanation in place. Perhaps something like,

NEW:
   Any implementation that does not support Conditional PathTear needs
   to ignore the new object but process the message as a normal PathTear
   without generating any error. For this reason, the Class-Num of the
   new object follows the pattern 10bbbbbb where 'b' represents a bit.
   (The behavior for objects of this type is specified in Section 3.10
   of [RFC2205]).

### Section 4.5, clarification of requirement

   If the ingress wants to tear down the LSP because of a management
   event while the LSP is being locally repaired at a transit PLR, it
   would not be desirable to wait till the completion of backup LSP
   signaling to perform state cleanup.  To enable LSP state cleanup when
   the LSP is being locally repaired, the PLR MUST send a "Remote"
   PathTear message instructing the MP to delete the PSB and RSB states
   corresponding to the LSP.  The TTL in the "Remote" PathTear message
   MUST be set to 255.

In this paragraph, I am unclear exactly what the sentence "To enable LSP state
cleanup when the LSP is being locally repaired, the PLR MUST send a "Remote"
PathTear message instructing the MP to delete the PSB and RSB states
corresponding to the LSP" is telling me. My best guess is,

NEW:
   If the ingress wants to tear down the LSP because of a management
   event while the LSP is being locally repaired at a transit PLR, it
   would not be desirable to wait till the completion of backup LSP
   signaling to perform state cleanup.  In this case, the PLR MUST send
   a "Remote" PathTear message instructing the MP to delete the PSB and
   RSB states corresponding to the LSP.  The TTL in the "Remote"
   PathTear message MUST be set to 255. Doing this enables LSP state
   cleanup when the LSP is being locally repaired,

Is that rewrite faithful to what you intended? If so, I suggest using it, or
any other rewrite of your choosing that clarifies matters. In particular, my
intent in the rewrite is to make it clear that the PLR is unequivocally
required to do this, which IMO wasn't clear before.

### Section 4.5.1, clarification of requirement

   If local repair fails on the PLR after a failure, then this MUST be
   considered as a case for cleaning up LSP state from the PLR to the
   Egress.  The PLR achieves state cleanup by sending "Remote" PathTear
   to the MP.  The MP MUST delete the states corresponding to the LSP
   also propagate the PathTear downstream thereby achieving state
   cleanup from all downstream nodes up to the LSP egress.  Note that in
   the case of link protection, the PathTear MUST be directed to the LP-
   MP's Node-ID IP address rather than the Nhop interface address.

Similar to the previous case, the combination of the RFC 2119 keyword with the
casual writing style leaves the intent of the requirement unclear to me. Here's
an attempt at a rewrite, in the same spirit.

NEW:
   If local repair fails on the PLR after a failure, the PLR MUST send a
   "Remote" PathTear to the MP.  The purpose of doing this is to clean
   up LSP state from the PLR to the Egress. Upon receiving the PathTear,
   the MP will delete the states corresponding to the LSP and also
   propagate the PathTear downstream thereby achieving state cleanup
   from all downstream nodes up to the LSP egress.  Note that in the
   case of link protection, the PathTear MUST be directed to the LP-
   MP's Node-ID IP address rather than the Nhop interface address.

Note that I removed the second MUST, on the assumption that you aren't
specifying a new requirement for the MP, but stating the natural consequence of
a requirement you've already specified elsewhere. Please double-check that I
haven't broken something!

### Section 4.5.2, use of RFC 2119 keyword in example

As with my 4.3.3 comment. In this case my suggestion is,

OLD:
   RRO of the LSP carried in the Resv will not contain C.  When A
   processes the Resv message with a new RRO not containing C - its
   former NP-MP, A MUST send a "Remote" PathTear to C.  When C receives

NEW:
   RRO of the LSP carried in the Resv will not contain C.  When A
   processes the Resv message with a new RRO not containing C - its
   former NP-MP, A sends a "Remote" PathTear to C.  When C receives

### Section 4.5.3.1, I'm confused

   If an LSP is preempted on an LP-MP after its Phop or the incoming
   link has already failed

Why "Phop or the incoming link" and not just "incoming link"? It's a
link-protecting merge point, so why does it care about the failure of an
upstream *node*? Also, this seems as though it conflicts with the title of the
subsection, which is "Preemption on LP-MP after Phop Link Failure" and not "...
after Phop or Phop Link Failure".

Could it be rewritten as follows?

   If an LSP is preempted on an LP-MP after its Phop
   link has already failed

### Section 4.5.3.2, I'm still confused but in the other direction

   If an LSP is preempted on an NP-MP after its Phop link has already
   failed

If it's a node-protecting merge point, shouldn't this one care about the Phop
as well as the Phop link? (Also in the title of the subsection.)

### Section 4.5.3.2, use of RFC 2119 keyword in example

As with my 4.3.3 comment. In this case my suggestion is,

OLD:
   3.  As the only reason for C having retained state after Phop node
      failure was that it was an NP-MP, C MUST send a normal PathTear to
      D and delete its PSB state also.  D would also delete the PSB and
      RSB states on receiving a PathTear from C.

NEW:
   3.  As the only reason for C having retained state after Phop node
      failure was that it was an NP-MP, C sends a normal PathTear to
      D and deletes its PSB state also.  D would also delete the PSB and
      RSB states on receiving a PathTear from C.

### Section 4.6, ALL CAPS

I probably wouldn't even flag this if I weren't doing a full review already, but

   DOES NOT support RI-RSVP-FRR extensions.  Note that for LSPs

Although RFC 2119 doesn't forbid the use of all caps for non-reserved keywords,
it's often considered to be inadvisable. I would suggest,

NEW:
   does not support RI-RSVP-FRR extensions.  Note that for LSPs

### Section 4.6.1, SHOULD set

   An implementation supporting RI-RSVP-FRR extensions SHOULD set the
   flag "Refresh interval Independent RSVP" or RI-RSVP flag in the
   CAPABILITY object carried in Hello messages as specified in RSVP-TE
   Scaling Techniques [RFC8370].

Depending on the resolution of my DISCUSS point, I guess you might want to
revisit this SHOULD.

### Section 4.6.2.1, use of RFC 2119 keyword in example

As with my 4.3.3 comment. In this case my suggestion is,

OLD:
   -  A and B MUST reduce the refresh time to the default short refresh
      interval of 30 seconds and trigger a Path message

   -  If B is not an MP and if Phop link of B fails, B cannot send
      Conditional PathTear to C but MUST time out the PSB state from A
      normally.  Note that B can time out the PSB state A normally only
      if A did not set long refresh in the TIME_VALUES object carried in
      the Path messages sent earlier.

NEW:
   -  A and B reduce the refresh time to the default short refresh
      interval of 30 seconds and trigger a Path message

   -  If B is not an MP and if Phop link of B fails, B cannot send
      Conditional PathTear to C but times out the PSB state from A
      normally.  Note that B can time out the PSB state A normally only
      if A did not set long refresh in the TIME_VALUES object carried in
      the Path messages sent earlier.

## Notes

This review is in the ["IETF Comments" Markdown format][ICMF], You can use the
[`ietf-comments` tool][ICT] to automatically convert this review into
individual GitHub issues.

[ICMF]: https://github.com/mnot/ietf-comments/blob/main/format.md
[ICT]: https://github.com/mnot/ietf-comments