Re: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Gregory Mirsky <gregory.mirsky@ericsson.com> Tue, 01 December 2015 00:50 UTC

From: Gregory Mirsky <gregory.mirsky@ericsson.com>
To: Huaimo Chen <huaimo.chen@huawei.com>, "rtorvi@juniper.net" <rtorvi@juniper.net>, "draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org" <draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>, "teas-chairs@ietf.org" <teas-chairs@ietf.org>, "teas@ietf.org" <teas@ietf.org>
Thread-Topic: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection
Thread-Index: AQHQdUsbhXhJtQ5lC02wWl3AD+NymZ1QUc6ggAeV/ACBS3mFcA==
Date: Tue, 01 Dec 2015 00:49:57 +0000
Message-ID: <7347100B5761DC41A166AC17F22DF1122194BD8B@eusaamb103.ericsson.se>
References: <7347100B5761DC41A166AC17F22DF1121B948347@eusaamb103.ericsson.se> <5316A0AB3C851246A7CA5758973207D44E37EE98@SJCEML701-CHM.china.huawei.com> <7347100B5761DC41A166AC17F22DF1121B94CD49@eusaamb103.ericsson.se> <5316A0AB3C851246A7CA5758973207D44E3804B5@SJCEML701-CHM.china.huawei.com>
In-Reply-To: <5316A0AB3C851246A7CA5758973207D44E3804B5@SJCEML701-CHM.china.huawei.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative; boundary="_000_7347100B5761DC41A166AC17F22DF1122194BD8Beusaamb103erics_"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/mpls/45m_WHPbU3aZbUMMIFnCu2gfhKI>
Cc: "mpls@ietf.org" <mpls@ietf.org>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
Subject: Re: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection
Precedence: list

Hi Huaimo and authors,
apologies for the extended period of silence. Thank you for your answers. I'll use this thread of discussion to review and comment on the latest version of the document.
I believe that we were discussing -02 version and the most current is -04. Thus I've reviewed changes from -02 through -04 versions. Please find my comments and questions below:

*         Source Detects Failure
I cannot agree with several assertions and statements made in this section:

o   the source node (S in Fig.1) detects failure not just of R1 but S-R1 path. The text "the backup ingress MUST use a method to reliably detect the failure of the primary ingress" concentrates only of failure of the R1, thus excluding failure of S-R1 path;

o   Similar omission seen in the last paragraph that concentrates only on scenario when R1 fails and excludes scenario when path S-R1 does fail even though from the perspective of S these are equally defects;

o   And it is not obvious why discussion of monitoring role of the backup ingress located in this section.

*         Backup and Source Detect Failure

o   as noted earlier, the backup ingress has no means to reliably detect failure of S-R1 path. Thus state of the LSP from S perspective and Ra may not be the same without explicit protocol similar to Protection Switchover Coordination [RFC6378].

*         Failure Detection and Refresh PATH Messages

o   as noted earlier, the backup ingress cannot "accurately detect that the ingress node has failed" for all the cases where the Source detects a failure.

o

*         several new places now use normative SHOULD, e.g. section Revert to Primary Ingress:
   If "Revert to Primary Ingress" is desired for a protected LSP, the
   (primary) ingress of the LSP SHOULD re-signal the LSP that starts
   from the primary ingress after the primary ingress restores.
Is there an alternative to re-signaling the LSP?
   After the LSP is re-signaled successfully, the traffic SHOULD be switched
   back to the primary ingress from the backup ingress on the source
   node and redirected into the LSP starting from the primary ingress.
Similarly, is there alternative to switching traffic back to the primary after the LSP been re-signaled successfully?

*         Security Considerations
I believe that the Source-Detect approach introduces several security concerns that were not in scope of RFC 4090. For example, the Source required to monitor S-R1 path thus increasing, if not opening possibility of DDoS attack.


                Regards,
                                Greg


From: Huaimo Chen [mailto:huaimo.chen@huawei.com]
Sent: Tuesday, April 21, 2015 8:27 AM
To: Gregory Mirsky; draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org<mailto:draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>; teas-chairs@ietf.org<mailto:teas-chairs@ietf.org>; teas@ietf.org<mailto:teas@ietf.org>
Cc: mpls@ietf.org<mailto:mpls@ietf.org>; rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>
Subject: RE: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Hi Greg,

Thanks for your comments.
My answers/explanations are inline below with [Huaimo 2].

Best Regards,
Huaimo
From: Gregory Mirsky [mailto:gregory.mirsky@ericsson.com]
Sent: Thursday, April 16, 2015 7:58 PM
To: Huaimo Chen; draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org<mailto:draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>; teas-chairs@ietf.org<mailto:teas-chairs@ietf.org>; teas@ietf.org<mailto:teas@ietf.org>
Cc: mpls@ietf.org<mailto:mpls@ietf.org>; rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>
Subject: RE: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Hi Huaimo,
thank you for kind consideration of my comments. Please find more in-lined and tagged GIM>> notes.

                Regards,
                                Greg

From: Huaimo Chen [mailto:huaimo.chen@huawei.com]
Sent: Sunday, April 12, 2015 11:04 AM
To: Gregory Mirsky; draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org<mailto:draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>; teas-chairs@ietf.org<mailto:teas-chairs@ietf.org>; teas@ietf.org<mailto:teas@ietf.org>
Cc: mpls@ietf.org<mailto:mpls@ietf.org>; rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>
Subject: RE: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Hi Greg,

Thanks for your comments.
My answers/explanations are inline below.

Best Regards,
Huaimo
From: mpls [mailto:mpls-bounces@ietf.org]<mailto:[mailto:mpls-bounces@ietf.org]> On Behalf Of Gregory Mirsky
Sent: Sunday, April 12, 2015 2:04 AM
To: draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org<mailto:draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>; teas-chairs@ietf.org<mailto:teas-chairs@ietf.org>; teas@ietf.org<mailto:teas@ietf.org>
Cc: mpls@ietf.org<mailto:mpls@ietf.org>; rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>
Subject: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Dear Editors, chairs, WG community,
please find my comments to the current version of your work below:

*         Introduction

o   The first paragraph may leave an impression that local protection of transit LSRs is not being already addressed, neither by RFC 4090, nor RFC 4875;
[Huaimo] Will revise it accordingly.

o   I think that "global protection" is not commonly used term, "end-to-end protection" seems to be commonly used instead.
[Huaimo] It seems that "global protection" is better here since we mentioned "local protection" here. It seems that Global Protection is used often.

*         Section 3.1

o   Third paragraph contains the following requirement:
"For a P2P LSP, after the primary ingress fails, the backup ingress must use a method to reliably detect the failure of the primary ingress before the PATH message for the LSP expires at the next hop of the primary ingress."
But that is not obvious that such requirement is really needed. Since this is RSVP-TE LSP, why not to use MP2MP construct and let the Source node to control switchover. Especially since, as noted in the last paragraph of Section 2.1, primary and backup ingress nodes must be connected by a logical link, which in general case will be a tunnel. Thus this solution puts a requirement, implicitly though, to instantiate a tunnel per protection group, tunnel that would not be used to carry traffic.
[Huaimo] The requirement above seems necessary. If the backup ingress does not detect the failure of the primary ingress before the timer for the PATH message for the LSP at the next hop of the primary ingress expires, the LSP will be down after the primary ingress fails. If the backup ingress detects the failure and sends/refreshes the PATH message to the next hop before the timer expires after the primary egress fails, the LSP will continue being up and carry the traffic from the backup ingress via the backup LSP.
For a P2P LSP, it seems that MP2MP construct is not used in RFC 4090 to protect a transit node of a P2P LSP. The logical link between the primary ingress and the backup ingress can be a direct link or a tunnel. It seems that a direct link is common.
GIM>> I think it is strange to cite requirement on scale of seconds if not tens of seconds in discussion of method of local protection that supposed to perform protection switchover in sub-second if not sub-50msec time.
[Huaimo 2] The requirement is for the control plane. More specifically, it is for the PATH message (not to be cleaned up) for the LSP at the next hop of the primary ingress of the LSP when the primary ingress fails. After the primary ingress fails, the next hop will not receive any PATH message from the primary ingress. In order to prevent the PATH message from clean up at the next hop, the backup ingress seems required to detect the failure of the primary ingress and send/refresh the PATH message to the next hop before the PATH message is cleaned up. Thus it seems reasonable for the requirement to have the time for detecting the failure of the primary ingress in seconds or even tens of seconds instead of sub-seconds or within 50 ms.


o   In addition, what is importance of requirement quoted above:
"... before the PATH message for the LSP expires at the next hop of the primary ingress"
[Huaimo] This seems very important. If the timer for the PATH message for the LSP at the next hop of the primary egress expires, then the LSP will be down. So the PATH message must be refreshed before the timer for the PATH message for the LSP expires at the next hop of the primary LSP.
GIM>> As noted above, these seem as requirements of different scale.
[Huaimo 2] See the explanation above.


o   Fourth paragraph makes very questionable assumption in:
"After the primary ingress fails, it will not be reachable after routing convergence."
I believe that if OAM session is between two nodes there's no reliable way to differentiate between node and link failure. Thus, to declare a node unreachable there must be N tunnels for N OAM sessions that monitor all possible paths between two nodes. (Note, that if there was no requirement to use a tunnel between primary and backup ingress, multi-hop BFD could be used though its detection time being limited by IGP convergence, which may be too slow comparing with your requirement of tens milliseconds).
[Huaimo] It is true that "After the primary ingress fails, it will not be reachable after routing convergence."  From routing's point of view, there is no need for us to have any OAM session between two nodes. The timer for a PATH message seems in tens of seconds. Routing convergence is not limited to tens of milliseconds.
GIM>> Routing convergence may take seconds. Is that acceptable as failure detection time for local protection? Protection switchover expected to be fast, perhaps on sub-50 msec scale. From TDM world we carry 10 msec failure detection, and BFD implementations can support that. but here, it appears, you describe failure detection mechanism with detection time on scale of seconds if not tens of seconds.
[Huaimo 2] The routing convergence is for the control plane. Refer to the explanation above.


*         Section 5.1

o   Regarding "Ingress local protection in use" flag
As demonstrated earlier, backup ingress node has no reliable way to detect that primary ingress node is not reachable to the Source and thus protection must be activated.
[Huaimo] It seems that there is no need for the backup ingress to detect whether the primary ingress is reachable to the Source and the focus is on the failure of the primary ingress.
GIM>> In that case, the text is not needed either.
[Huaimo 2] Can you give more details regarding to "the text is not needed either"? Which part of the text (do you think) is not needed in section 5.1?

Considering that backup ingress may initiate described in the document actions not when primary ingress became unavailable to Source, I believe that cases that may produce false positives must be removed along with extensions that intended to support these cases. In my opinion, the only viable case of ingress protection is Source-centric where Source monitors availability of both primary and backup ingress nodes and controls traffic switchover. I'd ask WG to discuss these comments and, if agreed, ask Editors to make appropriate changes to the document.
[Huaimo] It seems that the current version already indicates that the source-detect (i.e., Source detects the failure of the primary ingress and switches traffic to the backup ingress when the primary ingress fails) is used.  There were a few of modes for detecting the failure of the primary ingress that were proposed in the previous versions of the document. A different mode may have a different control on the traffic switch over and/or forwarding.  After discussions, the current version selects the source-detect.
GIM>> If this is historical part, then it may be moved to Appendix or taken from the document altogether.
[Huaimo 2] A couple of detection modes were removed from the document. One more will be smoothed out. Thus there will be only one mode in the document.

Can you give more details about the cases in which false positives may be produced?
GIM>> If current proposal is limited to Source-detect case only  then possibility of false positive/negative depends on Source to Ingress connection and OAM mechanism used. But that is deployment issue and is outside of scope of this document

                Regards,
                                Greg

[mpls] Comments on draft-ietf-teas-rsvp-ingress-p… Gregory Mirsky
Re: [mpls] Comments on draft-ietf-teas-rsvp-ingre… Huaimo Chen
Re: [mpls] Comments on draft-ietf-teas-rsvp-ingre… Gregory Mirsky
Re: [mpls] Comments on draft-ietf-teas-rsvp-ingre… Huaimo Chen
Re: [mpls] Comments on draft-ietf-teas-rsvp-ingre… Gregory Mirsky