Re: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Gregory Mirsky <gregory.mirsky@ericsson.com> Thu, 16 April 2015 23:58 UTC

From: Gregory Mirsky <gregory.mirsky@ericsson.com>
To: Huaimo Chen <huaimo.chen@huawei.com>, "draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org" <draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>, "teas-chairs@ietf.org" <teas-chairs@ietf.org>, "teas@ietf.org" <teas@ietf.org>
Thread-Topic: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection
Thread-Index: AQHQdUsbhXhJtQ5lC02wWl3AD+NymZ1QUc6g
Date: Thu, 16 Apr 2015 23:57:58 +0000
Message-ID: <7347100B5761DC41A166AC17F22DF1121B94CD49@eusaamb103.ericsson.se>
References: <7347100B5761DC41A166AC17F22DF1121B948347@eusaamb103.ericsson.se> <5316A0AB3C851246A7CA5758973207D44E37EE98@SJCEML701-CHM.china.huawei.com>
In-Reply-To: <5316A0AB3C851246A7CA5758973207D44E37EE98@SJCEML701-CHM.china.huawei.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative; boundary="_000_7347100B5761DC41A166AC17F22DF1121B94CD49eusaamb103erics_"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/mpls/JR6IVPz7ErKua4xdhF8mBoO1FXc>
Cc: "mpls@ietf.org" <mpls@ietf.org>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
Subject: Re: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection
Precedence: list

Hi Huaimo,
thank you for kind consideration of my comments. Please find more in-lined and tagged GIM>> notes.

                Regards,
                                Greg

From: Huaimo Chen [mailto:huaimo.chen@huawei.com]
Sent: Sunday, April 12, 2015 11:04 AM
To: Gregory Mirsky; draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org; teas-chairs@ietf.org; teas@ietf.org
Cc: mpls@ietf.org; rtg-bfd@ietf.org
Subject: RE: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Hi Greg,

Thanks for your comments.
My answers/explanations are inline below.

Best Regards,
Huaimo
From: mpls [mailto:mpls-bounces@ietf.org]<mailto:[mailto:mpls-bounces@ietf.org]> On Behalf Of Gregory Mirsky
Sent: Sunday, April 12, 2015 2:04 AM
To: draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org<mailto:draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>; teas-chairs@ietf.org<mailto:teas-chairs@ietf.org>; teas@ietf.org<mailto:teas@ietf.org>
Cc: mpls@ietf.org<mailto:mpls@ietf.org>; rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>
Subject: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Dear Editors, chairs, WG community,
please find my comments to the current version of your work below:

*         Introduction

o   The first paragraph may leave an impression that local protection of transit LSRs is not being already addressed, neither by RFC 4090, nor RFC 4875;
[Huaimo] Will revise it accordingly.

o   I think that "global protection" is not commonly used term, "end-to-end protection" seems to be commonly used instead.
[Huaimo] It seems that "global protection" is better here since we mentioned "local protection" here. It seems that Global Protection is used often.

*         Section 3.1

o   Third paragraph contains the following requirement:
"For a P2P LSP, after the primary ingress fails, the backup ingress must use a method to reliably detect the failure of the primary ingress before the PATH message for the LSP expires at the next hop of the primary ingress."
But that is not obvious that such requirement is really needed. Since this is RSVP-TE LSP, why not to use MP2MP construct and let the Source node to control switchover. Especially since, as noted in the last paragraph of Section 2.1, primary and backup ingress nodes must be connected by a logical link, which in general case will be a tunnel. Thus this solution puts a requirement, implicitly though, to instantiate a tunnel per protection group, tunnel that would not be used to carry traffic.
[Huaimo] The requirement above seems necessary. If the backup ingress does not detect the failure of the primary ingress before the timer for the PATH message for the LSP at the next hop of the primary ingress expires, the LSP will be down after the primary ingress fails. If the backup ingress detects the failure and sends/refreshes the PATH message to the next hop before the timer expires after the primary egress fails, the LSP will continue being up and carry the traffic from the backup ingress via the backup LSP.
For a P2P LSP, it seems that MP2MP construct is not used in RFC 4090 to protect a transit node of a P2P LSP. The logical link between the primary ingress and the backup ingress can be a direct link or a tunnel. It seems that a direct link is common.
GIM>> I think it is strange to cite requirement on scale of seconds if not tens of seconds in discussion of method of local protection that supposed to perform protection switchover in sub-second if not sub-50msec time.

o   In addition, what is importance of requirement quoted above:
"... before the PATH message for the LSP expires at the next hop of the primary ingress"
[Huaimo] This seems very important. If the timer for the PATH message for the LSP at the next hop of the primary egress expires, then the LSP will be down. So the PATH message must be refreshed before the timer for the PATH message for the LSP expires at the next hop of the primary LSP.
GIM>> As noted above, these seem as requirements of different scale.

o   Fourth paragraph makes very questionable assumption in:
"After the primary ingress fails, it will not be reachable after routing convergence."
I believe that if OAM session is between two nodes there's no reliable way to differentiate between node and link failure. Thus, to declare a node unreachable there must be N tunnels for N OAM sessions that monitor all possible paths between two nodes. (Note, that if there was no requirement to use a tunnel between primary and backup ingress, multi-hop BFD could be used though its detection time being limited by IGP convergence, which may be too slow comparing with your requirement of tens milliseconds).
[Huaimo] It is true that "After the primary ingress fails, it will not be reachable after routing convergence."  From routing's point of view, there is no need for us to have any OAM session between two nodes. The timer for a PATH message seems in tens of seconds. Routing convergence is not limited to tens of milliseconds.
GIM>> Routing convergence may take seconds. Is that acceptable as failure detection time for local protection? Protection switchover expected to be fast, perhaps on sub-50 msec scale. From TDM world we carry 10 msec failure detection, and BFD implementations can support that. but here, it appears, you describe failure detection mechanism with detection time on scale of seconds if not tens of seconds.

*         Section 5.1

o   Regarding "Ingress local protection in use" flag
As demonstrated earlier, backup ingress node has no reliable way to detect that primary ingress node is not reachable to the Source and thus protection must be activated.
[Huaimo] It seems that there is no need for the backup ingress to detect whether the primary ingress is reachable to the Source and the focus is on the failure of the primary ingress.
GIM>> In that case, the text is not needed either.

Considering that backup ingress may initiate described in the document actions not when primary ingress became unavailable to Source, I believe that cases that may produce false positives must be removed along with extensions that intended to support these cases. In my opinion, the only viable case of ingress protection is Source-centric where Source monitors availability of both primary and backup ingress nodes and controls traffic switchover. I'd ask WG to discuss these comments and, if agreed, ask Editors to make appropriate changes to the document.
[Huaimo] It seems that the current version already indicates that the source-detect (i.e., Source detects the failure of the primary ingress and switches traffic to the backup ingress when the primary ingress fails) is used.  There were a few of modes for detecting the failure of the primary ingress that were proposed in the previous versions of the document. A different mode may have a different control on the traffic switch over and/or forwarding.  After discussions, the current version selects the source-detect.
GIM>> If this is historical part, then it may be moved to Appendix or taken from the document altogether.

Can you give more details about the cases in which false positives may be produced?
GIM>> If current proposal is limited to Source-detect case only  then possibility of false positive/negative depends on Source to Ingress connection and OAM mechanism used. But that is deployment issue and is outside of scope of this document

                Regards,
                                Greg

[mpls] Comments on draft-ietf-teas-rsvp-ingress-p… Gregory Mirsky
Re: [mpls] Comments on draft-ietf-teas-rsvp-ingre… Huaimo Chen
Re: [mpls] Comments on draft-ietf-teas-rsvp-ingre… Gregory Mirsky
Re: [mpls] Comments on draft-ietf-teas-rsvp-ingre… Huaimo Chen
Re: [mpls] Comments on draft-ietf-teas-rsvp-ingre… Gregory Mirsky