Re: [Teas] [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Huaimo Chen <huaimo.chen@huawei.com> Tue, 21 April 2015 15:27 UTC

Return-Path: <huaimo.chen@huawei.com>
X-Original-To: teas@ietfa.amsl.com
Delivered-To: teas@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 43C0A1ACE5E; Tue, 21 Apr 2015 08:27:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.21
X-Spam-Level:
X-Spam-Status: No, score=-4.21 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eGje1vuHflgg; Tue, 21 Apr 2015 08:27:36 -0700 (PDT)
Received: from lhrrgout.huawei.com (lhrrgout.huawei.com [194.213.3.17]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9A1351ACE4D; Tue, 21 Apr 2015 08:27:29 -0700 (PDT)
Received: from 172.18.7.190 (EHLO lhreml403-hub.china.huawei.com) ([172.18.7.190]) by lhrrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id BRQ88849; Tue, 21 Apr 2015 15:27:27 +0000 (GMT)
Received: from SJCEML703-CHM.china.huawei.com (10.212.94.49) by lhreml403-hub.china.huawei.com (10.201.5.217) with Microsoft SMTP Server (TLS) id 14.3.158.1; Tue, 21 Apr 2015 16:27:26 +0100
Received: from SJCEML701-CHM.china.huawei.com ([169.254.3.13]) by SJCEML703-CHM.china.huawei.com ([169.254.5.137]) with mapi id 14.03.0158.001; Tue, 21 Apr 2015 08:27:23 -0700
From: Huaimo Chen <huaimo.chen@huawei.com>
To: Gregory Mirsky <gregory.mirsky@ericsson.com>, "draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org" <draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>, "teas-chairs@ietf.org" <teas-chairs@ietf.org>, "teas@ietf.org" <teas@ietf.org>
Thread-Topic: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection
Thread-Index: AdBz6EeHlPEjW31GTZaF4ZwMtm3+CgBSx/EQAOobtwAA2TtZ4A==
Date: Tue, 21 Apr 2015 15:27:22 +0000
Message-ID: <5316A0AB3C851246A7CA5758973207D44E3804B5@SJCEML701-CHM.china.huawei.com>
References: <7347100B5761DC41A166AC17F22DF1121B948347@eusaamb103.ericsson.se> <5316A0AB3C851246A7CA5758973207D44E37EE98@SJCEML701-CHM.china.huawei.com> <7347100B5761DC41A166AC17F22DF1121B94CD49@eusaamb103.ericsson.se>
In-Reply-To: <7347100B5761DC41A166AC17F22DF1121B94CD49@eusaamb103.ericsson.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.212.244.156]
Content-Type: multipart/alternative; boundary="_000_5316A0AB3C851246A7CA5758973207D44E3804B5SJCEML701CHMchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <http://mailarchive.ietf.org/arch/msg/teas/3Erd4wBwq8WcdV57jKB4E0BH_ZM>
Cc: "mpls@ietf.org" <mpls@ietf.org>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
Subject: Re: [Teas] [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection
X-BeenThere: teas@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Traffic Engineering Architecture and Signaling working group discussion list <teas.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/teas>, <mailto:teas-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/teas/>
List-Post: <mailto:teas@ietf.org>
List-Help: <mailto:teas-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/teas>, <mailto:teas-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Apr 2015 15:27:44 -0000

Hi Greg,

Thanks for your comments.
My answers/explanations are inline below with [Huaimo 2].
Best Regards,
Huaimo
From: Gregory Mirsky [mailto:gregory.mirsky@ericsson.com]
Sent: Thursday, April 16, 2015 7:58 PM
To: Huaimo Chen; draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org; teas-chairs@ietf.org; teas@ietf.org
Cc: mpls@ietf.org; rtg-bfd@ietf.org
Subject: RE: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Hi Huaimo,
thank you for kind consideration of my comments. Please find more in-lined and tagged GIM>> notes.

                Regards,
                                Greg

From: Huaimo Chen [mailto:huaimo.chen@huawei.com]
Sent: Sunday, April 12, 2015 11:04 AM
To: Gregory Mirsky; draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org<mailto:draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>; teas-chairs@ietf.org<mailto:teas-chairs@ietf.org>; teas@ietf.org<mailto:teas@ietf.org>
Cc: mpls@ietf.org<mailto:mpls@ietf.org>; rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>
Subject: RE: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Hi Greg,

Thanks for your comments.
My answers/explanations are inline below.

Best Regards,
Huaimo
From: mpls [mailto:mpls-bounces@ietf.org]<mailto:[mailto:mpls-bounces@ietf.org]> On Behalf Of Gregory Mirsky
Sent: Sunday, April 12, 2015 2:04 AM
To: draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org<mailto:draft-ietf-teas-rsvp-ingress-protection@tools.ietf.org>; teas-chairs@ietf.org<mailto:teas-chairs@ietf.org>; teas@ietf.org<mailto:teas@ietf.org>
Cc: mpls@ietf.org<mailto:mpls@ietf.org>; rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>
Subject: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection

Dear Editors, chairs, WG community,
please find my comments to the current version of your work below:

*        Introduction

o   The first paragraph may leave an impression that local protection of transit LSRs is not being already addressed, neither by RFC 4090, nor RFC 4875;
[Huaimo] Will revise it accordingly.

o   I think that "global protection" is not commonly used term, "end-to-end protection" seems to be commonly used instead.
[Huaimo] It seems that "global protection" is better here since we mentioned "local protection" here. It seems that Global Protection is used often.

*        Section 3.1

o   Third paragraph contains the following requirement:
"For a P2P LSP, after the primary ingress fails, the backup ingress must use a method to reliably detect the failure of the primary ingress before the PATH message for the LSP expires at the next hop of the primary ingress."
But that is not obvious that such requirement is really needed. Since this is RSVP-TE LSP, why not to use MP2MP construct and let the Source node to control switchover. Especially since, as noted in the last paragraph of Section 2.1, primary and backup ingress nodes must be connected by a logical link, which in general case will be a tunnel. Thus this solution puts a requirement, implicitly though, to instantiate a tunnel per protection group, tunnel that would not be used to carry traffic.
[Huaimo] The requirement above seems necessary. If the backup ingress does not detect the failure of the primary ingress before the timer for the PATH message for the LSP at the next hop of the primary ingress expires, the LSP will be down after the primary ingress fails. If the backup ingress detects the failure and sends/refreshes the PATH message to the next hop before the timer expires after the primary egress fails, the LSP will continue being up and carry the traffic from the backup ingress via the backup LSP.
For a P2P LSP, it seems that MP2MP construct is not used in RFC 4090 to protect a transit node of a P2P LSP. The logical link between the primary ingress and the backup ingress can be a direct link or a tunnel. It seems that a direct link is common.
GIM>> I think it is strange to cite requirement on scale of seconds if not tens of seconds in discussion of method of local protection that supposed to perform protection switchover in sub-second if not sub-50msec time.
[Huaimo 2] The requirement is for the control plane. More specifically, it is for the PATH message (not to be cleaned up) for the LSP at the next hop of the primary ingress of the LSP when the primary ingress fails. After the primary ingress fails, the next hop will not receive any PATH message from the primary ingress. In order to prevent the PATH message from clean up at the next hop, the backup ingress seems required to detect the failure of the primary ingress and send/refresh the PATH message to the next hop before the PATH message is cleaned up. Thus it seems reasonable for the requirement to have the time for detecting the failure of the primary ingress in seconds or even tens of seconds instead of sub-seconds or within 50 ms.


o   In addition, what is importance of requirement quoted above:
"... before the PATH message for the LSP expires at the next hop of the primary ingress"
[Huaimo] This seems very important. If the timer for the PATH message for the LSP at the next hop of the primary egress expires, then the LSP will be down. So the PATH message must be refreshed before the timer for the PATH message for the LSP expires at the next hop of the primary LSP.
GIM>> As noted above, these seem as requirements of different scale.
[Huaimo 2] See the explanation above.


o   Fourth paragraph makes very questionable assumption in:
"After the primary ingress fails, it will not be reachable after routing convergence."
I believe that if OAM session is between two nodes there's no reliable way to differentiate between node and link failure. Thus, to declare a node unreachable there must be N tunnels for N OAM sessions that monitor all possible paths between two nodes. (Note, that if there was no requirement to use a tunnel between primary and backup ingress, multi-hop BFD could be used though its detection time being limited by IGP convergence, which may be too slow comparing with your requirement of tens milliseconds).
[Huaimo] It is true that "After the primary ingress fails, it will not be reachable after routing convergence."  From routing's point of view, there is no need for us to have any OAM session between two nodes. The timer for a PATH message seems in tens of seconds. Routing convergence is not limited to tens of milliseconds.
GIM>> Routing convergence may take seconds. Is that acceptable as failure detection time for local protection? Protection switchover expected to be fast, perhaps on sub-50 msec scale. From TDM world we carry 10 msec failure detection, and BFD implementations can support that. but here, it appears, you describe failure detection mechanism with detection time on scale of seconds if not tens of seconds.
[Huaimo 2] The routing convergence is for the control plane. Refer to the explanation above.


*        Section 5.1

o   Regarding "Ingress local protection in use" flag
As demonstrated earlier, backup ingress node has no reliable way to detect that primary ingress node is not reachable to the Source and thus protection must be activated.
[Huaimo] It seems that there is no need for the backup ingress to detect whether the primary ingress is reachable to the Source and the focus is on the failure of the primary ingress.
GIM>> In that case, the text is not needed either.
[Huaimo 2] Can you give more details regarding to "the text is not needed either"? Which part of the text (do you think) is not needed in section 5.1?

Considering that backup ingress may initiate described in the document actions not when primary ingress became unavailable to Source, I believe that cases that may produce false positives must be removed along with extensions that intended to support these cases. In my opinion, the only viable case of ingress protection is Source-centric where Source monitors availability of both primary and backup ingress nodes and controls traffic switchover. I'd ask WG to discuss these comments and, if agreed, ask Editors to make appropriate changes to the document.
[Huaimo] It seems that the current version already indicates that the source-detect (i.e., Source detects the failure of the primary ingress and switches traffic to the backup ingress when the primary ingress fails) is used.  There were a few of modes for detecting the failure of the primary ingress that were proposed in the previous versions of the document. A different mode may have a different control on the traffic switch over and/or forwarding.  After discussions, the current version selects the source-detect.
GIM>> If this is historical part, then it may be moved to Appendix or taken from the document altogether.
[Huaimo 2] A couple of detection modes were removed from the document. One more will be smoothed out. Thus there will be only one mode in the document.

Can you give more details about the cases in which false positives may be produced?
GIM>> If current proposal is limited to Source-detect case only  then possibility of false positive/negative depends on Source to Ingress connection and OAM mechanism used. But that is deployment issue and is outside of scope of this document

                Regards,
                                Greg