Re: [mpls] MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr

Chandrasekar Ramachandran <csekar@juniper.net> Wed, 02 December 2015 19:28 UTC

From: Chandrasekar Ramachandran <csekar@juniper.net>
To: "Aissaoui, Mustapha (Mustapha)" <mustapha.aissaoui@alcatel-lucent.com>, Sriganesh Kini <sriganesh.kini@ericsson.com>, Loa Andersson <loa@pi.nu>, Lucy yong <lucy.yong@huawei.com>
Thread-Topic: MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
Thread-Index: AQHRDMPhsONZcO/ZDU6JSnAr5/i5wJ55MxXAgC7BdQCAAj56AIAB1u4AgAwwBlA=
Date: Wed, 02 Dec 2015 19:27:47 +0000
Message-ID: <BN3PR0501MB1377A3471B2E76D0827F3700D90E0@BN3PR0501MB1377.namprd05.prod.outlook.com>
References: <5628D430.4070602@pi.nu> <4A79394211F1AF4EB57D998426C9340DD479A17D@US70UWXCHMBA01.zam.alcatel-lucent.com> <56514290.60200@pi.nu> <95453A37E413464E93B5ABC0F8164C4D14C9BB4D@eusaamb101.ericsson.se> <4A79394211F1AF4EB57D998426C9340DD47C1317@US70UWXCHMBA01.zam.alcatel-lucent.com>
In-Reply-To: <4A79394211F1AF4EB57D998426C9340DD47C1317@US70UWXCHMBA01.zam.alcatel-lucent.com>
Accept-Language: en-US
Content-Language: en-US
received-spf: None (protection.outlook.com: juniper.net does not designate permitted sender hosts)
spamdiagnosticoutput: 1:23
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Dec 2015 19:27:47.3718 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR0501MB1380
Archived-At: <http://mailarchive.ietf.org/arch/msg/mpls/1tQBiBaj9qQcjHOgaKLDNemLeIw>
Cc: "mpls-chairs@ietf.org" <mpls-chairs@ietf.org>, "mpls@ietf.org" <mpls@ietf.org>, "draft-chandra-mpls-ri-rsvp-frr@ietf.org" <draft-chandra-mpls-ri-rsvp-frr@ietf.org>, Guijuan Wang 3 <guijuan.wang@ericsson.com>, Lizhong Jin <lizho.jin@gmail.com>
Subject: Re: [mpls] MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Dec 2015 19:28:13 -0000

Mustapha,

> -----Original Message-----
> From: Aissaoui, Mustapha (Mustapha) [mailto:mustapha.aissaoui@alcatel-
> lucent.com]
> Sent: Thursday, November 26, 2015 8:48 AM
> To: Sriganesh Kini <sriganesh.kini@ericsson.com>; Loa Andersson
> <loa@pi.nu>; Lucy yong <lucy.yong@huawei.com>
> Cc: Lizhong Jin <lizho.jin@gmail.com>; Guijuan Wang 3
> <guijuan.wang@ericsson.com>; draft-chandra-mpls-ri-rsvp-frr@ietf.org;
> mpls-chairs@ietf.org; Aissaoui, Mustapha (Mustapha)
> <mustapha.aissaoui@alcatel-lucent.com>; mpls@ietf.org
> Subject: RE: MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
> 
> Dear authors,
> I read this draft and I believe the intent is to provide a tighter control plane
> synchronization of the PLR and MP roles on a per RSVP session basis such
> that a LSR will know if it is a MP for a given RSVP session which requested
> protection. This is done such that the decision to retain or delete the state by
> the LSR detecting the failure of the link to the previous hop (or the failure of
> the previous hop itself) is made with the prior knowledge that the LSR is a MP
> or not on a per RSVP session basis.

[Chandra] The primary motivation of the draft is to enable LSRs to support FRR when the LSP scale on the LSR is of the order of hundreds of thousands. The analysis of the bottlenecks as a consequence of LSP scale, that can cause disruption of the LSR has already been documented in RFC 5439. As analyzed in RFC 5439, while there is a correlation between the percentage increase in refresh time and the improvement in LSR performance, there is a degree of functionality that is lost owing to the soft-state nature of the protocol. TE-SCALE-REC (https://tools.ietf.org/html/draft-ietf-teas-rsvp-te-scaling-rec-00) outlines the motivation for refresh independent RSVP (RI-RSVP) for all types of LSPs (packet or non-packet), and makes recommendations to enable RSVP implementations eliminate the reliance on short refresh time. This draft addresses the refresh-interval dependent behavior of RFC 4090 in order to support RI-RSVP, as facility backup FRR is a widely deployed feature in production networks.

> However,  the procedures presented in this document are fairly complex and
> go at odds with the original intent of making state synchronization based on
> a soft-state mechanism.

[Chandra] The tradeoffs between using short or long refresh intervals has been well understood. Short refresh intervals aid fast synchronization of states along the path of the LSP, but is problematic because of the control message traffic that a router has to handle at high LSP scale. Routers not only must synchronize new states as promptly as possible, but also must maintain the rate of periodic Srefresh messages to a level sufficient to refresh all existing states without being timed out. When the number of LSPs that routers carry approach half a million, there will be two problems with the control message rate.
(1) As analyzed in RFC 5439, even with RFC 2961 refresh reduction the size of Srefresh message may become very large, and the processing required may cause disruption of the LSR.
(2) Apart from the problem of RSVP message processing overhead, there is also the problem of RSVP-TE becoming a bottleneck preventing the router to scale other protocols or services.

> In fact, what I fail to find is a compelling argument
> or data from the field which shows the issue is not resolved via much simpler
> methods which are used in production networks today. I describe this in the
> detailed comments below.

[Chandra] Some of the mechanisms already deployed in multi-vendor production networks involve configuring implementation specific timers or delays on LSRs. Multiple vendors support various timers or delays on Ingress and Transit LSRs. However, in practice the values configured for these timers or delays are very scale specific, and the values that work at one LSP scale usually do not work at higher LSP scale. In practice, the "scale" that impacts the behavior at specific timer or delay value is not only the number of LSPs carried on the router, but also the number of other protocol states that reside on the router. It should be noted that the reliance on such implementation specific timers or delays has been a major contributor of operational complexity in running RSVP-TE FRR. Any solution based on such timers or delays while being operationally simple at one scale ceases to be so at higher scale. It has been practically found (with existing implementations from multiple vendors) that it is operationally hard to find out the new better value as a running production network grows over time. In short, the use of such timers or delays has been found to involve guess work that seldom remain simple in the long run.

> Regards,
> Mustapha.
> ---------------------
> 1. Section 3.1 - Problem Description
> "
> - If the protected LSP on C times out before D receives signaling
> for the backup LSP, then D would receive PathTear from C prior to
> receiving signaling for the backup LSP, thus resulting in deleting
> the LSP state. This would be possible at scale even with default
> refresh time.
> "
> MA> Since each LSR in the path of a RSVP session which requested protection
> has to assume it can be a MP without prior knowledge, a simpler method is
> to reset the refresh timeout for each session as soon as the link to the
> previous hop failed. In fact, a user configurable MP timeout upon failure,
> independent of the refresh timeout, can be provided to tune it to the desired
> value to give enough time to the Path message to be received via the bypass
> LSP.

[Chandra] The option of resetting the refresh timeout may not be viable if long refresh interval (of the order of tens of minutes) is applied on the LSPs. That leaves the other option of providing a "wait timer" (independent of refresh time) that is configurable on the MP. However, it is fairly clear that the "wait timer" that the operator should configure on MP will be problematic for two reasons. The operator should carefully analyze the performance impact of an existing timer/delay value if and when (a) the LSP scale on the same router increases, and (2) the LSP scale increases on other routers around it (that may potentially become upstream PLRs)!

> 2. Section 3.1 - Problem Description:
> "
> - If upon the link failure C is to keep state until its timeout,
> then with long refresh interval this may result in a large amount
> of stale state on C. Alternatively, if upon the link failure C is
> to delete the state and send PathTear to D, this would result in
> deleting the state on D, thus deleting the LSP. D needs a reliable
> mechanism to determine whether it is MP or not to overcome this
> problem.
> "
> MA> What is exactly the issue with state timeout being retained until the
> refresh timeout? You refer to this as "stale" state but in fact it is desirable to
> keep the state until node D created the backup PSB. Also, remember that
> head-end node will perform global revertive MBB and may tear down the LSP
> before the state timeout.

[Chandra] As described in the first response in this mail, the motivation driving the draft is to eliminate RSVP-TE's reliance of short refresh intervals. If router C were to retain the LSP state until time out, without any additional procedures that provide an explicit indication to router C on when the LSP state is no longer required, then router C would retain the LSP state potentially for hours if the refresh interval is long. So, router C will not only store the state of the LSP but also periodically send the Path message (in Srefresh) downstream - thereby unnecessarily consuming resources that could potentially be utilized for other LSPs.

It should also be noted that the transit router cannot assume that Ingress LSR will be able to complete global repair within a particular time frame. The transit LSR procedures should be able to handle cases when global repair does not complete for some valid reason for an extended period of time.

> 3. Section 3.1 - Problem Statement:
> "
> - If head-end A attempts to tear down LSP after step 1 but before
> step 2 of the above sequence, then B may receive the tear down
> message before step 2 and delete the LSP state from its state
> database. If B deletes its state without informing D, with long
> refresh interval this could cause (large) buildup of stale state
> on D.
> "
> MA> I am not sure I understand the issue here. If B acting as a PLR receives a
> PathTear from A, all it needs to do is to check that the primary path neighbor
> (C in this case) is in down or in cleanup state and send the PathTear over the
> bypass before deleting its own local state.
> Note here a key assumption is that PLR node B must send triggered Path
> refresh messages to the MP over the bypass. If node B has to wait to the
> next refresh interval to send the Path refresh over the bypass, then you can
> run into the issue described here.

[Chandra] The problem described is different. Router B detects the failure and undertakes local actions for re-routing traffic for all protected LSPs traversing the failed link. In parallel, router B also initiates backup LSP signaling for all those protected LSPs. At scale (i.e. if the number of protected LSPs thus undergoing repair is of the order of hundreds of thousands), the initiation of backup LSP signaling for all those protected LSPs is not expected to happen within short time period. If the Ingress LSR initiates LSP tear down during this interim time duration, the existing procedures do not offer any mechanism for router B to indicate to MP that it need not hold on to that state any longer. The new procedure described in the document does not introduce any major extension but simply use PathTear message from PLR to MP even though the PLR had not reliably refreshed backup LSP Path.

Here again, one may suggest the use of some "wait timer" on MP independent of refresh time out - say if the MP does not receive backup LSP Path message within the "wait timer", the MP can delete the LSP state. But depending on a timer or delay suffers from the same drawbacks pointed out in the response to comment #1.

> 4. Section 3.1 - Problem Statement:
> "
> - If B fails to perform local repair in step 1, then B will delete
> the LSP state from its state database without informing D. As B
> deletes its state without informing D, with long refresh interval
> this could cause (large) buildup of stale state on D.
> "
> MA> Well this is the normal RSVP soft-state behavior and in fact this
> behavior can occur for defects of link B-C which are not detected and which
> will not trigger the activation of the bypass LSP. There is nothing abnormal
> about this and this behavior is not specific to the case when the bypass could
> not be activated.

[Chandra] The problem is not about the activation of bypass LSP but that there is no indication to the MP router that it does not have to hold on to the LSP state any more. The problem has been explained in the response to comment #3 and #4. On the soft-state behavior, the responses to the first three paragraphs to your mail contain the reasoning for removing the limitations of soft-state nature of the protocol.
 
> 5. Section 4.5.2 - Procedures for backward compatibility:
> MA> What is the need to set the refresh interval to the default value when a
> LSR detects that a downstream or upstream neighbour does not support the
> refresh independent procedures? I assume you meant to change it back to
> the configured value which may not be the default value of 30 seconds.

[Chandra] If refresh time is not explicitly configured on the LSR supporting RI-RSVP-FRR, then the refresh interval should be reduced to RFC 2205 default value of 30 seconds for messages sent to a neighboring router that is not RI-RSVP capable. TE-SCALE-REC draft recommends default refresh interval of 20 minutes be used in Path and Resv messages if RI-RSVP is supported by the LSR. The statement in the draft referred here is not applicable if the operator has overridden the RI-RSVP default value.

I think the text in Section 4.5.2 could have state explicitly that if downstream or upstream neighbor does not advertise RI-RSVP capability, then the router should reduce the refresh interval to 30 seconds if the refresh interval has not been configured on the router. We will update this section accordingly to clarify the proposed behavior.

Regards,
Chandra.

Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Guijuan Wang 3
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… lizho.jin@gmail.com
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Aissaoui, Mustapha (Mustapha)
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Loa Andersson
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Aissaoui, Mustapha (Mustapha)
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Chandrasekar Ramachandran
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Chandrasekar Ramachandran
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… lizho.jin@gmail.com
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Chandrasekar Ramachandran
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Guijuan Wang 3
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Chandrasekar Ramachandran
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… AISSAOUI, Mustapha (Mustapha)
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Chandrasekar Ramachandran
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Chandrasekar Ramachandran
Re: [mpls] MPLS-RT review of draft-chandra-mpls-r… Aissaoui, Mustapha (Nokia - CA)