Re: [mpls] MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr

Loa Andersson <> Thu, 26 November 2015 05:53 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 83BD61ACE9B; Wed, 25 Nov 2015 21:53:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.485
X-Spam-Status: No, score=-2.485 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.585] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id rJE472JvtWeW; Wed, 25 Nov 2015 21:53:28 -0800 (PST)
Received: from ( []) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 9AEA01ACE9D; Wed, 25 Nov 2015 21:53:27 -0800 (PST)
Received: from [] (unknown []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id 90DA4180146D; Thu, 26 Nov 2015 06:53:22 +0100 (CET)
To: "Aissaoui, Mustapha (Mustapha)" <>, Sriganesh Kini <>, Lucy yong <>
References: <> <> <> <> <>
From: Loa Andersson <>
Message-ID: <>
Date: Thu, 26 Nov 2015 13:53:14 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <>
Cc: "" <>, "" <>, "" <>, Guijuan Wang 3 <>, Lizhong Jin <>
Subject: Re: [mpls] MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Multi-Protocol Label Switching WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 26 Nov 2015 05:53:29 -0000


Is this a recommendation not to adopt the document as a working group


On 2015-11-26 11:17, Aissaoui, Mustapha (Mustapha) wrote:
> Dear authors,
> I read this draft and I believe the intent is to provide a tighter control plane synchronization of the PLR and MP roles on a per RSVP session basis such that a LSR will know if it is a MP for a given RSVP session which requested protection. This is done such that the decision to retain or delete the state by the LSR detecting the failure of the link to the previous hop (or the failure of the previous hop itself) is made with the prior knowledge that the LSR is a MP or not on a per RSVP session basis.
> However,  the procedures presented in this document are fairly complex and go at odds with the original intent of making state synchronization based on a soft-state mechanism. In fact, what I fail to find is a compelling argument or data from the field which shows the issue is not resolved via much simpler methods which are used in production networks today. I describe this in the detailed comments below.
> Regards,
> Mustapha.
> ---------------------
> 1. Section 3.1 - Problem Description
> "
> - If the protected LSP on C times out before D receives signaling
> for the backup LSP, then D would receive PathTear from C prior to
> receiving signaling for the backup LSP, thus resulting in deleting
> the LSP state. This would be possible at scale even with default
> refresh time.
> "
> MA> Since each LSR in the path of a RSVP session which requested protection has to assume it can be a MP without prior knowledge, a simpler method is to reset the refresh timeout for each session as soon as the link to the previous hop failed. In fact, a user configurable MP timeout upon failure, independent of the refresh timeout, can be provided to tune it to the desired value to give enough time to the Path message to be received via the bypass LSP.
> 2. Section 3.1 - Problem Description:
> "
> - If upon the link failure C is to keep state until its timeout,
> then with long refresh interval this may result in a large amount
> of stale state on C. Alternatively, if upon the link failure C is
> to delete the state and send PathTear to D, this would result in
> deleting the state on D, thus deleting the LSP. D needs a reliable
> mechanism to determine whether it is MP or not to overcome this
> problem.
> "
> MA> What is exactly the issue with state timeout being retained until the refresh timeout? You refer to this as "stale" state but in fact it is desirable to keep the state until node D created the backup PSB. Also, remember that head-end node will perform global revertive MBB and may tear down the LSP before the state timeout.
> 3. Section 3.1 - Problem Statement:
> "
> - If head-end A attempts to tear down LSP after step 1 but before
> step 2 of the above sequence, then B may receive the tear down
> message before step 2 and delete the LSP state from its state
> database. If B deletes its state without informing D, with long
> refresh interval this could cause (large) buildup of stale state
> on D.
> "
> MA> I am not sure I understand the issue here. If B acting as a PLR receives a PathTear from A, all it needs to do is to check that the primary path neighbor (C in this case) is in down or in cleanup state and send the PathTear over the bypass before deleting its own local state.
> Note here a key assumption is that PLR node B must send triggered Path refresh messages to the MP over the bypass. If node B has to wait to the next refresh interval to send the Path refresh over the bypass, then you can run into the issue described here.
> 4. Section 3.1 - Problem Statement:
> "
> - If B fails to perform local repair in step 1, then B will delete
> the LSP state from its state database without informing D. As B
> deletes its state without informing D, with long refresh interval
> this could cause (large) buildup of stale state on D.
> "
> MA> Well this is the normal RSVP soft-state behavior and in fact this behavior can occur for defects of link B-C which are not detected and which will not trigger the activation of the bypass LSP. There is nothing abnormal about this and this behavior is not specific to the case when the bypass could not be activated.
> 5. Section 4.5.2 - Procedures for backward compatibility:
> MA> What is the need to set the refresh interval to the default value when a LSR detects that a downstream or upstream neighbour does not support the refresh independent procedures? I assume you meant to change it back to the configured value which may not be the default value of 30 seconds.