Re: [mpls] MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr

Chandrasekar Ramachandran <csekar@juniper.net> Sat, 16 July 2016 18:45 UTC

Return-Path: <csekar@juniper.net>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B00D812B015; Sat, 16 Jul 2016 11:45:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.903
X-Spam-Level:
X-Spam-Status: No, score=-1.903 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=junipernetworks.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0bcg7lKI6xUj; Sat, 16 Jul 2016 11:45:16 -0700 (PDT)
Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0091.outbound.protection.outlook.com [104.47.40.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B64EF126FDC; Sat, 16 Jul 2016 11:45:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=junipernetworks.onmicrosoft.com; s=selector1-juniper-net; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=V784FzoD4OV+9YSDlztjd2DPETWavG92wLwqU2TFlgA=; b=hOKpawI/3FUvycQaHcKLrfEV+zhpcYv4kVylANLYUbpGxsXG8ST7e/2vbZJBhGSBsRVMtP9feRLdt7KuMIhqTuFkPXEg78HSZUEDOwkdIY2JhbePKTxjokJCBj0w/TKQWBY1wAJJAIFS3umdCAQO8ZMXhPobkkS2omLofnMFbwc=
Received: from BN3PR0501MB1377.namprd05.prod.outlook.com (10.160.117.11) by BN3PR0501MB1377.namprd05.prod.outlook.com (10.160.117.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.549.5; Sat, 16 Jul 2016 18:45:12 +0000
Received: from BN3PR0501MB1377.namprd05.prod.outlook.com ([10.160.117.11]) by BN3PR0501MB1377.namprd05.prod.outlook.com ([10.160.117.11]) with mapi id 15.01.0549.003; Sat, 16 Jul 2016 18:45:10 +0000
From: Chandrasekar Ramachandran <csekar@juniper.net>
To: "Aissaoui, Mustapha (Nokia - CA)" <mustapha.aissaoui@nokia.com>, Loa Andersson <loa@pi.nu>
Thread-Topic: MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
Thread-Index: AQHRDMPhsONZcO/ZDU6JSnAr5/i5wJ55MxXAgC7BdQCAAj56AIAB1u4AgAwwBlCAGFak8IAbhGgQgGRIFpCAS4lVkICBFiww
Date: Sat, 16 Jul 2016 18:45:10 +0000
Message-ID: <BN3PR0501MB1377E773CC762FAC1E206822D9340@BN3PR0501MB1377.namprd05.prod.outlook.com>
References: <5628D430.4070602@pi.nu> <4A79394211F1AF4EB57D998426C9340DD479A17D@US70UWXCHMBA01.zam.alcatel-lucent.com> <56514290.60200@pi.nu> <95453A37E413464E93B5ABC0F8164C4D14C9BB4D@eusaamb101.ericsson.se> <4A79394211F1AF4EB57D998426C9340DD47C1317@US70UWXCHMBA01.zam.alcatel-lucent.com> <BN3PR0501MB1377A3471B2E76D0827F3700D90E0@BN3PR0501MB1377.namprd05.prod.outlook.com> <BN3PR0501MB13778D90595C293D6EB47165D9E10@BN3PR0501MB1377.namprd05.prod.outlook.com> <4A79394211F1AF4EB57D998426C9340DD47F8527@US70UWXCHMBA01.zam.alcatel-lucent.com> <BN3PR0501MB1377E952BD155E2B0A49F68BD9B20@BN3PR0501MB1377.namprd05.prod.outlook.com> <4A79394211F1AF4EB57D998426C9340DD4935C45@US70UWXCHMBA01.zam.alcatel-lucent.com>
In-Reply-To: <4A79394211F1AF4EB57D998426C9340DD4935C45@US70UWXCHMBA01.zam.alcatel-lucent.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=csekar@juniper.net;
x-originating-ip: [116.197.184.12]
x-ms-office365-filtering-correlation-id: bfbfce88-400e-4d24-fc7a-08d3ada950da
x-microsoft-exchange-diagnostics: 1; BN3PR0501MB1377; 6:81ZD892AiMnePaRNIn11XH5ZphJ8S/a3JRFgM3lyRWliYJMX9hwItfmYGMXddTta4ibnhkGghna7aJs9tr1lyC3Wuuh95KORJiyIMktzthj9u+XTyGh5q+TBuEFFHWxROzH+cVxSiKBb5ssQNZ/G0XzwkiYDD2TuiOwl1Zhk6G5pzadgbRjsV8LJ2/u2HGnB1wsC1KM+sPPmU/AofNoTpRLRmB+kttK5y+v7QtcxN2r4AA6UdXfohl4Ih/1bomEOHRABlEbLCmdpglwSxJs1pa85pzkUz8JGYl8j8pnpHxV/sU5wv0jJIljS2ywuudKYFSAJbs901NcG6QEjE9NxfA==; 5:8e14vzKjwAkEwU/NfFP1Ur2zlVZl+qB3rWXs9TDTpqWyNZueK/ItsjmSHDTRzDqt6H8SgTehLU9dbwF5H0xvbF1znUm6+tt/olqzp0SWYXgYDQDwohDdWN2z+MHg8kHgiJQqAGR/AUlxhEqiAqeUvw==; 24:fFNtGrqQcOJuOryWM4VzfP2vYK795PRLtmawWo5Ihkv6e4eQT8kV33BSonQT5yLWo7WVBzcCgKk20FJEv8BfFKLKwYcnh8h5PR+Q64xMKZQ=; 7:9Ia/gZZmtn4Tm1S3iFZQOr/orkQA8vk6dXgOAHZ3k/EXmqQXk+F9r2MJEko2/OuR0JP+sEDQrBcM9/orWMWRfJZ7J6+KtaCxpFnT7LSWe4K5jdSOHx7QCIl57U6lSEOOzeXQw5/ZAUm6FuT1KPB/u1c6y35bC/QuIdJQibrrx2cRj1RdW07KETqN2smRAxeoatdN/XlypvuxG6amhzvqOZkJkMpIWRZUaJU1yz+4/UB5DQyS87nr/CEeo44rxpxp
x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN3PR0501MB1377;
x-microsoft-antispam-prvs: <BN3PR0501MB137788FB1AF296D5FC055A6ED9340@BN3PR0501MB1377.namprd05.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(37575265505322)(138986009662008)(82608151540597)(50582790962513);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026); SRVR:BN3PR0501MB1377; BCL:0; PCL:0; RULEID:; SRVR:BN3PR0501MB1377;
x-forefront-prvs: 0005B05917
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(7916002)(199003)(377454003)(13464003)(37854004)(50944005)(189002)(76176999)(50986999)(10400500002)(2906002)(4326007)(122556002)(54356999)(76576001)(106356001)(106116001)(92566002)(8936002)(101416001)(81156014)(8676002)(230783001)(561944003)(68736007)(9686002)(81166006)(87936001)(74316002)(5002640100001)(2950100001)(66066001)(15975445007)(77096005)(86362001)(7696003)(5003600100003)(305945005)(7736002)(7846002)(2900100001)(102836003)(33656002)(6116002)(3846002)(3660700001)(4001430100002)(19580405001)(5001770100001)(99286002)(586003)(97736004)(93886004)(19580395003)(107886002)(3280700002)(105586002)(189998001)(42262002)(579004)(559001); DIR:OUT; SFP:1102; SCL:1; SRVR:BN3PR0501MB1377; H:BN3PR0501MB1377.namprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en;
received-spf: None (protection.outlook.com: juniper.net does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: juniper.net
X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jul 2016 18:45:10.5687 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR0501MB1377
Archived-At: <https://mailarchive.ietf.org/arch/msg/mpls/ILfOJB5QCDCPNWi2qM6HZxmZVIU>
Cc: "mpls@ietf.org" <mpls@ietf.org>, "mpls-chairs@ietf.org" <mpls-chairs@ietf.org>, "draft-chandra-mpls-ri-rsvp-frr@ietf.org" <draft-chandra-mpls-ri-rsvp-frr@ietf.org>, Vishnu Pavan Beeram <vbeeram@juniper.net>
Subject: Re: [mpls] MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Jul 2016 18:45:20 -0000

Hi Mustapha,
It is precisely because the clearing of RSVP-TE session state on Merge Point (MP) is "short refresh interval dependent" in RFC 4090 (under conditions explained in section 3 "Problem Description" of the draft), the extensions defined in the draft make RSVP-TE FRR "refresh interval independent". Hence, it is indeed accurate to term the capability as "refresh interval independent". With this capability it would be possible to increase refresh interval to an arbitrarily long value (refer to Section 2.2 of draft-ietf-teas-rsvp-te-scaling-rec-01).

2.2. Refresh Interval Independent RSVP
 
    The RSVP protocol relies on periodic refreshes for state
    synchronization between RSVP neighbors and for recovery from lost
    RSVP messages. It relies on refresh timeout for stale state cleanup.
    The primary motivation behind introducing the notion of "Refresh
    Interval Independent RSVP" (RI-RSVP) is to completely eliminate
    RSVP's reliance on refreshes and refresh timeouts. This is done by
    simply increasing the refresh interval to a fairly large value.

It is also because of the reasons explained above that the RI-RSVP capability cannot be an optional procedure in draft-mtaillon-mpls-summary-frr-rsvpte.

I guess you would agree that the goal in Section 2.2. of draft-ietf-teas-rsvp-te-scaling-rec-01 cannot be subsumed in draft-mtaillon-mpls-summary-frr-rsvpte. In the mail sent on march 8, I have also clarified that the extensions proposed in the draft are not complex - which was your earlier concern. So, I do not think the proposal of merging the drafts is workable.

Thanks,
Chandra.

> -----Original Message-----
> From: Aissaoui, Mustapha (Nokia - CA)
> [mailto:mustapha.aissaoui@nokia.com]
> Sent: Tuesday, May 17, 2016 3:17 AM
> To: Chandrasekar Ramachandran <csekar@juniper.net>; Loa Andersson
> <loa@pi.nu>
> Cc: mpls-chairs@ietf.org; mpls@ietf.org; draft-chandra-mpls-ri-rsvp-
> frr@ietf.org; Vishnu Pavan Beeram <vbeeram@juniper.net>
> Subject: RE: MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
> 
> Hi Chandra,
> I am sorry for the late follow-up.
> 
> My proposal to move forward with merging the conditional Path Tear
> extension in this draft as an optional capability in draft-mtaillon-mpls-
> summary-frr-rsvpte is based on the following analysis:
> 
> 1. It is very misleading to refer to the capability in this draft as "Refresh
> Interval Independent FRR Facility Protection". This draft describes a method
> for clearing the RSVP-TE session state at a Merge-Point (MP) on an ad-hoc
> basis complementing the state timeout. It is not replacing the state refresh
> method and the state timeout.
> 
> 2. draft-mtaillon-mpls-summary-frr-rsvpte defines a handshake procedure
> for the PLR and MP to signal and engage in the use of the Bypass Summary
> FRR Association object. So, there is no need to add a new capability. The
> conditional Path Tear can be an optional procedure of PLR and MP nodes
> which support the Bypass Summary FRR Association object.
> 
> Let me know if this works.
> 
> Regards,
> Mustapha.
> 
> > -----Original Message-----
> > From: EXT Chandrasekar Ramachandran [mailto:csekar@juniper.net]
> > Sent: Tuesday, March 08, 2016 11:32 AM
> > To: Aissaoui, Mustapha (Nokia - CA); Loa Andersson
> > Cc: mpls-chairs@ietf.org; mpls@ietf.org; draft-chandra-mpls-ri-rsvp-
> frr@ietf.org;
> > Vishnu Pavan Beeram
> > Subject: RE: MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
> >
> > Hi Mustapha,
> > Apologies for the delay. I will be prompt for subsequent mails on this
> thread. In my
> > previous mail, I categorized your concerns into two.
> > (A) The need for introducing refresh-interval independent behavior into
> RSVP-TE
> > (B) Why some kind of local implementation based timers will not be
> sufficient to
> > support long refresh intervals
> >
> > Your mail sent in January seems to indicate you are either not convinced
> on the
> > need for introducing refresh-interval independent behavior, or you are
> not
> > convinced that the extensions proposed in this draft (or the ones
> proposed in
> > [draft-ietf-teas-rsvp-te-scaling-rec]) actually make RSVP-TE FRR refresh-
> interval
> > independent.
> >
> > Here is my summary response to your points. I have also responded to
> your
> > comments separately inline.
> > - The goal of the draft is not to reduce the number of triggered messages
> between
> > PLR and MP when the protected LSP is being locally repaired. The draft
> addresses
> > a specific gap to make RSVP-TE refresh-interval independent as
> documented in
> > [draft-ietf-teas-rsvp-te-scaling-rec]. The key point is that if RSVP-TE routers
> are not
> > constrained by short refresh timeouts, then the existing RFC 2961
> mechanisms,
> > plus some minor additional mechanisms as recommended in [draft-ietf-
> teas-rsvp-
> > te-scaling-rec] should be sufficient to handle large LSP scale in all cases
> (except
> > FRR that [draft-chandra-mpls-ri-rsvp-frr] addresses). It should be noted
> that while
> > reducing the number of triggered backup LSP messages would be
> additional
> > performance improvement, the goal of RI-RSVP is more generic and also
> > encompasses make-before-break. Hence, the scope of this draft is not
> limited
> > reducing triggered messages as specified in draft-mtaillon-mpls-summary-
> frr-
> > rsvpte.
> > - It should be noted that [draft-ietf-teas-rsvp-te-scaling-rec] does make a
> specific
> > recommendation that would address the limitation you have pointed out
> with
> > Message-ID-ACK mechanism. The problem you have brought up is related
> to the
> > lack of flow control between the RSVP-TE neighbors and Section 2.3 of
> [draft-ietf-
> > teas-rsvp-te-scaling-rec] does address this limitation. If a router sends
> some
> > messages that are unacknowledged, then the router can use that as an
> indication
> > to reduce message rate to that neighbor. In fact, RFC 2961 message-ID-
> ACK
> > mechanism is the key tool in achieving flow control between RSVP-TE
> neighbors.
> > Hence, the mechanism proposed in this draft if considered along with
> [draft-ietf-
> > teas-rsvp-te-scaling-rec] does make RSVP-TE refresh interval independent.
> Do you
> > have specific cases in mind that are not covered by these drafts?
> > - The use of implementation specific timers for scaled networks have been
> > problematic in production networks because these timers are ad-hoc and
> do not
> > provide predictability to the behavior when the deployed scale increase
> over time.
> > Moreover, your point assumes the network uses short refresh interval so
> that lack
> > of reliable PathTear does not result in unacceptable level of stale state
> build up on
> > the routers. In fact, the only two new aspects introduced in this draft
> leverage
> > PathTear messages (with some minor extension) to reliably remove state
> from
> > downstream router(s). Could you specifically point out the mechanism(s)
> in the
> > draft that is/are complex?
> >
> > Please refer inline for more detailed and specific responses.
> >
> > Thanks,
> > Chandra.
> >
> > > -----Original Message-----
> > > From: AISSAOUI, Mustapha (Mustapha)
> > > [mailto:mustapha.aissaoui@nokia.com]
> > > Sent: Saturday, January 16, 2016 10:00 AM
> > > To: Chandrasekar Ramachandran <csekar@juniper.net>; Sriganesh Kini
> > > <sriganesh.kini@ericsson.com>; Loa Andersson <loa@pi.nu>; Lucy yong
> > > <lucy.yong@huawei.com>
> > > Cc: mpls-chairs@ietf.org; mpls@ietf.org; draft-chandra-mpls-ri-rsvp-
> > > frr@ietf.org; Guijuan Wang 3 <guijuan.wang@ericsson.com>; Lizhong Jin
> > > <lizho.jin@gmail.com>
> > > Subject: RE: MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
> > >
> > > Hi Chandra,
> > > Sorry for the late reply. See inline for some follow-up.
> > >
> > > Regards,
> > > Mustapha.
> > >
> > > > -----Original Message-----
> > > > From: Chandrasekar Ramachandran [mailto:csekar@juniper.net]
> > > > Sent: Friday, December 18, 2015 12:41 AM
> > > > To: Chandrasekar Ramachandran; Aissaoui, Mustapha (Mustapha);
> > > > Sriganesh
> > > Kini;
> > > > Loa Andersson; Lucy yong
> > > > Cc: mpls-chairs@ietf.org; mpls@ietf.org; draft-chandra-mpls-ri-rsvp-
> > > frr@ietf.org;
> > > > Guijuan Wang 3; Lizhong Jin
> > > > Subject: RE: MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
> > > >
> > > > Mustapha,
> > > > Did my response address your concerns? If I have understood you
> > > > correctly,
> > > the
> > > > concerns fall in two categories and I have tried to address these
> concerns.
> > > > (A) The need for introducing refresh-interval independent behavior
> > > > into RSVP-
> > > TE
> > > > (B) Why some kind of local implementation based timers will not be
> > > > sufficient
> > > to
> > > > support long refresh intervals
> > > >
> > > > Do you have any specific comment on my responses? Specifically, do
> > > > you still think the responses did not address (A), or (B) or both?
> > > >
> > > > Regards,
> > > > Chandra.
> > > >
> > > > > -----Original Message-----
> > > > > From: mpls [mailto:mpls-bounces@ietf.org] On Behalf Of
> > > > > Chandrasekar Ramachandran
> > > > > Sent: Thursday, December 03, 2015 12:58 AM
> > > > > To: Aissaoui, Mustapha (Mustapha)
> > > > > <mustapha.aissaoui@alcatel-lucent.com>;
> > > > > Sriganesh Kini <sriganesh.kini@ericsson.com>; Loa Andersson
> > > > > <loa@pi.nu>; Lucy yong <lucy.yong@huawei.com>
> > > > > Cc: mpls-chairs@ietf.org; mpls@ietf.org;
> > > > > draft-chandra-mpls-ri-rsvp- frr@ietf.org; Guijuan Wang 3
> > > > > <guijuan.wang@ericsson.com>; Lizhong Jin <lizho.jin@gmail.com>
> > > > > Subject: Re: [mpls] MPLS-RT review of
> > > > > draft-chandra-mpls-ri-rsvp-frr
> > > > >
> > > > > Mustapha,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Aissaoui, Mustapha (Mustapha)
> > > > > > [mailto:mustapha.aissaoui@alcatel-
> > > > > > lucent.com]
> > > > > > Sent: Thursday, November 26, 2015 8:48 AM
> > > > > > To: Sriganesh Kini <sriganesh.kini@ericsson.com>; Loa Andersson
> > > > > > <loa@pi.nu>; Lucy yong <lucy.yong@huawei.com>
> > > > > > Cc: Lizhong Jin <lizho.jin@gmail.com>; Guijuan Wang 3
> > > > > > <guijuan.wang@ericsson.com>;
> > > > > > draft-chandra-mpls-ri-rsvp-frr@ietf.org;
> > > > > > mpls-chairs@ietf.org; Aissaoui, Mustapha (Mustapha)
> > > > > > <mustapha.aissaoui@alcatel-lucent.com>; mpls@ietf.org
> > > > > > Subject: RE: MPLS-RT review of draft-chandra-mpls-ri-rsvp-frr
> > > > > >
> > > > > > Dear authors,
> > > > > > I read this draft and I believe the intent is to provide a
> > > > > > tighter control plane synchronization of the PLR and MP roles on
> > > > > > a per RSVP session basis such that a LSR will know if it is a MP
> > > > > > for a given RSVP session which requested protection. This is
> > > > > > done such that the decision to retain or delete the state by the
> > > > > > LSR detecting the failure of the link to the previous hop (or
> > > > > > the failure of the previous hop itself) is made with the prior
> > > > > > knowledge that the LSR is a MP
> > > or
> > > > not on a per RSVP session basis.
> > > > >
> > > > > [Chandra] The primary motivation of the draft is to enable LSRs to
> > > > > support FRR when the LSP scale on the LSR is of the order of
> > > > > hundreds of thousands. The analysis of the bottlenecks as a
> > > > > consequence of LSP scale, that can cause disruption of the LSR has
> > > > > already been documented in RFC 5439. As analyzed in RFC 5439,
> > > > > while there is a correlation between the percentage increase in
> > > > > refresh time and the improvement in LSR performance, there is a
> > > > > degree of functionality that is lost owing to the soft-state
> > > > > nature of the protocol. TE- SCALE-REC
> > > > > (https://tools.ietf.org/html/draft-ietf-teas-rsvp-te-scaling-rec-0
> > > > > 0) outlines the motivation for refresh independent RSVP (RI-RSVP)
> > > > > for all types of LSPs (packet or non-packet), and makes
> > > > > recommendations to enable RSVP implementations eliminate the
> > > > > reliance on short refresh time. This draft addresses the
> > > > > refresh-interval dependent behavior of RFC 4090 in order to
> > > > > support RI-RSVP, as facility backup FRR is a widely deployed
> > > > > feature in production networ  ks.
> > >
> > > MA> It seems to me the main issue with bypass protection is to deal
> > > MA> with the
> > > generation (PLR) and processing (MP) of a large number of *triggered*
> > > Path messages for the individual protected LSPs at the time the PLR
> > > activates the bypass LSP.
> > >
> > > I do not see anything new in this draft to address the bottleneck of
> > > the state refresh mechanism itself. In fact, the draft suggests that
> > > after the bypass LSP is activated, the state refresh continues with
> > > summary refreshes which is the solution already existent as per RFC
> > > 2961. Based on that, it is misleading to call this a
> > > Refresh-Independent RSVP FRR and this draft is not changing the fact
> > > that a PLR can generate and the MP may have to process a Srefresh
> message
> > with potentially a large number of message IDs once the bypass is
> activated.
> >
> > [Chandra] The need to generate and process a large number of message
> IDs will
> > not be a problem as long as the refresh interval is long. If the concern is
> that
> > transmitting and processing a large number of message IDs would
> overwhelm the
> > routers, then that is an orthogonal problem for which Section 2.3 of
> [draft-ietf-teas-
> > rsvp-te-scaling-rec] proposes to solve. Again, [draft-ietf-teas-rsvp-te-
> scaling-rec]
> > does indeed rely on RFC 2961 Message-Id-ACK extension to accomplish
> flow
> > control. Even if an implementation supports extensions to reduce the
> messages
> > between PLR and MP during local repair, we still require long refresh
> interval in
> > order to relieve the burden of the routers from having to refresh
> unaffected LSP
> > states and the make-before-break LSP states that occur subsequent to
> local
> > repair.
> >
> > Once the refresh interval could be set to an arbitrarily large value, it
> makes RSVP-
> > TE "refresh interval independent". Is the use of "refresh independent"
> instead of a
> > longer phrase "refresh-interval independent" a matter of concern here?
> >
> > > What this mechanism is providing is a bulk notification of bypass
> > > activation from the PLR to the MP. This can speed up the creation in
> > > the control plane of the new remote neighbor (PLR) at the MP and
> > > associate the existing PSB/RSBs of the protected LSPs with this new
> > > neighbor. The savings in the creation of the new remote neighbor is
> > > due to not relying on the receipt from the PLR and the processing of a
> > > triggered refresh message for each protected LSP. Note here there are
> > > no issue with the data plane and packets from all LSPs will be
> > > forwarded properly at the MP since the ILM for the protected LSP is
> > > the same whether the packet is received from the primary interface or
> from the
> > bypass LSP interface.
> >
> > [Chandra] Yes, precisely. The data plane can continue to forward traffic
> but the
> > control plane imposes an artificial restriction that the LSP states should
> be
> > refreshed within a short refresh timeout. The goal is to remove this
> restriction and
> > align with the goal of [draft-ietf-teas-rsvp-te-scaling-rec].
> >
> > > Based on the above, I am having trouble reconciling the limited
> > > benefits and the complexity of the proposed mechanism.
> > > I am not opposed for this part of the draft to move forward as an
> > > optional mechanism but I propose that it gets merged with
> > > draft-mtaillon-mpls-summary- frr-rsvpte and that the reference to
> > > "Refresh-Independent RSVP FRR" be removed. Also, the objective of the
> > > mechanism to more clearly explained as a bulk notification mechanism
> when
> > bypass is activated.
> >
> > [Chandra] The goals of draft-mtaillon-mpls-summary-frr-rsvpte and draft-
> chandra-
> > mpls-ri-rsvp-frr are different. While draft-mtaillon-mpls-summary-frr-
> rsvpte could be
> > used by RSVP-TE implementation using short refresh interval to reduce
> message
> > rate between PLR and MP during local repair, draft-chandra-mpls-ri-rsvp-
> frr could
> > be used by RSVP-TE implementation to set arbitrarily long refresh interval
> but
> > avoid unacceptable buildup of stale states across FRR local repairs. That's
> why
> > the only two extensions introduced in draft-chandra-mpls-ri-rsvp-frr
> involve sending
> > PathTear messages to ensure explicit cleanup of LSP states rather than
> relying on
> > timeout (which would be long with long refresh interval). Implementation
> that sets
> > arbitrarily long refresh interval could also use draft-mtaillon-mpls-
> summary-frr-
> > rsvpte to reduce message rate during local repair.
> >
> > > > > > However,  the procedures presented in this document are fairly
> > > > > > complex and go at odds with the original intent of making state
> > > > > > synchronization based on a soft-state mechanism.
> > > > >
> > > > > [Chandra] The tradeoffs between using short or long refresh
> > > > > intervals has been well understood. Short refresh intervals aid
> > > > > fast synchronization of states along the path of the LSP, but is
> > > > > problematic because of the control message traffic that a router
> > > > > has to handle at high LSP scale. Routers not only must synchronize
> > > > > new states as promptly as possible, but also must maintain the
> > > > > rate of periodic Srefresh messages to a level sufficient to
> > > > > refresh all existing states without being timed out. When the
> > > > > number of LSPs that
> > > routers
> > > > carry approach half a million, there will be two problems with the
> > > > control
> > > message
> > > > rate.
> > > > > (1) As analyzed in RFC 5439, even with RFC 2961 refresh reduction
> > > > > the size of Srefresh message may become very large, and the
> > > > > processing required may cause disruption of the LSR.
> > > > > (2) Apart from the problem of RSVP message processing overhead,
> > > > > there is also the problem of RSVP-TE becoming a bottleneck
> > > > > preventing the router to scale other protocols or services.
> > >
> > > MA> This draft does not solve the issue of large Srefresh message.
> > > MA> After the bulk
> > > triggering of the bypass, Srefresh message will still be large because
> > > implementations will need to make efficient use of the message by
> > > sending as many Message IDs as possible.
> >
> > [Chandra] I think this is a problem that is orthogonal to remove the
> reliance on
> > short refresh interval. As mentioned before, one should also take into
> account the
> > recommendations in [draft-ietf-teas-rsvp-te-scaling-rec] in order to
> support LSP
> > scales much higher than what routers run or support today. The point
> you have
> > raised here has been addressed in Section 2.3 of [draft-ietf-teas-rsvp-te-
> scaling-
> > rec].
> >
> > > > > > In fact, what I fail to find is a compelling argument or data
> > > > > > from the field which shows the issue is not resolved via much
> > > > > > simpler methods which are used in production networks today. I
> > > > > > describe this in the detailed comments below.
> > > > >
> > > > > [Chandra] Some of the mechanisms already deployed in multi-
> vendor
> > > > > production networks involve configuring implementation specific
> > > > > timers or
> > > > delays on LSRs.
> > > > > Multiple vendors support various timers or delays on Ingress and
> > > > > Transit
> > > LSRs.
> > > > > However, in practice the values configured for these timers or
> > > > > delays are very scale specific, and the values that work at one
> > > > > LSP scale usually do not work at higher LSP scale. In practice,
> > > > > the "scale" that impacts the behavior at specific timer or delay
> > > > > value is not only the number of LSPs carried on the router, but
> > > > > also the number of other protocol states that reside on the
> > > > > router. It should be noted that the reliance on such
> > > > > implementation specific timers or delays has been a major
> contributor of
> > operational complexity in running RSVP-TE FRR.
> > > > > Any solution based on such timers or delays while being
> > > > > operationally simple at one scale ceases to be so at higher scale.
> > > > > It has been practically found (with existing implementations from
> > > > > multiple
> > > > > vendors) tha  t it is operationally hard to find out the new
> > > > > better value as a running production network grows over time. In
> > > > > short, the use of
> > > such
> > > > timers or delays has been found to involve guess work that seldom
> > > > remain
> > > simple
> > > > in the long run.
> > >
> > >
> > > MA> The timer is for a different issue than the main objective of the
> > > MA> draft. Let us
> > > discuss it in the points below
> > >
> > > > > > Regards,
> > > > > > Mustapha.
> > > > > > ---------------------
> > > > > > 1. Section 3.1 - Problem Description "
> > > > > > - If the protected LSP on C times out before D receives
> > > > > > signaling for the backup LSP, then D would receive PathTear from
> > > > > > C prior to receiving signaling for the backup LSP, thus
> > > > > > resulting in deleting the LSP state. This would be possible at
> > > > > > scale even with default refresh time.
> > > > > > "
> > > > > > MA> Since each LSR in the path of a RSVP session which requested
> > > > > > MA> protection
> > > > > > has to assume it can be a MP without prior knowledge, a simpler
> > > > > > method is to reset the refresh timeout for each session as soon
> > > > > > as the link to the previous hop failed. In fact, a user
> > > > > > configurable MP timeout upon failure, independent of the refresh
> > > > > > timeout, can be provided to tune it to the desired value to give
> > > > > > enough time to the Path message to be received via the bypass
> LSP.
> > > > >
> > > > > [Chandra] The option of resetting the refresh timeout may not be
> > > > > viable if long refresh interval (of the order of tens of minutes)
> > > > > is applied on the LSPs. That leaves the other option of providing
> > > > > a "wait timer" (independent of refresh time) that is configurable on
> the MP.
> > > However, it is
> > > > fairly clear that the "wait timer"
> > > > > that the operator should configure on MP will be problematic for
> > > > > two
> > > reasons.
> > > > > The operator should carefully analyze the performance impact of an
> > > > > existing timer/delay value if and when (a) the LSP scale on the
> > > > > same router increases, and (2) the LSP scale increases on other
> > > > > routers around it (that may potentially become upstream PLRs)!
> > >
> > > MA> The use of a refresh-timeout independent timer has been shown
> to
> > > MA> work in
> > > production networks. Most of the time, the value is conservative
> > > enough to cover a large scale network.
> >
> > [Chandra] Relying on implementation specific timers does not provide
> > predictability of behavior as the deployed network scale increases. There
> are
> > cases in deployed networks where the use of such timers have increased
> the
> > complexity of network operations (a) when network scale increases with
> the same
> > software image, and (b) when network scale increase is also accompanied
> by
> > software upgrades. The general point is that increasing refresh interval to
> arbitrarily
> > long values without any accompanying mechanism to introducing
> predictable
> > behavior in the protocol may increase operational complexity. I have
> provided
> > detailed discussion on this in previous mails.
> >
> > > > > > 2. Section 3.1 - Problem Description:
> > > > > > "
> > > > > > - If upon the link failure C is to keep state until its timeout,
> > > > > > then with long refresh interval this may result in a large
> > > > > > amount of stale state on C. Alternatively, if upon the link
> > > > > > failure C is to delete the state and send PathTear to D, this
> > > > > > would result in deleting the state on D, thus deleting the LSP.
> > > > > > D needs a reliable mechanism to determine whether it is MP or
> > > > > > not to overcome this problem.
> > > > > > "
> > > > > > MA> What is exactly the issue with state timeout being retained
> > > > > > MA> until the
> > > > > > refresh timeout? You refer to this as "stale" state but in fact
> > > > > > it is desirable to keep the state until node D created the backup
> PSB.
> > > > > > Also, remember that head-end node will perform global revertive
> > > > > > MBB and may tear down the LSP before the state timeout.
> > > > >
> > > > > [Chandra] As described in the first response in this mail, the
> > > > > motivation driving the draft is to eliminate RSVP-TE's reliance of
> > > > > short refresh intervals. If router C were to retain the LSP state
> > > > > until time out, without any additional procedures that provide an
> > > > > explicit indication to router C on when the LSP state is no longer
> > > > > required, then router C would retain the LSP state potentially for
> > > > > hours if the refresh interval is long. So, router C will not only
> > > > > store the state of the LSP but also periodically send the Path
> > > > > message (in Srefresh) downstream - thereby unnecessarily
> consuming
> > > > > resources that
> > > > could potentially be utilized for other LSPs.
> > > > > It should also be noted that the transit router cannot assume that
> > > > > Ingress LSR will be able to complete global repair within a
> > > > > particular time frame. The transit LSR procedures should be able
> > > > > to handle cases when global repair does not complete for some
> > > > > valid reason for an extended
> > > > period of time.
> > >
> > > MA> Clearly, you are not taking away the reliance on refresh messages.
> > > MA> Any ad-
> > > hoc mechanism which is not reliable such as a PathTear may still keep
> > > the state until it times out. Furthermore, relying exclusively on
> > > Message ID ACk as proposed in draft-ietf-teas-rsvp-te-scaling-rec in a
> scaled
> > network is not good.
> > > Data from deployment networks I am familiar with has shown that the
> > > retransmission of messages which are not acknowledged when the
> control
> > > plane churn is happening is causing much more churn. Our advice for
> > > our customers was to disable the requirement to receive a Message ID
> Ack.
> >
> > [Chandra] I disagree with both the points above.
> > - The extensions introduced in the draft only involve leveraging PathTear
> > mechanism in two slightly modified ways. If PathTear messages are
> reliably sent
> > by RSVP routers (i.e. implement RFC 2961 for tear down messages also
> and also
> > implement flow control described in draft-ietf-teas-rsvp-te-scaling-rec),
> then the
> > operator could set arbitrarily long refresh intervals. If the concern is the
> use of
> > "refresh independent" instead of "refresh-interval independent", then I
> can add
> > clarification to the draft.
> > - The cause of churn in RSVP-TE deployments running at high scale in
> terms of
> > LSPs is the lack of flow control in RSVP-TE implementations. It is possible
> to
> > implement flow control if implementations can leverage Message-id-ACKs
> suitably
> > as recommended in [draft-ietf-teas-rsvp-te-scaling-rec].
> >
> > > > > > 3. Section 3.1 - Problem Statement:
> > > > > > "
> > > > > > - If head-end A attempts to tear down LSP after step 1 but
> > > > > > before step 2 of the above sequence, then B may receive the tear
> > > > > > down message before step 2 and delete the LSP state from its
> > > > > > state database. If B deletes its state without informing D, with
> > > > > > long refresh interval this could cause (large) buildup of stale
> > > > > > state on D.
> > > > > > "
> > > > > > MA> I am not sure I understand the issue here. If B acting as a
> > > > > > MA> PLR receives a
> > > > > > PathTear from A, all it needs to do is to check that the primary
> > > > > > path neighbor (C in this case) is in down or in cleanup state
> > > > > > and send the PathTear over the bypass before deleting its own
> > > > > > local state