Re: Questions on RSVP-TE Graceful Restart and the new Extensions

Dan Li <danli@huawei.com> Mon, 08 October 2007 03:10 UTC

Return-path: <owner-ccamp@ops.ietf.org>
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Iej0j-0005Sl-Kh for ccamp-archive@ietf.org; Sun, 07 Oct 2007 23:10:25 -0400
Received: from psg.com ([147.28.0.62]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1Iej0d-00074G-B1 for ccamp-archive@ietf.org; Sun, 07 Oct 2007 23:10:20 -0400
Received: from majordom by psg.com with local (Exim 4.67 (FreeBSD)) (envelope-from <owner-ccamp@ops.ietf.org>) id 1IeioX-0004Yg-QE for ccamp-data@psg.com; Mon, 08 Oct 2007 02:57:49 +0000
X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on psg.com
X-Spam-Level:
X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE, RDNS_NONE autolearn=no version=3.2.1
Received: from [61.144.161.54] (helo=szxga02-in.huawei.com) by psg.com with esmtp (Exim 4.67 (FreeBSD)) (envelope-from <danli@huawei.com>) id 1IeioU-0004Xu-8Q for ccamp@ops.ietf.org; Mon, 08 Oct 2007 02:57:48 +0000
Received: from huawei.com (szxga02-in [172.24.2.6]) by szxga02-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0JPK00GG8O84US@szxga02-in.huawei.com> for ccamp@ops.ietf.org; Mon, 08 Oct 2007 10:57:40 +0800 (CST)
Received: from huawei.com ([172.24.1.24]) by szxga02-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0JPK00DYKO823K@szxga02-in.huawei.com> for ccamp@ops.ietf.org; Mon, 08 Oct 2007 10:57:40 +0800 (CST)
Received: from l37133 ([10.70.77.75]) by szxml04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTPA id <0JPK001J5O8100@szxml04-in.huawei.com> for ccamp@ops.ietf.org; Mon, 08 Oct 2007 10:57:38 +0800 (CST)
Date: Mon, 08 Oct 2007 10:57:22 +0800
From: Dan Li <danli@huawei.com>
Subject: Re: Questions on RSVP-TE Graceful Restart and the new Extensions
To: Arun Satyanarayana <asatyana@cisco.com>, "Bardalai, Snigdho" <Snigdho.Bardalai@us.fujitsu.com>, "Ccamp (E-mail)" <ccamp@ops.ietf.org>, "Gao, Jianhua" <gjhhit@huawei.com>
Message-id: <00b101c80956$f296ddb0$4b4d460a@china.huawei.com>
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1409
X-Mailer: Microsoft Outlook Express 6.00.2800.1409
Content-type: multipart/alternative; boundary="Boundary_(ID_/94WMXfKIVN7q4yLr/V7xg)"
X-Priority: 3
X-MSMail-priority: Normal
References: <A278CCD6FF152E478C3CF84E4C3BC79D0222F53A@rchemx01.fnc.net.local>
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk
X-Spam-Score: 0.0 (/)
X-Scan-Signature: f884eb1d4ec5a230688d7edc526ea665

Questions on RSVP-TE Graceful Restart and the new ExtensionsHi Snigdho,

It's better to ignore the refresh message for N2 during the recovery process, i.e., before N2 receives the ACK message from N1 for the Hello message which indicates N2 restarts, N2 should ignore any refresh messages it received from N1. When N1 receives the Hello message from N2 in this case, the Srefresh message should be suppressed in N1 according to RFC3473.

As the editor of GR-description ID, we would like to add more text to address this important issue if the group come up with the decision.

For the second issue, the RFC3473 can be applied. If the upstream node of node C and the downstream node of node E both restart, node C and E should wait until their LSP states recovered. If node D restarts, its LSP states can be recovered from node C and E according to GR-EXT draft. Actually in the real deployment, the LSP in data plane is required not to be touched even the control plane fails, it's a local policy.

Regards,

Dan


  ----- Original Message ----- 
  From: Bardalai, Snigdho 
  To: Ccamp (E-mail) 
  Sent: Saturday, October 06, 2007 12:18 AM
  Subject: Questions on RSVP-TE Graceful Restart and the new Extensions


  Hi, 

  I have a couple of questions on RSVP-TE Graceful Restart and the new extensions being propose in draft-ietf-ccamp-rsvp-restart-ext-09.

  Did anybody come across any issues when the hello interval duration times the failure multiple (typically 3) is too large compared to the neighboring node restart duration? For example, if the RSVP-TE interval is 10 seconds, the multiple is 3 and the neighboring node restarts within 10 seconds then it is possible that the RSVP-TE hello will never detect a hello failure. 

  RFC3473 does describe detection of a node restart in this case based on a new source instance in the hello message, but we have come across an issue with NACKs being generated for an Srefresh message in this scenario.

  Please look-at the sequence diagram below: 

    N1                                N2 
    |                                 | 
    |                                 X (Restart start) 
    |  HELLO                          | 
    |-------------------------------->| 
    |                                 | 
    |  SRefresh                       | 
    |-------------------------------->| 
    |                                 | 
    |  HELLO                          | 
    |-------------------------------->| 
    |                                 | 
    |                                 X (Restart complete) 
    |  SRefresh                       | 
    |-------------------------------->| 
    |  NACK                           | 
    |<--------------------------------| 
    |  Path (without recovery label)  | 
    |-------------------------------->| 
    |                                 X (resoure allocation failed because the resouces are in use) 
    |  PathErr                        | 
    |<--------------------------------| 
    |  PathTear                       | 
    |-------------------------------->| 
    X (CON deletion)                  X (XCON deletion) 
    |                                 | 

  The issue is because N1 did not detect a hello failure it continues sending SRefreshes which may get NACKed by N2 once restart completes because there is no Path state corresponding to the SRefresh message. This NACK causes a Path refresh message to be generated but there is no RECOVERY_LABEL because N1 did not yet detect that N2 has restarted because hello exchanges have not yet started. PLEASE NOTE: This is based on an actual implementation and a real test.

  What is the solution to this issue because I don't see either N1 or N2 doing anything that is not compliant as per the current RFCs? Or is there something I have missed?

  The other issue I wanted to understand is with respect to the graceful restart extension. Will the RecoveryPath message handle issues when communication fails and a node restarts? There may be issues when somes nodes in the LSP path gets isolated from both upstream and downstream ends.

  Example, 

               A---B-x...x-C---D---E-x...x-F---G 

  Nodes C, D and E are isolated. If this condition persists and node's C,D and E restarts. Will the LSP get deleted after the recovery timer expires in node D? Can this be prevented ?

  Would appreciate your response. 

  Regards, 
  Snigdho