Re: draft-aruns-ccamp-rsvp-restart-ext-00

Reshad Rahman <rrahman@cisco.com> Fri, 12 March 2004 14:16 UTC

Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id JAA22658 for <ccamp-archive@ietf.org>; Fri, 12 Mar 2004 09:16:58 -0500 (EST)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1B1nT9-0002L2-00 for ccamp-archive@ietf.org; Fri, 12 Mar 2004 09:16:59 -0500
Received: from exim by ietf-mx with spam-scanned (Exim 4.12) id 1B1nS9-00025D-00 for ccamp-archive@ietf.org; Fri, 12 Mar 2004 09:15:58 -0500
Received: from psg.com ([147.28.0.62] ident=mailnull) by ietf-mx with esmtp (Exim 4.12) id 1B1nQe-0001cj-00 for ccamp-archive@ietf.org; Fri, 12 Mar 2004 09:14:24 -0500
Received: from lserv by psg.com with local (Exim 4.30; FreeBSD) id 1B1nBv-000Pdu-Is for ccamp-data@psg.com; Fri, 12 Mar 2004 13:59:11 +0000
Received: from [171.71.176.71] (helo=sj-iport-2.cisco.com) by psg.com with esmtp (Exim 4.30; FreeBSD) id 1B1nBk-000PaD-Q7 for ccamp@ops.ietf.org; Fri, 12 Mar 2004 13:59:00 +0000
Received: from sj-core-1.cisco.com (171.71.177.237) by sj-iport-2.cisco.com with ESMTP; 12 Mar 2004 06:01:43 +0000
Received: from mira-kan-a.cisco.com (IDENT:mirapoint@mira-kan-a.cisco.com [161.44.201.17]) by sj-core-1.cisco.com (8.12.10/8.12.6) with ESMTP id i2CDwtM7004438; Fri, 12 Mar 2004 05:58:56 -0800 (PST)
Received: from cisco.com (rrahman-u10.cisco.com [161.44.193.47]) by mira-kan-a.cisco.com (Mirapoint Messaging Server MOS 3.3.6-GR) with ESMTP id ABO84286; Fri, 12 Mar 2004 05:58:54 -0800 (PST)
Message-ID: <4051C21E.61753004@cisco.com>
Date: Fri, 12 Mar 2004 08:58:54 -0500
From: Reshad Rahman <rrahman@cisco.com>
Organization: Cisco Systems
X-Mailer: Mozilla 4.7 [en] (X11; U; SunOS 5.6 sun4u)
X-Accept-Language: en
MIME-Version: 1.0
To: Lou Berger <lberger@movaz.com>
CC: Nic Neate <Nic.Neate@dataconnection.com>, Adrian Farrel <adrian@olddog.co.uk>, "Satyanarayana, Arun" <aruns@movaz.com>, dimitri.papadimitriou@alcatel.be, ccamp@ops.ietf.org, Anca Zamfir <ancaz@cisco.com>, Junaid Israr <jisrar@cisco.com>, Zafar Ali <zali@cisco.com>
Subject: Re: draft-aruns-ccamp-rsvp-restart-ext-00
References: <53F74F5A7B94D511841C00B0D0AB16F8028708DE@baker.datcon.co.uk> <6.0.3.0.2.20040309105433.04e3fcb8@mo-ex1>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on ietf-mx.ietf.org
X-Spam-Status: No, hits=0.2 required=5.0 tests=AWL autolearn=no version=2.60
Content-Transfer-Encoding: 7bit

We'll have the part for simultaneous adjacent restarts in ~2 weeks.

Regards,
Reshad.

Lou Berger wrote:
> 
> Nic,
>          In one-on-one discussions at the IETF the authors agreed to do
> just these two things!  I know we're hoping to get the first part done late
> this week/early next week.  I can't speak for the other authors (of the
> other half of the to-be-merged draft) on the second part.
> 
> Lou
> 
> At 07:41 AM 3/9/2004 -0500, Nic Neate wrote:
> 
> >Hi Adrian (and draft-aruns authors),
> >
> >Responses below.  In summary, I agree
> >  - with the suggestion of being able to request RecoveryPath messages
> >  - that it would be very helpful if the procedures for recovering from
> >simultaneous adjacent restarts could be clarified.
> >
> >Thanks,
> >
> >Nic
> >
> > > -----Original Message-----
> > > From: Adrian Farrel
> > [<mailto:adrian@olddog.co.uk>mailto:adrian@olddog.co.uk]
> > > Sent: Saturday, March 06, 2004 12:47 PM
> > > To: Nic Neate; aruns@movaz.com; Movaz Networks - Louis Berger;
> > > dimitri.papadimitriou@alcatel.be
> > > Cc: ccamp@ops.ietf.org
> > > Subject: Re: draft-aruns-ccamp-rsvp-restart-ext-00
> > >
> > >
> > > Hi Nic,
> > >
> > > > I've just read your draft-aruns-ccamp-rsvp-restart-ext-00
> > > and it looks good.
> > > > In particular, we've been looking at using Restart for Fast
> > > Reroute LSPs for
> > > > some time and this draft provides everything that is needed
> > > (like recovering
> > > > the FAST_REROUTE, DETOUR, SENDER_TEMPLATE and ERO
> > > > objects from the downstream node when they are not
> > > available from upstream).
> > >
> > > Good. This concern was also raised in Seoul, and I am pleased
> > > to hear that the draft
> > > addresses these requirements.
> > >
> > > > However, I have a couple of concerns (not related to Fast Reroute).
> > > >
> > > >  - Your draft doesn't tackle, and won't work for,
> > > simultaneous restart of
> > > > adjacent nodes.  This is a problem that is tackled by
> > > > draft-rahman-ccamp-rsvp-restart-extensions, so merging the
> > > two drafts in
> > > > some way may be the best way to resolve that.  I realize
> > > that the Aruns
> > > > draft aims to make Restart possible for nodes which cannot
> > > retrieve state
> > > > from the data plane, and in that case recovering from
> > > simultaneous restart
> > > > of adjacent nodes isn't easy.  I think including some
> > > further extensions for
> > > > nodes which can retrieve some state from the data plane would be
> > > > appropriate.
> > >
> > > Retrieving state from the data plane only answers half of the
> > > problem. However, it is
> > > certainly important to audit the recovered control plane
> > > information against the known
> > > data plane state.
> > >
> >
> >Indeed.  My point was that if you can't retrieve even the outgoing signaling
> >interface from your data plane following a "nodal fault", you haven't got
> >much hope of reconstructing protocol state in between two nodes which
> >restarted at the same time (without some serious protocol enhancement
> >anyway).  Hence the suggestion of additional extensions to recover from
> >adjacent restarts for nodes which can retrieve the outgoing signaling
> >interface.
> >
> > > With regard to adjacent node failures and restarts, I believe
> > > there are actually
> > > sufficient capabilities here. Perhaps the authors would like
> > > to include text to clarify
> > > the procedures.
> > >
> >
> >If this is the case, then no problem.  I agree that some text clarifying
> >that in the draft would be very helpful.
> >
> > > >  - The back compatibility with RFC 3473 restart looks
> > > risky.  Draft Aruns
> > > > mandates that restarted nodes don't send Path Refreshes
> > > until either the
> > > > recovery period expires or a RecoveryPath is received from
> > > downstream.  In
> > > > the case that the downstream node only supports RFC 3473
> > > restart (and so
> > > > doesn't send RecoveryPaths), it may well timeout Path state
> > > at the same time
> > > > as or very soon after the recovery period expires.  Hence a
> > > dangerous timing
> > > > window is created.
> > >
> > > You have something here.
> > > However, section 9.5.3 of RFC3473 does not say that the
> > > neighbor MUST discard state that
> > > is not restored in the recovery time interval. Presumably it
> > > would simply recommence
> > > waiting for state refresh and so would time out after a 3.5
> > > refresh intervals from the end
> > > of the recovery interval.
> > >
> >
> >That would be sensible behavior, yes.  My concern (as I'm sure you realize)
> >is that it won't happen like that in all cases in the real world.
> >
> > > Some compromise may be introduced here by noting that 3473
> > > says that Path state SHOULD be
> > > restored within 1/2 of the recovery time. So we could follow
> > > this logic and use the first
> > > half of the time interval for the RecoveryPath message and
> > > the second half for backwards
> > > compatible recovery.
> > >
> > > On the other hand, I would prefer that this new capability
> > > (support for RecoveryPath
> > > message) was signaled in the Restart_Capabilities object so
> > > that the restarting node can
> > > know whether to expect to receive a RecoveryPath or not.
> > >
> > > > As a potential solution to both problems I'd suggest that a
> > > restarting node
> > > > receiving a Path message with a recovery label should
> > > always forward it
> > > > immediately as well as it can, and include both a recovery
> > > label and (for
> > > > back compatibility) a suggested label.  Similarly, it should forward
> > > > RecoveryPath messages immediately as well as it can.  I'd
> > > be happy to
> > > > discuss any of this further.
> > >
> > > This sounds very dangerous.
> > > "As well as it can" may include path computation which may
> > > pick a path other than the one
> > > previously in use. Hence the new Path message will be sent to
> > > a new neighbor. This
> > > disaster is no better than the problem we are trying to solve.
> > >
> >
> >Fine.  I had in mind that a node should only forward a Path message before
> >receiving a RecoveryPath if it was sure that it could send it (as per
> >RFC3473) to the right place and without a dangerous ERO.  In any case, I
> >prefer the idea of being able to request RecoveryPath messages and it sounds
> >like that will make recovery possible in more situations.
> >
> > > Cheers,
> > > Adrian
> > >