RE: draft-aruns-ccamp-rsvp-restart-ext-00

Nic Neate <Nic.Neate@dataconnection.com> Tue, 09 March 2004 13:06 UTC

Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA25405 for <ccamp-archive@ietf.org>; Tue, 9 Mar 2004 08:06:37 -0500 (EST)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1B0gwP-0006lZ-00 for ccamp-archive@ietf.org; Tue, 09 Mar 2004 08:06:37 -0500
Received: from exim by ietf-mx with spam-scanned (Exim 4.12) id 1B0gvT-0006bD-00 for ccamp-archive@ietf.org; Tue, 09 Mar 2004 08:05:40 -0500
Received: from psg.com ([147.28.0.62] ident=mailnull) by ietf-mx with esmtp (Exim 4.12) id 1B0guh-0006Rk-00 for ccamp-archive@ietf.org; Tue, 09 Mar 2004 08:04:51 -0500
Received: from lserv by psg.com with local (Exim 4.30; FreeBSD) id 1B0gYG-000HiZ-0N for ccamp-data@psg.com; Tue, 09 Mar 2004 12:41:40 +0000
Received: from [192.91.191.4] (helo=goodman.datcon.co.uk) by psg.com with esmtp (Exim 4.30; FreeBSD) id 1B0gY5-000Hgo-2J for ccamp@ops.ietf.org; Tue, 09 Mar 2004 12:41:29 +0000
Received: by goodman.datcon.co.uk with Internet Mail Service (5.5.2653.19) id <GTF8DNAX>; Tue, 9 Mar 2004 12:41:26 -0000
Message-ID: <53F74F5A7B94D511841C00B0D0AB16F8028708DE@baker.datcon.co.uk>
From: Nic Neate <Nic.Neate@dataconnection.com>
To: 'Adrian Farrel' <adrian@olddog.co.uk>, aruns@movaz.com, Movaz Networks - Louis Berger <lberger@movaz.com>, dimitri.papadimitriou@alcatel.be
Cc: ccamp@ops.ietf.org
Subject: RE: draft-aruns-ccamp-rsvp-restart-ext-00
Date: Tue, 09 Mar 2004 12:41:10 -0000
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain; charset="iso-8859-1"
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on ietf-mx.ietf.org
X-Spam-Status: No, hits=0.0 required=5.0 tests=none autolearn=no version=2.60

Hi Adrian (and draft-aruns authors),

Responses below.  In summary, I agree
 - with the suggestion of being able to request RecoveryPath messages
 - that it would be very helpful if the procedures for recovering from
simultaneous adjacent restarts could be clarified.

Thanks,

Nic


> -----Original Message-----
> From: Adrian Farrel [mailto:adrian@olddog.co.uk]
> Sent: Saturday, March 06, 2004 12:47 PM
> To: Nic Neate; aruns@movaz.com; Movaz Networks - Louis Berger;
> dimitri.papadimitriou@alcatel.be
> Cc: ccamp@ops.ietf.org
> Subject: Re: draft-aruns-ccamp-rsvp-restart-ext-00
> 
> 
> Hi Nic,
> 
> > I've just read your draft-aruns-ccamp-rsvp-restart-ext-00 
> and it looks good.
> > In particular, we've been looking at using Restart for Fast 
> Reroute LSPs for
> > some time and this draft provides everything that is needed 
> (like recovering
> > the FAST_REROUTE, DETOUR, SENDER_TEMPLATE and ERO
> > objects from the downstream node when they are not 
> available from upstream).
> 
> Good. This concern was also raised in Seoul, and I am pleased 
> to hear that the draft
> addresses these requirements.
> 
> > However, I have a couple of concerns (not related to Fast Reroute).
> >
> >  - Your draft doesn't tackle, and won't work for, 
> simultaneous restart of
> > adjacent nodes.  This is a problem that is tackled by
> > draft-rahman-ccamp-rsvp-restart-extensions, so merging the 
> two drafts in
> > some way may be the best way to resolve that.  I realize 
> that the Aruns
> > draft aims to make Restart possible for nodes which cannot 
> retrieve state
> > from the data plane, and in that case recovering from 
> simultaneous restart
> > of adjacent nodes isn't easy.  I think including some 
> further extensions for
> > nodes which can retrieve some state from the data plane would be
> > appropriate.
> 
> Retrieving state from the data plane only answers half of the 
> problem. However, it is
> certainly important to audit the recovered control plane 
> information against the known
> data plane state.
> 

Indeed.  My point was that if you can't retrieve even the outgoing signaling
interface from your data plane following a "nodal fault", you haven't got
much hope of reconstructing protocol state in between two nodes which
restarted at the same time (without some serious protocol enhancement
anyway).  Hence the suggestion of additional extensions to recover from
adjacent restarts for nodes which can retrieve the outgoing signaling
interface.

> With regard to adjacent node failures and restarts, I believe 
> there are actually
> sufficient capabilities here. Perhaps the authors would like 
> to include text to clarify
> the procedures.
> 

If this is the case, then no problem.  I agree that some text clarifying
that in the draft would be very helpful.

> >  - The back compatibility with RFC 3473 restart looks 
> risky.  Draft Aruns
> > mandates that restarted nodes don't send Path Refreshes 
> until either the
> > recovery period expires or a RecoveryPath is received from 
> downstream.  In
> > the case that the downstream node only supports RFC 3473 
> restart (and so
> > doesn't send RecoveryPaths), it may well timeout Path state 
> at the same time
> > as or very soon after the recovery period expires.  Hence a 
> dangerous timing
> > window is created.
> 
> You have something here.
> However, section 9.5.3 of RFC3473 does not say that the 
> neighbor MUST discard state that
> is not restored in the recovery time interval. Presumably it 
> would simply recommence
> waiting for state refresh and so would time out after a 3.5 
> refresh intervals from the end
> of the recovery interval.
> 

That would be sensible behavior, yes.  My concern (as I'm sure you realize)
is that it won't happen like that in all cases in the real world.

> Some compromise may be introduced here by noting that 3473 
> says that Path state SHOULD be
> restored within 1/2 of the recovery time. So we could follow 
> this logic and use the first
> half of the time interval for the RecoveryPath message and 
> the second half for backwards
> compatible recovery.
> 
> On the other hand, I would prefer that this new capability 
> (support for RecoveryPath
> message) was signaled in the Restart_Capabilities object so 
> that the restarting node can
> know whether to expect to receive a RecoveryPath or not.
> 
> > As a potential solution to both problems I'd suggest that a 
> restarting node
> > receiving a Path message with a recovery label should 
> always forward it
> > immediately as well as it can, and include both a recovery 
> label and (for
> > back compatibility) a suggested label.  Similarly, it should forward
> > RecoveryPath messages immediately as well as it can.  I'd 
> be happy to
> > discuss any of this further.
> 
> This sounds very dangerous.
> "As well as it can" may include path computation which may 
> pick a path other than the one
> previously in use. Hence the new Path message will be sent to 
> a new neighbor. This
> disaster is no better than the problem we are trying to solve.
> 

Fine.  I had in mind that a node should only forward a Path message before
receiving a RecoveryPath if it was sure that it could send it (as per
RFC3473) to the right place and without a dangerous ERO.  In any case, I
prefer the idea of being able to request RecoveryPath messages and it sounds
like that will make recovery possible in more situations.

> Cheers,
> Adrian
>