RE: draft-aruns-ccamp-rsvp-restart-ext-00

Lou Berger <lberger@movaz.com> Tue, 09 March 2004 16:15 UTC

Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id LAA06514 for <ccamp-archive@ietf.org>; Tue, 9 Mar 2004 11:15:44 -0500 (EST)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1B0jtR-0003gh-00 for ccamp-archive@ietf.org; Tue, 09 Mar 2004 11:15:45 -0500
Received: from exim by ietf-mx with spam-scanned (Exim 4.12) id 1B0jsJ-0003VQ-00 for ccamp-archive@ietf.org; Tue, 09 Mar 2004 11:14:36 -0500
Received: from psg.com ([147.28.0.62] ident=mailnull) by ietf-mx with esmtp (Exim 4.12) id 1B0jrI-0003B9-00 for ccamp-archive@ietf.org; Tue, 09 Mar 2004 11:13:32 -0500
Received: from lserv by psg.com with local (Exim 4.30; FreeBSD) id 1B0jbx-000Jhu-E0 for ccamp-data@psg.com; Tue, 09 Mar 2004 15:57:41 +0000
Received: from [65.205.166.188] (helo=jera.movaz.com) by psg.com with esmtp (Exim 4.30; FreeBSD) id 1B0jbl-000Jce-NH for ccamp@ops.ietf.org; Tue, 09 Mar 2004 15:57:29 +0000
Received: from lb-laptop.movaz.com (kenaz.atlanta.movaz.com [172.16.8.184]) by jera.movaz.com (Postfix) with ESMTP id 041845F6C; Tue, 9 Mar 2004 10:57:28 -0500 (EST)
Message-Id: <6.0.3.0.2.20040309105433.04e3fcb8@mo-ex1>
X-Sender: lb@mo-ex1
X-Mailer: QUALCOMM Windows Eudora Version 6.0.3.0
Date: Tue, 09 Mar 2004 10:57:27 -0500
To: Nic Neate <Nic.Neate@dataconnection.com>
From: Lou Berger <lberger@movaz.com>
Subject: RE: draft-aruns-ccamp-rsvp-restart-ext-00
Cc: Adrian Farrel <adrian@olddog.co.uk>, "Satyanarayana, Arun" <aruns@movaz.com>, "Berger, Lou" <lberger@movaz.com>, dimitri.papadimitriou@alcatel.be, ccamp@ops.ietf.org
In-Reply-To: <53F74F5A7B94D511841C00B0D0AB16F8028708DE@baker.datcon.co.u k>
References: <53F74F5A7B94D511841C00B0D0AB16F8028708DE@baker.datcon.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on ietf-mx.ietf.org
X-Spam-Status: No, hits=0.0 required=5.0 tests=none autolearn=no version=2.60

Nic,
         In one-on-one discussions at the IETF the authors agreed to do 
just these two things!  I know we're hoping to get the first part done late 
this week/early next week.  I can't speak for the other authors (of the 
other half of the to-be-merged draft) on the second part.

Lou

At 07:41 AM 3/9/2004 -0500, Nic Neate wrote:

>Hi Adrian (and draft-aruns authors),
>
>Responses below.  In summary, I agree
>  - with the suggestion of being able to request RecoveryPath messages
>  - that it would be very helpful if the procedures for recovering from
>simultaneous adjacent restarts could be clarified.
>
>Thanks,
>
>Nic
>
> > -----Original Message-----
> > From: Adrian Farrel 
> [<mailto:adrian@olddog.co.uk>mailto:adrian@olddog.co.uk]
> > Sent: Saturday, March 06, 2004 12:47 PM
> > To: Nic Neate; aruns@movaz.com; Movaz Networks - Louis Berger;
> > dimitri.papadimitriou@alcatel.be
> > Cc: ccamp@ops.ietf.org
> > Subject: Re: draft-aruns-ccamp-rsvp-restart-ext-00
> >
> >
> > Hi Nic,
> >
> > > I've just read your draft-aruns-ccamp-rsvp-restart-ext-00
> > and it looks good.
> > > In particular, we've been looking at using Restart for Fast
> > Reroute LSPs for
> > > some time and this draft provides everything that is needed
> > (like recovering
> > > the FAST_REROUTE, DETOUR, SENDER_TEMPLATE and ERO
> > > objects from the downstream node when they are not
> > available from upstream).
> >
> > Good. This concern was also raised in Seoul, and I am pleased
> > to hear that the draft
> > addresses these requirements.
> >
> > > However, I have a couple of concerns (not related to Fast Reroute).
> > >
> > >  - Your draft doesn't tackle, and won't work for,
> > simultaneous restart of
> > > adjacent nodes.  This is a problem that is tackled by
> > > draft-rahman-ccamp-rsvp-restart-extensions, so merging the
> > two drafts in
> > > some way may be the best way to resolve that.  I realize
> > that the Aruns
> > > draft aims to make Restart possible for nodes which cannot
> > retrieve state
> > > from the data plane, and in that case recovering from
> > simultaneous restart
> > > of adjacent nodes isn't easy.  I think including some
> > further extensions for
> > > nodes which can retrieve some state from the data plane would be
> > > appropriate.
> >
> > Retrieving state from the data plane only answers half of the
> > problem. However, it is
> > certainly important to audit the recovered control plane
> > information against the known
> > data plane state.
> >
>
>Indeed.  My point was that if you can't retrieve even the outgoing signaling
>interface from your data plane following a "nodal fault", you haven't got
>much hope of reconstructing protocol state in between two nodes which
>restarted at the same time (without some serious protocol enhancement
>anyway).  Hence the suggestion of additional extensions to recover from
>adjacent restarts for nodes which can retrieve the outgoing signaling
>interface.
>
> > With regard to adjacent node failures and restarts, I believe
> > there are actually
> > sufficient capabilities here. Perhaps the authors would like
> > to include text to clarify
> > the procedures.
> >
>
>If this is the case, then no problem.  I agree that some text clarifying
>that in the draft would be very helpful.
>
> > >  - The back compatibility with RFC 3473 restart looks
> > risky.  Draft Aruns
> > > mandates that restarted nodes don't send Path Refreshes
> > until either the
> > > recovery period expires or a RecoveryPath is received from
> > downstream.  In
> > > the case that the downstream node only supports RFC 3473
> > restart (and so
> > > doesn't send RecoveryPaths), it may well timeout Path state
> > at the same time
> > > as or very soon after the recovery period expires.  Hence a
> > dangerous timing
> > > window is created.
> >
> > You have something here.
> > However, section 9.5.3 of RFC3473 does not say that the
> > neighbor MUST discard state that
> > is not restored in the recovery time interval. Presumably it
> > would simply recommence
> > waiting for state refresh and so would time out after a 3.5
> > refresh intervals from the end
> > of the recovery interval.
> >
>
>That would be sensible behavior, yes.  My concern (as I'm sure you realize)
>is that it won't happen like that in all cases in the real world.
>
> > Some compromise may be introduced here by noting that 3473
> > says that Path state SHOULD be
> > restored within 1/2 of the recovery time. So we could follow
> > this logic and use the first
> > half of the time interval for the RecoveryPath message and
> > the second half for backwards
> > compatible recovery.
> >
> > On the other hand, I would prefer that this new capability
> > (support for RecoveryPath
> > message) was signaled in the Restart_Capabilities object so
> > that the restarting node can
> > know whether to expect to receive a RecoveryPath or not.
> >
> > > As a potential solution to both problems I'd suggest that a
> > restarting node
> > > receiving a Path message with a recovery label should
> > always forward it
> > > immediately as well as it can, and include both a recovery
> > label and (for
> > > back compatibility) a suggested label.  Similarly, it should forward
> > > RecoveryPath messages immediately as well as it can.  I'd
> > be happy to
> > > discuss any of this further.
> >
> > This sounds very dangerous.
> > "As well as it can" may include path computation which may
> > pick a path other than the one
> > previously in use. Hence the new Path message will be sent to
> > a new neighbor. This
> > disaster is no better than the problem we are trying to solve.
> >
>
>Fine.  I had in mind that a node should only forward a Path message before
>receiving a RecoveryPath if it was sure that it could send it (as per
>RFC3473) to the right place and without a dangerous ERO.  In any case, I
>prefer the idea of being able to request RecoveryPath messages and it sounds
>like that will make recovery possible in more situations.
>
> > Cheers,
> > Adrian
> >