Re: ACK of register and purge packets

shur@arch4.ho.att.com Thu, 16 November 1995 00:15 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa26114; 15 Nov 95 19:15 EST
Received: from guelah.nexen.com by IETF.CNRI.Reston.VA.US id aa26110; 15 Nov 95 19:15 EST
Received: from maelstrom.nexen.com (maelstrom.nexen.com [204.249.99.5]) by guelah.nexen.com (8.6.12/8.6.12) with ESMTP id SAA15806; Wed, 15 Nov 1995 18:44:39 -0500
Received: (from root@localhost) by maelstrom.nexen.com (8.6.12/8.6.12) id SAA16365 for rolc-out; Wed, 15 Nov 1995 18:52:58 -0500
Received: from guelah.nexen.com (guelah.nexen.com [204.249.96.19]) by maelstrom.nexen.com (8.6.12/8.6.12) with ESMTP id SAA16356; Wed, 15 Nov 1995 18:52:54 -0500
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: shur@arch4.ho.att.com
Received: from gw2.att.com (gw2.att.com [192.20.239.134]) by guelah.nexen.com (8.6.12/8.6.12) with SMTP id SAA15774; Wed, 15 Nov 1995 18:37:36 -0500
Received: from arch4.ho.att.com by ig1.att.att.com id AA02761; Wed, 15 Nov 95 10:17:17 EST
Received: from dahlia.ho.att.com by arch4.ho.att.com (4.1/EMS-1.2 GIS) id AA15816; Wed, 15 Nov 95 10:17:17 EST
Received: by dahlia.ho.att.com (4.1/EMS-1.1 SunOS) id AA01273; Wed, 15 Nov 95 10:17:45 EST
Date: Wed, 15 Nov 95 10:17:45 EST
Message-Id: <9511151517.AA01273@dahlia.ho.att.com>
To: luciani@nexen.com
Subject: Re: ACK of register and purge packets
Cc: bcole@cisco.com, rolc@nexen.com, gray@ctron.com, yakov@cisco.com
X-Orig-Sender: owner-rolc@nexen.com
Precedence: bulk
X-Info: Submissions to rolc@nexen.com
X-Info: [Un]Subscribe requests to rolc-request@nexen.com
X-Info: Archives for rolc via ftp://ietf.cnri.reston.va.us/ietf-mail-archive/rolc/

Jim, Eric:

One reason why we might be coming out differently on this one is that
we have different scenarios in mind. In some scenarios, the loss of a
purge may not be that serious. As you both point out, if the loss of a
purge message is not made known to the sender, the consequence may
simply be loss of connectivity to a particular destination. Given that
the entry will time out after (say) some 10's of minutes (off the top
of my head), that is certainly not serious enough in of itself to
warrant the additional Ack/Retx mechanism. It is reasonable to refer to
the purge mechanism as an optimization for this case.

In other situations, e,g, R2R, the purge may be generated in order to
avoid a forwarding loop.  If my router was in a forwarding loop I would
like this to be corrected within seconds. I think that relying on a
cache timeout mechanism, which may take 10's of minutes is not
adequate.  I also would not refer to the purge mechanism as an
"optimization" in this case. Rather it is a critical part of the
protocol, and it is important that the purge sender knows whether or
not the purge was received successfully. A Purge  ack timeout should be
set at several secs. so that the router can recover from a lost purge
and break the forwarding loop within seconds rather than minutes. I
realize that this case (i.e, unsafe R2R ) is not formally part of the
latest spec and is being analyzed separately.  I think that if the NHRP
protocol is "fine-tuned" (i.e., purge mechanism is weakened) without
consideration of the R2R cases under discussion, it is likely that we
will end up with diverging protocols, with a one version where the
destination is a host or a "safe" router, and another for the unsafe
router case. This is undesirable, in my opinion.

> > > > >> > > > > It seems to me that the purge can/will be used to remove cache entries that
> > > > > could cause routing loops (in router-router case), or an invalid path (in
> > > > > the host-router case), when routing outside the cloud changes. A reliable
> > > > > purge is very important for these cases. Therefore I recommmend keeping
> > > > > the Purge ack.
> > > > >
> > > > > David.
> > > >
> > > >       I think that you missed the point (maybe?); because of the scenario in
> > > > which a purge is not sent as a result of a component failure (crash), a
> > > > "reliable purge" is an unattainable goal.
> > > > This reduces the purge protocol to the status of "optimization" and it no longer
> > > > makes sense to require purge originators to repeat purge messages until they are
> > > > acknowledged.
> > > 
> > > This is a general argument that also applies to any other protocol that
> > > uses Ack/Retx. Your are not suggesting that if it is possible that a network
> > > componenet that has to wait for an acknowledgment ever goes down, then Acks/Retx
> > > as a protocol mechanism should never be used?
> > >
> > 
> > Hmmm...  Looks to me like a bear trap but, what the heck, guess I'll stick my foot in it
> > anyways.  Certainly there are scenerios where a steady stream of messages looking for an
> > acknowledgement can pose a significant threat to a component's ability to recover from a
> > trauma which - coincidentally - initially precipitated the need for the messages.
> SSSSSSSSSNNNNNNNNNNNNNAAAAAAAAAAAAAAPPPPPPPPPPPPP!
> 
> > > >As one member pointed out, that makes
> > > > the case where a component is being taken off-line much simpler to handle as it is
> > > > not necessary for the component to wait for acknowledgement before continuing the
> > > > power-down sequence (this could be critical for devices equiped with short-term
> > > > back-up power provided for the express purpose of allowing for a graceful power-
> > > > down). If it is not necessary to repeat purge messages until acknowledged, then an
> > > > acknowledgement becomes superfluous.
> > > 
> > > I disagree. Acks are useful to speed up convergence.
> > 
> > I disagree (as well). Purge protocol speeds up convergence; acknowledgements MAY help.
> 
> Eric's comment is true IMHO as well.
> 
> > >                                       However this does not make Acks
> > > superfluous.
> > >
> > 
> > If you send a purge to a system component, it will either process it or it will not.  If
> > it does, and it sends you an acknowledgement, then you should not repeat the purge.  If
> > it does and it doesn't send you an acknowledgement, then it will probably not send one
> > for any new purge(s) sent to it (so repeating the purge is pointless). If it does not
> > process the purge, then either it did not receive the purge or it is too busy to process
> > it in a timely way.  Considering the potential consequences of the change that made a
> > purge necessary, this latter case may be more likely than might otherwise be the case.  
> > 
> > You may argue that the component should never be too busy to handle purge protocol - but
> > remember, purge protocol is only an optimization.  Consequently, it may come after
> > forwarding related functions in component priorities (forwarding being the component's
> > main purpose in life).  This could be particularly true if the component is obliged to
> > acknowledge every purge it processes (even those that no longer apply) - as opposed to
> > simply scanning it for applicability.  A bogged down component might need to acknowledge
> > many purge messages before receiving the one that fixes the cause of its busy-ness.
> > 
> > Faced with this dilemma, an implemention may "decide" to postpone (i.e. temporarily
> > discontinue) acknowledging purges while it scans each one coming in looking for specific
> > applicability.  In this case, re-sending all of the unacknowledged purges contributes to
> > the problem rather than to its solution.

I think people interested in buying NHRP products would feel more safe
with acknowledged purges. I know I would :-)

> > 
> > How do YOU spell superfluous?
> 
> Given when a purge is likely to be sent,  e.g., during a topological change,
> it seems ill advised to me to dedicate the resources to acknowledge a protocol
> mechanism which is simply an optimization.  Even if I were to have an acknowledgement
> on a purge, it would be the very last thing to which I would attend during
> such an occurance as topological change and I certainly would not keep a station
> up to attend to it in such an event. 

Requiring acks to purges doe not mean that a station is not allowed to
continue to power down if it needs to. If the station that originated
the purge has to go down before an ack is received so be it. Things may
get unpleasant if this happens, but presumably the number of times a
station does not retx a un-acked purge because it is powering down, is
a small fraction of the number of times it does retx un-acked purges.

> Remember that you have a timer
> waiting on that acknowledgement and timer pops are usually (not always) high
> priority events.  When that timer pops and grabs the cpu, I am very likely
> to just drop the event rather than dedicate resources to it during such a major
> network occurance.
> 
> Regards,
> -- Jim Luciani
> __________________________________________________________________________
> James V. Luciani    Ascom Nexion                    voice: +1 508 266-3450
> luciani@nexen.com   289 Great Rd., Acton MA 01720   FAX: +1 508 266-2300
> 

David.