Re: [OSPF] Comments / questions on RFC3623 and Update GR part 1 : BMA env

Sujay Gupta,

	Let me give you some insight on part of this thinking.
	IMO, GR has added a few holes in our converged routing
	assumptions and that your update might be able to close
	a couple..

	There is alot here!!!!

	Summary says to think multiple
	times before you exit prematurely based on a topology
	change (LSDB) and assume that you need router "X" no matter
	what.

	So...

		Why don't you tell me how a LSDB change is
		going to be communicated iff the env is
		BMA and GR router "X" is the DR and no
		BDR exists?

		I consider this a huge hole..
		I don't really see a DR-other communicating
		  LSAs to other DR-others to allow them to
		  understand about a topology change.

	Second, lets assume that router X's is an extremely
	important router within our area with the
	connection to the internet, etc..  If it weren't we
	would / COULD just take it down.

	thus, we have router "X" is a DR, and is the ONLY path
	to a set of routes and end-systems within this area..

	Then lets conservatively look that a topology change is
	a removal of a single router LSA that has no effect
	after a SPF calculation.

	If router "X" has sucessfully entered its grace
	period / GR interval, aren't we ...

	   .... black holing the routers, routes, end-systems
	   that reside on the other end of router "X"?

	   .... since router "X" CAN not probe to verify that
	  a topology change has occured. It is in non-stop
	  forwarding mode.. It will still forward pkts to
	  the removed LSA routes until...

	so what...
	  The pkts that are forwarded by router "X" to
	  say non-reach Z will cause minimal harm.. They will be
	  dropped and most likely a ICMP message will
	  be sent on the reverse path... The ICMP
	  message might be interpreted by X, but if it
	  is just forwarded,

	  So,,, what we have here is the unability is to allow
	  router "X" to state that its routes are of a level
	  of imporantance that most topology changes should
	  be balanced against a delay in making the change.

		New LSAs or changed LSAs that don't show up as
	        primary LSAs after the SPF calculation COULD
		in theory IMO, also be ignored. The originator
		could in theory check out whether the LSA is
		needed to be communicated.

	       (This says just because a minor toplogy change
		has occured do we need to go thru all the
	 	work and possibly isolate router's "X"'s
		dependencies).

	  If we feel that the new / changed / or removed LSA
	 should be communicated ...
	   router X still isn't going to be in convergence
	   until it resync it's LSDB

	   If the change of LSA results in a better route, is
	   it worth isolating router's X's entities???...

      Third, most large OSPF router vendors support the ability
      to set the LSA refresh time on non-DNA LSAs. If we are in
      a normal environment that the admin does this to decrease
      the number of LSA refreshes versus the drastic steps to
      using DNA LSAs. Then assuming having a 45 min LSA refresh
      would not be unreasonable.

	Going with this logic we have a few simple steps to
	take. 
		One is that all HELPERS that originate LSAs,
	        re-originate their LSAs upon first seeing a grace 
		LSA in addition to router "X" doing the same.

		This allows us to take almost the full hour of the
		grace-time.

		If we could communicate a wished grace-time in
		excess of 1 hour, we could in theory send out
		DNA LSAs which would allow a static environment
		to support grace times in excess of 1 hour.
		:: Part of this logic follows the damand circuit
		   support.

	Part of this is can, should we figure out a way to stop
	router "X" in his grace-period/interval from forwarding pkts
	based on obsolete OSPF control information.?????

	The other part says, why don't we just run in a degraded mode
	if we -realize that router "X" is so important that we would
	just isolate ourselves or so many routes/routers that we
	REALLY need to keep router "X" alive as long as feasibly
	possible.

	Third minor item is I didn't see any suggestion of removing
	entitites from HELPERS forwarding table based on removing the
	GR router because of a LSDB change. 

	Mitchell Erblich
	------------------

Hi Mitchell,
I have some comments inline;

On 11/14/06, Erblichs < erblichs@earthlink.net> wrote:

     Group,

             First, I have a few questions to the base and the update.
             These comments / questions pertain ONLY to BMA envs
             but are also relevelent to other envs.

             FYI) IMO, helpers are defined by two major requirements 
                a) the ability to recieve a grace-LSA before a hello
                   "new hello" and not restart the adj
                b) the ability of a helper "Y" to (re)send a router-LSA,
etc upon 
                   router "X" exiting graceful restart that are in its
                   retransmit-list.

             If this "helper" split functionality is supported, then
only
             a limited number of routers within a area need to be in 
             Full-helper mode.

             Within BMA envs a DR-other above helper "b" functionality
             is not really necessary for retransmit. We use the DR/BDR
             for retransmit support.

             IFFFFFFF, a GR router is the DR, who is in it grace-period,
             and a DR-other stops sending hellos, if their are no
             alternate paths thru another router for any forwarding
             data. Then what benefit is it to prematurely end helper 
             mode? Basicly, most algs can determine ahead of time
whether
             the exiting router has contributed a primary link/path. If
             it isn't then it has no consequence. Yes, data pkts that
have
             a intermediate dest will be forwarded 1 or two extra hops, 
             before they are dropped.

             (Update: 2.0 Action on route calculation)
             If a topology change occurs and the GR restarting router
             is identified as the next hop. By definition it is in
non-stop 
             forwarding mode and should be able to forward packets. If
this
             is identified as a topology change and the ONLY route is
thru
             this GR router, shouldn't packets be forwarded thru it
anyway? 
             Not doing so blackholes those routes, IMO.

Suj>> Here topology change also includes routes which are no more valid,
if the lsa seeks a route to be removed and that route involves the GR
router 
the helper exits, thus black-holing and circumventing the GR router as
router
under plain restart.Note here that previously(rfc3623) the helper was
exiting 
for all reasons, which has been curbed now so as you have pointed out if
the 
lsa effects the route which is ONLY through the restarting router will
the system
exit, which makes sense.

	Yes, and no.. Router X, once he has entered GR, is not in the mode
	to accept topology changes.. He will forwarded no-matter what based
	on his old information. Thus, any routing inconsistencies done by
	him until his GR time period ends CAN result in him routing pkts
	improperly. This is basic IP forwarding.

	The originator of the change will be the one who can make the
	correct IP forwarding decisions with reciving or forwarding
	IP pkts based on the LSA change. However, because router X is no 
	longer in sync, should he inform others not to send or allow reciept 
	of data pkts from router X? 

	At the end of the grace-period or when router X re-enables
	his OSPF control code, he can be resync'ed. Until then, yes
	we SHOULD have some level of AI that determines the benefit
	of keeping router X in GR.

	Second, most environments have most of the
	LSA changes not result in any primary path alterations..

             Thus,
             1) Helpers should be able to be able to support a, without
b
                If this is the case, router "X", if it is a DR should
allow
                all the DR-others to support a, while only the BDR need
be 
                a "Full-Helper".

Suj>> I like your analysis in breaking up the helper job into partial
and full.
IMO  "Helper" is a behaviour of a router w.r.t to the GR operation,
where as 
the idea whether the helper is "Full-helper" else a "Partial" depends
upon the result 
of election. In brief a "Helper" is  always expected to perform
"Full-helper" features as per your describe , if it does less; guess it
is
plain lucky ! ,( also note Helper support maybe a feature switched on /
off by admin, not a partial/full helper). 

             2) If router "X" is a DR-other only one of a "DR or BDR" 
                need to be a Full-helper. All LSAs can be placed on the
                retransmit list of the for router "X" by the DR or BDR.

             3) Topology change. A LSA aging out of the LSDB, will not 
                necessarily remove a element in the forwarding table?
                If this can be detected, should it still force the exit
                of a helper wrt graceful restart?

Suj>> as per the base ospf operation a max aged lsa on the lsa, should
trigger 
it to flood the lsa (-->" which in turn should instigate the originator
of the lsa to resend a fresh copy, if fails ends in removal of the
corresponding route everywhere! "), 
thus if the case happens as you have described a GR with "strict lsa
checking 
enabled" i.e catch all topology changes should force the helper to exit
as it has to 
flood this lsa to the GR router, here the update draft would tell you
prohibit your
exit only if the lsa somehows affects a route through the GR router,
else let peace 
prevail, no need to exit.

             4) If router "X" is a DR and has entered graceful restart
                with no BDR present, should the DR-other routers allow
the
                LSAs to become not refreshed and aged out because the
                grace period has not expired. 
                How can "one" or more "HELPERS" force a new DR election?

Suj>> IMO the DR-other allows normal aging to continue, if the lsa age's
out
before grace period expiry, the helpers would exit GR(see above) , again 
applying the same logic.

A new election would disturb all previous adjacencies, do we need it?

             5) Even with a 30 min max for the grace period, some
routers
                may only retransmit once every 45 mins or so. How can we
                guarantee that those retransmited-LSAs are not aged-out
if
                router "X" is the DR and is down for a longer time?

                 Shouldn't recieving the first grace-LSAs from a
restarting
                 router initiate a re-send of the LSAs?

Suj>>  With the LSRefreshtime is 30 min , the MaxAge double(60min), the
max grace period is same as LSRefreshtime 30 min, I
dont see how we reach a stage where 
the problem will come, you might want to consider some border cases i
guess there the previous arguments(see above) hold good.

             Mitchell Erblich

Best Regards,
Sujay 

     _______________________________________________ 
     OSPF mailing list
     OSPF@ietf.org
     https://www1.ietf.org/mailman/listinfo/ospf

_______________________________________________
OSPF mailing list
OSPF@ietf.org
https://www1.ietf.org/mailman/listinfo/ospf