Re: [Idr] draft-uttaro-idr-bgp-persistence-00

<> Wed, 02 November 2011 14:45 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 44F3A11E80B6 for <>; Wed, 2 Nov 2011 07:45:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.649
X-Spam-Status: No, score=-2.649 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, HELO_EQ_FR=0.35, HTML_MESSAGE=0.001, J_CHICKENPOX_13=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id k0iqCOuiVHkP for <>; Wed, 2 Nov 2011 07:44:59 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 4AC3321F8E21 for <>; Wed, 2 Nov 2011 07:44:58 -0700 (PDT)
Received: from (localhost.localdomain []) by localhost (Postfix) with SMTP id BB1B79B0002; Wed, 2 Nov 2011 15:45:59 +0100 (CET)
Received: from (unknown []) by (Postfix) with ESMTP id A04098F0001; Wed, 2 Nov 2011 15:45:59 +0100 (CET)
Received: from ([]) by with Microsoft SMTPSVC(6.0.3790.4675); Wed, 2 Nov 2011 15:44:57 +0100
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CC996D.FD290DEE"
Date: Wed, 2 Nov 2011 15:44:56 +0100
Message-ID: <>
In-Reply-To: <>
Thread-Topic: [Idr] draft-uttaro-idr-bgp-persistence-00
Thread-Index: AcyVOX60v/Fmg6/PRreOWt6b9nPXSAEFQVbg
References: <> <><><> <><> <>
From: <>
To: <>
X-OriginalArrivalTime: 02 Nov 2011 14:44:57.0519 (UTC) FILETIME=[FDC8EBF0:01CC996D]
Subject: Re: [Idr] draft-uttaro-idr-bgp-persistence-00
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 02 Nov 2011 14:45:08 -0000



Please find some comments inlined [BD]


From: [] On Behalf Of
Enke Chen
Sent: Friday, October 28, 2011 8:19 AM
Cc: List;
Subject: Re: [Idr] draft-uttaro-idr-bgp-persistence-00



My comments are inlined.

On 10/27/11 1:17 PM, UTTARO, JAMES wrote: 

GR is a solution that is essentially local in scope it does not have the
ability to inform downstream speakers of the viability of routing state
from the point of possible control plane failure. OTOH Persistence does
propagate the condition of state. This provides distinct advantages in
terms of customers awareness of the SPs control plane. One could imagine
a customer receiving a STALE path and responding by selecting a backup.
Some of the extensions to this draft that I have considered in colouring
of STALE to inform if the condition arises from a local ( PE ) or
internal iBGP ( RR ) failures.. 
GR makes no distinction from STALE state and ACTIVE state.. This can
lead to the STALE path still being preferred throughout the topology.
IMO this is incorrect behavior regardless of the comparison. 
PERSISTENCE allows for a customer to indicate which paths should be
candidates. Customers may want to immediately failover to the backup for
some paths and not for others. GR is not capable of doing this it is all
or nothing. The granularity is not sufficient. It needs to be at the
path level. There may even be a case for having even more granularity
i.e a per path timer.. GR is not capable of being extended for either of
these cases.

I am not sure how this path level persistence would work operationally.
Without the detailed information of a provider's network, how would a
customer know what kind of failures and recovery that they might
experience?   Consider the example of the simultaneous RR failures in
the draft,  why wouldn't any customer not to want to protect against
such failures?   The end result could be that the PERSISTENCE flag is
always set, thus losing its significance.

Regarding the use of the STALE state vs ACTIVE state, clearly there is a

[BD] clearly. Cf my previous email.


   GR uses the stale routes in order to avoid forwarding churns, which
has been a critical requirement for a long time.   If there is a real
need for favoring a ACTIVE one over a STALE one in GR, it can be done by
a simple knob.

[BD] That "simple knob" would contradict the GR spec "The router MUST
NOT  differentiate between stale and other routing information during


As you know, BGP is full of knobs that adjust behaviors for different
needs :-)

[BD] sure. But I fail to see why a set of knobs independently defined by
each implementation would be better than a draft describing the expected

GR does not provide protection through successive restarts of the
session. I believe that if this occurs the state will be invalidated. So
for a session that is bouncing due to overload condition GR will not
provide the required protection

This can be addressed by a simple knob to set the min stale timer for

[BD] I'm not sure to understand how changing the min stale timer would
address this. Again, this "simple knob" would contradict the GR spec "To
deal with possible consecutive restarts, a route (from the peer)
previously marked as stale MUST be deleted."
GR does not employ a make before break strategy. All state is
invalidated first then the newly learned state is processed. This leads
to routing churn especially if the majority of the state is the same
which I am pretty sure is the case

Such behavior would be an implementation bug that needs to be fixed.
But it is not an issue with the protocol itself.

This is what we have in 4.2. Procedures for the Receiving Speaker, RFC


   The Receiving Speaker MUST replace the stale routes by the routing
   updates received from the peer.  Once the End-of-RIB marker for an
   address family is received from the peer, it MUST immediately remove
   any routes from the peer that are still marked as stale for that
   address family.


There are several possibilities for the premature purge of the stale
routes. For example, the "Forwarding state" flag was somehow not set
after the session was re-established, or the the EOR was sent
prematurely.   Further investigation will be needed in order to identify
any possible implementation or config issues involved in your setup.

GR invalidates state due to the case of protocol error i.e A malformed
update will invalidate all of the state. This is not the desired

It has been addressed by the following extension:

[BD] I would rather have said that this should be addressed by
draft-ietf-idr-optional-transitive-04. (as in case of a malformed
update, draft-keyupate-idr-bgp-gr-notification, would mostly GR restart
the BGP session again and again indefinitely). However, as hard as we
try, I would not bet that draft-ietf-idr-optional-transitive will cover
all possible future cases. This is mostly a (good IMHO) reaction to past
issue, rather than a solution to all future issues BGP could have to
face. OTOH, persistence could be the safety net to address the
unexpected issues.

GR is not specific as to which events invoke it or not. From my read on
the draft it is not clear if holdtime expiration invokes GR or not.. The
draft is unclear.

I think that it is covered by the above extension.  If not, it should be

[BD] Indeed. I think this should be clarify in the GR spec. Could you
please clarify this in draft-chen-idr-rfc4724bis-00.txt?

It is not clear to me how RRs and PEs differ in using GR. 

I think that there is a main difference when a RR is not in the
forwarding path.  In that case, the RR should always set the F bit in
the GR Capability so that its clients will continue forwarding after
they lose the sessions with RR.  It is a deployment issue, though.

The time that state can persist is limit to about 1 hour max. 

I think that you are talking about the "Restart time" field which has 12
bits and amount to about 68 minutes.  The "Restart time" is for the
session re-establishment.  It does not impact the duration for holding
stale routes after the session is re-established.

If the session does not get re-established in 68 minutes, the stale
routes would be purged.  That is a long time, isn't it?   However, if
one really wants to extend the session re-establishment time and
continue to hold stale routes, it can be done by a simple knob.

[BD] It's advertised in the GR capability and bounded by the 12 bits
fields. Eventually, you could have local knob to override every GR
behavior, but I would not call this GR anymore.

GR does detail the behavior where convergence is not achieved between
restarts.. Similar to above.. 

The min stale timer knob can cover it (see above).

But do you meant "does not"?  We can certainly clarify in 4724bis if
that is the case.

I do not believe that the current GR paradigm can be extended to cover
the majority of the cases above.

Except for the path level persistence you mentioned, I believe the GR
will be able to address all other persistence requirements you listed,
with some simple knobs and some implementation enhancements. 
[BD] What about calling "persistence" this set of "simple knobs and
implementation enhancements"?

       Jim Uttaro

Thanks.   -- Enke

-----Original Message-----
From: Enke Chen [] 
Sent: Wednesday, October 26, 2011 8:43 PM
Cc:; List; Enke Chen
Subject: Re: [Idr] draft-uttaro-idr-bgp-persistence-00
Hi, folks:
I have a hard time in understanding what new problems (beyond the GR) 
the draft try to solve :-(
If the concern is about the simultaneous RR failure as shown in the 
examples in Sect. 6 Application, that can be addressed easily using GR.

As the RRs are not in the forwarding path, it means that the forwarding 
is not impacted (thus is preserved) during the restart of a RR.   The 
Forwarding State bit (F) in the GR capability should always be set by 
the RR when it is not in the forwarding path.
Also in the case of simultaneous RR failure, I do not see why one would 
want to retain some routes, but not others, using the communities 
specified in the draft.  As the RRs are not in the forwarding path, 
wouldn't be better to retain all the routes on a PE/client?
As you might be aware, efforts have been underway to address issues with

GR found during implementation and deployment. They include the spec 
respin, notification handling, and implementations.  If there are issues

in the GR area that are not adequately addressed,  I suggest that we try

to address them in the GR respin if possible, instead of creating 
another variation unnecessarily.
Thanks.   -- Enke
On 10/26/11 10:24 AM, Robert Raszuk wrote:

	When one during design phase of a routing protocol or routing
	extension or modification to it already realizes that enabling
	feature may cause real network issue if not done carefully -
	should trigger the alarm to rethink the solution and explore 
	alternative approaches to the problem space.
	We as operators have already hard time to relate enabling a
	within our intradomain boundaries to make sure such rollout is
	wide. Here you are asking for the same level of awareness across
	boundaries. This is practically unrealistic IMHO.
	Back to the proposal ... I think that if anything needs to be
done is 
	to employ per prefix GR with longer and locally configurable
	That would address information persistence across direct IBGP
	On the RRs use case of this draft we may perhaps agree to
	but I do not see large enough probability of correctly
engineered RR 
	plane to experience simultaneous multiple ibgp session drops. If
	happens the RR placement, platforms or deployment model should
	Summary .. I do not think that IDR WG should adopt this
document. Just 
	adding a warning to the deployment section is not sufficient.
	Best regards,

		The introduction of this technology needs to be
carefully evaluated
		when being deployed into the network. Your example
clearly calls out
		how a series of independent design can culminate in
		behavior. Certainly the deployment of persistence on a
router that
		has interaction with a router that does not needs to be
		understood by the network designer. The goal of this
draft is to
		provide a fairly sophisticated tool that will protect
the majority of
		customers in the event of a catastrophic failure.. The
premise being
		the perfect is not the enemy of the good.. I will add
text in the
		deployment considerations section to better articulate
		Thanks, Jim Uttaro
		-----Original Message----- From:
		[] On Behalf Of Robert Raszuk
		Sunday, October 23, 2011 5:32 PM To: List
Subject: [Idr]
		Actually when discussing this draft a new concern
surfaced which I
		would like to get your answer on.
		The draft in section 4.2 says as one of the forwarding
		o  Forwarding to a "stale" route is only used if there
are no other
		paths available to that route.  In other words an active
path always
		wins regardless of path selection.  "Stale" state is
		considered to be less preferred when compared with an
active path.
		In the light of the above rule let's consider a very
simple case of
		dual PE attached site of L3VPN service. Two CEs would
inject into
		their IBGP mesh routes to the remote destination: one
marked as STALE
		and  one not marked at all. (Each CE is connected to
different PE and
		each PE RT imports only a single route to a remote hub
headquarter to
		support geographic load balancing).
		Let me illustrate:
		VPN Customer HUB
		PE3      PE4 SP PE1      PE2 |        | |        | CE1
CE2 |
		| 1|        |10 |        | R1 ------ R2 1
		CE1,CE2,R1,R2 are in IBGP mesh. IGP metric of CE1-R1 and
R1-R2 are 1
		and R2-CE2 is 10.
		Prefix X is advertised by remote hub in the given VPN
such that PE1
		vrf towards CE1 only has X via PE3 and PE2's vrf towards
CE2 only has
		X via PE4.
		Let's assume EBGP sessions PE3 to HUB went down, but
ethernet link
		is up, next hop is in the RIB while data plane is gone.
Assume no
		data plane real validation too. /* That is why in my
former message
		I suggested that data plane validation would be
necessary */.
		R1 has X via PE1/S (stale) and X via PE2/A (active) - it
		STALE so selects in his forwarding table path via CE2.
		R2 has X via PE1/S (stale) and X via PE2/A (active) - it
does not
		understand STALE, never was upgraded to support the
forwarding rule
		stated above in the draft and chooses X via CE1 (NH
metric 2 vs 10).
		R1--R2 produce data plane loop as long as STALE paths
are present in
		the system. Quite fun to troubleshoot too as the issue
of PE3
		injecting such STALE paths may be on the opposite site
of the world.
		The issue occurs when some routers within the customer
site will be
		able to recognize STALE transitive community and prefer
non stale
		paths in their forwarding planes (or BGP planes for that
		while others will not as well as when both stale and non
stale paths
		will be present.
		Question 1: How do you prevent forwarding loop in such
case ?
		Question 2: How do you prevent forwarding loop in the
case when
		customer would have backup connectivity to his sites or
		via different VPN provider yet routers in his site only
		understand the STALE community and only partially follow
		forwarding rules ?
		In general as the rule is about mandating some
particular order of
		path forwarding selection what is the mechanism in
		systems like today's routing to be able to achieve any
assurance that
		such rule is active and enforced across _all_ routers
behind EBGP
		PE-CE L3VPN boundaries in customer sites ?
		Best regards, R.
		-------- Original Message -------- Subject: [Idr]
		draft-uttaro-idr-bgp-persistence-00 Date: Sat, 22 Oct
2011 00:23:55
		+0200 From: Robert Raszuk<>
<>  Reply-To: To: List<>
		I have read the draft and have one question and one
		What is the point of defining DO_NOT_PERSIST community ?
In other
		words why not having PERSIST community set would not
mean the same as
		having path marked with DO_NOT_PERSIST.
		I found the below statement in section 4.2:
		o  Forwarding must ensure that the Next Hop to a "stale"
route is
		Of course I agree. But since we stating obvious in the
		section I think it would be good to explicitly also
state this in
		the best path selection that next hop to STALE best path
must be
		However sessions especially those between loopbacks do
not go down
		for no reason. Most likely there is network problem
which may have
		caused those sessions to go down. It is therefor likely
that LDP
		session went also down between any of the LSRs in the
data path and
		that in spite of having the paths in BGP and next hops
in IGP the LSP
		required for both quoted L2/L3VPN applications is
broken. That may
		particularly happen when network chooses to use
independent control
		mode for label allocation.
		I would suggest to at least add the recommendation
statement to the
		document that during best path selection especially for
stale paths
		a validity of required forwarding paradigm to next hop
of stale
		paths should be verified.
		For example using techniques as described in:
		Best regards, R.
		_______________________________________________ Idr
mailing list
		_______________________________________________ Idr
mailing list

	Idr mailing list