Re: [Idr] draft-uttaro-idr-bgp-persistence-00

Hi Jim,

You are correct that GR does not allow to propagate information by 
default on what paths should be called as "suspicious". However as 
mentioned at the mic during the session we need to have a way to modify 
best path selection criteria universally as opposed to continue to 
invent new tools to solve point problems and punch holes in the various 
implementations of BGP best path selection.

Chairs in fact responded that we will work on this when the right time 
comes.

Back to the persistence draft ... GR can address 90% of the persistence 
draft requirements. Propagating information about "suspicious" paths if 
at all we all agree if this is a good idea can be done today with either 
setting lowest local pref, using cost community or marking with 
community so your ebgp peers can interpret it correctly.

Why do we need new way to signal this ?

Also I think current -00 version has serious issues as already pointed 
on the list. We need to wait till -01 is published in order to see if 
those issues have been fixed.

Best regards,
R.

> All,
>
> First let me apologize for not being able to attend the ietf..
>
> After watching jabber it seems that the presentation of persistence
> degraded to something in re how we can apply multiple Band-Aids,
> patches, knobs etc... to extend GR to cover the Persistence draft..
> IMO GR is a subset of the Persistence functionality and may be
> therefore subsumed into the Persistence draft. Philosophically GR
> makes the session the invariant and all behavior that follows is
> based upon this premise.. As stated in the slides, many of the new
> services have much looser association between the control and
> forwarding planes i.e L2VN, L3VPN, 3107 etc...
>
> Again let me clarify, when I looked at GR earlier this year it became
> clear that the basic philosophy, and rules stating when to persist,
> how long, triggers to stop would not meet the requirements I have..
> From a philosophical point GR assumes that the paths that are
> persisting in forwarding should maintain their preference throughout
> the routing topology. Why? This seems beyond prescriptive as it does
> not provide the network operators with any flexibility as to how
> state learned  over a compromised control may be viewed/used by the
> downstream routing topology including customers. From my reading of
> the draft there is no way to allow operators to persist, not-persist,
> or de pref the state (stale). Do I have this wrong?  I do not think
> so.. So if we want to incorporate this ability into GR it would
> require a basic philosophical change.. of course even if this is done
> there is no way for a peer to pre-program  paths it has advertised to
> be treated if there is a failure.. So, even if we perform major
> surgery there is still no way to accomplish this beyond unique
> filters and maybe CVs.. WE could call them STALE, PERSIST and
> DO_NOT_PERSIST and then build routing policy across all of our BGP
> speakers.. Then we can spends lots of time making sure that this
> bottom approach is consistent across the entire topology, and
> coordinate with customers also..
>
> As I mentioned in a previous note ( see below ) GR does not meet
> requirements in terms of triggers that terminate GR persistence,
> timers etc...  personally I am not that interested in  a solution
> that fundamentally does not meet my requirements.. Most of the
> responses I have received is that GR can be extended. I do not think
> it can ever meet the requirements above and will require major
> surgery to fix the triggers, timers etc.. So the approach should be
> to fold the subset of GR functionality into the larger Persistence
> draft..
>
> Jim Uttaro
>
>
> Enke,
>
> GR is a solution that is essentially local in scope it does not have
> the ability to inform downstream speakers of the viability of routing
> state from the point of possible control plane failure. OTOH
> Persistence does propagate the condition of state. This provides
> distinct advantages in terms of customers awareness of the SPs
> control plane. One could imagine a customer receiving a STALE path
> and responding by selecting a backup. Some of the extensions to this
> draft that I have considered in colouring of STALE to inform if the
> condition arises from a local ( PE ) or internal iBGP ( RR )
> failures..
>
> GR makes no distinction from STALE state and ACTIVE state.. This can
> lead to the STALE path still being preferred throughout the topology.
> IMO this is incorrect behavior regardless of the comparison.
>
> PERSISTENCE allows for a customer to indicate which paths should be
> candidates. Customers may want to immediately failover to the backup
> for some paths and not for others. GR is not capable of doing this it
> is all or nothing. The granularity is not sufficient. It needs to be
> at the path level. There may even be a case for having even more
> granularity i.e a per path timer.. GR is not capable of being
> extended for either of these cases.
>
> GR does not provide protection through successive restarts of the
> session. I believe that if this occurs the state will be invalidated.
> So for a session that is bouncing due to overload condition GR will
> not provide the required protection
>
> GR does not employ a make before break strategy. All state is
> invalidated first then the newly learned state is processed. This
> leads to routing churn especially if the majority of the state is the
> same which I am pretty sure is the case
>
> GR invalidates state due to the case of protocol error i.e A
> malformed update will invalidate all of the state. This is not the
> desired behavior.
>
> GR is not specific as to which events invoke it or not. From my read
> on the draft it is not clear if holdtime expiration invokes GR or
> not.. The draft is unclear.
>
> It is not clear to me how RRs and PEs differ in using GR.
>
> The time that state can persist is limit to about 1 hour max.
>
> GR does detail the behavior where convergence is not achieved between
> restarts.. Similar to above..
>
> I do not believe that the current GR paradigm can be extended to
> cover the majority of the cases above.
>
> Thanks, Jim Uttaro
>
>
> From: UTTARO, JAMES Sent: Tuesday, November 01, 2011 11:20 AM To:
> 'Enke Chen' Cc: robert@raszuk.net; idr@ietf.org List Subject: RE:
> [Idr] draft-uttaro-idr-bgp-persistence-00
>
> Enke,
>
> Comments in-Line..
>
> Thanks, Jim Uttaro
>
> From: Enke Chen [mailto:enkechen@cisco.com] Sent: Friday, October 28,
> 2011 2:19 AM To: UTTARO, JAMES Cc: robert@raszuk.net; idr@ietf.org
> List; Enke Chen Subject: Re: [Idr]
> draft-uttaro-idr-bgp-persistence-00
>
> Jim,
>
> My comments are inlined.
>
> On 10/27/11 1:17 PM, UTTARO, JAMES wrote:
>
> Enke,
>
>
>
> GR is a solution that is essentially local in scope it does not have
> the ability to inform downstream speakers of the viability of routing
> state from the point of possible control plane failure. OTOH
> Persistence does propagate the condition of state. This provides
> distinct advantages in terms of customers awareness of the SPs
> control plane. One could imagine a customer receiving a STALE path
> and responding by selecting a backup. Some of the extensions to this
> draft that I have considered in colouring of STALE to inform if the
> condition arises from a local ( PE ) or internal iBGP ( RR )
> failures..
>
>
>
> GR makes no distinction from STALE state and ACTIVE state.. This can
> lead to the STALE path still being preferred throughout the topology.
> IMO this is incorrect behavior regardless of the comparison.
>
>
>
> PERSISTENCE allows for a customer to indicate which paths should be
> candidates. Customers may want to immediately failover to the backup
> for some paths and not for others. GR is not capable of doing this it
> is all or nothing. The granularity is not sufficient. It needs to be
> at the path level. There may even be a case for having even more
> granularity i.e a per path timer.. GR is not capable of being
> extended for either of these cases.
>
> I am not sure how this path level persistence would work
> operationally.   Without the detailed information of a provider's
> network, how would a customer know what kind of failures and recovery
> that they might experience?   Consider the example of the
> simultaneous RR failures in the draft,  why would dn't any customer
> not to want to protect against such failures?   The end result could
> be that the PERSISTENCE flag is always set, thus losing its
> significance. [Jim U>] One ex would be customers who create multiple
> VPNs over different SPs.. A customer may want to take advantage of
> the knowledge that a control plane failure has occurred and migrate
> the traffic to the backup. This could be done at a path granularity
> by use of the DO_NOT_PERSIST CV. . We as SPs want to provide our
> customers with the tools needed to manage their VPNs and not
> prescribe a one size fits all solution.
>
>
> Regarding the use of the STALE state vs ACTIVE state, clearly there
> is a tradeoff.   GR uses the stale routes in order to avoid
> forwarding churns, which has been a critical requirement for a long
> time.   If there is a real need for favoring a ACTIVE one over a
> STALE one in GR, it can be done by a simple knob. [Jim U>] The
> current draft has no ability to inform downstream speakers of whether
> or not a path is STALE or ACTIVE. The knob may be simple but a lot of
> machinery would have to be built. This is one of the big reasons for
> the PERSIST draft. I do not understand the routing churn part in the
> context of vpnv4, vpnv2, 3107 etc... maybe the GR solution was
> constructed as a solution that primarily speaks to eBGP IPV4
> connections for the IPV4 AF ( Internet ).. I could understand that..
>
>
> As you know, BGP is full of knobs that adjust behaviors for different
> needs :-) [Jim U>] More Knobs..
>
>
>
>
>
>
>
> GR does not provide protection through successive restarts of the
> session. I believe that if this occurs the state will be invalidated.
> So for a session that is bouncing due to overload condition GR will
> not provide the required protection
>
> This can be addressed by a simple knob to set the min stale timer for
> GR. [Jim U>] And yet more knobs
>
>
>
>
>
>
>
> GR does not employ a make before break strategy. All state is
> invalidated first then the newly learned state is processed. This
> leads to routing churn especially if the majority of the state is the
> same which I am pretty sure is the case
>
> Such behavior would be an implementation bug that needs to be fixed.
> But it is not an issue with the protocol itself.
>
> This is what we have in 4.2. Procedures for the Receiving Speaker,
> RFC 4724:
>
> ---
>
> The Receiving Speaker MUST replace the stale routes by the routing
>
> updates received from the peer.  Once the End-of-RIB marker for an
>
> address family is received from the peer, it MUST immediately remove
>
> any routes from the peer that are still marked as stale for that
>
> address family. [Jim U>] This does not address the lack of clarity
> about make before break.. it only states that must immediately remove
> routes marked as stale. It should state that any paths that are
> learned which are the same as the STALE paths should not force the
> forwarding plane to be re-programmed for those paths.. This should be
> made clear and in general is good practice to avoid churn..
>
>
> There are several possibilities for the premature purge of the stale
> routes. For example, the "Forwarding state" flag was somehow not set
> after the session was re-established, or the the EOR was sent
> prematurely.   Further investigation will be needed in order to
> identify any possible implementation or config issues involved in
> your setup. [Jim U>] More moving parts to worry about..
>
>
>
>
>
>
>
> GR invalidates state due to the case of protocol error i.e A
> malformed update will invalidate all of the state. This is not the
> desired behavior.
>
> It has been addressed by the following extension:
>
> http://datatracker.ietf.org/doc/draft-keyupate-idr-bgp-gr-notification/
>
>  [Jim U>] A few comments here.. I do not understand, the draft does
> not clarify that the only thing that will force a tear down is the
> cease subcode and a hard reset error code.. is the intention that
> this is the only thing that will tear it down? I guess I would like
> to see which things will and will not force a session termination in
> the original draft.. Like
>
>
> -          Holdtime Expiration
>
> -          Malformed Update
>
> -          Consecutive Restarts.. So what does this exactly mean
> "As part of this extension, possible consecutive restarts SHOULD NOT
> delete a route (from the peer) previously marked as stale, until
> required by rules mentioned in [RFC4724]." Possible consecutive
> restarts means what? I really need clarity on this whole notion of
> when is a session truly invalidated.
>
> Why is the purpose of the following text?
>
> Once the session is re-established, both BGP speakers MUST set their
> "Forwarding State" bit to 1 if they want to apply planned graceful
> restart.  The handling of the "Forwarding State" bit should be done
> as specified by the procedures of the Receiving speaker in [RFC4724]
> are applied.
>
>
>
>
>
>
>
> GR is not specific as to which events invoke it or not. From my read
> on the draft it is not clear if holdtime expiration invokes GR or
> not.. The draft is unclear.
>
> I think that it is covered by the above extension.  If not, it should
> be clarified. [Jim U>] I did not see it..
>
>
>
>
>
> It is not clear to me how RRs and PEs differ in using GR.
>
> I think that there is a main difference when a RR is not in the
> forwarding path.  In that case, the RR should always set the F bit in
> the GR Capability so that its clients will continue forwarding after
> they lose the sessions with RR.  It is a deployment issue, though.
> [Jim U>] Yes.. Again from an operations perspective I have to deploy
> technology differently in different parts of the network across
> multiple vendors. This is generally not a desired starting point for
> the successful deployment of new technology.. I want solutions that
> are generic and simple to deploy.
>
>
>
>
>
>
>
> The time that state can persist is limit to about 1 hour max.
>
> I think that you are talking about the "Restart time" field which has
> 12 bits and amount to about 68 minutes.  The "Restart time" is for
> the session re-establishment.  It does not impact the duration for
> holding stale routes after the session is re-established. [Jim U>]
> But if the session does not become re-established then the state is
> invalidated as the session terminates with an error code that GR will
> not persist through..
>
>
> If the session does not get re-established in 68 minutes, the stale
> routes would be purged.  That is a long time, isn't it?   However, if
> one really wants to extend the session re-establishment time and
> continue to hold stale routes, it can be done by a simple knob. [Jim
> U>] And yet even more knobs
>
>
>
>
>
>
>
> GR does detail the behavior where convergence is not achieved between
> restarts.. Similar to above..
>
> The min stale timer knob can cover it (see above).
>
> But do you meant "does not"?  We can certainly clarify in 4724bis if
> that is the case. [Jim U>] If convergence is not achieved what is the
> behavior. I could not determine from the draft..
>
>
>
>
>
>
>
> I do not believe that the current GR paradigm can be extended to
> cover the majority of the cases above.
>
> Except for the path level persistence you mentioned, I believe the GR
> will be able to address all other persistence requirements you
> listed, with some simple knobs and some implementation enhancements.
> [Jim U>] IMO GR was originally designed to prevent churn due to
> intermittent failure on an eBGP session for the IpV4 AF.. I do not
> want to have different knobs and implementation enhancements to solve
> the basics of persistence.. Regardless of that it does not inform the
> topology of the state of a path in re the control plane it was
> learned over so there can be no independent decisions about the value
> of a given path by different customers/providers.. This is required
> for my applications..
>
>
>
>
>
>
>
> Thanks,
>
> Jim Uttaro
>
> Thanks.   -- Enke
>
>
>
>
>
>
> -----Original Message-----
>
> From: Enke Chen [mailto:enkechen@cisco.com]
>
> Sent: Wednesday, October 26, 2011 8:43 PM
>
> To: UTTARO, JAMES
>
> Cc: robert@raszuk.net<mailto:robert@raszuk.net>;
> idr@ietf.org<mailto:idr@ietf.org>  List; Enke Chen
>
> Subject: Re: [Idr] draft-uttaro-idr-bgp-persistence-00
>
>
>
> Hi, folks:
>
>
>
> I have a hard time in understanding what new problems (beyond the
> GR)
>
> the draft try to solve :-(
>
>
>
> If the concern is about the simultaneous RR failure as shown in the
>
> examples in Sect. 6 Application, that can be addressed easily using
> GR.
>
> As the RRs are not in the forwarding path, it means that the
> forwarding
>
> is not impacted (thus is preserved) during the restart of a RR.
> The
>
> Forwarding State bit (F) in the GR capability should always be set
> by
>
> the RR when it is not in the forwarding path.
>
>
>
> Also in the case of simultaneous RR failure, I do not see why one
> would
>
> want to retain some routes, but not others, using the communities
>
> specified in the draft.  As the RRs are not in the forwarding path,
>
> wouldn't be better to retain all the routes on a PE/client?
>
>
>
> As you might be aware, efforts have been underway to address issues
> with
>
> GR found during implementation and deployment. They include the spec
>
> respin, notification handling, and implementations.  If there are
> issues
>
> in the GR area that are not adequately addressed,  I suggest that we
> try
>
> to address them in the GR respin if possible, instead of creating
>
> another variation unnecessarily.
>
>
>
> Thanks.   -- Enke
>
>
>
>
>
> On 10/26/11 10:24 AM, Robert Raszuk wrote:
>
> Jim,
>
>
>
> When one during design phase of a routing protocol or routing
> protocol
>
> extension or modification to it already realizes that enabling such
>
> feature may cause real network issue if not done carefully - that
>
> should trigger the alarm to rethink the solution and explore
>
> alternative approaches to the problem space.
>
>
>
> We as operators have already hard time to relate enabling a feature
>
> within our intradomain boundaries to make sure such rollout is
> network
>
> wide. Here you are asking for the same level of awareness across
> ebgp
>
> boundaries. This is practically unrealistic IMHO.
>
>
>
> Back to the proposal ... I think that if anything needs to be done
> is
>
> to employ per prefix GR with longer and locally configurable timer.
>
> That would address information persistence across direct IBGP
> sessions.
>
>
>
> On the RRs use case of this draft we may perhaps agree to disagree,
>
> but I do not see large enough probability of correctly engineered RR
>
> plane to experience simultaneous multiple ibgp session drops. If
> that
>
> happens the RR placement, platforms or deployment model should be
>
> re-engineered.
>
>
>
> Summary .. I do not think that IDR WG should adopt this document.
> Just
>
> adding a warning to the deployment section is not sufficient.
>
>
>
> Best regards,
>
> R.
>
>
>
>
>
> Robert,
>
>
>
> The introduction of this technology needs to be carefully evaluated
>
> when being deployed into the network. Your example clearly calls out
>
> how a series of independent design can culminate in incorrect
>
> behavior. Certainly the deployment of persistence on a router that
>
> has interaction with a router that does not needs to be clearly
>
> understood by the network designer. The goal of this draft is to
>
> provide a fairly sophisticated tool that will protect the majority
> of
>
> customers in the event of a catastrophic failure.. The premise being
>
> the perfect is not the enemy of the good.. I will add text in the
>
> deployment considerations section to better articulate that..
>
>
>
> Thanks, Jim Uttaro
>
>
>
> -----Original Message----- From:
> idr-bounces@ietf.org<mailto:idr-bounces@ietf.org>
>
> [mailto:idr-bounces@ietf.org] On Behalf Of Robert Raszuk Sent:
>
> Sunday, October 23, 2011 5:32 PM To:
> idr@ietf.org<mailto:idr@ietf.org>  List Subject: [Idr]
>
> draft-uttaro-idr-bgp-persistence-00
>
>
>
> Authors,
>
>
>
> Actually when discussing this draft a new concern surfaced which I
>
> would like to get your answer on.
>
>
>
> The draft in section 4.2 says as one of the forwarding rules:
>
>
>
> o  Forwarding to a "stale" route is only used if there are no other
>
> paths available to that route.  In other words an active path always
>
> wins regardless of path selection.  "Stale" state is always
>
> considered to be less preferred when compared with an active path.
>
>
>
> In the light of the above rule let's consider a very simple case of
>
> dual PE attached site of L3VPN service. Two CEs would inject into
>
> their IBGP mesh routes to the remote destination: one marked as
> STALE
>
> and  one not marked at all. (Each CE is connected to different PE
> and
>
> each PE RT imports only a single route to a remote hub headquarter
> to
>
> support geographic load balancing).
>
>
>
> Let me illustrate:
>
>
>
> VPN Customer HUB
>
>
>
> PE3      PE4 SP PE1      PE2 |        | |        | CE1      CE2 |
>
> | 1|        |10 |        | R1 ------ R2 1
>
>
>
> CE1,CE2,R1,R2 are in IBGP mesh. IGP metric of CE1-R1 and R1-R2 are 1
>
> and R2-CE2 is 10.
>
>
>
> Prefix X is advertised by remote hub in the given VPN such that PE1
>
> vrf towards CE1 only has X via PE3 and PE2's vrf towards CE2 only
> has
>
> X via PE4.
>
>
>
> Let's assume EBGP sessions PE3 to HUB went down, but ethernet link
>
> is up, next hop is in the RIB while data plane is gone. Assume no
>
> data plane real validation too. /* That is why in my former message
>
> I suggested that data plane validation would be necessary */.
>
>
>
> R1 has X via PE1/S (stale) and X via PE2/A (active) - it understands
>
> STALE so selects in his forwarding table path via CE2.
>
>
>
> R2 has X via PE1/S (stale) and X via PE2/A (active) - it does not
>
> understand STALE, never was upgraded to support the forwarding rule
>
> stated above in the draft and chooses X via CE1 (NH metric 2 vs 10).
>
>
>
> R1--R2 produce data plane loop as long as STALE paths are present in
>
> the system. Quite fun to troubleshoot too as the issue of PE3
>
> injecting such STALE paths may be on the opposite site of the world.
>
>
>
> The issue occurs when some routers within the customer site will be
>
> able to recognize STALE transitive community and prefer non stale
>
> paths in their forwarding planes (or BGP planes for that matter)
>
> while others will not as well as when both stale and non stale paths
>
> will be present.
>
>
>
> Question 1: How do you prevent forwarding loop in such case ?
>
>
>
> Question 2: How do you prevent forwarding loop in the case when
>
> customer would have backup connectivity to his sites or connectivity
>
> via different VPN provider yet routers in his site only partially
>
> understand the STALE community and only partially follow the
>
> forwarding rules ?
>
>
>
> In general as the rule is about mandating some particular order of
>
> path forwarding selection what is the mechanism in distributed
>
> systems like today's routing to be able to achieve any assurance
> that
>
> such rule is active and enforced across _all_ routers behind EBGP
>
> PE-CE L3VPN boundaries in customer sites ?
>
>
>
> Best regards, R.
>
>
>
>
>
> -------- Original Message -------- Subject: [Idr]
>
> draft-uttaro-idr-bgp-persistence-00 Date: Sat, 22 Oct 2011 00:23:55
>
> +0200 From: Robert
> Raszuk<robert@raszuk.net><mailto:robert@raszuk.net>  Reply-To:
>
> robert@raszuk.net<mailto:robert@raszuk.net>  To:
> idr@ietf.org<mailto:idr@ietf.org>
> List<idr@ietf.org><mailto:idr@ietf.org>
>
>
>
> Hi,
>
>
>
> I have read the draft and have one question and one observation.
>
>
>
> Question:
>
>
>
> What is the point of defining DO_NOT_PERSIST community ? In other
>
> words why not having PERSIST community set would not mean the same
> as
>
> having path marked with DO_NOT_PERSIST.
>
>
>
> Observation:
>
>
>
> I found the below statement in section 4.2:
>
>
>
> o  Forwarding must ensure that the Next Hop to a "stale" route is
>
> viable.
>
>
>
> Of course I agree. But since we stating obvious in the forwarding
>
> section I think it would be good to explicitly also state this in
>
> the best path selection that next hop to STALE best path must be
>
> valid.
>
>
>
> However sessions especially those between loopbacks do not go down
>
> for no reason. Most likely there is network problem which may have
>
> caused those sessions to go down. It is therefor likely that LDP
>
> session went also down between any of the LSRs in the data path and
>
> that in spite of having the paths in BGP and next hops in IGP the
> LSP
>
> required for both quoted L2/L3VPN applications is broken. That may
>
> particularly happen when network chooses to use independent control
>
> mode for label allocation.
>
>
>
> I would suggest to at least add the recommendation statement to the
>
> document that during best path selection especially for stale paths
>
> a validity of required forwarding paradigm to next hop of stale
>
> paths should be verified.
>
>
>
> For example using techniques as described in:
>
> draft-ietf-idr-bgp-bestpath-selection-criteria
>
>
>
> Best regards, R.
>
>
>
>
>
> _______________________________________________ Idr mailing list
>
> Idr@ietf.org<mailto:Idr@ietf.org>
> https://www.ietf.org/mailman/listinfo/idr
>
>
>
>
>
> _______________________________________________ Idr mailing list
>
> Idr@ietf.org<mailto:Idr@ietf.org>
> https://www.ietf.org/mailman/listinfo/idr
>
>
>
>
>
>
>
> _______________________________________________
>
> Idr mailing list
>
> Idr@ietf.org<mailto:Idr@ietf.org>
>
> https://www.ietf.org/mailman/listinfo/idr
>
>
>
>