Re: [Idr] draft-uttaro-idr-bgp-persistence-00

<bruno.decraene@orange.com> Tue, 15 November 2011 08:01 UTC

Return-Path: <bruno.decraene@orange.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B0FE01F0C47 for <idr@ietfa.amsl.com>; Tue, 15 Nov 2011 00:01:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.649
X-Spam-Level:
X-Spam-Status: No, score=-1.649 tagged_above=-999 required=5 tests=[AWL=-0.600, BAYES_00=-2.599, HELO_EQ_FR=0.35, J_CHICKENPOX_13=0.6, J_CHICKENPOX_21=0.6]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 10pxT64X-KLQ for <idr@ietfa.amsl.com>; Tue, 15 Nov 2011 00:01:31 -0800 (PST)
Received: from r-mail1.rd.francetelecom.com (r-mail1.rd.francetelecom.com [217.108.152.41]) by ietfa.amsl.com (Postfix) with ESMTP id EE6951F0C3E for <idr@ietf.org>; Tue, 15 Nov 2011 00:01:30 -0800 (PST)
Received: from r-mail1.rd.francetelecom.com (localhost.localdomain [127.0.0.1]) by localhost (Postfix) with SMTP id 791348B8002; Tue, 15 Nov 2011 09:02:52 +0100 (CET)
Received: from ftrdsmtp2.rd.francetelecom.fr (unknown [10.192.128.47]) by r-mail1.rd.francetelecom.com (Postfix) with ESMTP id 66D366C0001; Tue, 15 Nov 2011 09:02:52 +0100 (CET)
Received: from ftrdmel0.rd.francetelecom.fr ([10.192.128.56]) by ftrdsmtp2.rd.francetelecom.fr with Microsoft SMTPSVC(6.0.3790.4675); Tue, 15 Nov 2011 09:01:29 +0100
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable
Date: Tue, 15 Nov 2011 09:01:24 +0100
Message-ID: <FE8F6A65A433A744964C65B6EDFDC240029DD7D4@ftrdmel0.rd.francetelecom.fr>
In-Reply-To: <4EC21062.5020504@raszuk.net>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [Idr] draft-uttaro-idr-bgp-persistence-00
Thread-Index: AcyjZdwZKlRsQyACRRy3rfjF10k/DwAAl05w
References: <4EA1F0FB.3090100@raszuk.net> <4EA487E4.2040201@raszuk.net><B17A6910EEDD1F45980687268941550FA20750@MISOUT7MSGUSR9I.ITServices.sbc.com><4EA84254.9000400@raszuk.net> <4EA8A91C.4090305@cisco.com><B17A6910EEDD1F45980687268941550FA20BB8@MISOUT7MSGUSR9I.ITServices.sbc.com><4EAA496C.9070605@cisco.com><B17A6910EEDD1F45980687268941550FA21F96@MISOUT7MSGUSR9I.ITServices.sbc.com><B17A6910EEDD1F45980687268941550FA324FA@MISOUT7MSGUSR9I.ITServices.sbc.com> <4EC21062.5020504@raszuk.net>
From: <bruno.decraene@orange.com>
To: <robert@raszuk.net>, <ju1738@att.com>
X-OriginalArrivalTime: 15 Nov 2011 08:01:29.0109 (UTC) FILETIME=[C7D17450:01CCA36C]
Cc: idr@ietf.org
Subject: Re: [Idr] draft-uttaro-idr-bgp-persistence-00
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Nov 2011 08:01:33 -0000

Hi Robert,

>
>Hi Jim,
>
>You are correct that GR does not allow to propagate information by
>default on what paths should be called as "suspicious".

Yes, that's part of its original DNA: to locally conceal a BGP session
failure which is expected to restart soon (or has already restarted)

> However as
>mentioned at the mic during the session we need to have a way to modify
>best path selection criteria universally as opposed to continue to
>invent new tools to solve point problems and punch holes in the various
>implementations of BGP best path selection.

Ah! If your concern is about the vehicle used to convey the information
(to de-preference the route) I think we are pretty open to this. As a
matter of fact, as per a previous email sent to you and IDR, I said that
the use of local_pref could be envisaged within the AS. No new tools /
holes punched. And on eBGP, I proposed to use the STALE community: no
new attribute. If you're concerned about the community value being new,
I guess we could re-use the g-shut (graceful restart) community.

http://www.ietf.org/mail-archive/web/idr/current/msg05646.html


>Chairs in fact responded that we will work on this when the right time
>comes.
>
>Back to the persistence draft ... GR can address 90% of the persistence
>draft requirements. Propagating information about "suspicious" paths if
>at all we all agree if this is a good idea can be done today with
either
>setting lowest local pref, using cost community or marking with
>community so your ebgp peers can interpret it correctly.
>
>Why do we need new way to signal this ?

Cf my above point: no new way required. Point closed (again).


>Also I think current -00 version has serious issues as already pointed
>on the list. We need to wait till -01 is published in order to see if
>those issues have been fixed.

I somewhat disagree with the level of seriousness. As you did not
express comments, I take that the list of point to be addressed in -01
as listed in the slides this morning are fine for you.

Bruno

>Best regards,
>R.
>
>
>> All,
>>
>> First let me apologize for not being able to attend the ietf..
>>
>> After watching jabber it seems that the presentation of persistence
>> degraded to something in re how we can apply multiple Band-Aids,
>> patches, knobs etc... to extend GR to cover the Persistence draft..
>> IMO GR is a subset of the Persistence functionality and may be
>> therefore subsumed into the Persistence draft. Philosophically GR
>> makes the session the invariant and all behavior that follows is
>> based upon this premise.. As stated in the slides, many of the new
>> services have much looser association between the control and
>> forwarding planes i.e L2VN, L3VPN, 3107 etc...
>>
>> Again let me clarify, when I looked at GR earlier this year it became
>> clear that the basic philosophy, and rules stating when to persist,
>> how long, triggers to stop would not meet the requirements I have..
>> From a philosophical point GR assumes that the paths that are
>> persisting in forwarding should maintain their preference throughout
>> the routing topology. Why? This seems beyond prescriptive as it does
>> not provide the network operators with any flexibility as to how
>> state learned  over a compromised control may be viewed/used by the
>> downstream routing topology including customers. From my reading of
>> the draft there is no way to allow operators to persist, not-persist,
>> or de pref the state (stale). Do I have this wrong?  I do not think
>> so.. So if we want to incorporate this ability into GR it would
>> require a basic philosophical change.. of course even if this is done
>> there is no way for a peer to pre-program  paths it has advertised to
>> be treated if there is a failure.. So, even if we perform major
>> surgery there is still no way to accomplish this beyond unique
>> filters and maybe CVs.. WE could call them STALE, PERSIST and
>> DO_NOT_PERSIST and then build routing policy across all of our BGP
>> speakers.. Then we can spends lots of time making sure that this
>> bottom approach is consistent across the entire topology, and
>> coordinate with customers also..
>>
>> As I mentioned in a previous note ( see below ) GR does not meet
>> requirements in terms of triggers that terminate GR persistence,
>> timers etc...  personally I am not that interested in  a solution
>> that fundamentally does not meet my requirements.. Most of the
>> responses I have received is that GR can be extended. I do not think
>> it can ever meet the requirements above and will require major
>> surgery to fix the triggers, timers etc.. So the approach should be
>> to fold the subset of GR functionality into the larger Persistence
>> draft..
>>
>> Jim Uttaro
>>
>>
>> Enke,
>>
>> GR is a solution that is essentially local in scope it does not have
>> the ability to inform downstream speakers of the viability of routing
>> state from the point of possible control plane failure. OTOH
>> Persistence does propagate the condition of state. This provides
>> distinct advantages in terms of customers awareness of the SPs
>> control plane. One could imagine a customer receiving a STALE path
>> and responding by selecting a backup. Some of the extensions to this
>> draft that I have considered in colouring of STALE to inform if the
>> condition arises from a local ( PE ) or internal iBGP ( RR )
>> failures..
>>
>> GR makes no distinction from STALE state and ACTIVE state.. This can
>> lead to the STALE path still being preferred throughout the topology.
>> IMO this is incorrect behavior regardless of the comparison.
>>
>> PERSISTENCE allows for a customer to indicate which paths should be
>> candidates. Customers may want to immediately failover to the backup
>> for some paths and not for others. GR is not capable of doing this it
>> is all or nothing. The granularity is not sufficient. It needs to be
>> at the path level. There may even be a case for having even more
>> granularity i.e a per path timer.. GR is not capable of being
>> extended for either of these cases.
>>
>> GR does not provide protection through successive restarts of the
>> session. I believe that if this occurs the state will be invalidated.
>> So for a session that is bouncing due to overload condition GR will
>> not provide the required protection
>>
>> GR does not employ a make before break strategy. All state is
>> invalidated first then the newly learned state is processed. This
>> leads to routing churn especially if the majority of the state is the
>> same which I am pretty sure is the case
>>
>> GR invalidates state due to the case of protocol error i.e A
>> malformed update will invalidate all of the state. This is not the
>> desired behavior.
>>
>> GR is not specific as to which events invoke it or not. From my read
>> on the draft it is not clear if holdtime expiration invokes GR or
>> not.. The draft is unclear.
>>
>> It is not clear to me how RRs and PEs differ in using GR.
>>
>> The time that state can persist is limit to about 1 hour max.
>>
>> GR does detail the behavior where convergence is not achieved between
>> restarts.. Similar to above..
>>
>> I do not believe that the current GR paradigm can be extended to
>> cover the majority of the cases above.
>>
>> Thanks, Jim Uttaro
>>
>>
>> From: UTTARO, JAMES Sent: Tuesday, November 01, 2011 11:20 AM To:
>> 'Enke Chen' Cc: robert@raszuk.net; idr@ietf.org List Subject: RE:
>> [Idr] draft-uttaro-idr-bgp-persistence-00
>>
>> Enke,
>>
>> Comments in-Line..
>>
>> Thanks, Jim Uttaro
>>
>> From: Enke Chen [mailto:enkechen@cisco.com] Sent: Friday, October 28,
>> 2011 2:19 AM To: UTTARO, JAMES Cc: robert@raszuk.net; idr@ietf.org
>> List; Enke Chen Subject: Re: [Idr]
>> draft-uttaro-idr-bgp-persistence-00
>>
>> Jim,
>>
>> My comments are inlined.
>>
>> On 10/27/11 1:17 PM, UTTARO, JAMES wrote:
>>
>> Enke,
>>
>>
>>
>> GR is a solution that is essentially local in scope it does not have
>> the ability to inform downstream speakers of the viability of routing
>> state from the point of possible control plane failure. OTOH
>> Persistence does propagate the condition of state. This provides
>> distinct advantages in terms of customers awareness of the SPs
>> control plane. One could imagine a customer receiving a STALE path
>> and responding by selecting a backup. Some of the extensions to this
>> draft that I have considered in colouring of STALE to inform if the
>> condition arises from a local ( PE ) or internal iBGP ( RR )
>> failures..
>>
>>
>>
>> GR makes no distinction from STALE state and ACTIVE state.. This can
>> lead to the STALE path still being preferred throughout the topology.
>> IMO this is incorrect behavior regardless of the comparison.
>>
>>
>>
>> PERSISTENCE allows for a customer to indicate which paths should be
>> candidates. Customers may want to immediately failover to the backup
>> for some paths and not for others. GR is not capable of doing this it
>> is all or nothing. The granularity is not sufficient. It needs to be
>> at the path level. There may even be a case for having even more
>> granularity i.e a per path timer.. GR is not capable of being
>> extended for either of these cases.
>>
>> I am not sure how this path level persistence would work
>> operationally.   Without the detailed information of a provider's
>> network, how would a customer know what kind of failures and recovery
>> that they might experience?   Consider the example of the
>> simultaneous RR failures in the draft,  why would dn't any customer
>> not to want to protect against such failures?   The end result could
>> be that the PERSISTENCE flag is always set, thus losing its
>> significance. [Jim U>] One ex would be customers who create multiple
>> VPNs over different SPs.. A customer may want to take advantage of
>> the knowledge that a control plane failure has occurred and migrate
>> the traffic to the backup. This could be done at a path granularity
>> by use of the DO_NOT_PERSIST CV. . We as SPs want to provide our
>> customers with the tools needed to manage their VPNs and not
>> prescribe a one size fits all solution.
>>
>>
>> Regarding the use of the STALE state vs ACTIVE state, clearly there
>> is a tradeoff.   GR uses the stale routes in order to avoid
>> forwarding churns, which has been a critical requirement for a long
>> time.   If there is a real need for favoring a ACTIVE one over a
>> STALE one in GR, it can be done by a simple knob. [Jim U>] The
>> current draft has no ability to inform downstream speakers of whether
>> or not a path is STALE or ACTIVE. The knob may be simple but a lot of
>> machinery would have to be built. This is one of the big reasons for
>> the PERSIST draft. I do not understand the routing churn part in the
>> context of vpnv4, vpnv2, 3107 etc... maybe the GR solution was
>> constructed as a solution that primarily speaks to eBGP IPV4
>> connections for the IPV4 AF ( Internet ).. I could understand that..
>>
>>
>> As you know, BGP is full of knobs that adjust behaviors for different
>> needs :-) [Jim U>] More Knobs..
>>
>>
>>
>>
>>
>>
>>
>> GR does not provide protection through successive restarts of the
>> session. I believe that if this occurs the state will be invalidated.
>> So for a session that is bouncing due to overload condition GR will
>> not provide the required protection
>>
>> This can be addressed by a simple knob to set the min stale timer for
>> GR. [Jim U>] And yet more knobs
>>
>>
>>
>>
>>
>>
>>
>> GR does not employ a make before break strategy. All state is
>> invalidated first then the newly learned state is processed. This
>> leads to routing churn especially if the majority of the state is the
>> same which I am pretty sure is the case
>>
>> Such behavior would be an implementation bug that needs to be fixed.
>> But it is not an issue with the protocol itself.
>>
>> This is what we have in 4.2. Procedures for the Receiving Speaker,
>> RFC 4724:
>>
>> ---
>>
>> The Receiving Speaker MUST replace the stale routes by the routing
>>
>> updates received from the peer.  Once the End-of-RIB marker for an
>>
>> address family is received from the peer, it MUST immediately remove
>>
>> any routes from the peer that are still marked as stale for that
>>
>> address family. [Jim U>] This does not address the lack of clarity
>> about make before break.. it only states that must immediately remove
>> routes marked as stale. It should state that any paths that are
>> learned which are the same as the STALE paths should not force the
>> forwarding plane to be re-programmed for those paths.. This should be
>> made clear and in general is good practice to avoid churn..
>>
>>
>> There are several possibilities for the premature purge of the stale
>> routes. For example, the "Forwarding state" flag was somehow not set
>> after the session was re-established, or the the EOR was sent
>> prematurely.   Further investigation will be needed in order to
>> identify any possible implementation or config issues involved in
>> your setup. [Jim U>] More moving parts to worry about..
>>
>>
>>
>>
>>
>>
>>
>> GR invalidates state due to the case of protocol error i.e A
>> malformed update will invalidate all of the state. This is not the
>> desired behavior.
>>
>> It has been addressed by the following extension:
>>
>>
http://datatracker.ietf.org/doc/draft-keyupate-idr-bgp-gr-notification/
>>
>>  [Jim U>] A few comments here.. I do not understand, the draft does
>> not clarify that the only thing that will force a tear down is the
>> cease subcode and a hard reset error code.. is the intention that
>> this is the only thing that will tear it down? I guess I would like
>> to see which things will and will not force a session termination in
>> the original draft.. Like
>>
>>
>> -          Holdtime Expiration
>>
>> -          Malformed Update
>>
>> -          Consecutive Restarts.. So what does this exactly mean
>> "As part of this extension, possible consecutive restarts SHOULD NOT
>> delete a route (from the peer) previously marked as stale, until
>> required by rules mentioned in [RFC4724]." Possible consecutive
>> restarts means what? I really need clarity on this whole notion of
>> when is a session truly invalidated.
>>
>> Why is the purpose of the following text?
>>
>> Once the session is re-established, both BGP speakers MUST set their
>> "Forwarding State" bit to 1 if they want to apply planned graceful
>> restart.  The handling of the "Forwarding State" bit should be done
>> as specified by the procedures of the Receiving speaker in [RFC4724]
>> are applied.
>>
>>
>>
>>
>>
>>
>>
>> GR is not specific as to which events invoke it or not. From my read
>> on the draft it is not clear if holdtime expiration invokes GR or
>> not.. The draft is unclear.
>>
>> I think that it is covered by the above extension.  If not, it should
>> be clarified. [Jim U>] I did not see it..
>>
>>
>>
>>
>>
>> It is not clear to me how RRs and PEs differ in using GR.
>>
>> I think that there is a main difference when a RR is not in the
>> forwarding path.  In that case, the RR should always set the F bit in
>> the GR Capability so that its clients will continue forwarding after
>> they lose the sessions with RR.  It is a deployment issue, though.
>> [Jim U>] Yes.. Again from an operations perspective I have to deploy
>> technology differently in different parts of the network across
>> multiple vendors. This is generally not a desired starting point for
>> the successful deployment of new technology.. I want solutions that
>> are generic and simple to deploy.
>>
>>
>>
>>
>>
>>
>>
>> The time that state can persist is limit to about 1 hour max.
>>
>> I think that you are talking about the "Restart time" field which has
>> 12 bits and amount to about 68 minutes.  The "Restart time" is for
>> the session re-establishment.  It does not impact the duration for
>> holding stale routes after the session is re-established. [Jim U>]
>> But if the session does not become re-established then the state is
>> invalidated as the session terminates with an error code that GR will
>> not persist through..
>>
>>
>> If the session does not get re-established in 68 minutes, the stale
>> routes would be purged.  That is a long time, isn't it?   However, if
>> one really wants to extend the session re-establishment time and
>> continue to hold stale routes, it can be done by a simple knob. [Jim
>> U>] And yet even more knobs
>>
>>
>>
>>
>>
>>
>>
>> GR does detail the behavior where convergence is not achieved between
>> restarts.. Similar to above..
>>
>> The min stale timer knob can cover it (see above).
>>
>> But do you meant "does not"?  We can certainly clarify in 4724bis if
>> that is the case. [Jim U>] If convergence is not achieved what is the
>> behavior. I could not determine from the draft..
>>
>>
>>
>>
>>
>>
>>
>> I do not believe that the current GR paradigm can be extended to
>> cover the majority of the cases above.
>>
>> Except for the path level persistence you mentioned, I believe the GR
>> will be able to address all other persistence requirements you
>> listed, with some simple knobs and some implementation enhancements.
>> [Jim U>] IMO GR was originally designed to prevent churn due to
>> intermittent failure on an eBGP session for the IpV4 AF.. I do not
>> want to have different knobs and implementation enhancements to solve
>> the basics of persistence.. Regardless of that it does not inform the
>> topology of the state of a path in re the control plane it was
>> learned over so there can be no independent decisions about the value
>> of a given path by different customers/providers.. This is required
>> for my applications..
>>
>>
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Jim Uttaro
>>
>> Thanks.   -- Enke
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>>
>> From: Enke Chen [mailto:enkechen@cisco.com]
>>
>> Sent: Wednesday, October 26, 2011 8:43 PM
>>
>> To: UTTARO, JAMES
>>
>> Cc: robert@raszuk.net<mailto:robert@raszuk.net>;
>> idr@ietf.org<mailto:idr@ietf.org>  List; Enke Chen
>>
>> Subject: Re: [Idr] draft-uttaro-idr-bgp-persistence-00
>>
>>
>>
>> Hi, folks:
>>
>>
>>
>> I have a hard time in understanding what new problems (beyond the
>> GR)
>>
>> the draft try to solve :-(
>>
>>
>>
>> If the concern is about the simultaneous RR failure as shown in the
>>
>> examples in Sect. 6 Application, that can be addressed easily using
>> GR.
>>
>> As the RRs are not in the forwarding path, it means that the
>> forwarding
>>
>> is not impacted (thus is preserved) during the restart of a RR.
>> The
>>
>> Forwarding State bit (F) in the GR capability should always be set
>> by
>>
>> the RR when it is not in the forwarding path.
>>
>>
>>
>> Also in the case of simultaneous RR failure, I do not see why one
>> would
>>
>> want to retain some routes, but not others, using the communities
>>
>> specified in the draft.  As the RRs are not in the forwarding path,
>>
>> wouldn't be better to retain all the routes on a PE/client?
>>
>>
>>
>> As you might be aware, efforts have been underway to address issues
>> with
>>
>> GR found during implementation and deployment. They include the spec
>>
>> respin, notification handling, and implementations.  If there are
>> issues
>>
>> in the GR area that are not adequately addressed,  I suggest that we
>> try
>>
>> to address them in the GR respin if possible, instead of creating
>>
>> another variation unnecessarily.
>>
>>
>>
>> Thanks.   -- Enke
>>
>>
>>
>>
>>
>> On 10/26/11 10:24 AM, Robert Raszuk wrote:
>>
>> Jim,
>>
>>
>>
>> When one during design phase of a routing protocol or routing
>> protocol
>>
>> extension or modification to it already realizes that enabling such
>>
>> feature may cause real network issue if not done carefully - that
>>
>> should trigger the alarm to rethink the solution and explore
>>
>> alternative approaches to the problem space.
>>
>>
>>
>> We as operators have already hard time to relate enabling a feature
>>
>> within our intradomain boundaries to make sure such rollout is
>> network
>>
>> wide. Here you are asking for the same level of awareness across
>> ebgp
>>
>> boundaries. This is practically unrealistic IMHO.
>>
>>
>>
>> Back to the proposal ... I think that if anything needs to be done
>> is
>>
>> to employ per prefix GR with longer and locally configurable timer.
>>
>> That would address information persistence across direct IBGP
>> sessions.
>>
>>
>>
>> On the RRs use case of this draft we may perhaps agree to disagree,
>>
>> but I do not see large enough probability of correctly engineered RR
>>
>> plane to experience simultaneous multiple ibgp session drops. If
>> that
>>
>> happens the RR placement, platforms or deployment model should be
>>
>> re-engineered.
>>
>>
>>
>> Summary .. I do not think that IDR WG should adopt this document.
>> Just
>>
>> adding a warning to the deployment section is not sufficient.
>>
>>
>>
>> Best regards,
>>
>> R.
>>
>>
>>
>>
>>
>> Robert,
>>
>>
>>
>> The introduction of this technology needs to be carefully evaluated
>>
>> when being deployed into the network. Your example clearly calls out
>>
>> how a series of independent design can culminate in incorrect
>>
>> behavior. Certainly the deployment of persistence on a router that
>>
>> has interaction with a router that does not needs to be clearly
>>
>> understood by the network designer. The goal of this draft is to
>>
>> provide a fairly sophisticated tool that will protect the majority
>> of
>>
>> customers in the event of a catastrophic failure.. The premise being
>>
>> the perfect is not the enemy of the good.. I will add text in the
>>
>> deployment considerations section to better articulate that..
>>
>>
>>
>> Thanks, Jim Uttaro
>>
>>
>>
>> -----Original Message----- From:
>> idr-bounces@ietf.org<mailto:idr-bounces@ietf.org>
>>
>> [mailto:idr-bounces@ietf.org] On Behalf Of Robert Raszuk Sent:
>>
>> Sunday, October 23, 2011 5:32 PM To:
>> idr@ietf.org<mailto:idr@ietf.org>  List Subject: [Idr]
>>
>> draft-uttaro-idr-bgp-persistence-00
>>
>>
>>
>> Authors,
>>
>>
>>
>> Actually when discussing this draft a new concern surfaced which I
>>
>> would like to get your answer on.
>>
>>
>>
>> The draft in section 4.2 says as one of the forwarding rules:
>>
>>
>>
>> o  Forwarding to a "stale" route is only used if there are no other
>>
>> paths available to that route.  In other words an active path always
>>
>> wins regardless of path selection.  "Stale" state is always
>>
>> considered to be less preferred when compared with an active path.
>>
>>
>>
>> In the light of the above rule let's consider a very simple case of
>>
>> dual PE attached site of L3VPN service. Two CEs would inject into
>>
>> their IBGP mesh routes to the remote destination: one marked as
>> STALE
>>
>> and  one not marked at all. (Each CE is connected to different PE
>> and
>>
>> each PE RT imports only a single route to a remote hub headquarter
>> to
>>
>> support geographic load balancing).
>>
>>
>>
>> Let me illustrate:
>>
>>
>>
>> VPN Customer HUB
>>
>>
>>
>> PE3      PE4 SP PE1      PE2 |        | |        | CE1      CE2 |
>>
>> | 1|        |10 |        | R1 ------ R2 1
>>
>>
>>
>> CE1,CE2,R1,R2 are in IBGP mesh. IGP metric of CE1-R1 and R1-R2 are 1
>>
>> and R2-CE2 is 10.
>>
>>
>>
>> Prefix X is advertised by remote hub in the given VPN such that PE1
>>
>> vrf towards CE1 only has X via PE3 and PE2's vrf towards CE2 only
>> has
>>
>> X via PE4.
>>
>>
>>
>> Let's assume EBGP sessions PE3 to HUB went down, but ethernet link
>>
>> is up, next hop is in the RIB while data plane is gone. Assume no
>>
>> data plane real validation too. /* That is why in my former message
>>
>> I suggested that data plane validation would be necessary */.
>>
>>
>>
>> R1 has X via PE1/S (stale) and X via PE2/A (active) - it understands
>>
>> STALE so selects in his forwarding table path via CE2.
>>
>>
>>
>> R2 has X via PE1/S (stale) and X via PE2/A (active) - it does not
>>
>> understand STALE, never was upgraded to support the forwarding rule
>>
>> stated above in the draft and chooses X via CE1 (NH metric 2 vs 10).
>>
>>
>>
>> R1--R2 produce data plane loop as long as STALE paths are present in
>>
>> the system. Quite fun to troubleshoot too as the issue of PE3
>>
>> injecting such STALE paths may be on the opposite site of the world.
>>
>>
>>
>> The issue occurs when some routers within the customer site will be
>>
>> able to recognize STALE transitive community and prefer non stale
>>
>> paths in their forwarding planes (or BGP planes for that matter)
>>
>> while others will not as well as when both stale and non stale paths
>>
>> will be present.
>>
>>
>>
>> Question 1: How do you prevent forwarding loop in such case ?
>>
>>
>>
>> Question 2: How do you prevent forwarding loop in the case when
>>
>> customer would have backup connectivity to his sites or connectivity
>>
>> via different VPN provider yet routers in his site only partially
>>
>> understand the STALE community and only partially follow the
>>
>> forwarding rules ?
>>
>>
>>
>> In general as the rule is about mandating some particular order of
>>
>> path forwarding selection what is the mechanism in distributed
>>
>> systems like today's routing to be able to achieve any assurance
>> that
>>
>> such rule is active and enforced across _all_ routers behind EBGP
>>
>> PE-CE L3VPN boundaries in customer sites ?
>>
>>
>>
>> Best regards, R.
>>
>>
>>
>>
>>
>> -------- Original Message -------- Subject: [Idr]
>>
>> draft-uttaro-idr-bgp-persistence-00 Date: Sat, 22 Oct 2011 00:23:55
>>
>> +0200 From: Robert
>> Raszuk<robert@raszuk.net><mailto:robert@raszuk.net>  Reply-To:
>>
>> robert@raszuk.net<mailto:robert@raszuk.net>  To:
>> idr@ietf.org<mailto:idr@ietf.org>
>> List<idr@ietf.org><mailto:idr@ietf.org>
>>
>>
>>
>> Hi,
>>
>>
>>
>> I have read the draft and have one question and one observation.
>>
>>
>>
>> Question:
>>
>>
>>
>> What is the point of defining DO_NOT_PERSIST community ? In other
>>
>> words why not having PERSIST community set would not mean the same
>> as
>>
>> having path marked with DO_NOT_PERSIST.
>>
>>
>>
>> Observation:
>>
>>
>>
>> I found the below statement in section 4.2:
>>
>>
>>
>> o  Forwarding must ensure that the Next Hop to a "stale" route is
>>
>> viable.
>>
>>
>>
>> Of course I agree. But since we stating obvious in the forwarding
>>
>> section I think it would be good to explicitly also state this in
>>
>> the best path selection that next hop to STALE best path must be
>>
>> valid.
>>
>>
>>
>> However sessions especially those between loopbacks do not go down
>>
>> for no reason. Most likely there is network problem which may have
>>
>> caused those sessions to go down. It is therefor likely that LDP
>>
>> session went also down between any of the LSRs in the data path and
>>
>> that in spite of having the paths in BGP and next hops in IGP the
>> LSP
>>
>> required for both quoted L2/L3VPN applications is broken. That may
>>
>> particularly happen when network chooses to use independent control
>>
>> mode for label allocation.
>>
>>
>>
>> I would suggest to at least add the recommendation statement to the
>>
>> document that during best path selection especially for stale paths
>>
>> a validity of required forwarding paradigm to next hop of stale
>>
>> paths should be verified.
>>
>>
>>
>> For example using techniques as described in:
>>
>> draft-ietf-idr-bgp-bestpath-selection-criteria
>>
>>
>>
>> Best regards, R.
>>
>>
>>
>>
>>
>> _______________________________________________ Idr mailing list
>>
>> Idr@ietf.org<mailto:Idr@ietf.org>
>> https://www.ietf.org/mailman/listinfo/idr
>>
>>
>>
>>
>>
>> _______________________________________________ Idr mailing list
>>
>> Idr@ietf.org<mailto:Idr@ietf.org>
>> https://www.ietf.org/mailman/listinfo/idr
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Idr mailing list
>>
>> Idr@ietf.org<mailto:Idr@ietf.org>
>>
>> https://www.ietf.org/mailman/listinfo/idr
>>
>>
>>
>>
>
>_______________________________________________
>Idr mailing list
>Idr@ietf.org
>https://www.ietf.org/mailman/listinfo/idr