Re: [Idr] draft-uttaro-idr-bgp-persistence-00

"UTTARO, JAMES" <ju1738@att.com> Thu, 27 October 2011 20:31 UTC

Return-Path: <ju1738@att.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D25B921F8B4B for <idr@ietfa.amsl.com>; Thu, 27 Oct 2011 13:31:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.639
X-Spam-Level:
X-Spam-Status: No, score=-105.639 tagged_above=-999 required=5 tests=[AWL=0.360, BAYES_00=-2.599, J_CHICKENPOX_13=0.6, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1VcgBfZWkWSF for <idr@ietfa.amsl.com>; Thu, 27 Oct 2011 13:31:39 -0700 (PDT)
Received: from mail120.messagelabs.com (mail120.messagelabs.com [216.82.250.83]) by ietfa.amsl.com (Postfix) with ESMTP id 5CCC921F8B42 for <idr@ietf.org>; Thu, 27 Oct 2011 13:31:34 -0700 (PDT)
X-Env-Sender: ju1738@att.com
X-Msg-Ref: server-14.tower-120.messagelabs.com!1319747492!46109529!1
X-Originating-IP: [144.160.20.145]
X-StarScan-Version: 6.3.6; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 12051 invoked from network); 27 Oct 2011 20:31:33 -0000
Received: from sbcsmtp6.sbc.com (HELO mlpd192.enaf.sfdc.sbc.com) (144.160.20.145) by server-14.tower-120.messagelabs.com with DHE-RSA-AES256-SHA encrypted SMTP; 27 Oct 2011 20:31:33 -0000
Received: from enaf.sfdc.sbc.com (localhost.localdomain [127.0.0.1]) by mlpd192.enaf.sfdc.sbc.com (8.14.4/8.14.4) with ESMTP id p9RKVxYk016305; Thu, 27 Oct 2011 16:32:00 -0400
Received: from MISOUT7MSGHUB9E.ITServices.sbc.com (misout7msghub9e.itservices.sbc.com [144.151.223.61]) by mlpd192.enaf.sfdc.sbc.com (8.14.4/8.14.4) with ESMTP id p9RKVqFT016158 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 27 Oct 2011 16:31:52 -0400
Received: from MISOUT7MSGUSR9I.ITServices.sbc.com ([169.254.1.231]) by MISOUT7MSGHUB9E.ITServices.sbc.com ([144.151.223.61]) with mapi id 14.01.0339.001; Thu, 27 Oct 2011 16:31:24 -0400
From: "UTTARO, JAMES" <ju1738@att.com>
To: "'robert@raszuk.net'" <robert@raszuk.net>
Thread-Topic: [Idr] draft-uttaro-idr-bgp-persistence-00
Thread-Index: AQHMkEAnLP0iAkRlVkChUYWtkZqOMZWKuQoAgAQhplCAAFAiAIABf8Jw
Date: Thu, 27 Oct 2011 20:31:24 +0000
Message-ID: <B17A6910EEDD1F45980687268941550FA20F79@MISOUT7MSGUSR9I.ITServices.sbc.com>
References: <4EA1F0FB.3090100@raszuk.net> <4EA487E4.2040201@raszuk.net> <B17A6910EEDD1F45980687268941550FA20750@MISOUT7MSGUSR9I.ITServices.sbc.com> <4EA84254.9000400@raszuk.net>
In-Reply-To: <4EA84254.9000400@raszuk.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [135.70.63.246]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "idr@ietf.org List" <idr@ietf.org>
Subject: Re: [Idr] draft-uttaro-idr-bgp-persistence-00
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Oct 2011 20:31:40 -0000

Robert,

	To be honest I do not agree and am confused by your comments.. There are many specifications that demand correct deployment in a network to prevent unwanted behavior. Some immediate examples come to mind:

L3VPN. If the operator mis-configures RTs or stitches incorrectly VPN pollution can easily occur. There is no protection that I know of in the draft to prevent this.

RT Constrain. If incorrectly deployed in a network with devices that do and don't support the VPN distribution graph will be incorrectly built. This will result in customers VPN being compromised. The draft does not prevent this.

It is the responsibility of the network designers to determine how to deploy powerful technology into their networks. Part of this is to ensure that we do not create a "real network issue if not done carefully ". We do not have the luxury of not being careful. Generally speaking when introducing technology with new capability the expectation is that the network architects/designers understand the technology and deploy it correctly. 

Thanks,
	Jim Uttaro

-----Original Message-----
From: Robert Raszuk [mailto:robert@raszuk.net] 
Sent: Wednesday, October 26, 2011 1:25 PM
To: UTTARO, JAMES
Cc: idr@ietf.org List
Subject: Re: [Idr] draft-uttaro-idr-bgp-persistence-00

Jim,

When one during design phase of a routing protocol or routing protocol 
extension or modification to it already realizes that enabling such 
feature may cause real network issue if not done carefully - that should 
trigger the alarm to rethink the solution and explore alternative 
approaches to the problem space.

We as operators have already hard time to relate enabling a feature 
within our intradomain boundaries to make sure such rollout is network 
wide. Here you are asking for the same level of awareness across ebgp 
boundaries. This is practically unrealistic IMHO.

Back to the proposal ... I think that if anything needs to be done is to 
employ per prefix GR with longer and locally configurable timer. That 
would address information persistence across direct IBGP sessions.

On the RRs use case of this draft we may perhaps agree to disagree, but 
I do not see large enough probability of correctly engineered RR plane 
to experience simultaneous multiple ibgp session drops. If that happens 
the RR placement, platforms or deployment model should be re-engineered.

Summary .. I do not think that IDR WG should adopt this document. Just 
adding a warning to the deployment section is not sufficient.

Best regards,
R.


> Robert,
>
> The introduction of this technology needs to be carefully evaluated
> when being deployed into the network. Your example clearly calls out
> how a series of independent design can culminate in incorrect
> behavior. Certainly the deployment of persistence on a router that
> has interaction with a router that does not needs to be clearly
> understood by the network designer. The goal of this draft is to
> provide a fairly sophisticated tool that will protect the majority of
> customers in the event of a catastrophic failure.. The premise being
> the perfect is not the enemy of the good.. I will add text in the
> deployment considerations section to better articulate that..
>
> Thanks, Jim Uttaro
>
> -----Original Message----- From: idr-bounces@ietf.org
> [mailto:idr-bounces@ietf.org] On Behalf Of Robert Raszuk Sent:
> Sunday, October 23, 2011 5:32 PM To: idr@ietf.org List Subject: [Idr]
> draft-uttaro-idr-bgp-persistence-00
>
> Authors,
>
> Actually when discussing this draft a new concern surfaced which I
> would like to get your answer on.
>
> The draft in section 4.2 says as one of the forwarding rules:
>
> o  Forwarding to a "stale" route is only used if there are no other
> paths available to that route.  In other words an active path always
> wins regardless of path selection.  "Stale" state is always
> considered to be less preferred when compared with an active path.
>
> In the light of the above rule let's consider a very simple case of
> dual PE attached site of L3VPN service. Two CEs would inject into
> their IBGP mesh routes to the remote destination: one marked as STALE
> and  one not marked at all. (Each CE is connected to different PE and
> each PE RT imports only a single route to a remote hub headquarter to
> support geographic load balancing).
>
> Let me illustrate:
>
> VPN Customer HUB
>
> PE3      PE4 SP PE1      PE2 |        | |        | CE1      CE2 |
> | 1|        |10 |        | R1 ------ R2 1
>
> CE1,CE2,R1,R2 are in IBGP mesh. IGP metric of CE1-R1 and R1-R2 are 1
> and R2-CE2 is 10.
>
> Prefix X is advertised by remote hub in the given VPN such that PE1
> vrf towards CE1 only has X via PE3 and PE2's vrf towards CE2 only has
> X via PE4.
>
> Let's assume EBGP sessions PE3 to HUB went down, but ethernet link
> is up, next hop is in the RIB while data plane is gone. Assume no
> data plane real validation too. /* That is why in my former message
> I suggested that data plane validation would be necessary */.
>
> R1 has X via PE1/S (stale) and X via PE2/A (active) - it understands
> STALE so selects in his forwarding table path via CE2.
>
> R2 has X via PE1/S (stale) and X via PE2/A (active) - it does not
> understand STALE, never was upgraded to support the forwarding rule
> stated above in the draft and chooses X via CE1 (NH metric 2 vs 10).
>
> R1--R2 produce data plane loop as long as STALE paths are present in
> the system. Quite fun to troubleshoot too as the issue of PE3
> injecting such STALE paths may be on the opposite site of the world.
>
> The issue occurs when some routers within the customer site will be
> able to recognize STALE transitive community and prefer non stale
> paths in their forwarding planes (or BGP planes for that matter)
> while others will not as well as when both stale and non stale paths
> will be present.
>
> Question 1: How do you prevent forwarding loop in such case ?
>
> Question 2: How do you prevent forwarding loop in the case when
> customer would have backup connectivity to his sites or connectivity
> via different VPN provider yet routers in his site only partially
> understand the STALE community and only partially follow the
> forwarding rules ?
>
> In general as the rule is about mandating some particular order of
> path forwarding selection what is the mechanism in distributed
> systems like today's routing to be able to achieve any assurance that
> such rule is active and enforced across _all_ routers behind EBGP
> PE-CE L3VPN boundaries in customer sites ?
>
> Best regards, R.
>
>
> -------- Original Message -------- Subject: [Idr]
> draft-uttaro-idr-bgp-persistence-00 Date: Sat, 22 Oct 2011 00:23:55
> +0200 From: Robert Raszuk<robert@raszuk.net> Reply-To:
> robert@raszuk.net To: idr@ietf.org List<idr@ietf.org>
>
> Hi,
>
> I have read the draft and have one question and one observation.
>
> Question:
>
> What is the point of defining DO_NOT_PERSIST community ? In other
> words why not having PERSIST community set would not mean the same as
> having path marked with DO_NOT_PERSIST.
>
> Observation:
>
> I found the below statement in section 4.2:
>
> o  Forwarding must ensure that the Next Hop to a "stale" route is
> viable.
>
> Of course I agree. But since we stating obvious in the forwarding
> section I think it would be good to explicitly also state this in
> the best path selection that next hop to STALE best path must be
> valid.
>
> However sessions especially those between loopbacks do not go down
> for no reason. Most likely there is network problem which may have
> caused those sessions to go down. It is therefor likely that LDP
> session went also down between any of the LSRs in the data path and
> that in spite of having the paths in BGP and next hops in IGP the LSP
> required for both quoted L2/L3VPN applications is broken. That may
> particularly happen when network chooses to use independent control
> mode for label allocation.
>
> I would suggest to at least add the recommendation statement to the
> document that during best path selection especially for stale paths
> a validity of required forwarding paradigm to next hop of stale
> paths should be verified.
>
> For example using techniques as described in:
> draft-ietf-idr-bgp-bestpath-selection-criteria
>
> Best regards, R.
>
>
> _______________________________________________ Idr mailing list
> Idr@ietf.org https://www.ietf.org/mailman/listinfo/idr
>
>
> _______________________________________________ Idr mailing list
> Idr@ietf.org https://www.ietf.org/mailman/listinfo/idr
>
>