Re: [GROW] [Idr] Fwd: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

"UTTARO, JAMES" <ju1738@att.com> Fri, 22 June 2012 18:22 UTC

Return-Path: <ju1738@att.com>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F349611E80AD; Fri, 22 Jun 2012 11:22:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.272
X-Spam-Level:
X-Spam-Status: No, score=-106.272 tagged_above=-999 required=5 tests=[AWL=0.099, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, SARE_SUB_OBFU_Q1=0.227, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AcdokfpL03vj; Fri, 22 Jun 2012 11:22:34 -0700 (PDT)
Received: from nbfkord-smmo05.seg.att.com (nbfkord-smmo05.seg.att.com [209.65.160.92]) by ietfa.amsl.com (Postfix) with ESMTP id 37F6411E808C; Fri, 22 Jun 2012 11:22:34 -0700 (PDT)
Received: from unknown [144.160.20.145] (EHLO nbfkord-smmo05.seg.att.com) by nbfkord-smmo05.seg.att.com(mxl_mta-6.11.0-10) with ESMTP id ae7b4ef4.4dc14940.1038928.00-581.2893482.nbfkord-smmo05.seg.att.com (envelope-from <ju1738@att.com>); Fri, 22 Jun 2012 18:22:34 +0000 (UTC)
X-MXL-Hash: 4fe4b7ea79e9af12-47e0413d48eefc472a51a820a15fdb092c0d2378
Received: from unknown [144.160.20.145] (EHLO mlpd192.enaf.sfdc.sbc.com) by nbfkord-smmo05.seg.att.com(mxl_mta-6.11.0-10) over TLS secured channel with ESMTP id 6e7b4ef4.0.1038911.00-468.2893422.nbfkord-smmo05.seg.att.com (envelope-from <ju1738@att.com>); Fri, 22 Jun 2012 18:22:30 +0000 (UTC)
X-MXL-Hash: 4fe4b7e64943767e-c6f0041d8af840b2cfbab54ef5095598790e372b
Received: from enaf.sfdc.sbc.com (localhost.localdomain [127.0.0.1]) by mlpd192.enaf.sfdc.sbc.com (8.14.5/8.14.5) with ESMTP id q5MIMUtC012241; Fri, 22 Jun 2012 14:22:30 -0400
Received: from sflint02.pst.cso.att.com (sflint02.pst.cso.att.com [144.154.234.229]) by mlpd192.enaf.sfdc.sbc.com (8.14.5/8.14.5) with ESMTP id q5MIMMes012190 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 22 Jun 2012 14:22:25 -0400
Received: from MISOUT7MSGHUB9B.ITServices.sbc.com (misout7msghub9b.itservices.sbc.com [144.151.223.72]) by sflint02.pst.cso.att.com (RSA Interceptor); Fri, 22 Jun 2012 14:22:11 -0400
Received: from MISOUT7MSGUSR9I.ITServices.sbc.com ([144.151.223.56]) by MISOUT7MSGHUB9B.ITServices.sbc.com ([144.151.223.72]) with mapi id 14.02.0298.004; Fri, 22 Jun 2012 14:22:10 -0400
From: "UTTARO, JAMES" <ju1738@att.com>
To: 'Enke Chen' <enkechen@cisco.com>, "robert@raszuk.net" <robert@raszuk.net>
Thread-Topic: [Idr] Fwd: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04
Thread-Index: Ac1Qh0AwzuLH5xBMSu2rz3aHemLX7wAGW8kZAADOS0A=
Date: Fri, 22 Jun 2012 18:22:10 +0000
Message-ID: <B17A6910EEDD1F45980687268941550FB1F97F@MISOUT7MSGUSR9I.ITServices.sbc.com>
References: <B17A6910EEDD1F45980687268941550FB1F5C0@MISOUT7MSGUSR9I.ITServices.sbc.com> <4FE495CD.7080604@raszuk.net> <4FE4B2AD.9050704@cisco.com>
In-Reply-To: <4FE4B2AD.9050704@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [135.70.151.80]
Content-Type: multipart/alternative; boundary="_000_B17A6910EEDD1F45980687268941550FB1F97FMISOUT7MSGUSR9IIT_"
MIME-Version: 1.0
X-RSA-Inspected: yes
X-RSA-Classifications: public
X-Spam: [F=0.2000000000; CM=0.500; S=0.200(2010122901)]
X-MAIL-FROM: <ju1738@att.com>
X-SOURCE-IP: [144.160.20.145]
X-AnalysisOut: [v=1.0 c=1 a=dXkvp1zHZaMA:10 a=ofMgfj31e3cA:10 a=BLceEmwcHo]
X-AnalysisOut: [wA:10 a=ZRNLZ4dFUbCvG8UMqPvVAA==:17 a=AUd_NHdVAAAA:8 a=2cl]
X-AnalysisOut: [OPd4PAAAA:8 a=48vgC7mUAAAA:8 a=COAiZOjdOaZY4NTSpp4A:9 a=Cj]
X-AnalysisOut: [uIK1q_8ugA:10 a=JfD0Fch1gWkA:10 a=bDUki_mJ7DgA:10 a=lZB815]
X-AnalysisOut: [dzVvQA:10 a=yMhMjlubAAAA:8 a=SSmOFEACAAAA:8 a=_82lQTgo2F5v]
X-AnalysisOut: [O93CtvAA:9 a=gKO2Hq4RSVkA:10 a=UiCQ7L4-1S4A:10 a=hTZeC7Yk6]
X-AnalysisOut: [K0A:10]
X-Mailman-Approved-At: Fri, 22 Jun 2012 12:00:05 -0700
Cc: "idr@ietf.org List" <idr@ietf.org>, "grow@ietf.org" <grow@ietf.org>
Subject: Re: [GROW] [Idr] Fwd: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/grow>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Jun 2012 18:22:37 -0000

+1..

Jim Uttaro

From: Enke Chen [mailto:enkechen@cisco.com]
Sent: Friday, June 22, 2012 2:00 PM
To: robert@raszuk.net
Cc: idr@ietf.org List; grow@ietf.org; UTTARO, JAMES; Enke Chen
Subject: Re: [Idr] Fwd: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Hi, folks:

It might help the discussion to refresh ourselves about several large outages in the last few years that prompted the work on the error handling requirements and solutions:

   o issue with AS4_PATH that resulted in session resets multiple hops away (two separate incidents)
   o session reset triggered by a single route with a new attribute

I remember that Rob had a presentation at the NANOG on the topic.

-- Enke

On 6/22/12 8:57 AM, Robert Raszuk wrote:
Jim,


We could as easily without any change to BGP use BGP Persistence to
maintain the paths except for the ones that have the invalid
attribute.. This is the simpler method, has the benefit of not
changing BGP, or educating the world on the nuances of the changes
etc...
+

Why wouldn't we simply let the session fail and then use BGP Persistence
or GR ;)

Please observe that when the session is down you are not receiving withdraws or new best paths for those "good" prefixes (maybe 99% of them) which did not have any errors in their respective update messages.

Equating it with persistence proposal is therefor highly incorrect.


I also do not fully understand "treat as withdraw" does this meant that
the peer who has received an update with P1-PN with malformed attr then
initiate a withdrawal to all of its peers?  Or simply assume that the
paths have been received as a message?  Some sample topologies as to how
this works would be a good addition to this section..

The speaker reacting on an error which can be addressed by "treat-as-withdraw" invalidates locally those prefixes received in the update message, runs local best path and as result if no other path is found withdraws those prefixes from all peers it has previously sent them to.


I am not in support of solutions which create a scenario where BGP
cannot recover without human intervention.

I think no one is. But we are - I think - not there yet for the routers to automatically fix their bugs, but only automatically signalling them the requested action ;(.

> Nothing is going to get people's attention like a failed BGP
> Session..

True statement. But the entire assumption behind treat-as-withdraw is that your ops scripts parse the syslog messages indicating the issue to NOC with the same red color and buzz as bgp session down. Of course you need to rework your ops scripts/alarms for that to happen.

Rgs,
R.

PS.

Note that if the main BGP session is down (like in the persistence case) BGP Operational Messages can not any longer be exchanged between peers as TCP connection could have been reset (if no multisession is used and if we are talking about single SAFI). That just makes the issue worse especially when you do not like to have humans intervention.






_______________________________________________

Idr mailing list

Idr@ietf.org<mailto:Idr@ietf.org>

https://www.ietf.org/mailman/listinfo/idr