[Idr] Fwd: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Robert Raszuk <robert@raszuk.net> Fri, 22 June 2012 15:57 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2B95621F85F8 for <idr@ietfa.amsl.com>; Fri, 22 Jun 2012 08:57:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.448
X-Spam-Level:
X-Spam-Status: No, score=-2.448 tagged_above=-999 required=5 tests=[AWL=-0.076, BAYES_00=-2.599, SARE_SUB_OBFU_Q1=0.227]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 18fAAzinvwja for <idr@ietfa.amsl.com>; Fri, 22 Jun 2012 08:57:04 -0700 (PDT)
Received: from mail1310.opentransfer.com (mail1310.opentransfer.com [76.162.254.103]) by ietfa.amsl.com (Postfix) with ESMTP id E2B2F21F8595 for <idr@ietf.org>; Fri, 22 Jun 2012 08:57:03 -0700 (PDT)
Received: (qmail 22950 invoked by uid 399); 22 Jun 2012 15:57:02 -0000
Received: from unknown (HELO ?192.168.1.91?) (pbs:robert@raszuk.net@83.31.147.254) by mail1310.opentransfer.com with ESMTPM; 22 Jun 2012 15:57:02 -0000
X-Originating-IP: 83.31.147.254
Message-ID: <4FE495CD.7080604@raszuk.net>
Date: Fri, 22 Jun 2012 17:57:01 +0200
From: Robert Raszuk <robert@raszuk.net>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120614 Thunderbird/13.0.1
MIME-Version: 1.0
To: "idr@ietf.org List" <idr@ietf.org>, "grow@ietf.org" <grow@ietf.org>
References: <B17A6910EEDD1F45980687268941550FB1F5C0@MISOUT7MSGUSR9I.ITServices.sbc.com>
In-Reply-To: <B17A6910EEDD1F45980687268941550FB1F5C0@MISOUT7MSGUSR9I.ITServices.sbc.com>
X-Forwarded-Message-Id: <B17A6910EEDD1F45980687268941550FB1F5C0@MISOUT7MSGUSR9I.ITServices.sbc.com>
Content-Type: multipart/mixed; boundary="------------020302020806020607080304"
Cc: "UTTARO, JAMES (ATTLABS)" <ju1738@att.com>
Subject: [Idr] Fwd: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: robert@raszuk.net
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Jun 2012 15:57:07 -0000

Jim,

> We could as easily without any change to BGP use BGP Persistence to
> maintain the paths except for the ones that have the invalid
> attribute.. This is the simpler method, has the benefit of not
> changing BGP, or educating the world on the nuances of the changes
> etc…
+
> Why wouldn’t we simply let the session fail and then use BGP Persistence
> or GR ;)

Please observe that when the session is down you are not receiving 
withdraws or new best paths for those "good" prefixes (maybe 99% of 
them) which did not have any errors in their respective update messages.

Equating it with persistence proposal is therefor highly incorrect.

> I also do not fully understand “treat as withdraw” does this meant that
> the peer who has received an update with P1-PN with malformed attr then
> initiate a withdrawal to all of its peers?  Or simply assume that the
> paths have been received as a message?  Some sample topologies as to how
> this works would be a good addition to this section..

The speaker reacting on an error which can be addressed by 
"treat-as-withdraw" invalidates locally those prefixes received in the 
update message, runs local best path and as result if no other path is 
found withdraws those prefixes from all peers it has previously sent 
them to.

> I am not in support of solutions which create a scenario where BGP
> cannot recover without human intervention.

I think no one is. But we are - I think - not there yet for the routers 
to automatically fix their bugs, but only automatically signalling them 
the requested action ;(.

 > Nothing is going to get people’s attention like a failed BGP
 > Session..

True statement. But the entire assumption behind treat-as-withdraw is 
that your ops scripts parse the syslog messages indicating the issue to 
NOC with the same red color and buzz as bgp session down. Of course you 
need to rework your ops scripts/alarms for that to happen.

Rgs,
R.

PS.

Note that if the main BGP session is down (like in the persistence case) 
BGP Operational Messages can not any longer be exchanged between peers 
as TCP connection could have been reset (if no multisession is used and 
if we are talking about single SAFI). That just makes the issue worse 
especially when you do not like to have humans intervention.