Re: [GROW] [Idr] Fwd: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Enke Chen <enkechen@cisco.com> Fri, 22 June 2012 17:58 UTC

Return-Path: <enkechen@cisco.com>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 57C5B21F86CF; Fri, 22 Jun 2012 10:58:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.365
X-Spam-Level:
X-Spam-Status: No, score=-10.365 tagged_above=-999 required=5 tests=[AWL=0.006, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8, SARE_SUB_OBFU_Q1=0.227]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E8SSyliXgwhL; Fri, 22 Jun 2012 10:58:47 -0700 (PDT)
Received: from mtv-iport-1.cisco.com (mtv-iport-1.cisco.com [173.36.130.12]) by ietfa.amsl.com (Postfix) with ESMTP id 5E41821F86C9; Fri, 22 Jun 2012 10:58:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=enkechen@cisco.com; l=8075; q=dns/txt; s=iport; t=1340387927; x=1341597527; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to; bh=fmkzcRJgmtVEVP51hgY465sncHca9jTxZh/6z/nYmvI=; b=T+VHiPxzx5RUPik3/6kNOkARh2rPJ0L4fWCRq0MssYYOOd+BkO0raivR 2brlu9OPJzRyfRPK/FoaQN5ucU2stDGPFGfPtL9NTfjH/zbrc22lkP9n6 uiXJG+1sLe04gmVX6DdfVPtn2PErQI0IKg5m+ekN0kU0d2YVSfHKaEjZl 8=;
X-IronPort-AV: E=Sophos; i="4.77,459,1336348800"; d="scan'208,217"; a="46791415"
Received: from mtv-core-1.cisco.com ([171.68.58.6]) by mtv-iport-1.cisco.com with ESMTP; 22 Jun 2012 17:58:47 +0000
Received: from sjc-vpn4-1382.cisco.com (sjc-vpn4-1382.cisco.com [10.21.85.101]) by mtv-core-1.cisco.com (8.14.5/8.14.5) with ESMTP id q5MHwkbL019597; Fri, 22 Jun 2012 17:58:46 GMT
Message-ID: <4FE4B2AD.9050704@cisco.com>
Date: Fri, 22 Jun 2012 11:00:13 -0700
From: Enke Chen <enkechen@cisco.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:12.0) Gecko/20120428 Thunderbird/12.0.1
MIME-Version: 1.0
To: robert@raszuk.net
References: <B17A6910EEDD1F45980687268941550FB1F5C0@MISOUT7MSGUSR9I.ITServices.sbc.com> <4FE495CD.7080604@raszuk.net>
In-Reply-To: <4FE495CD.7080604@raszuk.net>
Content-Type: multipart/alternative; boundary="------------070506060203000902020309"
Cc: "idr@ietf.org List" <idr@ietf.org>, "grow@ietf.org" <grow@ietf.org>, "UTTARO, JAMES (ATTLABS)" <ju1738@att.com>
Subject: Re: [GROW] [Idr] Fwd: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/grow>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Jun 2012 17:58:49 -0000

Hi, folks:

It might help the discussion to refresh ourselves about several large 
outages in the last few years that prompted the work on the error 
handling requirements and solutions:

    o issue with AS4_PATH that resulted in session resets multiple hops 
away (two separate incidents)
    o session reset triggered by a single route with a new attribute

I remember that Rob had a presentation at the NANOG on the topic.

-- Enke

On 6/22/12 8:57 AM, Robert Raszuk wrote:
> Jim,
>
>> We could as easily without any change to BGP use BGP Persistence to
>> maintain the paths except for the ones that have the invalid
>> attribute.. This is the simpler method, has the benefit of not
>> changing BGP, or educating the world on the nuances of the changes
>> etc...
> +
>> Why wouldn't we simply let the session fail and then use BGP Persistence
>> or GR ;)
>
> Please observe that when the session is down you are not receiving 
> withdraws or new best paths for those "good" prefixes (maybe 99% of 
> them) which did not have any errors in their respective update messages.
>
> Equating it with persistence proposal is therefor highly incorrect.
>
>> I also do not fully understand "treat as withdraw" does this meant that
>> the peer who has received an update with P1-PN with malformed attr then
>> initiate a withdrawal to all of its peers?  Or simply assume that the
>> paths have been received as a message?  Some sample topologies as to how
>> this works would be a good addition to this section..
>
> The speaker reacting on an error which can be addressed by 
> "treat-as-withdraw" invalidates locally those prefixes received in the 
> update message, runs local best path and as result if no other path is 
> found withdraws those prefixes from all peers it has previously sent 
> them to.
>
>> I am not in support of solutions which create a scenario where BGP
>> cannot recover without human intervention.
>
> I think no one is. But we are - I think - not there yet for the 
> routers to automatically fix their bugs, but only automatically 
> signalling them the requested action ;(.
>
> > Nothing is going to get people's attention like a failed BGP
> > Session..
>
> True statement. But the entire assumption behind treat-as-withdraw is 
> that your ops scripts parse the syslog messages indicating the issue 
> to NOC with the same red color and buzz as bgp session down. Of course 
> you need to rework your ops scripts/alarms for that to happen.
>
> Rgs,
> R.
>
> PS.
>
> Note that if the main BGP session is down (like in the persistence 
> case) BGP Operational Messages can not any longer be exchanged between 
> peers as TCP connection could have been reset (if no multisession is 
> used and if we are talking about single SAFI). That just makes the 
> issue worse especially when you do not like to have humans intervention.
>
>
>
>
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr