Re: [OSPF] FW: Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Karsten Thomann <karsten_thomann@linfre.de> Wed, 12 October 2016 18:44 UTC

Return-Path: <karsten_thomann@linfre.de>
X-Original-To: ospf@ietfa.amsl.com
Delivered-To: ospf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 77FB312962C for <ospf@ietfa.amsl.com>; Wed, 12 Oct 2016 11:44:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.897
X-Spam-Level:
X-Spam-Status: No, score=-4.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-2.996, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l99Q8xm4sh3Y for <ospf@ietfa.amsl.com>; Wed, 12 Oct 2016 11:44:55 -0700 (PDT)
Received: from linfre.de (linfre.de [83.151.26.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1DFE9129622 for <ospf@ietf.org>; Wed, 12 Oct 2016 11:44:45 -0700 (PDT)
Received: from linne.localnet (85.16.209.250) by linfreserv (Axigen) with (ECDHE-RSA-AES256-SHA encrypted) ESMTPSA id 1C16DD; Wed, 12 Oct 2016 20:44:35 +0200
From: Karsten Thomann <karsten_thomann@linfre.de>
To: ospf@ietf.org
Message-ID: <5060078.WTTeaZs0yX@linne>
User-Agent: KMail/4.13.0.22 (Windows/6.1; KDE/4.14.3; i686; git-c97962a; 2016-07-14)
In-Reply-To: <57FDEEC6.6010400@cisco.com>
References: <D422ABC2.8241A%acee@cisco.com> <57FDEEC6.6010400@cisco.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="nextPart2074808.IN16zJ3MyM"
Content-Transfer-Encoding: 7bit
X-AXIGEN-DK-Result: No records
DomainKey-Status: no signature
X-AxigenSpam-Level: 7
Date: Wed, 12 Oct 2016 11:44:45 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/ospf/xKPo80vK2fhP1rvHMXS4Q2YP3NQ>
Subject: Re: [OSPF] FW: Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement
X-BeenThere: ospf@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: The Official IETF OSPG WG Mailing List <ospf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ospf>, <mailto:ospf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ospf/>
List-Post: <mailto:ospf@ietf.org>
List-Help: <mailto:ospf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ospf>, <mailto:ospf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Oct 2016 18:44:58 -0000

I fully agree. Even if there is a solution to the mentioned problem, who knows if the 
implementation has usable code to let other routers know what it is doing...


Am Mittwoch, 12. Oktober 2016, 10:05:26 schrieb Peter Psenak:
> No support. We should not modify protocol to address possible bugs in
> the implementation.
> 
> thanks,
> Peter
> 
> On 11/10/16 20:51 , Acee Lindem (acee) wrote:
> > Speaking as WG Co-Chair:
> > 
> > We had a quite a lengthy discussion on this problem and whether or it is
> > something the WG should adopt. Please indicate whether or not you would
> > support WG adoption before Oct 26th, 2016.
> > 
> > Thanks,
> > Acee
> > 
> > From: "lizhenqiang@chinamobile.com <mailto:lizhenqiang@chinamobile.com>"
> > <lizhenqiang@chinamobile.com <mailto:lizhenqiang@chinamobile.com>>
> > Date: Thursday, August 25, 2016 at 9:29 PM
> > To: Acee Lindem <acee@cisco.com <mailto:acee@cisco.com>>, Jie Dong
> > <jie.dong@huawei.com <mailto:jie.dong@huawei.com>>, "Les Ginsberg
> > (ginsberg)" <ginsberg@cisco.com <mailto:ginsberg@cisco.com>>, OSPF WG
> > List <ospf@ietf.org <mailto:ospf@ietf.org>>
> > Cc: "Zhangxudong (zhangxudong, VRP)" <zhangxudong@huawei.com
> > <mailto:zhangxudong@huawei.com>>
> > Subject: Re: Re: [OSPF] Solicit feedbacks on
> > draft-dong-ospf-maxage-flush-problem-statement
> > 
> >     Hi Acee,
> >     
> >     Totally agree with you that we have to avoid significant
> >     modification to OSPF.
> >     
> >     The common point after the mail discussion is production network
> >     running OSPF DOES have some problems due to software implementation
> >     bugs or hardware defects. Those production network problems deserve
> >     some proposals both to identify the router with bugs and to mitigate
> >     the problem, for example to reduce th impact of OSPF route flapping.
> >     
> >     Your suggestion is one option about defective router identification.
> >     Thank you very much.
> >     
> >     Best Regards,
> >     ----------------------------------------------------------------------
> >     --
> >     lizhenqiang@chinamobile.com <mailto:lizhenqiang@chinamobile.com>
> >     
> >         *From:* Acee Lindem (acee) <mailto:acee@cisco.com>
> >         *Date:* 2016-08-25 03:04
> >         *To:* lizhenqiang@chinamobile.com
> >         <mailto:lizhenqiang@chinamobile.com>; Dongjie (Jimmy)
> >         <mailto:jie.dong@huawei.com>; Les Ginsberg (ginsberg)
> >         <mailto:ginsberg@cisco.com>; ospf@ietf.org <mailto:ospf@ietf.org>
> >         *CC:* Zhangxudong (zhangxudong, VRP)
> >         <mailto:zhangxudong@huawei.com>
> >         *Subject:* Re: [OSPF] Solicit feedbacks on
> >         draft-dong-ospf-maxage-flush-problem-statement
> >         Speaking as WG member:
> >         
> >         Hi Zhenjiang,
> >         
> >         I don’t doubt that this was a very disquieting experience.
> >         However, I still don’t think we should attempt to change the
> >         protocol to compensate for routers that do not adhere to the
> >         protocol. To make an analogy, in my years of OSPF experience
> >         I’ve been subject to a number of bugs related to OSPF’s usage of
> >         local wire multicast (some triggered by obscure conditions such
> >         as routing and bridging on the same port). However, I’ve never
> >         proposed to not use local wire multicast. Also, after 25 years
> >         of OSPFv2, it doesn’t make sense to try and change the protocol
> >         to avoid bugs in this area. As for identifying the nefarious
> >         router, I think adding a counter and possibly a separate
> >         notification to the YANG model might be warranted since purging
> >         a non-self-originated LSA should not be a common occurrence in
> >         most networks.
> >         
> >         Thanks,
> >         Acee
> >         P.S. Since this is an OSPF standards list, I’ve purposely
> >         avoided the questions as to how this catastrophic bug made it
> >         into a production network.
> >         
> >         
> >         From: "lizhenqiang@chinamobile.com
> >         <mailto:lizhenqiang@chinamobile.com>"
> >         <lizhenqiang@chinamobile.com <mailto:lizhenqiang@chinamobile.com>>
> >         Date: Wednesday, August 24, 2016 at 2:11 PM
> >         To: Jie Dong <jie.dong@huawei.com <mailto:jie.dong@huawei.com>>,
> >         Acee Lindem <acee@cisco.com <mailto:acee@cisco.com>>, "Les
> >         Ginsberg (ginsberg)" <ginsberg@cisco.com
> >         <mailto:ginsberg@cisco.com>>, OSPF WG List <ospf@ietf.org
> >         <mailto:ospf@ietf.org>>
> >         Cc: "Zhangxudong (zhangxudong, VRP)" <zhangxudong@huawei.com
> >         <mailto:zhangxudong@huawei.com>>
> >         Subject: Re: RE: [OSPF] Solicit feedbacks on
> >         draft-dong-ospf-maxage-flush-problem-statement
> >         
> >             Hello Jie, Acee and Les,
> >             
> >             I am a coauthor of this draft from operator China Mobile.
> >             Thank you all for your discussion and suggestion in the
> >             previous mails. As you all discussed, a misbehavior OSPF
> >             router (due to software or hardware problem) can cause
> >             severe problem in the whole OSPF domain.
> >             
> >             Here I want to point out that OSPF route flapping DID occour
> >             in my field network contributed by a misbehavior OSPF router
> >             installed. The procedure to analyze and look for the cause
> >             were very complicated because we did not know the source of
> >             the flushing. Two hours past, we could not identify the real
> >             cause and restore our network. The CPU utilization of OSPF
> >             routers was high, the network traffic decreased
> >             significantly, lots of tunnel down warnings raised. When we
> >             tried to shutdown one OSPF router, route flapping stopped.
> >             This router was a newly deployed one. Through communication
> >             with our vendor, they admitted that this product had some
> >             defects in dealing with OSPF protocol. This kind of defects
> >             are difficult for us to test  when they apply for entrance
> >             in our network. Once defective products are deployed in the
> >             field network,  locating the problem is very hard and time
> >             consuming.
> >             
> >             So, I think it is necessary for us to solve the problem and
> >             improve the robustness of the protocol.At least it should
> >             provide the means to help us locate the OSPF route flapping
> >             problem.
> >             
> >             --------------------------------------------------------------
> >             ----------
> >             lizhenqiang@chinamobile.com
> >             <mailto:lizhenqiang@chinamobile.com>
> >             
> >                 *From:* Dongjie (Jimmy) <mailto:jie.dong@huawei.com>
> >                 *Date:* 2016-08-18 17:09
> >                 *To:* Acee Lindem (acee) <mailto:acee@cisco.com>; Les
> >                 Ginsberg (ginsberg) <mailto:ginsberg@cisco.com>;
> >                 ospf@ietf.org <mailto:ospf@ietf.org>
> >                 *CC:* Zhangxudong (zhangxudong, VRP)
> >                 <mailto:zhangxudong@huawei.com>;
> >                 lizhenqiang@chinamobile.com
> >                 <mailto:lizhenqiang@chinamobile.com>
> >                 *Subject:* RE: [OSPF] Solicit feedbacks on
> >                 draft-dong-ospf-maxage-flush-problem-statement
> >                 
> >                 Hi Acee,
> >                 
> >                 Please see my replies inline:
> >                 
> >                 *From:*Acee Lindem (acee) [mailto:acee@cisco.com]