Re: [OSPF] FW: Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Peter Psenak <ppsenak@cisco.com> Wed, 12 October 2016 08:05 UTC

Return-Path: <ppsenak@cisco.com>
X-Original-To: ospf@ietfa.amsl.com
Delivered-To: ospf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BD8CB1296F1 for <ospf@ietfa.amsl.com>; Wed, 12 Oct 2016 01:05:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.518
X-Spam-Level:
X-Spam-Status: No, score=-17.518 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.996, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EFeDsQmsZYyU for <ospf@ietfa.amsl.com>; Wed, 12 Oct 2016 01:05:28 -0700 (PDT)
Received: from aer-iport-3.cisco.com (aer-iport-3.cisco.com [173.38.203.53]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F38F51296E5 for <ospf@ietf.org>; Wed, 12 Oct 2016 01:05:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=40446; q=dns/txt; s=iport; t=1476259528; x=1477469128; h=message-id:date:from:mime-version:to:subject:references: in-reply-to:content-transfer-encoding; bh=v8bNieHejqyd1XXgMl4U/iWPCvXMhvxkxtpvox5bfB4=; b=HeXRehX483pXsPHjOXKDFRUvqMMJPbbrZRxwnsE/vc9c2orgD/z+h8Nm 2ICZhntrpL/NehZfW0n9VNcRMHxLnFTL8V8AzSu4F0UIhAU9VeJBV51E7 o1nYHJNyK960/jMx5gWt5mhML4jX2TX1SVsMP4xWlDkWUjoUU+37lpGmw E=;
X-IronPort-AV: E=Sophos;i="5.31,333,1473120000"; d="scan'208";a="647337168"
Received: from aer-iport-nat.cisco.com (HELO aer-core-1.cisco.com) ([173.38.203.22]) by aer-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Oct 2016 08:05:26 +0000
Received: from [10.60.140.52] (ams-ppsenak-nitro3.cisco.com [10.60.140.52]) by aer-core-1.cisco.com (8.14.5/8.14.5) with ESMTP id u9C85PqI015624; Wed, 12 Oct 2016 08:05:26 GMT
Message-ID: <57FDEEC6.6010400@cisco.com>
Date: Wed, 12 Oct 2016 10:05:26 +0200
From: Peter Psenak <ppsenak@cisco.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
To: "Acee Lindem (acee)" <acee@cisco.com>, OSPF WG List <ospf@ietf.org>
References: <D422ABC2.8241A%acee@cisco.com>
In-Reply-To: <D422ABC2.8241A%acee@cisco.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ospf/H1Ex_BNeCPJacHobdm7JjGhlAiM>
Subject: Re: [OSPF] FW: Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement
X-BeenThere: ospf@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: The Official IETF OSPG WG Mailing List <ospf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ospf>, <mailto:ospf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ospf/>
List-Post: <mailto:ospf@ietf.org>
List-Help: <mailto:ospf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ospf>, <mailto:ospf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Oct 2016 08:05:32 -0000

No support. We should not modify protocol to address possible bugs in 
the implementation.

thanks,
Peter

On 11/10/16 20:51 , Acee Lindem (acee) wrote:
> Speaking as WG Co-Chair:
>
> We had a quite a lengthy discussion on this problem and whether or it is
> something the WG should adopt. Please indicate whether or not you would
> support WG adoption before Oct 26th, 2016.
>
> Thanks,
> Acee
>
> From: "lizhenqiang@chinamobile.com <mailto:lizhenqiang@chinamobile.com>"
> <lizhenqiang@chinamobile.com <mailto:lizhenqiang@chinamobile.com>>
> Date: Thursday, August 25, 2016 at 9:29 PM
> To: Acee Lindem <acee@cisco.com <mailto:acee@cisco.com>>, Jie Dong
> <jie.dong@huawei.com <mailto:jie.dong@huawei.com>>, "Les Ginsberg
> (ginsberg)" <ginsberg@cisco.com <mailto:ginsberg@cisco.com>>, OSPF WG
> List <ospf@ietf.org <mailto:ospf@ietf.org>>
> Cc: "Zhangxudong (zhangxudong, VRP)" <zhangxudong@huawei.com
> <mailto:zhangxudong@huawei.com>>
> Subject: Re: Re: [OSPF] Solicit feedbacks on
> draft-dong-ospf-maxage-flush-problem-statement
>
>     Hi Acee,
>
>     Totally agree with you that we have to avoid significant
>     modification to OSPF.
>
>     The common point after the mail discussion is production network
>     running OSPF DOES have some problems due to software implementation
>     bugs or hardware defects. Those production network problems deserve
>     some proposals both to identify the router with bugs and to mitigate
>     the problem, for example to reduce th impact of OSPF route flapping.
>
>     Your suggestion is one option about defective router identification.
>     Thank you very much.
>
>     Best Regards,
>     ------------------------------------------------------------------------
>     lizhenqiang@chinamobile.com <mailto:lizhenqiang@chinamobile.com>
>
>         *From:* Acee Lindem (acee) <mailto:acee@cisco.com>
>         *Date:* 2016-08-25 03:04
>         *To:* lizhenqiang@chinamobile.com
>         <mailto:lizhenqiang@chinamobile.com>; Dongjie (Jimmy)
>         <mailto:jie.dong@huawei.com>; Les Ginsberg (ginsberg)
>         <mailto:ginsberg@cisco.com>; ospf@ietf.org <mailto:ospf@ietf.org>
>         *CC:* Zhangxudong (zhangxudong, VRP) <mailto:zhangxudong@huawei.com>
>         *Subject:* Re: [OSPF] Solicit feedbacks on
>         draft-dong-ospf-maxage-flush-problem-statement
>         Speaking as WG member:
>
>         Hi Zhenjiang,
>
>         I don’t doubt that this was a very disquieting experience.
>         However, I still don’t think we should attempt to change the
>         protocol to compensate for routers that do not adhere to the
>         protocol. To make an analogy, in my years of OSPF experience
>         I’ve been subject to a number of bugs related to OSPF’s usage of
>         local wire multicast (some triggered by obscure conditions such
>         as routing and bridging on the same port). However, I’ve never
>         proposed to not use local wire multicast. Also, after 25 years
>         of OSPFv2, it doesn’t make sense to try and change the protocol
>         to avoid bugs in this area. As for identifying the nefarious
>         router, I think adding a counter and possibly a separate
>         notification to the YANG model might be warranted since purging
>         a non-self-originated LSA should not be a common occurrence in
>         most networks.
>
>         Thanks,
>         Acee
>         P.S. Since this is an OSPF standards list, I’ve purposely
>         avoided the questions as to how this catastrophic bug made it
>         into a production network.
>
>
>         From: "lizhenqiang@chinamobile.com
>         <mailto:lizhenqiang@chinamobile.com>"
>         <lizhenqiang@chinamobile.com <mailto:lizhenqiang@chinamobile.com>>
>         Date: Wednesday, August 24, 2016 at 2:11 PM
>         To: Jie Dong <jie.dong@huawei.com <mailto:jie.dong@huawei.com>>,
>         Acee Lindem <acee@cisco.com <mailto:acee@cisco.com>>, "Les
>         Ginsberg (ginsberg)" <ginsberg@cisco.com
>         <mailto:ginsberg@cisco.com>>, OSPF WG List <ospf@ietf.org
>         <mailto:ospf@ietf.org>>
>         Cc: "Zhangxudong (zhangxudong, VRP)" <zhangxudong@huawei.com
>         <mailto:zhangxudong@huawei.com>>
>         Subject: Re: RE: [OSPF] Solicit feedbacks on
>         draft-dong-ospf-maxage-flush-problem-statement
>
>             Hello Jie, Acee and Les,
>
>             I am a coauthor of this draft from operator China Mobile.
>             Thank you all for your discussion and suggestion in the
>             previous mails. As you all discussed, a misbehavior OSPF
>             router (due to software or hardware problem) can cause
>             severe problem in the whole OSPF domain.
>
>             Here I want to point out that OSPF route flapping DID occour
>             in my field network contributed by a misbehavior OSPF router
>             installed. The procedure to analyze and look for the cause
>             were very complicated because we did not know the source of
>             the flushing. Two hours past, we could not identify the real
>             cause and restore our network. The CPU utilization of OSPF
>             routers was high, the network traffic decreased
>             significantly, lots of tunnel down warnings raised. When we
>             tried to shutdown one OSPF router, route flapping stopped.
>             This router was a newly deployed one. Through communication
>             with our vendor, they admitted that this product had some
>             defects in dealing with OSPF protocol. This kind of defects
>             are difficult for us to test  when they apply for entrance
>             in our network. Once defective products are deployed in the
>             field network,  locating the problem is very hard and time
>             consuming.
>
>             So, I think it is necessary for us to solve the problem and
>             improve the robustness of the protocol.At least it should
>             provide the means to help us locate the OSPF route flapping
>             problem.
>
>             ------------------------------------------------------------------------
>             lizhenqiang@chinamobile.com <mailto:lizhenqiang@chinamobile.com>
>
>                 *From:* Dongjie (Jimmy) <mailto:jie.dong@huawei.com>
>                 *Date:* 2016-08-18 17:09
>                 *To:* Acee Lindem (acee) <mailto:acee@cisco.com>; Les
>                 Ginsberg (ginsberg) <mailto:ginsberg@cisco.com>;
>                 ospf@ietf.org <mailto:ospf@ietf.org>
>                 *CC:* Zhangxudong (zhangxudong, VRP)
>                 <mailto:zhangxudong@huawei.com>;
>                 lizhenqiang@chinamobile.com
>                 <mailto:lizhenqiang@chinamobile.com>
>                 *Subject:* RE: [OSPF] Solicit feedbacks on
>                 draft-dong-ospf-maxage-flush-problem-statement
>
>                 Hi Acee,
>
>                 Please see my replies inline:
>
>                 *From:*Acee Lindem (acee) [mailto:acee@cisco.com]
>                 *Sent:* Thursday, August 18, 2016 2:23 AM
>                 *To:* Dongjie (Jimmy); Les Ginsberg (ginsberg);
>                 ospf@ietf.org <mailto:ospf@ietf.org>
>                 *Cc:* Zhangxudong (zhangxudong, VRP);
>                 lizhenqiang@chinamobile.com
>                 <mailto:lizhenqiang@chinamobile.com>
>                 *Subject:* Re: [OSPF] Solicit feedbacks on
>                 draft-dong-ospf-maxage-flush-problem-statement
>
>                 Speaking as a WG member who has some experience with
>                 OSPF implementations:
>
>                 Hi Jie,
>
>                 Along with Les, I’m also against progressing this draft.
>
>                 *From: *Jie Dong <jie.dong@huawei.com
>                 <mailto:jie.dong@huawei.com>>
>                 *Date: *Tuesday, August 16, 2016 at 9:56 AM
>                 *To: *Acee Lindem <acee@cisco.com
>                 <mailto:acee@cisco.com>>, "Les Ginsberg (ginsberg)"
>                 <ginsberg@cisco.com <mailto:ginsberg@cisco.com>>, OSPF
>                 WG List <ospf@ietf.org <mailto:ospf@ietf.org>>
>                 *Cc: *"Zhangxudong (zhangxudong, VRP)"
>                 <zhangxudong@huawei.com
>                 <mailto:zhangxudong@huawei.com>>,
>                 "lizhenqiang@chinamobile.com
>                 <mailto:lizhenqiang@chinamobile.com>"
>                 <lizhenqiang@chinamobile.com
>                 <mailto:lizhenqiang@chinamobile.com>>
>                 *Subject: *RE: [OSPF] Solicit feedbacks on
>                 draft-dong-ospf-maxage-flush-problem-statement
>
>                     Hi Acee,
>
>                     Thanks a lot for your feedbacks.
>
>                     For packet corruption which impacts the LS age
>                     before the LSAs are packed into LSU packet, I agree
>                     it is less likely to happen than the other cases.
>                     However I think we agree that OSPF authentication
>                     only protect the packet level corruption, which
>                     cannot help to detect the corruption at LSA level.
>
>                 So, you are suggesting that LSAs are corrupted in the
>                 database in such a way that the LSA Age is set exactly
>                 to 0xE10? How would the implementation know that this
>                 had happened and prematurely age the packet? Database
>                 aging just doesn’t work this way (unless the
>                 implementation is particularly naïve).
>
>                 [Jie] Actually the case is when the LSA is about to be
>                 exchanged with neighbor, during the message packing the
>                 LS age is corrupted to either Maxage or a large number
>                 close to Maxage. The sending router does not intend to
>                 do a Maxage flush, however the neighbor routers which
>                 receive the message would treat this as a flush. This is
>                 a possible case although less likely to happen than the
>                 other cases.
>
>                     In my understanding, robustness is an important
>                     feature of network protocols, which include the
>                     robustness to errors and failures happened in the
>                     network. If there is a bug in a particular router in
>                     the network, operator would not allow the whole
>                     network being impacted, which means other routers in
>                     the network needs to work properly in this
>                     situation. For example in BGP, the error handling
>                     mechanism has been optimized to avoid unnecessary
>                     session teardown.
>
>                 So you agree your problem statement is confined to a
>                 software bug resulting in LSAs being aged too quickly? I
>                 think this is the third time I’ve raised this question.
>
>                 [Jie] As I said before, the problems happened in the
>                 production network are caused by software bug in LSA
>                 aging, so I think this is the major case.
>
>                 If it has such a problem (whether it be due to a system
>                 timer bug or a some more specific aging problem), it
>                 seems the router would also be refreshing its LSAs all
>                 too frequently (at least at twice the rate) and it would
>                 be readily identifiable. For a system time problem, the
>                 router would likely have many other problems. For
>                 example, it would not maintain OSPF adjacencies if the
>                 dead timer advances fast enough. It would retransmit at
>                 a very fast rate as well. Are you going to write problem
>                 statements and suggest solutions for these situations as
>                 well?
>
>                 [Jie] This depends on the implementation. the software
>                 bug may only impact the aging of LSAs received from
>                 other routers. And frequent LSA refreshing may be caused
>                 by other cases such as link oscillation.  For a system
>                 timer problem, OSPF adjacency may oscillate, but if the
>                 management connection is impacted, such oscillation is
>                 difficult to be identified.
>
>                 What about other bugs? What if the router erroneously
>                 specifies a neighbor’s router-id as its own in a
>                 Router-LSA? Is this a problem the protocol should handle?
>
>                 [Jie] Depends on the significance to network, case by
>                 case analysis may be needed.
>
>                     I agree that OSPF Yang notification for LSA timeout
>                     is a nice thing to have and could be useful to
>                     identify the misbehaved router. My concern is
>                     sometimes the network may be severely impacted that
>                     the connectivity of netconf/restconf is also
>                     impacted. To avoid this, some mechanism to mitigate
>                     the impact of this problem could help.
>
>                 I believe a router have such impact would be easy to
>                 identify…
>
>                 [Jie] According to the feedback from on-site engineers,
>                 when IGP routing is oscillating severely which makes the
>                 management connection unavailable, it usually takes much
>                 longer time for troubleshooting, as logging to any
>                 router cannot be done via the management network. So
>                 maybe it would be better to have some automatic
>                 mechanism to reduce the impact before it becomes a big
>                 problem to troubleshoot.
>
>                 Best regards,
>
>                 Jie
>
>                 Thanks,
>
>                 Acee
>
>                     Best regards,
>
>                     Jie
>
>                     *From:*Acee Lindem (acee) [mailto:acee@cisco.com]
>                     *Sent:* Saturday, August 13, 2016 3:27 AM
>                     *To:* Les Ginsberg (ginsberg); Dongjie (Jimmy);
>                     ospf@ietf.org <mailto:ospf@ietf.org>
>                     *Cc:* Zhangxudong (zhangxudong, VRP);
>                     lizhenqiang@chinamobile.com
>                     <mailto:lizhenqiang@chinamobile.com>
>                     *Subject:* Re: [OSPF] Solicit feedbacks on
>                     draft-dong-ospf-maxage-flush-problem-statement
>
>                     Speaking as a WG member:
>
>                     Hi Jie,
>
>                     I believe we agree that the problem is confined to
>                     OSPF bugs, system timer bugs,  and packet
>                     corruption. I’d assert that corruption can be
>                     detected via OSPF authentication. In fact, there is
>                     a well-known antidote where IS-IS authentication was
>                     enabled solely for the purpose of filtering
>                     corrupted protocol packets in an environment with
>                     line cards that were prone to such corruption.
>                     Hence, we are left with problems based on OSPF or
>                     system timer bugs. If there were a system timer bug,
>                     I’d doubt that networking device with such a bug
>                     would be functional to the point of being able to
>                     establish and maintaining OSPF adjacencies.  Do we
>                     really want to enhance the protocol to deal with bugs?
>
>                     I’ve thought about this and one potential action I
>                     could envision would be to add a separate OSPF YANG
>                     notification where an LSA times out and a router
>                     other than the originator purges it. This way, the
>                     misbehaving OSPF router could be readily identified.
>
>                     Thanks,
>
>                     Acee
>
>                     *From: *OSPF <ospf-bounces@ietf.org
>                     <mailto:ospf-bounces@ietf.org>> on behalf of "Les
>                     Ginsberg (ginsberg)" <ginsberg@cisco.com
>                     <mailto:ginsberg@cisco.com>>
>                     *Date: *Thursday, August 11, 2016 at 1:29 PM
>                     *To: *Jie Dong <jie.dong@huawei.com
>                     <mailto:jie.dong@huawei.com>>, OSPF WG List
>                     <ospf@ietf.org <mailto:ospf@ietf.org>>
>                     *Cc: *"Zhangxudong (zhangxudong, VRP)"
>                     <zhangxudong@huawei.com
>                     <mailto:zhangxudong@huawei.com>>,
>                     "lizhenqiang@chinamobile.com
>                     <mailto:lizhenqiang@chinamobile.com>"
>                     <lizhenqiang@chinamobile.com
>                     <mailto:lizhenqiang@chinamobile.com>>
>                     *Subject: *Re: [OSPF] Solicit feedbacks on
>                     draft-dong-ospf-maxage-flush-problem-statement
>
>                         Jie –
>
>                         Having the discussion has certainly been a good
>                         thing, but if the consensus of the WG is that
>                         there is no protocol change required then there
>                         is no need for any draft – which is my current
>                         position.
>
>                         The other point is that you seem to be confusing
>                         the IS-IS Purge origination TLV (RFC 6232) with
>                         detecting invalid purges/remaining lifetime
>                         corruption. This is not the case. RFC 6232
>                         simply allows us to detect which router
>                         originated a purge – it is not able to detect
>                         whether a purge is valid/invalid – and was not
>                         motivated by concerns about remaining lifetime
>                         corruption.
>
>                             Les
>
>                         *From:*Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
>                         *Sent:* Wednesday, August 10, 2016 9:24 PM
>                         *To:* Les Ginsberg (ginsberg); ospf@ietf.org
>                         <mailto:ospf@ietf.org>
>                         *Cc:* Zhangxudong (zhangxudong, VRP);
>                         lizhenqiang@chinamobile.com
>                         <mailto:lizhenqiang@chinamobile.com>
>                         *Subject:* RE: [OSPF] Solicit feedbacks on
>                         draft-dong-ospf-maxage-flush-problem-statement
>
>                         Hi Les,
>
>                         The current draft is about problem statement, so
>                         IMO what the WG needs to consider is whether
>                         this is a vulnerability of OSPF protocol, and
>                         whether it can have negative impact to the
>                         network. If the problem is acknowledged, IMO it
>                         is worth to be documented.
>
>                         The “ROI” as you mentioned is for the evaluation
>                         of the proposed solutions. I totally agree that
>                         for the timer bug case, recognizing and ignoring
>                         the received abnormal Maxage LSAs cannot stop
>                         the misbehaved router from generating further
>                         Maxage LSA, as it is a systematic problem, which
>                         can only be fixed after the operator identifies
>                         that router. This is also similar to the
>                         systematic corruption of IS-IS remain time.  And
>                         this is why this draft mentions two kinds of
>                         potential solutions, the mitigation mechanism
>                         can avoid the network being severely impacted by
>                         the problem, while for systematic problems,
>                         problem localization is needed to identify the
>                         misbehaved router and then solve the problem.
>
>                         Best regards,
>
>                         Jie
>
>                         *From:*OSPF [mailto:ospf-bounces@ietf.org] *On
>                         Behalf Of *Les Ginsberg (ginsberg)
>                         *Sent:* Monday, August 08, 2016 2:14 AM
>                         *To:* Dongjie (Jimmy) <jie.dong@huawei.com
>                         <mailto:jie.dong@huawei.com>>; ospf@ietf.org
>                         <mailto:ospf@ietf.org>
>                         *Cc:* Zhangxudong (zhangxudong, VRP)
>                         <zhangxudong@huawei.com
>                         <mailto:zhangxudong@huawei.com>>;
>                         lizhenqiang@chinamobile.com
>                         <mailto:lizhenqiang@chinamobile.com>
>                         *Subject:* Re: [OSPF] Solicit feedbacks on
>                         draft-dong-ospf-maxage-flush-problem-statement
>
>                         Jie –
>
>                         Thinking about the following some more:
>
>                         /<snip>/
>
>                         /What remains is the possibility that an
>                         implementation has some bug and unintentionally
>                         modifies the age to something other than what it
>                         should be due to the actual elapsed time since
>                         LSA generation. I suppose a mechanism equivalent
>                         to what the IS-IS draft defined i.e. setting the
>                         age to “new” (0 in OSPF case) when first
>                         receiving a non-self-generated LSA could be
>                         useful to prevent negative impacts of such an
>                         implementation bug. Is this what you intend?/
>
>                         //
>
>                         /[Jie]: More specifically, the problem could be
>                         caused by either “setting the LS age field
>                         incorrectly due to implementation bug” or
>                         “system timer runs so fast that the LS age
>                         reaches MaxAge much earlier than other routers”.
>                         Another less likely case is that the LS age
>                         field is corrupted before the LSA is assembled
>                         into OSPF packet./
>
>                         /<end snip>/
>
>                         The benefits are extremely limited. If a router
>                         prematurely ages an LSA due to a timer bug,
>                         ignoring the received LSA age on reception isn’t
>                         going to prevent premature purging by the router
>                         which has the bug. So the effect of ignoring the
>                         received LSA age prior to reaching MAXAGE will
>                         be short lived. You are then left with the
>                         possibility that an implementation corrupts the
>                         LSA age BEFORE calculating checksum/crypto
>                         authentication – but its local timeout logic is
>                         unaffected. This has very limited value. Whether
>                         the WG considers this worth pursuing is
>                         something you need to ask. For myself, I don’t
>                         see much ROI here.
>
>                            Les
>
>                         *From:*Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
>                         *Sent:* Monday, August 01, 2016 9:43 PM
>                         *To:* Les Ginsberg (ginsberg); ospf@ietf.org
>                         <mailto:ospf@ietf.org>
>                         *Cc:* Zhangxudong (zhangxudong, VRP);
>                         lizhenqiang@chinamobile.com
>                         <mailto:lizhenqiang@chinamobile.com>
>                         *Subject:* RE: [OSPF] Solicit feedbacks on
>                         draft-dong-ospf-maxage-flush-problem-statement
>
>                         Hi Les,
>
>                         Please see my replies with [Jie2]:
>
>                         *From:*Les Ginsberg (ginsberg)
>                         [mailto:ginsberg@cisco.com]
>                         *Sent:* Monday, August 01, 2016 9:57 PM
>                         *To:* Dongjie (Jimmy); ospf@ietf.org
>                         <mailto:ospf@ietf.org>
>                         *Cc:* Zhangxudong (zhangxudong, VRP);
>                         lizhenqiang@chinamobile.com
>                         <mailto:lizhenqiang@chinamobile.com>
>                         *Subject:* RE: [OSPF] Solicit feedbacks on
>                         draft-dong-ospf-maxage-flush-problem-statement
>
>                         Jie -
>
>                         *From:*Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
>                         *Sent:* Monday, August 01, 2016 1:44 AM
>                         *To:* Les Ginsberg (ginsberg); ospf@ietf.org
>                         <mailto:ospf@ietf.org>
>                         *Cc:* Zhangxudong (zhangxudong, VRP);
>                         lizhenqiang@chinamobile.com
>                         <mailto:lizhenqiang@chinamobile.com>
>                         *Subject:* RE: [OSPF] Solicit feedbacks on
>                         draft-dong-ospf-maxage-flush-problem-statement
>
>                         Hi Les,
>
>                         Please see inline with [Jie]:
>
>                         *From:*Les Ginsberg (ginsberg)
>                         [mailto:ginsberg@cisco.com]
>                         *Sent:* Monday, August 01, 2016 3:09 PM
>                         *To:* Dongjie (Jimmy); ospf@ietf.org
>                         <mailto:ospf@ietf.org>
>                         *Cc:* Zhangxudong (zhangxudong, VRP);
>                         lizhenqiang@chinamobile.com
>                         <mailto:lizhenqiang@chinamobile.com>
>                         *Subject:* RE: [OSPF] Solicit feedbacks on
>                         draft-dong-ospf-maxage-flush-problem-statement
>
>                         Jie –
>
>                         Fully agree that IS-IS and OSPF differ in this
>                         regard.
>
>                         https://www.ietf.org/id/draft-ietf-isis-remaining-lifetime-01.txt
>                         addresses problems where corruption of the
>                         remaining lifetime occurs either during
>                         transmission/reception or due to some DOS
>                         attack. This isn’t a concern w OSPF (hope you
>                         agree).
>
>                         [Jie]: Yes, for OSPF the corruption during
>                         packet transmission can be detected.
>
>                         What remains is the possibility that an
>                         implementation has some bug and unintentionally
>                         modifies the age to something other than what it
>                         should be due to the actual elapsed time since
>                         LSA generation. I suppose a mechanism equivalent
>                         to what the IS-IS draft defined i.e. setting the
>                         age to “new” (0 in OSPF case) when first
>                         receiving a non-self-generated LSA could be
>                         useful to prevent negative impacts of such an
>                         implementation bug. Is this what you intend?
>
>                         [Jie]: More specifically, the problem could be
>                         caused by either “setting the LS age field
>                         incorrectly due to implementation bug” or
>                         “system timer runs so fast that the LS age
>                         reaches MaxAge much earlier than other routers”.
>                         Another less likely case is that the LS age
>                         field is corrupted before the LSA is assembled
>                         into OSPF packet.
>
>                         [Jie]: Regarding the solutions space, IMO we
>                         need to consider both cases: “LS age reaches
>                         MaxAge” and “LS age close to MaxAge”. For IS-IS,
>                         RFC 6232 and RFC 6233 provide solutions for the
>                         detection and identification of corrupted IS-IS
>                         purge, while OSPF does not have similar mechanisms.
>
>                         */[Les:] It is incorrect to say that RFC 6232
>                         makes it possible to detect a corrupt purge.
>                         What it does do is to provide an indication as
>                         to which IS initiated a purge. I don’t know how
>                         OSPF would address this issue, but for OSPFv2 at
>                         least any solution would likely not be backwards
>                         compatible. For this reason I suggest that you
>                         not try to address this issue in the same draft./*
>
>                         *//*
>
>                         [Jie2]: Agreed, RFC 6232 provide the mechanism
>                         to track the misbehaved routers so that operator
>                         can fix the problem, the detection can be based
>                         on the rules in RFC 6233 or some other
>                         anomalies. Indeed for OSPFv2 legacy LSAs, it is
>                         difficult to introduce the mechanism similar to
>                         RFC 6232, while it can be easier for the
>                         OSPFv2/v3 Extended LSAs. So it depends on how
>                         backward compatible the solution should be. I
>                         agree with you that the solution for Problem
>                         Localization in OSPF needs to be provided in a
>                         separate document.
>
>                         *//*
>
>                         */Solutions to LS age  corruption can be done in
>                         a backwards compatible way, but they  MUST NOT
>                         result in discarding purges which pass
>                         authentication- doing so places you at risk for
>                         having inconsistent LSDBs in the network./*
>
>                         [Jie2]: Exactly. The received MaxAge LSAs cannot
>                         simply be discarded, the decision must be made
>                         carefully, probably based on some additional
>                         information. The authors has discussed some
>                         possible solution internally, and will prepare
>                         some material for further open discussion.
>
>                         As written, the draft makes claims that are at
>                         least misleading – and I believe actually
>                         incorrect. In Section 6 you say:
>
>                         “The LS age field may be altered as a result of
>
>                             packet corruption, such modification cannot
>                         be detected by LSA
>
>                             checksum nor OSPF packet cryptographic
>                         authentication.”
>
>                         This isn’t correct.
>
>                         [Jie] Thanks for pointing out this. This
>                         sentence need to be revised to mention “LSA
>                         corruption” rather than “packet corruption”.
>
>                         What would be helpful – at least to me – is to
>                         move from a generic problem statement to the
>                         specific problem you want to solve and the
>                         proposed solution. This also requires you to
>                         more clearly state the cases where there is an
>                         actual vulnerability. It would be a lot easier
>                         to support the draft if this were done.
>
>                         [Jie] Thanks for your suggestion. Yes we can
>                         update this draft with more specific problem
>                         statements as I mentioned above.
>
>                         [Jie] As for the proposed solutions, the current
>                         draft specifies the requirements on the
>                         potential solutions, from which we envision that
>                         different solutions maybe needed for “Impact
>                         Mitigation” and “Problem Localization”. The
>                         solution for “Impact mitigation” can be the
>                         easier one, for which we can start to discuss
>                         the potential solutions now. While the solution
>                         for “problem localization” may need more
>                         considerations.
>
>                         */[Les:] A discussion of the requirements is
>                         useful and necessary, but IMO until you propose
>                         a solution there isn’t enough substance for the
>                         document to become a WG document./*
>
>                         [Jie2] Yes the current draft focuses on the
>                         problem statement and the requirements, the goal
>                         is to firstly get the MaxAge flush problem
>                         acknowledged and reach consensus on the
>                         requirements. Then the plan is to specify the
>                         solutions in separate documents.  Your valuable
>                         suggestions will be considered, and further
>                         contributions are welcome.
>
>                         Best regards,
>
>                         Jie
>
>                         *//*
>
>                         */    Les/*
>
>                         Best regards,
>
>                         Jie
>
>                             Les
>
>                         *From:*Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
>                         *Sent:* Sunday, July 31, 2016 11:48 PM
>                         *To:* Les Ginsberg (ginsberg); ospf@ietf.org
>                         <mailto:ospf@ietf.org>
>                         *Cc:* Zhangxudong (zhangxudong, VRP);
>                         lizhenqiang@chinamobile.com
>                         <mailto:lizhenqiang@chinamobile.com>
>                         *Subject:* RE: [OSPF] Solicit feedbacks on
>                         draft-dong-ospf-maxage-flush-problem-statement
>
>                         Hi Les,
>
>                         Thanks for your comments.
>
>                         OSPF packet level checksum and authentication
>                         can only protect the assembled LSU packet one
>                         hop on the wire, while cannot detect any change
>                         to LSA made by the routers. This is because the
>                         OSPF packets are re-assembled on each hop, which
>                         is slightly different from IS-IS. So the problem
>                         for OSPF is mainly due to the problems inside
>                         the router, for example protocol
>                         implementations, system timers, or some hardware
>                         problem. Actually this problem has been seen in
>                         several production networks.
>
>                         We can improve the description in the draft to
>                         make this clear.
>
>                         Best regards,
>
>                         Jie
>
>                         *From:*Les Ginsberg (ginsberg)
>                         [mailto:ginsberg@cisco.com]
>                         *Sent:* Monday, August 01, 2016 1:30 PM
>                         *To:* Dongjie (Jimmy); ospf@ietf.org
>                         <mailto:ospf@ietf.org>
>                         *Cc:* Zhangxudong (zhangxudong, VRP);
>                         lizhenqiang@chinamobile.com
>                         <mailto:lizhenqiang@chinamobile.com>
>                         *Subject:* RE: [OSPF] Solicit feedbacks on
>                         draft-dong-ospf-maxage-flush-problem-statement
>
>                         Jie –
>
>                         The draft says (Section 2):
>
>                         “Since cryptographic authentication is executed
>                         at the OSPF packet
>
>                             level, it can only protect the assembled LSU
>                         packet for one hop and
>
>                             does not provide any additional protection
>                         for the corruption of LS
>
>                             age field.”
>
>                         But as authentication is calculated at the OSPF
>                         packet level, any change to the LS age field for
>                         an individual LSA contained within the OSPF
>                         packet (e.g. by some packet corruption in
>                         transmission) would cause authentication to fail
>                         when the packet is received. So the statement
>                         you make is not correct. I therefore am
>                         struggling to understand what problem you
>                         believe is not addressed by existing
>                         authentication techniques.
>
>                             Les
>
>                         *From:*OSPF [mailto:ospf-bounces@ietf.org] *On
>                         Behalf Of *Dongjie (Jimmy)
>                         *Sent:* Sunday, July 31, 2016 8:15 PM
>                         *To:* ospf@ietf.org <mailto:ospf@ietf.org>
>                         *Cc:* Zhangxudong (zhangxudong, VRP);
>                         lizhenqiang@chinamobile.com
>                         <mailto:lizhenqiang@chinamobile.com>
>                         *Subject:* [OSPF] Solicit feedbacks on
>                         draft-dong-ospf-maxage-flush-problem-statement
>
>                         Hi all,
>
>                         draft-dong-ospf-maxage-flush-problem-statement
>                         describes the problems caused by the corruption
>                         of the LS Age field, and summarizes the
>                         requirements on potential solutions. This draft
>                         received good comments during the presentation
>                         on the IETF meeting in B.A.
>
>                         The authors would like to solicit further
>                         feedbacks from the mailing list, on both the
>                         problem statement and the solution requirements.
>                         Based on the feedbacks, we will update the
>                         problem statement draft, and work together to
>                         build suitable solutions.
>
>                         The URL of the draft is:
>
>                         https://tools.ietf.org/html/draft-dong-ospf-maxage-flush-problem-statement-00
>
>                         Comments & feedbacks are welcome.
>
>                         Best regards,
>
>                         Jie
>
>
>
> _______________________________________________
> OSPF mailing list
> OSPF@ietf.org
> https://www.ietf.org/mailman/listinfo/ospf
>