Re: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

"Dongjie (Jimmy)" <jie.dong@huawei.com> Tue, 16 August 2016 13:56 UTC

Return-Path: <jie.dong@huawei.com>
X-Original-To: ospf@ietfa.amsl.com
Delivered-To: ospf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 485AE12D7FC for <ospf@ietfa.amsl.com>; Tue, 16 Aug 2016 06:56:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.167
X-Spam-Level:
X-Spam-Status: No, score=-3.167 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.247, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jEiNkjaLXUXd for <ospf@ietfa.amsl.com>; Tue, 16 Aug 2016 06:56:41 -0700 (PDT)
Received: from lhrrgout.huawei.com (lhrrgout.huawei.com [194.213.3.17]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D5F0512D801 for <ospf@ietf.org>; Tue, 16 Aug 2016 06:56:39 -0700 (PDT)
Received: from 172.18.7.190 (EHLO lhreml706-cah.china.huawei.com) ([172.18.7.190]) by lhrrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CUK55775; Tue, 16 Aug 2016 13:56:36 +0000 (GMT)
Received: from NKGEML411-HUB.china.huawei.com (10.98.56.70) by lhreml706-cah.china.huawei.com (10.201.5.182) with Microsoft SMTP Server (TLS) id 14.3.235.1; Tue, 16 Aug 2016 14:56:34 +0100
Received: from NKGEML515-MBX.china.huawei.com ([fe80::a54a:89d2:c471:ff]) by nkgeml411-hub.china.huawei.com ([10.98.56.70]) with mapi id 14.03.0235.001; Tue, 16 Aug 2016 21:56:23 +0800
From: "Dongjie (Jimmy)" <jie.dong@huawei.com>
To: "Acee Lindem (acee)" <acee@cisco.com>, "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>, "ospf@ietf.org" <ospf@ietf.org>
Thread-Topic: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement
Thread-Index: AQHR8NeBqKAJ44d2wE2+UqwZwZ55IaBDH0IAgABktoCAAbNJAIAGZ93Q
Date: Tue, 16 Aug 2016 13:56:22 +0000
Message-ID: <76CD132C3ADEF848BD84D028D243C92774F084E6@NKGEML515-MBX.china.huawei.com>
References: <76CD132C3ADEF848BD84D028D243C92774EFB09A@NKGEML515-MBX.china.huawei.com> <90433b8486184c9cb4b947e7ffb9fc73@XCH-ALN-001.cisco.com> <76CD132C3ADEF848BD84D028D243C92774EFB143@NKGEML515-MBX.china.huawei.com> <0369fc017f8d47568594d3eb9f684649@XCH-ALN-001.cisco.com> <76CD132C3ADEF848BD84D028D243C92774EFB1BF@NKGEML515-MBX.china.huawei.com> <3a424b8025ca42a5a64bf88af69ea108@XCH-ALN-001.cisco.com> <76CD132C3ADEF848BD84D028D243C92774EFBC05@NKGEML515-MBX.china.huawei.com> <37a4a1ba0da84b76a4d5962f59441a17@XCH-ALN-001.cisco.com> <76CD132C3ADEF848BD84D028D243C92774F05C49@NKGEML515-MBX.china.huawei.com> <36c4636b09bf4464b912080806d917e3@XCH-ALN-001.cisco.com> <D3D39927.78E35%acee@cisco.com>
In-Reply-To: <D3D39927.78E35%acee@cisco.com>
Accept-Language: en-US, zh-CN
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.109.111.20]
Content-Type: multipart/alternative; boundary="_000_76CD132C3ADEF848BD84D028D243C92774F084E6NKGEML515MBXchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.57B31B95.024D, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32
X-Mirapoint-Loop-Id: 5200549c66e32efab57702b9df46456d
Archived-At: <https://mailarchive.ietf.org/arch/msg/ospf/su64rlJtK_47hCSRtdvtR5Bae0A>
Cc: "Zhangxudong (zhangxudong, VRP)" <zhangxudong@huawei.com>, "lizhenqiang@chinamobile.com" <lizhenqiang@chinamobile.com>
Subject: Re: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement
X-BeenThere: ospf@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: The Official IETF OSPG WG Mailing List <ospf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ospf>, <mailto:ospf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ospf/>
List-Post: <mailto:ospf@ietf.org>
List-Help: <mailto:ospf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ospf>, <mailto:ospf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2016 13:56:45 -0000

Hi Acee,

Thanks a lot for your feedbacks.

For packet corruption which impacts the LS age before the LSAs are packed into LSU packet, I agree it is less likely to happen than the other cases. However I think we agree that OSPF authentication only protect the packet level corruption, which cannot help to detect the corruption at LSA level.

In my understanding, robustness is an important feature of network protocols, which include the robustness to errors and failures happened in the network. If there is a bug in a particular router in the network, operator would not allow the whole network being impacted, which means other routers in the network needs to work properly in this situation. For example in BGP, the error handling mechanism has been optimized to avoid unnecessary session teardown.

I agree that OSPF Yang notification for LSA timeout is a nice thing to have and could be useful to identify the misbehaved router. My concern is sometimes the network may be severely impacted that the connectivity of netconf/restconf is also impacted. To avoid this, some mechanism to mitigate the impact of this problem could help.

Best regards,
Jie

From: Acee Lindem (acee) [mailto:acee@cisco.com]
Sent: Saturday, August 13, 2016 3:27 AM
To: Les Ginsberg (ginsberg); Dongjie (Jimmy); ospf@ietf.org
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com
Subject: Re: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Speaking as a WG member:

Hi Jie,

I believe we agree that the problem is confined to OSPF bugs, system timer bugs,  and packet corruption. I’d assert that corruption can be detected via OSPF authentication. In fact, there is a well-known antidote where IS-IS authentication was enabled solely for the purpose of filtering corrupted protocol packets in an environment with line cards that were prone to such corruption. Hence, we are left with problems based on OSPF or system timer bugs. If there were a system timer bug, I’d doubt that networking device with such a bug would be functional to the point of being able to establish and maintaining OSPF adjacencies.  Do we really want to enhance the protocol to deal with bugs?

I’ve thought about this and one potential action I could envision would be to add a separate OSPF YANG notification where an LSA times out and a router other than the originator purges it. This way, the misbehaving OSPF router could be readily identified.

Thanks,
Acee


From: OSPF <ospf-bounces@ietf.org<mailto:ospf-bounces@ietf.org>> on behalf of "Les Ginsberg (ginsberg)" <ginsberg@cisco.com<mailto:ginsberg@cisco.com>>
Date: Thursday, August 11, 2016 at 1:29 PM
To: Jie Dong <jie.dong@huawei.com<mailto:jie.dong@huawei.com>>, OSPF WG List <ospf@ietf.org<mailto:ospf@ietf.org>>
Cc: "Zhangxudong (zhangxudong, VRP)" <zhangxudong@huawei.com<mailto:zhangxudong@huawei.com>>, "lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>" <lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>>
Subject: Re: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie –

Having the discussion has certainly been a good thing, but if the consensus of the WG is that there is no protocol change required then there is no need for any draft – which is my current position.

The other point is that you seem to be confusing the IS-IS Purge origination TLV (RFC 6232) with detecting invalid purges/remaining lifetime corruption. This is not the case. RFC 6232 simply allows us to detect which router originated a purge – it is not able to detect whether a purge is valid/invalid – and was not motivated by concerns about remaining lifetime corruption.

   Les


From: Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
Sent: Wednesday, August 10, 2016 9:24 PM
To: Les Ginsberg (ginsberg); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi Les,

The current draft is about problem statement, so IMO what the WG needs to consider is whether this is a vulnerability of OSPF protocol, and whether it can have negative impact to the network. If the problem is acknowledged, IMO it is worth to be documented.

The “ROI” as you mentioned is for the evaluation of the proposed solutions. I totally agree that for the timer bug case, recognizing and ignoring the received abnormal Maxage LSAs cannot stop the misbehaved router from generating further Maxage LSA, as it is a systematic problem, which can only be fixed after the operator identifies that router. This is also similar to the systematic corruption of IS-IS remain time.  And this is why this draft mentions two kinds of potential solutions, the mitigation mechanism can avoid the network being severely impacted by the problem, while for systematic problems, problem localization is needed to identify the misbehaved router and then solve the problem.

Best regards,
Jie

From: OSPF [mailto:ospf-bounces@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Monday, August 08, 2016 2:14 AM
To: Dongjie (Jimmy) <jie.dong@huawei.com<mailto:jie.dong@huawei.com>>; ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP) <zhangxudong@huawei.com<mailto:zhangxudong@huawei.com>>; lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: Re: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie –

Thinking about the following some more:

<snip>
What remains is the possibility that an implementation has some bug and unintentionally modifies the age to something other than what it should be due to the actual elapsed time since LSA generation. I suppose a mechanism equivalent to what the IS-IS draft defined i.e. setting the age to “new” (0 in OSPF case) when first receiving a non-self-generated LSA could be useful to prevent negative impacts of such an implementation bug. Is this what you intend?

[Jie]: More specifically, the problem could be caused by either “setting the LS age field incorrectly due to implementation bug” or “system timer runs so fast that the LS age reaches MaxAge much earlier than other routers”. Another less likely case is that the LS age field is corrupted before the LSA is assembled into OSPF packet.
<end snip>

The benefits are extremely limited. If a router prematurely ages an LSA due to a timer bug, ignoring the received LSA age on reception isn’t going to prevent premature purging by the router which has the bug. So the effect of ignoring the received LSA age prior to reaching MAXAGE will be short lived. You are then left with the possibility that an implementation corrupts the LSA age BEFORE calculating checksum/crypto authentication – but its local timeout logic is unaffected. This has very limited value. Whether the WG considers this worth pursuing is something you need to ask. For myself, I don’t see much ROI here.

  Les



From: Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
Sent: Monday, August 01, 2016 9:43 PM
To: Les Ginsberg (ginsberg); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi Les,

Please see my replies with [Jie2]:

From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
Sent: Monday, August 01, 2016 9:57 PM
To: Dongjie (Jimmy); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie -

From: Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
Sent: Monday, August 01, 2016 1:44 AM
To: Les Ginsberg (ginsberg); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi Les,

Please see inline with [Jie]:

From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
Sent: Monday, August 01, 2016 3:09 PM
To: Dongjie (Jimmy); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie –

Fully agree that IS-IS and OSPF differ in this regard.

https://www.ietf.org/id/draft-ietf-isis-remaining-lifetime-01.txt addresses problems where corruption of the remaining lifetime occurs either during transmission/reception or due to some DOS attack. This isn’t a concern w OSPF (hope you agree).

[Jie]: Yes, for OSPF the corruption during packet transmission can be detected.

What remains is the possibility that an implementation has some bug and unintentionally modifies the age to something other than what it should be due to the actual elapsed time since LSA generation. I suppose a mechanism equivalent to what the IS-IS draft defined i.e. setting the age to “new” (0 in OSPF case) when first receiving a non-self-generated LSA could be useful to prevent negative impacts of such an implementation bug. Is this what you intend?

[Jie]: More specifically, the problem could be caused by either “setting the LS age field incorrectly due to implementation bug” or “system timer runs so fast that the LS age reaches MaxAge much earlier than other routers”. Another less likely case is that the LS age field is corrupted before the LSA is assembled into OSPF packet.

[Jie]: Regarding the solutions space, IMO we need to consider both cases: “LS age reaches MaxAge” and “LS age close to MaxAge”. For IS-IS, RFC 6232 and RFC 6233 provide solutions for the detection and identification of corrupted IS-IS purge, while OSPF does not have similar mechanisms.

[Les:] It is incorrect to say that RFC 6232 makes it possible to detect a corrupt purge. What it does do is to provide an indication as to which IS initiated a purge. I don’t know how OSPF would address this issue, but for OSPFv2 at least any solution would likely not be backwards compatible. For this reason I suggest that you not try to address this issue in the same draft.

[Jie2]: Agreed, RFC 6232 provide the mechanism to track the misbehaved routers so that operator can fix the problem, the detection can be based on the rules in RFC 6233 or some other anomalies. Indeed for OSPFv2 legacy LSAs, it is difficult to introduce the mechanism similar to RFC 6232, while it can be easier for the OSPFv2/v3 Extended LSAs. So it depends on how backward compatible the solution should be. I agree with you that the solution for Problem Localization in OSPF needs to be provided in a separate document.

Solutions to LS age  corruption can be done in a backwards compatible way, but they  MUST NOT result in discarding purges which pass authentication- doing so places you at risk for having inconsistent LSDBs in the network.

[Jie2]: Exactly. The received MaxAge LSAs cannot simply be discarded, the decision must be made carefully, probably based on some additional information. The authors has discussed some possible solution internally, and will prepare some material for further open discussion.

As written, the draft makes claims that are at least misleading – and I believe actually incorrect. In Section 6 you say:

“The LS age field may be altered as a result of
   packet corruption, such modification cannot be detected by LSA
   checksum nor OSPF packet cryptographic authentication.”

This isn’t correct.

[Jie] Thanks for pointing out this. This sentence need to be revised to mention “LSA corruption” rather than “packet corruption”.

What would be helpful – at least to me – is to move from a generic problem statement to the specific problem you want to solve and the proposed solution. This also requires you to more clearly state the cases where there is an actual vulnerability. It would be a lot easier to support the draft if this were done.

[Jie] Thanks for your suggestion. Yes we can update this draft with more specific problem statements as I mentioned above.

[Jie] As for the proposed solutions, the current draft specifies the requirements on the potential solutions, from which we envision that different solutions maybe needed for “Impact Mitigation” and “Problem Localization”. The solution for “Impact mitigation” can be the easier one, for which we can start to discuss the potential solutions now. While the solution for “problem localization” may need more considerations.

[Les:] A discussion of the requirements is useful and necessary, but IMO until you propose a solution there isn’t enough substance for the document to become a WG document.

[Jie2] Yes the current draft focuses on the problem statement and the requirements, the goal is to firstly get the MaxAge flush problem acknowledged and reach consensus on the requirements. Then the plan is to specify the solutions in separate documents.  Your valuable suggestions will be considered, and further contributions are welcome.

Best regards,
Jie

    Les

Best regards,
Jie

   Les


From: Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
Sent: Sunday, July 31, 2016 11:48 PM
To: Les Ginsberg (ginsberg); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi Les,

Thanks for your comments.

OSPF packet level checksum and authentication can only protect the assembled LSU packet one hop on the wire, while cannot detect any change to LSA made by the routers. This is because the OSPF packets are re-assembled on each hop, which is slightly different from IS-IS. So the problem for OSPF is mainly due to the problems inside the router, for example protocol implementations, system timers, or some hardware problem. Actually this problem has been seen in several production networks.

We can improve the description in the draft to make this clear.

Best regards,
Jie

From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
Sent: Monday, August 01, 2016 1:30 PM
To: Dongjie (Jimmy); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie –

The draft says (Section 2):

“Since cryptographic authentication is executed at the OSPF packet
   level, it can only protect the assembled LSU packet for one hop and
   does not provide any additional protection for the corruption of LS
   age field.”

But as authentication is calculated at the OSPF packet level, any change to the LS age field for an individual LSA contained within the OSPF packet (e.g. by some packet corruption in transmission) would cause authentication to fail when the packet is received. So the statement you make is not correct. I therefore am struggling to understand what problem you believe is not addressed by existing authentication techniques.

   Les



From: OSPF [mailto:ospf-bounces@ietf.org] On Behalf Of Dongjie (Jimmy)
Sent: Sunday, July 31, 2016 8:15 PM
To: ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi all,

draft-dong-ospf-maxage-flush-problem-statement describes the problems caused by the corruption of the LS Age field, and summarizes the requirements on potential solutions. This draft received good comments during the presentation on the IETF meeting in B.A.

The authors would like to solicit further feedbacks from the mailing list, on both the problem statement and the solution requirements. Based on the feedbacks, we will update the problem statement draft, and work together to build suitable solutions.

The URL of the draft is:
https://tools.ietf.org/html/draft-dong-ospf-maxage-flush-problem-statement-00

Comments & feedbacks are welcome.

Best regards,
Jie