Re: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

"Les Ginsberg (ginsberg)" <ginsberg@cisco.com> Tue, 16 August 2016 23:04 UTC

Return-Path: <ginsberg@cisco.com>
X-Original-To: ospf@ietfa.amsl.com
Delivered-To: ospf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3ED2E12D5D1 for <ospf@ietfa.amsl.com>; Tue, 16 Aug 2016 16:04:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -15.767
X-Spam-Level:
X-Spam-Status: No, score=-15.767 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.247, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tyTrEYWE_7cr for <ospf@ietfa.amsl.com>; Tue, 16 Aug 2016 16:04:41 -0700 (PDT)
Received: from alln-iport-6.cisco.com (alln-iport-6.cisco.com [173.37.142.93]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7768612D151 for <ospf@ietf.org>; Tue, 16 Aug 2016 16:04:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=70739; q=dns/txt; s=iport; t=1471388681; x=1472598281; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=wowahxdvttfI1mUPE2UfWBQW4ssrZv5Z6Hh9S1EyUnQ=; b=aOw3CTC9ueiXiTtI96lqa85vBGl00B6ifG1ZWDWy+x+j8qMAZKwaxF8/ nI9yO9Do8joPwp1ZFdCWcflyiIPtyZO2BXzFITvPvLQe9W010i06jYpBM XY1/uVEO0/L5OiFHTD+kFymbznKpWMreXEvAnhrrndmqkU5lW7fI3G7mT Q=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0CMAgDBmrNX/4oNJK1TCoJ3TlZ8B7c2gg+BfSaFdwKBWzgUAgEBAQEBAQFeJ4ReAQEFGhNBCxACAQgRBAEBIQEGBzIUCQgCBAENBQgTiBYOvn8BAQEBAQEBAQEBAQEBAQEBAQEBAQEXBYYqhE2EGARahSUFiC2LT4VHAYYdiHSBcoRciH6GZYVTg3cBHjaDem4BhRtGAX4BAQE
X-IronPort-AV: E=Sophos;i="5.28,529,1464652800"; d="scan'208,217";a="311703139"
Received: from alln-core-5.cisco.com ([173.36.13.138]) by alln-iport-6.cisco.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 16 Aug 2016 23:04:38 +0000
Received: from XCH-RCD-004.cisco.com (xch-rcd-004.cisco.com [173.37.102.14]) by alln-core-5.cisco.com (8.14.5/8.14.5) with ESMTP id u7GN4ckF025382 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Tue, 16 Aug 2016 23:04:39 GMT
Received: from xch-aln-001.cisco.com (173.36.7.11) by XCH-RCD-004.cisco.com (173.37.102.14) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Tue, 16 Aug 2016 18:04:38 -0500
Received: from xch-aln-001.cisco.com ([173.36.7.11]) by XCH-ALN-001.cisco.com ([173.36.7.11]) with mapi id 15.00.1210.000; Tue, 16 Aug 2016 18:04:38 -0500
From: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
To: "Dongjie (Jimmy)" <jie.dong@huawei.com>, "ospf@ietf.org" <ospf@ietf.org>
Thread-Topic: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement
Thread-Index: AQHR84hPeSCWxXzyPU+GgmaHtHS1iaBEA4oQgAfqA4CAAE/bAA==
Date: Tue, 16 Aug 2016 23:04:38 +0000
Message-ID: <09aab4b3829a4203a32ea6febfdd8a89@XCH-ALN-001.cisco.com>
References: <76CD132C3ADEF848BD84D028D243C92774EFB09A@NKGEML515-MBX.china.huawei.com> <90433b8486184c9cb4b947e7ffb9fc73@XCH-ALN-001.cisco.com> <76CD132C3ADEF848BD84D028D243C92774EFB143@NKGEML515-MBX.china.huawei.com> <0369fc017f8d47568594d3eb9f684649@XCH-ALN-001.cisco.com> <76CD132C3ADEF848BD84D028D243C92774EFB1BF@NKGEML515-MBX.china.huawei.com> <3a424b8025ca42a5a64bf88af69ea108@XCH-ALN-001.cisco.com> <76CD132C3ADEF848BD84D028D243C92774EFBC05@NKGEML515-MBX.china.huawei.com> <37a4a1ba0da84b76a4d5962f59441a17@XCH-ALN-001.cisco.com> <76CD132C3ADEF848BD84D028D243C92774F05C49@NKGEML515-MBX.china.huawei.com> <36c4636b09bf4464b912080806d917e3@XCH-ALN-001.cisco.com> <76CD132C3ADEF848BD84D028D243C92774F08496@NKGEML515-MBX.china.huawei.com>
In-Reply-To: <76CD132C3ADEF848BD84D028D243C92774F08496@NKGEML515-MBX.china.huawei.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [128.107.151.10]
Content-Type: multipart/alternative; boundary="_000_09aab4b3829a4203a32ea6febfdd8a89XCHALN001ciscocom_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/ospf/LZrF1hDdc9WNAjrYaQTyTlDJde4>
Cc: "Zhangxudong (zhangxudong, VRP)" <zhangxudong@huawei.com>, "lizhenqiang@chinamobile.com" <lizhenqiang@chinamobile.com>
Subject: Re: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement
X-BeenThere: ospf@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: The Official IETF OSPG WG Mailing List <ospf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ospf>, <mailto:ospf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ospf/>
List-Post: <mailto:ospf@ietf.org>
List-Help: <mailto:ospf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ospf>, <mailto:ospf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2016 23:04:46 -0000

Jie -

I agree that the discussion has been useful. But my position - considering all the points made during this discussion - is that no protocol changes are advisable.
I therefore think the draft should not be moved forward.

   Les


From: Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
Sent: Tuesday, August 16, 2016 6:16 AM
To: Les Ginsberg (ginsberg); ospf@ietf.org
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi Les,

It seems that after these days discussion, now we are clear with the problem, this is a good progress for the problem statement draft. Next step I think we can focus on the discussion of the solutions.

If the Maxage flush problem happens in production network, without protocol change, it will have severe impact to the network, which I believe is not acceptable. At least some mechanism to mitigate the impact is needed. The mechanism you proposed can be part of the mitigation solution, while some optimization needs to be considered to avoid unexpected behaviors, e.g. in some cases the LSA may stay for quite a long time and cannot get aged properly.

As for RFC 6232, sorry for not making it clear in the beginning. RFC 6232 can be useful to track the originator of the purge message when an invalid purge (e.g. remain lifetime corruption) is detected by some other means.

Best regards,
Jie

From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
Sent: Friday, August 12, 2016 1:29 AM
To: Dongjie (Jimmy) <jie.dong@huawei.com<mailto:jie.dong@huawei.com>>; ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP) <zhangxudong@huawei.com<mailto:zhangxudong@huawei.com>>; lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie -

Having the discussion has certainly been a good thing, but if the consensus of the WG is that there is no protocol change required then there is no need for any draft - which is my current position.

The other point is that you seem to be confusing the IS-IS Purge origination TLV (RFC 6232) with detecting invalid purges/remaining lifetime corruption. This is not the case. RFC 6232 simply allows us to detect which router originated a purge - it is not able to detect whether a purge is valid/invalid - and was not motivated by concerns about remaining lifetime corruption.

   Les


From: Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
Sent: Wednesday, August 10, 2016 9:24 PM
To: Les Ginsberg (ginsberg); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi Les,

The current draft is about problem statement, so IMO what the WG needs to consider is whether this is a vulnerability of OSPF protocol, and whether it can have negative impact to the network. If the problem is acknowledged, IMO it is worth to be documented.

The "ROI" as you mentioned is for the evaluation of the proposed solutions. I totally agree that for the timer bug case, recognizing and ignoring the received abnormal Maxage LSAs cannot stop the misbehaved router from generating further Maxage LSA, as it is a systematic problem, which can only be fixed after the operator identifies that router. This is also similar to the systematic corruption of IS-IS remain time.  And this is why this draft mentions two kinds of potential solutions, the mitigation mechanism can avoid the network being severely impacted by the problem, while for systematic problems, problem localization is needed to identify the misbehaved router and then solve the problem.

Best regards,
Jie

From: OSPF [mailto:ospf-bounces@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Monday, August 08, 2016 2:14 AM
To: Dongjie (Jimmy) <jie.dong@huawei.com<mailto:jie.dong@huawei.com>>; ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP) <zhangxudong@huawei.com<mailto:zhangxudong@huawei.com>>; lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: Re: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie -

Thinking about the following some more:

<snip>
What remains is the possibility that an implementation has some bug and unintentionally modifies the age to something other than what it should be due to the actual elapsed time since LSA generation. I suppose a mechanism equivalent to what the IS-IS draft defined i.e. setting the age to "new" (0 in OSPF case) when first receiving a non-self-generated LSA could be useful to prevent negative impacts of such an implementation bug. Is this what you intend?

[Jie]: More specifically, the problem could be caused by either "setting the LS age field incorrectly due to implementation bug" or "system timer runs so fast that the LS age reaches MaxAge much earlier than other routers". Another less likely case is that the LS age field is corrupted before the LSA is assembled into OSPF packet.
<end snip>

The benefits are extremely limited. If a router prematurely ages an LSA due to a timer bug, ignoring the received LSA age on reception isn't going to prevent premature purging by the router which has the bug. So the effect of ignoring the received LSA age prior to reaching MAXAGE will be short lived. You are then left with the possibility that an implementation corrupts the LSA age BEFORE calculating checksum/crypto authentication - but its local timeout logic is unaffected. This has very limited value. Whether the WG considers this worth pursuing is something you need to ask. For myself, I don't see much ROI here.

  Les



From: Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
Sent: Monday, August 01, 2016 9:43 PM
To: Les Ginsberg (ginsberg); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi Les,

Please see my replies with [Jie2]:

From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
Sent: Monday, August 01, 2016 9:57 PM
To: Dongjie (Jimmy); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie -

From: Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
Sent: Monday, August 01, 2016 1:44 AM
To: Les Ginsberg (ginsberg); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi Les,

Please see inline with [Jie]:

From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
Sent: Monday, August 01, 2016 3:09 PM
To: Dongjie (Jimmy); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie -

Fully agree that IS-IS and OSPF differ in this regard.

https://www.ietf.org/id/draft-ietf-isis-remaining-lifetime-01.txt addresses problems where corruption of the remaining lifetime occurs either during transmission/reception or due to some DOS attack. This isn't a concern w OSPF (hope you agree).

[Jie]: Yes, for OSPF the corruption during packet transmission can be detected.

What remains is the possibility that an implementation has some bug and unintentionally modifies the age to something other than what it should be due to the actual elapsed time since LSA generation. I suppose a mechanism equivalent to what the IS-IS draft defined i.e. setting the age to "new" (0 in OSPF case) when first receiving a non-self-generated LSA could be useful to prevent negative impacts of such an implementation bug. Is this what you intend?

[Jie]: More specifically, the problem could be caused by either "setting the LS age field incorrectly due to implementation bug" or "system timer runs so fast that the LS age reaches MaxAge much earlier than other routers". Another less likely case is that the LS age field is corrupted before the LSA is assembled into OSPF packet.

[Jie]: Regarding the solutions space, IMO we need to consider both cases: "LS age reaches MaxAge" and "LS age close to MaxAge". For IS-IS, RFC 6232 and RFC 6233 provide solutions for the detection and identification of corrupted IS-IS purge, while OSPF does not have similar mechanisms.

[Les:] It is incorrect to say that RFC 6232 makes it possible to detect a corrupt purge. What it does do is to provide an indication as to which IS initiated a purge. I don't know how OSPF would address this issue, but for OSPFv2 at least any solution would likely not be backwards compatible. For this reason I suggest that you not try to address this issue in the same draft.

[Jie2]: Agreed, RFC 6232 provide the mechanism to track the misbehaved routers so that operator can fix the problem, the detection can be based on the rules in RFC 6233 or some other anomalies. Indeed for OSPFv2 legacy LSAs, it is difficult to introduce the mechanism similar to RFC 6232, while it can be easier for the OSPFv2/v3 Extended LSAs. So it depends on how backward compatible the solution should be. I agree with you that the solution for Problem Localization in OSPF needs to be provided in a separate document.

Solutions to LS age  corruption can be done in a backwards compatible way, but they  MUST NOT result in discarding purges which pass authentication- doing so places you at risk for having inconsistent LSDBs in the network.

[Jie2]: Exactly. The received MaxAge LSAs cannot simply be discarded, the decision must be made carefully, probably based on some additional information. The authors has discussed some possible solution internally, and will prepare some material for further open discussion.

As written, the draft makes claims that are at least misleading - and I believe actually incorrect. In Section 6 you say:

"The LS age field may be altered as a result of
   packet corruption, such modification cannot be detected by LSA
   checksum nor OSPF packet cryptographic authentication."

This isn't correct.

[Jie] Thanks for pointing out this. This sentence need to be revised to mention "LSA corruption" rather than "packet corruption".

What would be helpful - at least to me - is to move from a generic problem statement to the specific problem you want to solve and the proposed solution. This also requires you to more clearly state the cases where there is an actual vulnerability. It would be a lot easier to support the draft if this were done.

[Jie] Thanks for your suggestion. Yes we can update this draft with more specific problem statements as I mentioned above.

[Jie] As for the proposed solutions, the current draft specifies the requirements on the potential solutions, from which we envision that different solutions maybe needed for "Impact Mitigation" and "Problem Localization". The solution for "Impact mitigation" can be the easier one, for which we can start to discuss the potential solutions now. While the solution for "problem localization" may need more considerations.

[Les:] A discussion of the requirements is useful and necessary, but IMO until you propose a solution there isn't enough substance for the document to become a WG document.

[Jie2] Yes the current draft focuses on the problem statement and the requirements, the goal is to firstly get the MaxAge flush problem acknowledged and reach consensus on the requirements. Then the plan is to specify the solutions in separate documents.  Your valuable suggestions will be considered, and further contributions are welcome.

Best regards,
Jie

    Les

Best regards,
Jie

   Les


From: Dongjie (Jimmy) [mailto:jie.dong@huawei.com]
Sent: Sunday, July 31, 2016 11:48 PM
To: Les Ginsberg (ginsberg); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi Les,

Thanks for your comments.

OSPF packet level checksum and authentication can only protect the assembled LSU packet one hop on the wire, while cannot detect any change to LSA made by the routers. This is because the OSPF packets are re-assembled on each hop, which is slightly different from IS-IS. So the problem for OSPF is mainly due to the problems inside the router, for example protocol implementations, system timers, or some hardware problem. Actually this problem has been seen in several production networks.

We can improve the description in the draft to make this clear.

Best regards,
Jie

From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
Sent: Monday, August 01, 2016 1:30 PM
To: Dongjie (Jimmy); ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: RE: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Jie -

The draft says (Section 2):

"Since cryptographic authentication is executed at the OSPF packet
   level, it can only protect the assembled LSU packet for one hop and
   does not provide any additional protection for the corruption of LS
   age field."

But as authentication is calculated at the OSPF packet level, any change to the LS age field for an individual LSA contained within the OSPF packet (e.g. by some packet corruption in transmission) would cause authentication to fail when the packet is received. So the statement you make is not correct. I therefore am struggling to understand what problem you believe is not addressed by existing authentication techniques.

   Les



From: OSPF [mailto:ospf-bounces@ietf.org] On Behalf Of Dongjie (Jimmy)
Sent: Sunday, July 31, 2016 8:15 PM
To: ospf@ietf.org<mailto:ospf@ietf.org>
Cc: Zhangxudong (zhangxudong, VRP); lizhenqiang@chinamobile.com<mailto:lizhenqiang@chinamobile.com>
Subject: [OSPF] Solicit feedbacks on draft-dong-ospf-maxage-flush-problem-statement

Hi all,

draft-dong-ospf-maxage-flush-problem-statement describes the problems caused by the corruption of the LS Age field, and summarizes the requirements on potential solutions. This draft received good comments during the presentation on the IETF meeting in B.A.

The authors would like to solicit further feedbacks from the mailing list, on both the problem statement and the solution requirements. Based on the feedbacks, we will update the problem statement draft, and work together to build suitable solutions.

The URL of the draft is:
https://tools.ietf.org/html/draft-dong-ospf-maxage-flush-problem-statement-00

Comments & feedbacks are welcome.

Best regards,
Jie