Re: [manet] RFC7181 (OLSRv2) trouble with ANSN and router restart

"Rogge, Henning" <henning.rogge@fkie.fraunhofer.de> Fri, 23 July 2021 06:11 UTC

Return-Path: <henning.rogge@fkie.fraunhofer.de>
X-Original-To: manet@ietfa.amsl.com
Delivered-To: manet@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 717D93A1EC8 for <manet@ietfa.amsl.com>; Thu, 22 Jul 2021 23:11:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.298
X-Spam-Level:
X-Spam-Status: No, score=-4.298 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=fkie.fraunhofer.de
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IiKYMJ8XMZ9h for <manet@ietfa.amsl.com>; Thu, 22 Jul 2021 23:11:51 -0700 (PDT)
Received: from mail-edgeDD24.fraunhofer.de (mail-edgeDD24.fraunhofer.de [192.102.167.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 214C13A1EC5 for <manet@ietf.org>; Thu, 22 Jul 2021 23:11:50 -0700 (PDT)
IronPort-SDR: cBsDxzEspPDroJw5jX+mXb++0U2jTCJB7j32iRaB50Gv9Q6SpOrmhjKVXJBNfgvGL29yy27UNF MlGvxWbgDBYg==
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2FCAACIXPpg/xmnZsBaHAEBAQEBAQc?= =?us-ascii?q?BARIBAQQEAQFAgUYGAQELAYEiAS8jgSyBQguNQYhgA4pXkQcUgV8JCwEBAQE?= =?us-ascii?q?BAQEBAQkqAQoKAgQBAQMDhFECgnkBJTUIDgIEAQEBEgEBBgEBAQEBBgQCAoE?= =?us-ascii?q?KhWgBDINWgQgBAQEBAQEBAQEBAQEBAQEBAQEBFgIIORk6EgEBHgIBAwEBPgE?= =?us-ascii?q?BLAsBDwIBKR4HDxIGCxQRAgQOBRuCV4F+UgUDLwEOqEx4gTSBAYIHAQEGglm?= =?us-ascii?q?CSA2CPwMGgToBhweCaYN7J4FmQ4EVNoMtgQSBHEIBAYFGAUYJhUWDGgYBLYE?= =?us-ascii?q?NCRc7BSU+AhUjCzeUY4UPg3KBYp0RXAMEA4F9gSmYXYVhK4McR5FokRiWCYI?= =?us-ascii?q?cjUuVT4E/IgFlG4ETcU+CaVAXAg6OHxeEDYRhhUpzAjYCBgsBAQMJfIlsAYE?= =?us-ascii?q?QAQE?=
X-IPAS-Result: =?us-ascii?q?A2FCAACIXPpg/xmnZsBaHAEBAQEBAQcBARIBAQQEAQFAg?= =?us-ascii?q?UYGAQELAYEiAS8jgSyBQguNQYhgA4pXkQcUgV8JCwEBAQEBAQEBAQkqAQoKA?= =?us-ascii?q?gQBAQMDhFECgnkBJTUIDgIEAQEBEgEBBgEBAQEBBgQCAoEKhWgBDINWgQgBA?= =?us-ascii?q?QEBAQEBAQEBAQEBAQEBAQEBFgIIORk6EgEBHgIBAwEBPgEBLAsBDwIBKR4HD?= =?us-ascii?q?xIGCxQRAgQOBRuCV4F+UgUDLwEOqEx4gTSBAYIHAQEGglmCSA2CPwMGgToBh?= =?us-ascii?q?weCaYN7J4FmQ4EVNoMtgQSBHEIBAYFGAUYJhUWDGgYBLYENCRc7BSU+AhUjC?= =?us-ascii?q?zeUY4UPg3KBYp0RXAMEA4F9gSmYXYVhK4McR5FokRiWCYIcjUuVT4E/IgFlG?= =?us-ascii?q?4ETcU+CaVAXAg6OHxeEDYRhhUpzAjYCBgsBAQMJfIlsAYEQAQE?=
X-IronPort-AV: E=Sophos; i="5.84,263,1620684000"; d="scan'208,217"; a="44345851"
Received: from mail-mtadd25.fraunhofer.de ([192.102.167.25]) by mail-edgeDD24.fraunhofer.de with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jul 2021 08:11:48 +0200
IronPort-SDR: OnQHFFoRlOdwnrtmJSnW1bIgrsPCRd087EV/OGsrOA9k8rREQnHWOTFRkbAuJrWDtF5aaPBZRe ERAZWfnec3kVLFNqCcIRQzlo9GvN/dTu4=
IronPort-HdrOrdr: =?us-ascii?q?A9a23=3A22WrsqmNS4jxSdaYbee/kaPKM/bpDfO/im?= =?us-ascii?q?dD5ihNYBxZY6WkfpiV7Y0mPR+dslcssQIb6Ki90ci7MDrhHPFOkOws1NuZMz?= =?us-ascii?q?UO/VHYSr2KjrGSiwEIeReOktK1vJ0IG8MVZbHN5BpB/KHHCWGDYpMdKbK8kJ?= =?us-ascii?q?xA8N2urUuFOjsaCJ2IgT0WNu75encGAzWvWvECZcOhDq8unUvlRV0nKuCAQl?= =?us-ascii?q?UVVenKoNPG0Lj8ZwQdOhIh4A6SyRu19b/TCXGjr1wjegIK5Y1n3XnOkgT/6K?= =?us-ascii?q?nmmeq80AXg22ja6IkTsMf9y+FEGNeHhqEuW3/RY0eTFclcso+5zX8ISdKUmR?= =?us-ascii?q?gXeR730lYd1vFImj/sl6eO0FvQMkfboXUTAjTZuCSlaDPY0LbErZgBeogx3b?= =?us-ascii?q?6xNCGprXbI9esMrZ5jziaXsYFaAgjHmzm479/UVwtynk7xunY6l/UP5kYvJL?= =?us-ascii?q?f2x4Uh3LD30XklY6voJhiKmrwPAa1rFoXR9fxWeVSVYzTQuXRu2sWlWjA2Eg?= =?us-ascii?q?2dSkYPt8SJ23wO9UoJuHcw1YgahDMN5Zg9Q55L66DNNblpjqhHSosTYbhmDO?= =?us-ascii?q?kMTMOrAijGQA7KMmiVPVP7fZt3T04ljqSHlondyNvaB6Dg/aFC6KgpCmkoyV?= =?us-ascii?q?LaU3ied/Gz4A=3D=3D?=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0CHAACIXPpg/wUDB4BaHAEBAQEBAQc?= =?us-ascii?q?BARIBAQQEAQFAgUYGAQELAYEiAS8jgSxaJRIxC41BiGADileRBxSBXwkLAQM?= =?us-ascii?q?BAQEBAQkEJgEKCgIEAQGEVwKCeCY1CA4CBAEBARIBAQEEAQEBAgEGAgEBexO?= =?us-ascii?q?FaAEMhkYCAQMBAT4BASwLAQ8CASkeBw8SBgsUEQIEDgUbgleBflIFAy8BDqh?= =?us-ascii?q?MeIE0gQGCBwEBBoJZgkgNgj8DBoE6AYcHgmmDe4INQ4EVNoMtgQSBHEIBAYF?= =?us-ascii?q?GAUYJhUWDGgYBLYENCRc7BSU+AhUjCzeUY4UPg3KBYp0RXAMEA4F9gSmYXYV?= =?us-ascii?q?hK4McR5FokRiWCYIcjUuVT4E/IgE5KxuBE3FPgmlQFwIOjh8XhA2EYYVKcwI?= =?us-ascii?q?2AgYLAQEDCXyJbAGBEAEB?=
X-IPAS-Result: =?us-ascii?q?A0CHAACIXPpg/wUDB4BaHAEBAQEBAQcBARIBAQQEAQFAg?= =?us-ascii?q?UYGAQELAYEiAS8jgSxaJRIxC41BiGADileRBxSBXwkLAQMBAQEBAQkEJgEKC?= =?us-ascii?q?gIEAQGEVwKCeCY1CA4CBAEBARIBAQEEAQEBAgEGAgEBexOFaAEMhkYCAQMBA?= =?us-ascii?q?T4BASwLAQ8CASkeBw8SBgsUEQIEDgUbgleBflIFAy8BDqhMeIE0gQGCBwEBB?= =?us-ascii?q?oJZgkgNgj8DBoE6AYcHgmmDe4INQ4EVNoMtgQSBHEIBAYFGAUYJhUWDGgYBL?= =?us-ascii?q?YENCRc7BSU+AhUjCzeUY4UPg3KBYp0RXAMEA4F9gSmYXYVhK4McR5FokRiWC?= =?us-ascii?q?YIcjUuVT4E/IgE5KxuBE3FPgmlQFwIOjh8XhA2EYYVKcwI2AgYLAQEDCXyJb?= =?us-ascii?q?AGBEAEB?=
X-IronPort-AV: E=Sophos;i="5.84,263,1620684000"; d="scan'208,217";a="116625046"
X-IronPort-Outbreak-Status: No, level 0, Unknown - Unknown
Received: from mailguard.fkie.fraunhofer.de (HELO a.mx.fkie.fraunhofer.de) ([128.7.3.5]) by mail-mtaDD25.fraunhofer.de with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jul 2021 08:11:45 +0200
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=fkie.fraunhofer.de; s=dkim202105; h=MIME-Version:Content-Type:In-Reply-To: References:Message-ID:Date:Subject:CC:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=S6/utA/TtfdnN7jPxYU84W08fUaJhiyZwNeR+TqZW18=; b=sUaFZDos8h9Ts9qFm1SeFh5ie uFTdmR4os1V7lVAN2mYgz4fcMajNqo5oTG8iUH+E43LTYQGdcPBPkDl2jJAWqOLOmVDRx9/42mlvd s8clbpuqViCLTkmEHfCfPUzVzb3Sl96dX5cvHU6s1aGY7PRTld4vfo8l44R05T31eAi1xt6Wt7lG6 tUPQT7iRFDcDjnKTIjxjKxL7Ym7qa0YCBfoNOKriXJC+J1mptOqv62Af54omM3pYJnfwSFd2hJqKE fR2OgFVikCM7a7Y1Or/ugo2k+MfkDmCiQ0MXQIFVt0wv/LHYCyzmpYiHF/07IsBy8juPIs8Oz97dM NXwP7ledQ==;
Received: from srv-mailhost-b.fkie.fraunhofer.de ([128.7.10.131]) by a.mx.fkie.fraunhofer.de with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <henning.rogge@fkie.fraunhofer.de>) id 1m6oPR-0001Wd-0Y; Fri, 23 Jul 2021 08:11:45 +0200
Received: from srv-mail-01.fkie.fraunhofer.de ([128.7.11.16] helo=srv-mail-01.gaia.fkie.fraunhofer.de) by srv-mailhost-b.fkie.fraunhofer.de with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from <henning.rogge@fkie.fraunhofer.de>) id 1m6oPN-00005n-9b; Fri, 23 Jul 2021 08:11:41 +0200
Received: from srv-mail-03.gaia.fkie.fraunhofer.de (128.7.11.18) by srv-mail-01.gaia.fkie.fraunhofer.de (128.7.11.16) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Fri, 23 Jul 2021 08:11:44 +0200
Received: from srv-mail-03.gaia.fkie.fraunhofer.de ([fe80::bdb5:83e4:9ad3:822f]) by srv-mail-03.gaia.fkie.fraunhofer.de ([fe80::bdb5:83e4:9ad3:822f%13]) with mapi id 15.00.1497.018; Fri, 23 Jul 2021 08:11:43 +0200
From: "Rogge, Henning" <henning.rogge@fkie.fraunhofer.de>
To: "manet@ietf.org" <manet@ietf.org>
Thread-Topic: [manet] RFC7181 (OLSRv2) trouble with ANSN and router restart
Thread-Index: AQHXfseSiLtj09eyb0WM8ePG4fmzGqtO/nUAgAEWWYs=
Date: Fri, 23 Jul 2021 06:11:43 +0000
Message-ID: <1627020703962.66100@fkie.fraunhofer.de>
References: <1626937943164.99401@fkie.fraunhofer.de>, <CA+-pDCdGjVPyVxnfMt2trN_Rk5J_btZrt2teFg43JSEbo0sn6g@mail.gmail.com>
In-Reply-To: <CA+-pDCdGjVPyVxnfMt2trN_Rk5J_btZrt2teFg43JSEbo0sn6g@mail.gmail.com>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [128.7.4.48]
Content-Type: multipart/alternative; boundary="_000_162702070396266100fkiefraunhoferde_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/manet/eXn90Lf_TLehILCi-jonErUPL_M>
Subject: Re: [manet] RFC7181 (OLSRv2) trouble with ANSN and router restart
X-BeenThere: manet@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Mobile Ad-hoc Networks <manet.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/manet>, <mailto:manet-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/manet/>
List-Post: <mailto:manet@ietf.org>
List-Help: <mailto:manet-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/manet>, <mailto:manet-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Jul 2021 06:11:57 -0000

The problem is easy to solve on the "sender" side, but if we manage I would like to solve it on the receiver side (of TCs).


This way we could prevent an incompatible change to the protocol.


Could the problem be solved if we track (per prefix) when we got the last TC containing it? This way, even when a transmitter only sends "incomplete TCs", we would eventually remove all database entries that have never been repeated despite being part of the (currently unchanging) ANSN.


Henning


________________________________
Von: Justin Dean <bebemaster@gmail.com>
Gesendet: Donnerstag, 22. Juli 2021 17:32
An: Rogge, Henning
Cc: manet@ietf.org
Betreff: Re: [manet] RFC7181 (OLSRv2) trouble with ANSN and router restart

Interesting issue.  I think the solution you proposed for new ANSN < old ANSN and new MSN > old MSN works and wouldn't be hard to provide some ERRATA on how to deal with this.  I would suggest that when this state occurs all existing entries with that ANSN be removed as if they were old and all the entries in the message with the lower ANSN be added and the ANSN number be reset to the lower value.

As for the lower case it will happen less often but can still happen.  One way we could solve it is to have a router periodically update it's ANSN upon restart (even when the state has not changed) and slowly stop updating it over time.  Any update rate should suffice but an actual specification need not be created as any progression of the ANSN near the restart time should suffice.  With more updates providing more/quicker resolution at the cost of efficiencies throughout the network.  We would still need the first solution.

A completely different approach would be to use a lollipop numbering system with restarts occuring say anything greater than 65500 and progressing regardless of neighbor state for some small number then jumping to 0 and looping the number system at 65500 instead of 65535.  Any ANSN with a value on the "stick" would reset the ANSN if the currently stored ANSN was on the "pop"  otherwise normal operation.  This is a bit more complicated and IMO unnecessary for fixing this but just wanted to put it out there.

Justin Dean

On Thu, Jul 22, 2021 at 3:12 AM Rogge, Henning <henning.rogge@fkie.fraunhofer.de<mailto:henning.rogge@fkie.fraunhofer.de>> wrote:
Hi,

I think I have potentially identified an issue with OLSRv2 that can lead to a stable desynchronization of the OLSRv2 TC database after a router restart.

The trouble happens because the ANSN (Advertised Neighbor Set Number) can (and should) become stable when the locally reachable neighbors and their metrics don't restart anymore.

The sequence that leads to the issue is:

1) router A restarts
2) router A (randomly?) selects a new Message Sequence Number which is HIGHER (in terms of cyclical comparison) than the last one it used
3) router A selects a new ANSN which is LOWER (or the same) than the last one it used
4) router B sees the new message sequence number/ANSN in TCs from router A
   => router B does not allow the old TC data to timeout (message sequence number is higher!)
   => router B does NOT overwrite the old TC data (ANSN is lower)

this situation will continue as long as the ANSN or router A (which can be stable for an arbitrary time) stays below the ANSN used by the router before its restart

There are two parts to this problem, one of them easy to fix.

a) the ANSN after the restart is lower than the ANSN before. We could just demand that a router does NOT increase the validity time of the TC entry in this case... or that it overwrites the TC entry (the combination of "new message seqno" and "old ANSN" should only happen after a restart)
b) the ANSN after the restart is the SAME as before... this is tricky, I have no idea how to resolve this at the receiver without comparing the TC data with the database, which is not reliable when we deal with incomplete TCs.

What do you think?

Henning Rogge
_______________________________________________
manet mailing list
manet@ietf.org<mailto:manet@ietf.org>
https://www.ietf.org/mailman/listinfo/manet