Re: [manet] RFC7181 (OLSRv2) trouble with ANSN and router restart

Justin Dean <bebemaster@gmail.com> Thu, 22 July 2021 15:32 UTC

Return-Path: <bebemaster@gmail.com>
X-Original-To: manet@ietfa.amsl.com
Delivered-To: manet@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 84CE93A4A6A for <manet@ietfa.amsl.com>; Thu, 22 Jul 2021 08:32:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4yDafcWdIFds for <manet@ietfa.amsl.com>; Thu, 22 Jul 2021 08:32:41 -0700 (PDT)
Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CB2033A4A66 for <manet@ietf.org>; Thu, 22 Jul 2021 08:32:40 -0700 (PDT)
Received: by mail-qt1-x82b.google.com with SMTP id q15so4523952qtp.0 for <manet@ietf.org>; Thu, 22 Jul 2021 08:32:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cvtm19GMbkJhN3ySg2+S2wdFMbYm6YWhhJfSyBaJ+B4=; b=Zj4YzuOWnJ7+HLdpnsH1SuI8X1ef7U/9bH/wKBNbzzGvC59MAHa8AUjiJQf+ZX7F0s M5bJ4RqiO8LTg35odKqjB8Uf8JYgIZpIZXM44dkgAn1+fQAOhPncZ1rYGfFoqAuDFzRM NfJeRTbB/whNOVUKmXo8dhH6C48rFjx/DXvjLXXdCaR7+lA7/US1JCGDZ6v1TOsuwjEH gydmvShJBtG35n9CHJuqtNtz8QSY2EsDMa/B4wMddHPoXWdbkDE3QoSusNI6SzW9pcmm eLt2ONj3DlEo6D84Q3wMDZvIM4i9TK7AWwNQue0o4ouZoiebgR/yVMuzIMOt/OhOZZlw xcpw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cvtm19GMbkJhN3ySg2+S2wdFMbYm6YWhhJfSyBaJ+B4=; b=XA9hUxidnEyhN4wDAtF/S1ZWqxIOQfRR05RdwymSPdefTHK2BrJeUFhv7TRC9MAFz+ z7Wz0nHOUiMstZzk5PA8cFfuFdkvNoPIXCtLu6QBCeN0bybuv6CmHf6JVAwpfpCBMMEm sf2MZVovCKGd4GN5gQuLSmsx8zua2yY4E8d9gq80MxFQQu3pR5BrOYMU3RThQKuad5/i ybYuVYONQSG/DFlEqZKPEXD3f3uZWtHQV/S1qd35f4FfdrHrvmoze+VsypRipeS7cBcQ LfXNeIjr9dcjugHj3q8cG/336x7smNz0c3b3Z++xERWYh6gSGgE4VLftWQ4ATlyt7Dqj WNpQ==
X-Gm-Message-State: AOAM531M81iH+IKWsGdbzZ6aAxkOK0TYzrqFrod9Qz+DUMurNeBlmbJs XO62oeWl7Qy2o67wJfC138b5v9H0VYAO/QCAZMYysP7X
X-Google-Smtp-Source: ABdhPJzuEmyPt3eTDCpCjXoIRn+QUvdkRyYRgvvPx9HTaNLgcnha8Rl3lV1z691ARY+iLF0joJ3bFZH5iHTPTvcWm+Y=
X-Received: by 2002:a05:622a:4c8:: with SMTP id q8mr231873qtx.187.1626967958671; Thu, 22 Jul 2021 08:32:38 -0700 (PDT)
MIME-Version: 1.0
References: <1626937943164.99401@fkie.fraunhofer.de>
In-Reply-To: <1626937943164.99401@fkie.fraunhofer.de>
From: Justin Dean <bebemaster@gmail.com>
Date: Thu, 22 Jul 2021 11:32:26 -0400
Message-ID: <CA+-pDCdGjVPyVxnfMt2trN_Rk5J_btZrt2teFg43JSEbo0sn6g@mail.gmail.com>
To: "Rogge, Henning" <henning.rogge@fkie.fraunhofer.de>
Cc: "manet@ietf.org" <manet@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000003cd61e05c7b7ff9b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/manet/o3CaTP5WVVkFZsT9ucSRMVzAQlM>
Subject: Re: [manet] RFC7181 (OLSRv2) trouble with ANSN and router restart
X-BeenThere: manet@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Mobile Ad-hoc Networks <manet.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/manet>, <mailto:manet-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/manet/>
List-Post: <mailto:manet@ietf.org>
List-Help: <mailto:manet-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/manet>, <mailto:manet-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Jul 2021 15:32:46 -0000

Interesting issue.  I think the solution you proposed for new ANSN < old
ANSN and new MSN > old MSN works and wouldn't be hard to provide some
ERRATA on how to deal with this.  I would suggest that when this state
occurs all existing entries with that ANSN be removed as if they were old
and all the entries in the message with the lower ANSN be added and the
ANSN number be reset to the lower value.

As for the lower case it will happen less often but can still happen.  One
way we could solve it is to have a router periodically update it's ANSN
upon restart (even when the state has not changed) and slowly stop updating
it over time.  Any update rate should suffice but an actual specification
need not be created as any progression of the ANSN near the restart time
should suffice.  With more updates providing more/quicker resolution at the
cost of efficiencies throughout the network.  We would still need the first
solution.

A completely different approach would be to use a lollipop numbering system
with restarts occuring say anything greater than 65500 and progressing
regardless of neighbor state for some small number then jumping to 0 and
looping the number system at 65500 instead of 65535.  Any ANSN with a value
on the "stick" would reset the ANSN if the currently stored ANSN was on the
"pop"  otherwise normal operation.  This is a bit more complicated and IMO
unnecessary for fixing this but just wanted to put it out there.

Justin Dean

On Thu, Jul 22, 2021 at 3:12 AM Rogge, Henning <
henning.rogge@fkie.fraunhofer.de> wrote:

> Hi,
>
> I think I have potentially identified an issue with OLSRv2 that can lead
> to a stable desynchronization of the OLSRv2 TC database after a router
> restart.
>
> The trouble happens because the ANSN (Advertised Neighbor Set Number) can
> (and should) become stable when the locally reachable neighbors and their
> metrics don't restart anymore.
>
> The sequence that leads to the issue is:
>
> 1) router A restarts
> 2) router A (randomly?) selects a new Message Sequence Number which is
> HIGHER (in terms of cyclical comparison) than the last one it used
> 3) router A selects a new ANSN which is LOWER (or the same) than the last
> one it used
> 4) router B sees the new message sequence number/ANSN in TCs from router A
>    => router B does not allow the old TC data to timeout (message sequence
> number is higher!)
>    => router B does NOT overwrite the old TC data (ANSN is lower)
>
> this situation will continue as long as the ANSN or router A (which can be
> stable for an arbitrary time) stays below the ANSN used by the router
> before its restart
>
> There are two parts to this problem, one of them easy to fix.
>
> a) the ANSN after the restart is lower than the ANSN before. We could just
> demand that a router does NOT increase the validity time of the TC entry in
> this case... or that it overwrites the TC entry (the combination of "new
> message seqno" and "old ANSN" should only happen after a restart)
> b) the ANSN after the restart is the SAME as before... this is tricky, I
> have no idea how to resolve this at the receiver without comparing the TC
> data with the database, which is not reliable when we deal with incomplete
> TCs.
>
> What do you think?
>
> Henning Rogge
> _______________________________________________
> manet mailing list
> manet@ietf.org
> https://www.ietf.org/mailman/listinfo/manet
>