[manet] OLSRv2 router restart

Christopher Dearlove <christopher.dearlove@gmail.com> Fri, 23 July 2021 13:36 UTC

Return-Path: <christopher.dearlove@gmail.com>
X-Original-To: manet@ietfa.amsl.com
Delivered-To: manet@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D11F73A1D4B for <manet@ietfa.amsl.com>; Fri, 23 Jul 2021 06:36:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JP5w-L0Q8rNA for <manet@ietfa.amsl.com>; Fri, 23 Jul 2021 06:36:38 -0700 (PDT)
Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B04143A1D0E for <manet@ietf.org>; Fri, 23 Jul 2021 06:36:38 -0700 (PDT)
Received: by mail-wr1-x42d.google.com with SMTP id w12so2330595wro.13 for <manet@ietf.org>; Fri, 23 Jul 2021 06:36:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=8egjL9FFdfLdHjXIxIPM0MULEnP5x0u0fdq/2K5rru4=; b=s462U5Y5/0A5d/bF8Na3O5Wciycfxwgn4M+evJhpVf5tEcT1UFSSm1zWWDxHM8hscI ykyxgxc/gZqKnAmX8ZJGflyk7b8LZO1urc7mpyTTzM9dpLXOoZKZ36Sh+J33AWy5kM6R uuv6SwfGQNmlhl6dAXNqkjLAeoVnkqmLZI8yb1A2FB//ljEI3ucSM955AJPbzlM/URd4 2QQk23ai27xudyRhTyy5bW2DSZlV4YFCUuR4woDMHEFmBKn2uHRiRbNv9WAH37xtENpu mLmpMesB1fOLyyrwf36feRWcHu6GPHRjCR9k4x3Xff3aqOCtEJXSXgxQvfwlJ59WZzJ0 X8VQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=8egjL9FFdfLdHjXIxIPM0MULEnP5x0u0fdq/2K5rru4=; b=fU8i8zfRFVRg7cSoBgItKVpN/u23Frr9eaVVE9kYU3ZdIieolm3u3fJZCetbixKnKp DJbQSwHbPSDJQEOoBdIfNZdAAoGF8tWAbp7hwj9y9E30CsK8A+KDJgDvNLifAtgvOFd2 IYc7OEixnH8f3wqalqyRmTw0+wZyxlUiGqEZtwH1/CXZjQhA5H0MrgL5PZc+lS2eIows qFEQotl1B/l59ZCOwV2ke+DhXlaC6eSrhFY6CyDoyFM09pzF6YN5nRnnl+FI2ncT6a/n Vp8871oKi/vGdqowPeplVkK87FiuGaRILiyWjomrKd/fUV4ZBp4Nvm5XJ0if7r5ALuNa fKKg==
X-Gm-Message-State: AOAM533ByfHc415ljIMu93/CYP+dzHa12LPgTQIR6U7VWnJ8RgwXC2o6 4ofTxC2W+xawDcPWwHIqTWwkXmNyOCw=
X-Google-Smtp-Source: ABdhPJx/cCv4w5+yGxUUeuGzMGqcqff5P9h3Cq5Q3K3Nfaj9gqXG3GQTRX8kpRbs9BnPA5mrdQPjMg==
X-Received: by 2002:adf:e902:: with SMTP id f2mr5386847wrm.424.1627047396252; Fri, 23 Jul 2021 06:36:36 -0700 (PDT)
Received: from [172.20.10.4] (82-132-233-39.dab.02.net. [82.132.233.39]) by smtp.gmail.com with ESMTPSA id z25sm28145815wmf.9.2021.07.23.06.36.35 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 Jul 2021 06:36:35 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\))
From: Christopher Dearlove <christopher.dearlove@gmail.com>
In-Reply-To: <1627020703962.66100@fkie.fraunhofer.de>
Date: Fri, 23 Jul 2021 14:36:33 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <23565714-003C-43B4-B367-16AA3EC35FA5@gmail.com>
References: <1626937943164.99401@fkie.fraunhofer.de> <CA+-pDCdGjVPyVxnfMt2trN_Rk5J_btZrt2teFg43JSEbo0sn6g@mail.gmail.com> <1627020703962.66100@fkie.fraunhofer.de>
To: MANET IETF <manet@ietf.org>
X-Mailer: Apple Mail (2.3654.60.0.2.21)
Archived-At: <https://mailarchive.ietf.org/arch/msg/manet/-1eA1jyJ3Z38gokgtzpwSBsOXKs>
Subject: [manet] OLSRv2 router restart
X-BeenThere: manet@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Mobile Ad-hoc Networks <manet.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/manet>, <mailto:manet-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/manet/>
List-Post: <mailto:manet@ietf.org>
List-Help: <mailto:manet-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/manet>, <mailto:manet-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Jul 2021 13:36:49 -0000

I would start with the principle that we don’t want to change OLSRv2, create incompatibilities etc. unless we actually have to. And I don’t think we need to.

So what we should start with is advice to restarting routers. Can we solve all our problems that way?

Before addressing that, you might say, what about the case of routers that restart but don’t follow advice, can they mess us up? Possibly (though I think it needs a matching ANSN, not just an expired one). But that’s the nature of OLSRv2 and similar protocols, a poorly behaving router can always mess you up. Worse if it is malicious, but that’s a different problem.

And let’s supposed we managed to come up with a set of really good rules for restarting routers. What to do with them.  Could be an informational RFC, advice on restarting a router. Or a standards track RFC, what you should and what you must do on restarting a router. I could see either. Or a non-RFC approach.

But that’s getting ahead. What advice could we give?

First, if the router remembers its old MSN and ANSN (and if necessary, PSN) there’s no problem. (Obviously you’d include this in an RFC/whatever but it’s a trivial case.)

The second trivial case is that you just wait longer than your data will take to timeout. Often the easiest solution. But if you might have been using long timeouts, might be a problem. Or if you want rapid readmission. But that needs other routers to also act rapidly, and might not be an option.

So let’s assume you want back in, have no idea of your old MSN/ANSN, and there might be data out there that needs replacing, and you can’t just wait.

Handling two sequence numbers (MSN/ANSN) together would be tricky. But fortunately we can do these one at a time. Because to get back in, first we need HELLO exchange (NHDP). That only uses MSN. So first we fix MSN, then we move on to ANSN. (Until we’ve exchanged HELLOs, no one will forward our TC messages.)

So pick a random MSN. If we pick badly, we will be ignored. We could wait, see if we are ignored, if we are, move on. (We might be ignored because messages are lost. I’ll ignore that for now. You might want to send messages more than once before moving on to handle that.)

But maybe we would rather not wait that long. So here’s an approach I think works. Haven’t been able to test it - my former employer has my OLSRv2 code unfortunately and won’t let me have it - but that’s the point of discussion.

So we pick three or four equispaced (or roughly so) numbers. Two won’t work, the case of separation by exactly N/2 is unreliable, where N is 2^16. (The version in OLSRv1 is specified but has strange consequences, so we left it unspecified in OLSRv2.). Four is easiest (0, N/4, N/2, 3N/4).

So we send a HELLO (or more than one) with one of those MSNs. Then, without needing to wait, do the same for the rest of the numbers in order. At first we might be ignored, but at some point we will get past the last one we used. Then we will start overriding ourselves. And the last one we send will be accepted. (We can stop early if we get a response, that tells us we have an MSN we can use.)

Is that ugly? Yes. But we are trying to solve the problem of having no information and not being prepared to wait. There are more efficient solutions if we modify OLSRv2 (e.g. reserve 0 or N-1 as a “forced reset” number) but I don’t want to do that as noted.

Now we have established an MSN, once we have data to send in TC messages we could do the same with the ANSN. Or actually we don’t need to wait, we can do that with empty TC messages. Complete ones. (Note that, unlike OLSRv1, empty messages are recorded.)

Why might this not work? (Apart of course from something I’ve overlooked.) I think there is implicit permission for a router to ignore messages with a big gap of sequence number from last seen as a security measure. We need that not to be done. (That would be firming up behaviour.) But - and as even there will be a security implications section - we really want to be running with authenticated messages anyway.

Is this fast enough? That depends on how long your data would take to expire anyway. This is really for cases where long duration data can be assumed.

If anyone does want to take this further, I’m around.

Christopher