Re: [babel] Restarting nodes and seqno requests

Toke Høiland-Jørgensen <toke@toke.dk> Mon, 30 April 2018 13:08 UTC

From: Toke Høiland-Jørgensen <toke@toke.dk>
To: Juliusz Chroboczek <jch@irif.fr>
Cc: babel@ietf.org
In-Reply-To: <874ljszz54.wl-jch@irif.fr>
References: <87po2h2b31.fsf@toke.dk> <874ljszz54.wl-jch@irif.fr>
Date: Mon, 30 Apr 2018 15:08:07 +0200
Message-ID: <87k1so3kqw.fsf@toke.dk>
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/bURcu76w1JE3hlYQGOj7oA5WvNU>
Subject: Re: [babel] Restarting nodes and seqno requests
Precedence: list

Juliusz Chroboczek <jch@irif.fr> writes:

>> - Node B restarts (i.e. shuts down, loses its transient state such as
>>   seqnos and comes back up either immediately or after a relatively
>>   short time). It will then start announcing prefix P again, but now
>>   with seqno S' < S [which is unfeasible for A].
>
> Yes.  The loop avoidance mechanism in Babel is stateful (the source table
> contains the state), and if you loose your state, you're in trouble.  That
> is why babeld saves its seqno into persistent storage at shutdown and
> restores it at startup.
>
> If the seqno is lost (either because a node has crashed or because you
> have no persistent storage), then you need to timeout the source table entry:
>
>   - first, the route times out, so its metric becomes infinite;
>   - you start sending retractions, and retractions don't update the source
>     table;
>   - at some point, the source table GC timer triggers, and the source
>     entry gets updated.
>
> Note that the total time you wait for the route to become feasible again
> is the sum of the route hold time and the source GC time, so its on the
> order of a few minutes -- Babel is optimised for the case where links go
> up and down, but routers do not reboot often.  If that's not acceptable,
> you can work around the issue by changing router ids at every boot, so
> that the old and new state don't interact.  Babeld implements this with
> the "random-id" option, and it is useful in environments where routers
> reboot oftent without saving their seqno.

Right, that's what I thought. I don't think there's a good way to store
persistent state in Bird, but it may be possible to try harder to do
graceful restarts... I'll look into the options.

>> The question is, how is this supposed to be resolved?
>
> There's no good solution.  The loop avoidance mechanism is stateful, and
> that's a fact of nature.  (BGP avoids the statefulness by putting the
> whole state into each update, which causes updates to have an unbounded
> size.)
>
> If you have an idea for a good mechanism to avoid the issue, I'm
> listening.

Not off the top off my head..

>> Should A keep resending the seqno requests each time it gets a new
>> unfeasible update - and if so, is there any limit to the frequency?
>
> Only the usual delay on sending out updates (Section 4, "a Babel node
> SHOULD buffer every TLV and delay sending a packet by a small,
> randomly chosen delay"). Is that a problem in practice?

No, don't think so, it's just an implementation issue: Bird currently
won't resend the same request until it expires after two seconds. So
when the triggered update arrives with S'' (that is still too low), it
won't ask for another seqno increase. But that is a straight-forward
fix, and with that it converges reasonably quickly (as long as the seqno
is not too high I guess...)

>> Or should A immediately consider the update with seqno S' as feasible
>> because its selected route has metric infinity?
>
> As you mentioned, that would be unsafe.  Consider the following topology:
>
>   ::0 --- A --- B
>
> A loses its Internet connection, so it sets the metric of its selected
> route to infinity.  Before it sends a retraction, it receives an
> unfeasible update from B (with router-id A) -- routing loop.

Right, thought so :)

-Toke

[babel] Restarting nodes and seqno requests Toke Høiland-Jørgensen
Re: [babel] Restarting nodes and seqno requests Juliusz Chroboczek
Re: [babel] Restarting nodes and seqno requests Toke Høiland-Jørgensen
Re: [babel] Restarting nodes and seqno requests Juliusz Chroboczek
Re: [babel] Restarting nodes and seqno requests Toke Høiland-Jørgensen
Re: [babel] Restarting nodes and seqno requests Juliusz Chroboczek
Re: [babel] Restarting nodes and seqno requests Toke Høiland-Jørgensen
Re: [babel] Restarting nodes and seqno requests Juliusz Chroboczek
Re: [babel] Restarting nodes and seqno requests Toke Høiland-Jørgensen