Re: [babel] Restarting nodes and seqno requests

Juliusz Chroboczek <jch@irif.fr> Mon, 30 April 2018 11:56 UTC

Return-Path: <jch@irif.fr>
X-Original-To: babel@ietfa.amsl.com
Delivered-To: babel@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9BB7512762F for <babel@ietfa.amsl.com>; Mon, 30 Apr 2018 04:56:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.099
X-Spam-Level: ***
X-Spam-Status: No, score=3.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_SUMOF=5, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WR6cP7-_iKUW for <babel@ietfa.amsl.com>; Mon, 30 Apr 2018 04:56:12 -0700 (PDT)
Received: from korolev.univ-paris7.fr (korolev.univ-paris7.fr [IPv6:2001:660:3301:8000::1:2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 522A0126D85 for <babel@ietf.org>; Mon, 30 Apr 2018 04:56:12 -0700 (PDT)
Received: from mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [81.194.30.253]) by korolev.univ-paris7.fr (8.14.4/8.14.4/relay1/75695) with ESMTP id w3UBu8NF031187; Mon, 30 Apr 2018 13:56:08 +0200
Received: from mailhub.math.univ-paris-diderot.fr (localhost [127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTP id A7720EB21E; Mon, 30 Apr 2018 13:56:08 +0200 (CEST)
X-Virus-Scanned: amavisd-new at math.univ-paris-diderot.fr
Received: from mailhub.math.univ-paris-diderot.fr ([127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id 3_Jwn2pZPXsL; Mon, 30 Apr 2018 13:56:07 +0200 (CEST)
Received: from trurl.irif.fr (unknown [78.194.40.74]) (Authenticated sender: jch) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTPSA id A6D57EB227; Mon, 30 Apr 2018 13:56:07 +0200 (CEST)
Date: Mon, 30 Apr 2018 13:56:07 +0200
Message-ID: <874ljszz54.wl-jch@irif.fr>
From: Juliusz Chroboczek <jch@irif.fr>
To: Toke Høiland-Jørgensen <toke@toke.dk>
Cc: babel@ietf.org
In-Reply-To: <87po2h2b31.fsf@toke.dk>
References: <87po2h2b31.fsf@toke.dk>
User-Agent: Wanderlust/2.15.9
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset="US-ASCII"
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (korolev.univ-paris7.fr [194.254.61.138]); Mon, 30 Apr 2018 13:56:10 +0200 (CEST)
X-Miltered: at korolev with ID 5AE70458.001 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)!
X-j-chkmail-Enveloppe: 5AE70458.001 from mailhub.math.univ-paris-diderot.fr/mailhub.math.univ-paris-diderot.fr/null/mailhub.math.univ-paris-diderot.fr/<jch@irif.fr>
X-j-chkmail-Score: MSGID : 5AE70458.001 on korolev.univ-paris7.fr : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000
X-j-chkmail-Status: Ham
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/lL7Y66biQiH27ZtuGxaLbcDp1Eo>
Subject: Re: [babel] Restarting nodes and seqno requests
X-BeenThere: babel@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "A list for discussion of the Babel Routing Protocol." <babel.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/babel>, <mailto:babel-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/babel/>
List-Post: <mailto:babel@ietf.org>
List-Help: <mailto:babel-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/babel>, <mailto:babel-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Apr 2018 11:56:15 -0000

> - Node B restarts (i.e. shuts down, loses its transient state such as
>   seqnos and comes back up either immediately or after a relatively
>   short time). It will then start announcing prefix P again, but now
>   with seqno S' < S [which is unfeasible for A].

Yes.  The loop avoidance mechanism in Babel is stateful (the source table
contains the state), and if you loose your state, you're in trouble.  That
is why babeld saves its seqno into persistent storage at shutdown and
restores it at startup.

If the seqno is lost (either because a node has crashed or because you
have no persistent storage), then you need to timeout the source table entry:

  - first, the route times out, so its metric becomes infinite;
  - you start sending retractions, and retractions don't update the source
    table;
  - at some point, the source table GC timer triggers, and the source
    entry gets updated.

Note that the total time you wait for the route to become feasible again
is the sum of the route hold time and the source GC time, so its on the
order of a few minutes -- Babel is optimised for the case where links go
up and down, but routers do not reboot often.  If that's not acceptable,
you can work around the issue by changing router ids at every boot, so
that the old and new state don't interact.  Babeld implements this with
the "random-id" option, and it is useful in environments where routers
reboot oftent without saving their seqno.

> The question is, how is this supposed to be resolved?

There's no good solution.  The loop avoidance mechanism is stateful, and
that's a fact of nature.  (BGP avoids the statefulness by putting the
whole state into each update, which causes updates to have an unbounded
size.)

If you have an idea for a good mechanism to avoid the issue, I'm listening.

> Should A keep resending the seqno requests each time it gets a new
> unfeasible update - and if so, is there any limit to the frequency?

Only the usual delay on sending out updates (Section 4, "a Babel node
SHOULD buffer every TLV and delay sending a packet by a small, randomly
chosen delay").  Is that a problem in practice?

> Or should A immediately consider the update with seqno S' as feasible
> because its selected route has metric infinity?

As you mentioned, that would be unsafe.  Consider the following topology:

  ::0 --- A --- B

A loses its Internet connection, so it sets the metric of its selected
route to infinity.  Before it sends a retraction, it receives an
unfeasible update from B (with router-id A) -- routing loop.

-- Juliusz