[RTG-DIR] About Babel management [was: RTG-DIR QA review...]

Juliusz Chroboczek <jch@irif.fr> Sun, 07 January 2018 03:32 UTC

Return-Path: <jch@irif.fr>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 25F07126B6D; Sat, 6 Jan 2018 19:32:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ytRIG3A-ovV8; Sat, 6 Jan 2018 19:32:15 -0800 (PST)
Received: from korolev.univ-paris7.fr (korolev.univ-paris7.fr [IPv6:2001:660:3301:8000::1:2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 081A01201F8; Sat, 6 Jan 2018 19:32:14 -0800 (PST)
Received: from potemkin.univ-paris7.fr (potemkin.univ-paris7.fr [IPv6:2001:660:3301:8000::1:1]) by korolev.univ-paris7.fr (8.14.4/8.14.4/relay1/75695) with ESMTP id w073W6tu015387 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 7 Jan 2018 04:32:06 +0100
Received: from mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [81.194.30.253]) by potemkin.univ-paris7.fr (8.14.4/8.14.4/relay2/75695) with ESMTP id w073W7UJ017979; Sun, 7 Jan 2018 04:32:07 +0100
Received: from mailhub.math.univ-paris-diderot.fr (localhost [127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTP id 77136EB33A; Sun, 7 Jan 2018 04:32:06 +0100 (CET)
X-Virus-Scanned: amavisd-new at math.univ-paris-diderot.fr
Received: from mailhub.math.univ-paris-diderot.fr ([127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id kUZviaj6KsEw; Sun, 7 Jan 2018 04:32:00 +0100 (CET)
Received: from trurl.irif.fr (dra38-1-82-225-44-56.fbx.proxad.net [82.225.44.56]) (Authenticated sender: jch) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTPSA id C8E70EB215; Sun, 7 Jan 2018 04:31:58 +0100 (CET)
Date: Sun, 07 Jan 2018 04:32:02 +0100
Message-ID: <87o9m6mjil.wl-jch@irif.fr>
From: Juliusz Chroboczek <jch@irif.fr>
To: Susan Hares <shares@ndzh.com>
Cc: 'Donald Eastlake' <d3e3e3@gmail.com>, 'David Schinazi' <dschinazi@apple.com>, 'Alia Atlas' <akatlas@gmail.com>, 'Russ White' <russ@riw.us>, rtg-dir@ietf.org, Barbara Stark <bs7652@att.com>, babel@ietf.org
In-Reply-To: <002f01d3874a$6d222400$47666c00$@ndzh.com>
References: <00a801d3850a$e4eb7640$aec262c0$@ndzh.com> <87incg183q.wl-jch@irif.fr> <CAF4+nEEO8WE=SmKT8kXT4Om0PKiKz9t4bCqP72Ys7MvREb3=og@mail.gmail.com> <87y3larkia.wl-jch@irif.fr> <87tvvyrjhq.wl-jch@irif.fr> <002f01d3874a$6d222400$47666c00$@ndzh.com>
User-Agent: Wanderlust/2.15.9
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset="US-ASCII"
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (korolev.univ-paris7.fr [IPv6:2001:660:3301:8000::1:2]); Sun, 07 Jan 2018 04:32:07 +0100 (CET)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (potemkin.univ-paris7.fr [194.254.61.141]); Sun, 07 Jan 2018 04:32:07 +0100 (CET)
X-Miltered: at korolev with ID 5A5194B6.002 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)!
X-Miltered: at potemkin with ID 5A5194B7.001 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)!
X-j-chkmail-Enveloppe: 5A5194B6.002 from potemkin.univ-paris7.fr/potemkin.univ-paris7.fr/null/potemkin.univ-paris7.fr/<jch@irif.fr>
X-j-chkmail-Enveloppe: 5A5194B7.001 from mailhub.math.univ-paris-diderot.fr/mailhub.math.univ-paris-diderot.fr/null/mailhub.math.univ-paris-diderot.fr/<jch@irif.fr>
X-j-chkmail-Score: MSGID : 5A5194B6.002 on korolev.univ-paris7.fr : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000
X-j-chkmail-Score: MSGID : 5A5194B7.001 on potemkin.univ-paris7.fr : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000
X-j-chkmail-Status: Ham
X-j-chkmail-Status: Ham
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/UfAyB9APzxi0xQl0-HlUkJYfqvo>
Subject: [RTG-DIR] About Babel management [was: RTG-DIR QA review...]
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir/>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Jan 2018 03:32:18 -0000

I've added Barbara Stark and the babel@ietf mailing list to the CC.

> Are your users only using proprietary management?  Or are they using
> another open source management mechanism?

There are currently four production-quality implementations of Babel:

  (1) the standalone reference implementation, called "babeld" (Babel Daemon);

  (2) the implementation integrated into FRR (formerly Quagga), the leading
      Open-Source routing suite; this is based on an older version of babeld;

  (3) the implementation integrated into BIRD, the other major Open-Source
      routing suite; this is an independent reimplementation;

  (4) a proprietary implementation I know very little about.

Implementations (2) and (3) aim to integrate as well as possible into
their respective routing suites, and therefore they use, respectively,
FRR's and BIRD's native management mechanisms; I will be glad to get you
in touch with FRR and BIRD specialists should you wish to learn more (as
far as I'm aware, neither uses NETCONF).  The rest of this mail only
applies to implementation (1), babeld.

Babeld has a rich management language (including BGP-style route
filtering), which can be used in two ways:

  - statically, in a set of configuration files that are read by babeld at
    startup;
  - dynamically, over a TCP or Unix Domain socket.

The management language is a textual format that is carefully designed to
be both human- and machine-readable, and extensible without breaking
backwards compatibility.  The implementation does not rely on any external
libraries, and the hand-written recursive-descent parser consists of 1347
lines of C code that compile to 11.5kB of text.

It has the following features:

  - global configuration of the daemon;
  - configuration of parameters of specific interfaces;
  - route filtering, inspired by gated but more powerful;
  - one-shot monitoring;
  - event-driven monitoring.

The interface is designed to be as exhaustive as possible.  It therefore
exposes a number of internal implementation details that it would not be
useful to standardise.

As you know, babeld is deployed in production.  Here are a few examples of
how our users do monitoring and reconfiguration in production.

1. Static configuration

A number of our users use a subset of the configuration language in static
configuration files.  As far as I can tell, the most common applications
are, in roughly decreasing order:

  - defining the set of active interfaces;
  - changing the link-quality estimation algorithm of specific interfaces
    (notably enabling the optional RTT- and diversity-based algorithms);
  - filtering out aggregated routes or routes subsumed by a default route.

As far as I can tell, our users do not much tweak any of the other
tweakables.  I like to think that this is because Babel is able to adapt
to widely varying network conditions without the need to tweak.

2. Dynamic reconfiguration using static interfaces

The reference implementation of the Homenet protocol suite generates a new
configuration file every time the hnetd daemon detects a configuration
change, and restarts the babeld process.  While a better approach would be
to use the dynamic reconfiguration mechanisms in babeld, this approach is
easy to implement, and appears to be good enough for Homenet.

    https://github.com/sbyx/hnetd

3. Pure dynamic reconfiguration

Nexedi are a French company that use Babel for managing a global-scale
overlay network between datacenters over the public Internet.  Nexedi are
interested in avoiding any form of manual intervention: they do not
manually configure metrics, but instead rely on the Babel-RTT extension to
compute metrics automatically:

   https://tools.ietf.org/html/draft-jonglez-babel-rtt-extension

However, Nexedi sometimes need to bring down a link in their overlay, and
wish to avoid waiting for the protocol to reconverge.  For that, they use
a proprietary configuration interface in order to bring a Babel interface
down before they bring the link down, thus achieving graceful failover
with no packet loss whatsoever.

Nexedi currently use a proprietary management interface, but I am working
with them on getting them to use the normal babeld mechanisms (which is
somewhat involved, since part of their nodes are at customer sites and
therefore a flag day is not possible).

4. one-shot state dumps parsed by a script

The management interface is designed so that its output is trivial to
parse using a script without using any extra libraries.  For example, the
following command line gives a dump of all routes known to the local Babel
daemon:

  (echo dump; echo quit) | nc ::1 33123 | grep '^route add '

People usually use ssh tunnelling in order to dump the state of remote
nodes.

A surprisingly large number of our users perform monitoring using this
kind of approach.  The most ambitious case I am aware of is the Wlan-SI
nodelist, which is a PHP-based website that gives real-time details about
a fairly large hybrid Babel/OLSRv1 network:

    https://nodes.wlan-si.net/

5. event-driven monitoring

The management interface is designed to support event-driven monitoring.
This is used by the "Babelweb" graphic tool:

    http://babelweb.wifi.pps.univ-paris-diderot.fr:8080/


As you can see, we are actively working with our users on management for
Babel.  There is a lot of exciting and novel experimentation going on, and
it is therefore my opinion that it would be a mistake to mandate a single
style of management in rfc6126bis.

Regards,

-- Juliusz