Re: [babel] Open issues with draft-ietf-babel-rtt-extension

Dave Taht <dave.taht@gmail.com> Thu, 25 July 2019 13:06 UTC

Return-Path: <dave.taht@gmail.com>
X-Original-To: babel@ietfa.amsl.com
Delivered-To: babel@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3B2AE12015F for <babel@ietfa.amsl.com>; Thu, 25 Jul 2019 06:06:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LFhctI9iYnwI for <babel@ietfa.amsl.com>; Thu, 25 Jul 2019 06:06:23 -0700 (PDT)
Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0053A12002E for <babel@ietf.org>; Thu, 25 Jul 2019 06:06:22 -0700 (PDT)
Received: by mail-io1-xd43.google.com with SMTP id j6so21891238ioa.5 for <babel@ietf.org>; Thu, 25 Jul 2019 06:06:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=Q5bR1jZX22Ex78+7hJjUk27/Z5iC6RLWpKPuj1Ds0g0=; b=kGr8gWSKqZHEOMjHDsWQ6yjkIs8TlKXfbjSz7k97dnzFXuW+CfJX6iqmyaHXSzVsfD ZJEnp9on7l5aI6x6yskX54X8lHiSEkQaxoX5+VAuOAWBQjdXk2M2U0FVt/0crb07cPqZ VM745zlw/wcTEY0MQRL4JyhNqCbZWq59Q6XgJr9wQVwm4tsNTawKKG7mMO8mWlwpDSlN cUzKDLW/u3Oxwg2bmq7d5WjYDTDIQeH6fUqbz7nVhKVxSwjMdCjH8S2iBaYygUqgYRBg WhQcI4hWrqi5GM255RFJaV3KiRQrJfeQ+alrq8FUtHRPteJedfBdyLIzVUu9eGZNWpKD Ywdw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=Q5bR1jZX22Ex78+7hJjUk27/Z5iC6RLWpKPuj1Ds0g0=; b=kfdqlEhn6lRwouYbBD5zIrO7njGYKIPp65eHoAvbH9ktk1/k3Z7Us8EAvl8ArZ9uRi 3nRQlLhwxbPFNZoNfwezBsv+qSqY9vtBZQ4/gJs5djvoBcXPPEXCkc5yTP71RW9iiO/V BQYOFwGu256SLQzBnHnGDlM17QQ231byKNT2D+D2Uw42U6dWxQ7usoDEGPcmaL5hr0Z/ PuaH37QMI9oUxqv+FX5W3vi/tymS2oOwN5XjS71sE9EJjLkpjcY8VpOfLx8RYd37SPEz ewUgRmE5eI2UPiLGuarQKkvt3RTJNLCu2lqWeAuFVWDwtzSzDQRpdoHmYCuqsDjiJ+Cw Rzjw==
X-Gm-Message-State: APjAAAXw37Rly8DLndo441yg8vaxOpgOe27D4jzqRjcIC98XSRLf6jhM AFVclcfaKwNimgPjETcNJADg9WLjcEI5gYcyFic=
X-Google-Smtp-Source: APXvYqyiXAtLwS3bSfTcYV6PX2wjH5jOu5IFqH4LRvvADBpx28qWccYccrLpbntc/EdMGY53Ir8yhFJthm+ZZq9xeyc=
X-Received: by 2002:a05:6602:2183:: with SMTP id b3mr69975512iob.249.1564059982048; Thu, 25 Jul 2019 06:06:22 -0700 (PDT)
MIME-Version: 1.0
References: <87k1c75cn5.wl-jch@irif.fr>
In-Reply-To: <87k1c75cn5.wl-jch@irif.fr>
From: Dave Taht <dave.taht@gmail.com>
Date: Thu, 25 Jul 2019 06:06:10 -0700
Message-ID: <CAA93jw7YZp6nPgW_DccAkbaSG=MxeFymesKBMJnOwgk++COHTQ@mail.gmail.com>
To: Juliusz Chroboczek <jch@irif.fr>
Cc: Babel at IETF <babel@ietf.org>, Baptiste Jonglez <baptiste.jonglez@imag.fr>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/J9TDkQCSngsgFnf86vZ4J_djrE4>
Subject: Re: [babel] Open issues with draft-ietf-babel-rtt-extension
X-BeenThere: babel@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the Babel Routing Protocol." <babel.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/babel>, <mailto:babel-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/babel/>
List-Post: <mailto:babel@ietf.org>
List-Help: <mailto:babel-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/babel>, <mailto:babel-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Jul 2019 13:06:25 -0000

I had had a chance to buttonhole baptiste at battlemesh, and also
tried to get a test up and running.

A couple points.

A) Packet loss is not much of a thing on modern wifi networks,
particularly with unicast
B) As you have larger numbers of routes (1000s), congestion control
becomes a problem

My "solution" for some congestion control issues is to start
dynamically increasing the route announcement
interval for stabler routes, while still announcing the most important
ones (like defaults) in the hello. the protocol has an underused
feature where we can announce a per route interval larger than the
default.

i long ago patched out CS6 as it lands in the VO queue which cannot
aggregate on wireless-n in favor of ecn. I've had a variant that just
ecn's the hello, also.

Most of my babel routing failures were congestive in nature before I
did this. (I also, in the end dramatically
reduced the size of my route table and the size of the babel network,
at one point it was 100+ nodes with ~10 route announcements each which
became rather unreliable. Now, it's < 20 with ~3). I do wish more folk
would do the kind of experiments teco did to test larger babel
networks, and I did with the rtod tool on github. Until more core folk
start pushing the protocol (and daemons) to larger numbers of routes
we're not going to have mutually clear insights here.

On Wed, Jul 24, 2019 at 2:46 PM Juliusz Chroboczek <jch@irif.fr> wrote:
>
> Here's my personal take on the issues outlined during Baptiste's talk.
>
> 1. Nanosecond granularity
>
> Originally, Baptiste's extension used centiseconds in a 16-bit field,
> which gave a granularity of 10ms.  Under Dave Taht's pressure, who was at
> the time interested in measuring bufferbloat, Baptiste switched to
> microseconds in a 32-bit field.

I have largely been waiting for all the other stuff to land before
pursuing these ideas again. Needed
SS routing, thought unicast would change the game on wifi entirely.

> We still don't have a good example of why this increased prefision is
> needed -- in the networks that use this extension in production, the time
> scales under consideration are on the order of tens or hundreds of
> milliseconds.

It certainly was my hope at the time to be able to extend this to real
networks, on wifi. nsec
resolution is not required there either, but 10s of usec precision
seemed feasible at the time.

nsec might buy some better ability to detect (when coupled with the
increasingly common hw timestamping and/or hw pacing techniques) -
1gigE+ networks.

> Switching to nanoseconds would require switching to 48-bit fields, since
> the timestamps must not wrap between two successfully received Hello or
> IHU TLVs.  Not a big deal, but a slight implementation issue -- I'd like
> to be convinced that this is necessary.
>
> We will be able to switch formats without a flag day.  We first define
> a new sub-TLV with nanosecond granularity, and start sending both.  We
> wait for a year or so, until most deployed implementations understand the
> new format, then we deprecate the old format.  If any old implementations
> remain, they will ignore the new sub-TLV, and simply fail to perform any
> RTT estimation, falling back to the base protocol.

I don't presently see a need for nsec.

> 2. Decoupling timestamps from Hello
>
> Currently, the timestamp data is sent in two pieces:
>
>   - Hello contains the transmit timestamp, which is the same for all
>     neighbours;
>   - IHU contains the receive timestamp and the timestamp echo, which are
>     per-neighbour.
>
> This makes a lot of sense -- after all, IHUs contain per-neighbour data,
> while sender data is contained in Hellos.  OTOH, it means that an IHU's
> timestamp can only be interpreted if it is sent in a packet that contains
> a Hello TLV.
>
> Baptiste outlined two solutions:
>
>   (a) require that all IHUs be sent in packets with a Hello;
>   (b) define a new TLV that contains just a timestamp.
>
> In RFC 6126bis, we defined Unicast Hellos and Unscheduled Hellos, which
> should in principle make approach (a) easy to implement.  I'd be very
> grateful if people could look very carefully at the new kinds of Hello,
> and decide whether these mechanisms are enough, or whether we need a new
> kind of TLV so we can avoid sending spurious Hellos.

Generically I had looked as this as a path forward to getting
congestive controls to start working,
leveraging the must-ack tlv also, whatever it's called. So being able
to include a timestamp at the end of a route burst seemed helpful.

>
> -- Juliusz
>
> _______________________________________________
> babel mailing list
> babel@ietf.org
> https://www.ietf.org/mailman/listinfo/babel



--

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740