Re: [babel] Zaheduzzaman Sarker's Discuss on draft-ietf-babel-rtt-extension-05: (with DISCUSS and COMMENT)

Juliusz Chroboczek <jch@irif.fr> Fri, 16 February 2024 12:23 UTC

Date: Fri, 16 Feb 2024 13:23:33 +0100
Message-ID: <87ttm8wqbe.wl-jch@irif.fr>
From: Juliusz Chroboczek <jch@irif.fr>
To: Zaheduzzaman Sarker <zahed.sarker.ietf@gmail.com>
Cc: The IESG <iesg@ietf.org>, draft-ietf-babel-rtt-extension@ietf.org, babel-chairs@ietf.org, babel@ietf.org, Donald Eastlake <d3e3e3@gmail.com>
In-Reply-To: <170787401277.9987.12424865727760301020@ietfa.amsl.com>
References: <170787401277.9987.12424865727760301020@ietfa.amsl.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/29.1 Mule/6.0
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/eeoWfVHMPnA3zufEvGXU_al1tbA>
Subject: Re: [babel] Zaheduzzaman Sarker's Discuss on draft-ietf-babel-rtt-extension-05: (with DISCUSS and COMMENT)
Precedence: list

>  # I support Rob's discuss that it is not clear why this is published as
>  standard track document. Apart from what Rob pointed out, there is another
>  place where the experimental nature of this specification is obvious. In
>  section 1 it says -
> 
>     "We believe that this protocol may be useful in other situations than the
>     one described above, such as when running Babel in a congested wireless
>     mesh network or over a complex link layer that performs its own routing;
>     the fine granularity of the timestamps used (1µs) should make it possible
>     to experiment with RTT-based metrics on this kind of link layers."

I'm very confused by this argument.

As David explained, this document proposes an algorithm that we know to be
suitable for a specific application, large overlay networks, as described
in Section 1 of the document.  We have a lot of experience running this
algorithm in such environments (extensive experimentation in simulation
followed by 10 years of production deployment).

What we say in the paragraph that you quote is that we do not know whether
the algorithm has other applications, but we believe it might have.  As
David explained, it does not apply to the application for which we propose
the algorithm, it is just a side note.

If you think that this paragraph hampers intelligibility by those who have
only read the document superficially, I'm open to removing the whole
paragraph, even though it will make the document less informative.  Please
let me know what to do.

>    This shows lack of confidence on the results

I'm very confused by this sentence. Please explain how you came to this
conclusion, given that the paragraph you quote is just a side comment
about potential future research.

>    RTT-based route selection can end up having negative impacts by
>    overloading and congesting low RTT routes,

I don't see how this is different from hop-count routing, which runs the
risk of overloading low-hop-count routes.  This is why we perform
congestion control at the transport layer: so that the network layer is
not reponsible for congestion avoidance.

>  # This specification does not specify the relation to other loss-based metric
>  and hop-count metric based strategies. I can imagine a network where low RTT
>  can be emitted at the cost of packet loss. Will this RTT-based strategy be
>  safe to use?

It will be safe to use as long as the resulting metric satisfies the
properties in Section 3.5.2 of RFC 8966.

>  # How would this RTT-based strategy will co-exists with other strategies those
>  are deployed already as claimed in this specification? This specification need
>  to guide the implementers about what to consider when selecting the routing
>  strategy and how the strategies can co-exits.

That's what Sections 3.5.1 and 3.5.2 of RFC 8966 do: they describe the
general conditions under which a combination of cost and metric
computation strategies are safe in Babel.

>  # The periodicity of HELLO message is not clear to me.

The document says:

   the only change to Babel's message scheduling is the requirement that
   a packet containing an IHU also contains a Hello.

Recommended scheduling of Hellos is described in Appendix B of RFC 8966.

>  This is an important piece of information that should be derived from
>  proper experiments as we don't want the HELLO message to overload the
>  route or path.

Appendix B of RFC 8966 recommends a default of one Hello/IHU exchange
every 4s.  This default is deliberately very conservative, so that the
protocol works well on poor wireless links.  Please let me know if you
need links to publications about Babel's behaviour in hostile environments.

>  The discussion on when to stop sending those HEllO messages is
>  required.

I'm very confused by this sentence.  Please see Section 2.5 of RFC 8966,
which says:

  A Babel node periodically sends Hello messages to all of its neighbours;
  it also periodically sends an IHU ("I Heard You") message to every
  neighbour from which it has recently heard a Hello.

The exact specification is in Sections 3.4.1 and 3.4.2 of RFC 8966.

>  Also the frequency of the HELLO message might help adjusting the clock
>  drift, as it is an important aspect of the accuracy of the algorithm.

The document says:

   However, t2' - t1' is usually on the order of seconds, and significant
   clock drift is unlikely to happen at that time scale.

A typical low-cost crystal oscillator has drift under 30ppm.  30ppm of 4s
is 120 microseconds.

-- Juliusz

[babel] Zaheduzzaman Sarker's Discuss on draft-ie… Zaheduzzaman Sarker via Datatracker
Re: [babel] Zaheduzzaman Sarker's Discuss on draf… David Schinazi
Re: [babel] Zaheduzzaman Sarker's Discuss on draf… Juliusz Chroboczek
Re: [babel] Zaheduzzaman Sarker's Discuss on draf… Zaheduzzaman Sarker
Re: [babel] Zaheduzzaman Sarker's Discuss on draf… Zaheduzzaman Sarker
Re: [babel] Zaheduzzaman Sarker's Discuss on draf… Gunter van de Velde (Nokia)