Re: [babel] Zaheduzzaman Sarker's Discuss on draft-ietf-babel-rtt-extension-05: (with DISCUSS and COMMENT)

Juliusz Chroboczek <jch@irif.fr> Fri, 16 February 2024 12:23 UTC

Return-Path: <jch@irif.fr>
X-Original-To: babel@ietfa.amsl.com
Delivered-To: babel@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 074AEC14F600; Fri, 16 Feb 2024 04:23:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=irif.fr
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AvNcUsLWqZhQ; Fri, 16 Feb 2024 04:23:42 -0800 (PST)
Received: from korolev.univ-paris7.fr (korolev.univ-paris7.fr [IPv6:2001:660:3301:8000::1:2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 372E8C14F5F6; Fri, 16 Feb 2024 04:23:40 -0800 (PST)
Received: from potemkin.univ-paris7.fr (potemkin.univ-paris7.fr [IPv6:2001:660:3301:8000::1:1]) by korolev.univ-paris7.fr (8.14.4/8.14.4/relay1/82085) with ESMTP id 41GCNcUT024112 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 16 Feb 2024 13:23:39 +0100
Received: from mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [81.194.30.253]) by potemkin.univ-paris7.fr (8.14.4/8.14.4/relay2/82085) with ESMTP id 41GCNcAB013078; Fri, 16 Feb 2024 13:23:38 +0100
Received: from mailhub.math.univ-paris-diderot.fr (localhost [127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTP id 80F15AD1AC; Fri, 16 Feb 2024 13:23:37 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=irif.fr; h= content-transfer-encoding:content-type:content-type:mime-version :user-agent:references:in-reply-to:subject:subject:from:from :message-id:date:date:received:received; s=dkim-irif; t= 1708086215; x=1708950216; bh=l8OeHf4pDjNnw4d47JMqnqEOOsK5RVevX6+ +zCFSY3Q=; b=Al2nMZPBIiLOHU3hIOtlYHzVyCgKnL5h0sL4rfJHXNmb3FDLMIB G2gJmMNP9vwvd6BGDY9wyT3Z3NDkRpFJT+sltfUtqnfEvvAidh3pHDMaxFd22rjj Ej77LsjgzfLltVYbqqULjiLEflYwaCNmtu1DP9V1bq+0sieN5m4J/jyX/kz9BigW 78uzhU1tWQcGmihSwzlRsGR/ZqS8DDUGhKD/0S/sllm0K1x05RY6KBvzL0u7HL3V woPyty7h+MQrc9rOuQu9XPrPzpVveFhhi7+Z+cYEM4Zr/ntKbG+kF3X887c3Ctw+ hSNoXylMSXxijCRRbrOU6AWqBy7yGgtYK9A==
X-Virus-Scanned: amavisd-new at math.univ-paris-diderot.fr
Received: from mailhub.math.univ-paris-diderot.fr ([127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id YWM832HTsKwt; Fri, 16 Feb 2024 13:23:35 +0100 (CET)
Received: from pirx.irif.fr (unknown [78.194.40.74]) (Authenticated sender: jch) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTPSA id 97C0AAD13D; Fri, 16 Feb 2024 13:23:33 +0100 (CET)
Date: Fri, 16 Feb 2024 13:23:33 +0100
Message-ID: <87ttm8wqbe.wl-jch@irif.fr>
From: Juliusz Chroboczek <jch@irif.fr>
To: Zaheduzzaman Sarker <zahed.sarker.ietf@gmail.com>
Cc: The IESG <iesg@ietf.org>, draft-ietf-babel-rtt-extension@ietf.org, babel-chairs@ietf.org, babel@ietf.org, Donald Eastlake <d3e3e3@gmail.com>
In-Reply-To: <170787401277.9987.12424865727760301020@ietfa.amsl.com>
References: <170787401277.9987.12424865727760301020@ietfa.amsl.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/29.1 Mule/6.0
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (korolev.univ-paris7.fr [IPv6:2001:660:3301:8000::1:2]); Fri, 16 Feb 2024 13:23:39 +0100 (CET)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (potemkin.univ-paris7.fr [194.254.61.141]); Fri, 16 Feb 2024 13:23:38 +0100 (CET)
X-Miltered: at korolev with ID 65CF53CA.001 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)!
X-Miltered: at potemkin with ID 65CF53CA.000 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)!
X-j-chkmail-Enveloppe: 65CF53CA.001 from potemkin.univ-paris7.fr/potemkin.univ-paris7.fr/null/potemkin.univ-paris7.fr/<jch@irif.fr>
X-j-chkmail-Enveloppe: 65CF53CA.000 from mailhub.math.univ-paris-diderot.fr/mailhub.math.univ-paris-diderot.fr/null/mailhub.math.univ-paris-diderot.fr/<jch@irif.fr>
X-j-chkmail-Score: MSGID : 65CF53CA.001 on korolev.univ-paris7.fr : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000
X-j-chkmail-Score: MSGID : 65CF53CA.000 on potemkin.univ-paris7.fr : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000
X-j-chkmail-Status: Ham
X-j-chkmail-Status: Ham
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/eeoWfVHMPnA3zufEvGXU_al1tbA>
Subject: Re: [babel] Zaheduzzaman Sarker's Discuss on draft-ietf-babel-rtt-extension-05: (with DISCUSS and COMMENT)
X-BeenThere: babel@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "A list for discussion of the Babel Routing Protocol." <babel.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/babel>, <mailto:babel-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/babel/>
List-Post: <mailto:babel@ietf.org>
List-Help: <mailto:babel-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/babel>, <mailto:babel-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Feb 2024 12:23:47 -0000

>  # I support Rob's discuss that it is not clear why this is published as
>  standard track document. Apart from what Rob pointed out, there is another
>  place where the experimental nature of this specification is obvious. In
>  section 1 it says -
> 
>     "We believe that this protocol may be useful in other situations than the
>     one described above, such as when running Babel in a congested wireless
>     mesh network or over a complex link layer that performs its own routing;
>     the fine granularity of the timestamps used (1µs) should make it possible
>     to experiment with RTT-based metrics on this kind of link layers."

I'm very confused by this argument.

As David explained, this document proposes an algorithm that we know to be
suitable for a specific application, large overlay networks, as described
in Section 1 of the document.  We have a lot of experience running this
algorithm in such environments (extensive experimentation in simulation
followed by 10 years of production deployment).

What we say in the paragraph that you quote is that we do not know whether
the algorithm has other applications, but we believe it might have.  As
David explained, it does not apply to the application for which we propose
the algorithm, it is just a side note.

If you think that this paragraph hampers intelligibility by those who have
only read the document superficially, I'm open to removing the whole
paragraph, even though it will make the document less informative.  Please
let me know what to do.

>    This shows lack of confidence on the results

I'm very confused by this sentence. Please explain how you came to this
conclusion, given that the paragraph you quote is just a side comment
about potential future research.

>    RTT-based route selection can end up having negative impacts by
>    overloading and congesting low RTT routes,

I don't see how this is different from hop-count routing, which runs the
risk of overloading low-hop-count routes.  This is why we perform
congestion control at the transport layer: so that the network layer is
not reponsible for congestion avoidance.

>  # This specification does not specify the relation to other loss-based metric
>  and hop-count metric based strategies. I can imagine a network where low RTT
>  can be emitted at the cost of packet loss. Will this RTT-based strategy be
>  safe to use?

It will be safe to use as long as the resulting metric satisfies the
properties in Section 3.5.2 of RFC 8966.

>  # How would this RTT-based strategy will co-exists with other strategies those
>  are deployed already as claimed in this specification? This specification need
>  to guide the implementers about what to consider when selecting the routing
>  strategy and how the strategies can co-exits.

That's what Sections 3.5.1 and 3.5.2 of RFC 8966 do: they describe the
general conditions under which a combination of cost and metric
computation strategies are safe in Babel.

>  # The periodicity of HELLO message is not clear to me.

The document says:

   the only change to Babel's message scheduling is the requirement that
   a packet containing an IHU also contains a Hello.

Recommended scheduling of Hellos is described in Appendix B of RFC 8966.

>  This is an important piece of information that should be derived from
>  proper experiments as we don't want the HELLO message to overload the
>  route or path.

Appendix B of RFC 8966 recommends a default of one Hello/IHU exchange
every 4s.  This default is deliberately very conservative, so that the
protocol works well on poor wireless links.  Please let me know if you
need links to publications about Babel's behaviour in hostile environments.

>  The discussion on when to stop sending those HEllO messages is
>  required.

I'm very confused by this sentence.  Please see Section 2.5 of RFC 8966,
which says:

  A Babel node periodically sends Hello messages to all of its neighbours;
  it also periodically sends an IHU ("I Heard You") message to every
  neighbour from which it has recently heard a Hello.

The exact specification is in Sections 3.4.1 and 3.4.2 of RFC 8966.

>  Also the frequency of the HELLO message might help adjusting the clock
>  drift, as it is an important aspect of the accuracy of the algorithm.

The document says:

   However, t2' - t1' is usually on the order of seconds, and significant
   clock drift is unlikely to happen at that time scale.

A typical low-cost crystal oscillator has drift under 30ppm.  30ppm of 4s
is 120 microseconds.

-- Juliusz