Re: [iccrg] LEDBAT++, rLEDBAT, and slowdowns

Neal Cardwell <ncardwell@google.com> Mon, 09 September 2019 14:31 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: iccrg@ietfa.amsl.com
Delivered-To: iccrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 338D8120871 for <iccrg@ietfa.amsl.com>; Mon, 9 Sep 2019 07:31:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -12.499
X-Spam-Level:
X-Spam-Status: No, score=-12.499 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, GB_SUMOF=5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sUwjPHGbA98j for <iccrg@ietfa.amsl.com>; Mon, 9 Sep 2019 07:31:16 -0700 (PDT)
Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6BE4512086B for <iccrg@irtf.org>; Mon, 9 Sep 2019 07:31:16 -0700 (PDT)
Received: by mail-ot1-x336.google.com with SMTP id s28so12620342otd.4 for <iccrg@irtf.org>; Mon, 09 Sep 2019 07:31:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Dr+Pu7C6crn7n74atqmMl/DuPVDAUeJEg5KgyePTrUE=; b=rq5JaZWBdoPmbdbER22VERHi7AVrceY5qG4o/Cw82nKOvRNq9zAlB2fYbzCfaB82V+ wDnK0ZOkrGziATkkX0oQ1PGQNlllyE3lgSh22lF8aHusRKYyC3Ql/JG06i4aGUMn1amN BMvorkhkLkcvqBiFYcA25xQRxeaH0iogoQo6MW5eUxkqvoatCsq2zUS24/3VCUqy7kNi 1O4d7zXH1Sw8pdqsMmVSQ4Db6VjcNtEx8dOvx9yaz7TE2yIXm8gCVnxiu6g1WRUphlmd IA125yOs6zvJg7QGI2O+j251aOIM43Mxg4aDadLvwWG0844NCXcjeXk1+Jh3rs6FV7N2 NM9g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Dr+Pu7C6crn7n74atqmMl/DuPVDAUeJEg5KgyePTrUE=; b=SHYt9m2cJqndNumFdD01vRDvMgIwDyNqFfdBghdV2H4ifhMLuCvmgQJJtdTOuAr6dr RN94qgUpb72fM17OFQPJ//wXmd0qIt7ySVqhf0cqoi2y2x80/efDSDQSArSRk1t4tz+H pPiGBD91nAFub3bDgPFQOZQiznjg2II6+VX7ANOMkZuHSfA7Ow9jUPf0ulMS5W8pN0nV A86P96i9Z6COGHrvSgkH2bCGGPZLzKPpE6/9Bcwx8MrncEBVqdYspVbY58g2mynDm79z u44HtOuo2abxgVhyT+gpGL3dOXruhZ2zvZgijHTDjPnfXi2yOsczowasi2pv1z22qtQd wOFw==
X-Gm-Message-State: APjAAAW3VIK9UuSjt0VxY5tvvXbJeX6cbEICAoG8E/0yVn/7PdaNHnGp 26zvMYB0SL5kylfi+g5ybVXhFHrFriCPJhGq66iSdA==
X-Google-Smtp-Source: APXvYqygAZk1z6KfuP5SsHPTbAEAMpmkhESe5i72oxlxktWIThuyPzU99yliwjdZTxEWy4NnGQtVgrTTVdfZRloDijQ=
X-Received: by 2002:a05:6830:1f12:: with SMTP id u18mr21011272otg.255.1568039475275; Mon, 09 Sep 2019 07:31:15 -0700 (PDT)
MIME-Version: 1.0
References: <CADVnQynJY8xkqkghhCWhPpbF+4Ev_3c7OZf_tDEb_J5xr0FV9A@mail.gmail.com> <c4b76af5-abbe-3184-24ce-03a2c0b9544b@it.uc3m.es> <CADVnQy=3FqEjqipX6thgjcjN8YPTOqiduYKU2GccXHwS+a3wVA@mail.gmail.com> <be98e323-506f-bdc8-a128-72c9f4aa5ead@it.uc3m.es> <CADVnQykxubLk-o02im1T-Xd_6BT=TT6nSSYfP2MDPFO5hAorfA@mail.gmail.com> <2f779633-f8d6-3af4-78b0-b968e8f47ac5@it.uc3m.es>
In-Reply-To: <2f779633-f8d6-3af4-78b0-b968e8f47ac5@it.uc3m.es>
From: Neal Cardwell <ncardwell@google.com>
Date: Mon, 09 Sep 2019 10:30:58 -0400
Message-ID: <CADVnQymt0a+j_HA=FsYtwW8pf_fEWH087gp-8zgVhODcHSziAg@mail.gmail.com>
To: marcelo bagnulo braun <marcelo@it.uc3m.es>
Cc: iccrg IRTF list <iccrg@irtf.org>, Praveen Balasubramanian <pravb@microsoft.com>, Yuchung Cheng <ycheng@google.com>, BBR Development <bbr-dev@googlegroups.com>
Content-Type: multipart/alternative; boundary="000000000000eb4f7d05921fa3f4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/QZmE216Iv7P4-oM7mpVKK0uh7t8>
Subject: Re: [iccrg] LEDBAT++, rLEDBAT, and slowdowns
X-BeenThere: iccrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of Internet Congestion Control Research Group \(ICCRG\)" <iccrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/iccrg>, <mailto:iccrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/iccrg/>
List-Post: <mailto:iccrg@irtf.org>
List-Help: <mailto:iccrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/iccrg>, <mailto:iccrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Sep 2019 14:31:18 -0000

[+bbr-dev since this iccrg list thread is getting into a discussion of the
detailed dynamics of PROBE_RTT]

On Mon, Sep 9, 2019 at 6:01 AM marcelo bagnulo braun <marcelo@it.uc3m.es>
wrote:

> below...
>
> El 06/09/19 a las 17:02, Neal Cardwell escribió:
> >
> > Yes, rather than synchronizing using losses, I had something in mind
> along
> > the lines I proposed earlier in this thread; something like
> > BBR's PROBE_RTT mechanism:
> >
> >     ... the
> >     only approach I can think of to achieve this would be for the flows
> to
> >     coordinate in some fashion to all do a slowdown during overlapping
> >     time intervals (as in BBR's PROBE_RTT mechanism).
> >
> > The details of BBR's PROBE_RTT mechanism are described in:
> >
> https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congestion-control-00#section-4.3.5
>
>
> Thanks for the pointer. So, if i understand this correctly, essentially
> the proposed mechanism is that the flow slows down 10 seconds after the
> last update in the estimation of the base delay.
>
> Is this it or am i missing something?
>

Yes, that's it, exactly.


> I understand that the expectation is that if there are several flows,
> the slow down of one them will cause the update of the base delay of
> several flows. These flows are now synchronized, so when they all slow
> down at the same time, they will cause even more flows to update their
> base delay estimation, and eventually all of them will be synchronized
> in their slow downs and then properly assess the base delay.
>

Yes, that's the basic flavor of the dynamics.


> If this is so, I guess one question is how to know for sure that when
> they are in the uncoordinated state, the slow down of one of them will
> cause the update of the base delay of other flows, no?
>

In the common case BBR PROBE_RTT works in a way that can be broken down
into two parts:

(1) How does the algorithm coordinate the phase of multiple bulk flows to
enter PROBE_RTT at overlapping time periods? This coordination works
because if there are multiple bulk flows sharing a bottleneck, they will
usually tend to see the same pattern of increases and decreases in RTT, due
to the variation of queuing delay over time (they will all tend to see the
same time series, merely shifted up or down by their two-way propagation
delay). So the flows will tend to be able to agree on the point in time
during the last 10-second window that had the lowest RTT. Then the flows
use that point in time as a coordination marker, and all schedule their
next PROBE_RTT relative to that point in time. Over time, usually 2-3
min_rtt windows of 10 seconds each, that allows coordinated decreases of
cwnd.

(2) How does a coordinated decrease of cwnd in PROBE_RTT expose the
approximate two-way propagation delay? If the multiple flows coordinate a
cut in cwnd, and that cut reduces the cwnds low enough and long enough that
the sum of the rates of the flows is less than the bottleneck bandwidth,
then the bottleneck queue can drain and the RTT samples can approximate
the  two-way propagation delay for each flow. It is easiest to visualize
the simplest case, where all flows have the same two-way propagation delay,
and then the ensemble of flows merely needs to pull the cwnds down far
enough to make the sum of cwnds less than the BDP of the path, and then
hold that condition for long enough for all flows to measure an
approximation of the queue-free RTT. With LEDBAT++ having a cwnd=2 (as in
https://tools.ietf.org/id/draft-balasubramanian-iccrg-ledbatplusplus-00.html )
or BBRv1 having a cwnd=4, the sum of the cwnds will tend to be less than
the BDP of the path in many common cases (up to tens or hundreds of flows
at typical low BDPs, or thousands of flows in high-speed WANs).

thanks,
neal