Re: [iccrg] LEDBAT++, rLEDBAT, and slowdowns

Praveen Balasubramanian <pravb@microsoft.com> Mon, 04 November 2019 23:29 UTC

Return-Path: <pravb@microsoft.com>
X-Original-To: iccrg@ietfa.amsl.com
Delivered-To: iccrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D4172120B94 for <iccrg@ietfa.amsl.com>; Mon, 4 Nov 2019 15:29:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=microsoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CFoTjY3V4gpM for <iccrg@ietfa.amsl.com>; Mon, 4 Nov 2019 15:29:35 -0800 (PST)
Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-eopbgr770099.outbound.protection.outlook.com [40.107.77.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 338AE120B8A for <iccrg@irtf.org>; Mon, 4 Nov 2019 15:29:35 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RyAN8UuxgpNII4YppOU8hVkJKoAcSO2hd8GuajKkuT5IgXGfEw5NPTMlyNHKv6AvFVAyGJ5eOSxoozLTgSwAle9wZ0KLeNMZO3VDbPdORZPlr9JYpuIt3tBOu5Jc3uLiFsJPa/mgSiwe7IPMZgdK2VPAZzck8jKQoeJbJtYpF6fuAS5oZpmTxxe420WlTMq+UvwcL7rZdX+HSRxd4vU5k13X7LtK00D4BZZHvgNkZ9g8whnkjksixfmUSvrgxQyQnMQFE8s7afEnM0EAILMbLETvWwtiN0r+NdjCT1xQCzlrHDXfDsAfxQdV7RhmrODNb94yg8Q8l/4WkWLF+NJ0Jg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LoSY+FXexinjPl9v/wubsb8G6MmETw44AzHYYKHP7kQ=; b=cYOSKxtJ/ie8qnyfMhOwG2Z16fUReL16eKkiB9JISScPTDJiwg9CieopV0ZqVsuR/LIMUk9Y/w5SUizKDXQfCJt8iEgJ8wrB8uL5x4YkgqFMfNKPxfpg2qb0SPFUd5m2zprv0WkpQYGJ3Dzql5ZmNuUT2dXcOSzBsfI2n6Q4jnMJwcFMP78yR54GPWSrgv3hB+8tYPHZToh5nixrnqzeB9sMp4WKKbcze9/S+Pf3xVv5725FbDgk7G7csBaFpS7uqQXgw5pDxDoYnxNidTIylk2652m14XADk77Omw4EegS+bn5UpMaph5EixIjSO/pny23uKLeSTMqeKYN80/QzBA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=microsoft.com; dmarc=pass action=none header.from=microsoft.com; dkim=pass header.d=microsoft.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LoSY+FXexinjPl9v/wubsb8G6MmETw44AzHYYKHP7kQ=; b=Ix0Yj89LH+VNgCpDIHw+Xa6ghy0yCUDT3td34lpZ73NpZCGs/1uq6O9EFVUWLFoCyEV5ro+xiymGQqItyEKKAr7PucF5aWd6wow0er5cUWI8E7g4Ug5gvR3WRV8+agW7K6Oy4HE7XFzZDiAx7ujdzdTvj9/pM2784slX3I0uBNc=
Received: from MW2PR2101MB1049.namprd21.prod.outlook.com (52.132.149.13) by MW2PR2101MB1113.namprd21.prod.outlook.com (52.132.149.30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2430.16; Mon, 4 Nov 2019 23:29:32 +0000
Received: from MW2PR2101MB1049.namprd21.prod.outlook.com ([fe80::d90d:509a:e6c8:9c19]) by MW2PR2101MB1049.namprd21.prod.outlook.com ([fe80::d90d:509a:e6c8:9c19%7]) with mapi id 15.20.2430.014; Mon, 4 Nov 2019 23:29:28 +0000
From: Praveen Balasubramanian <pravb@microsoft.com>
To: Matt Mathis <mattmathis@google.com>, marcelo bagnulo braun <marcelo@it.uc3m.es>
CC: iccrg IRTF list <iccrg@irtf.org>, Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>, Yuchung Cheng <ycheng@google.com>
Thread-Topic: [iccrg] LEDBAT++, rLEDBAT, and slowdowns
Thread-Index: AQHVQbfWQW3B2GKS0Eq5hEzSgM/TDqcZ6fiAgABka4CAAA6sAIABMuYAgAN7OwCAAJGRAIAC0lIg
Date: Mon, 04 Nov 2019 23:29:28 +0000
Message-ID: <MW2PR2101MB1049496B027E956ADF60A6E6B67F0@MW2PR2101MB1049.namprd21.prod.outlook.com>
References: <CADVnQynJY8xkqkghhCWhPpbF+4Ev_3c7OZf_tDEb_J5xr0FV9A@mail.gmail.com> <c4b76af5-abbe-3184-24ce-03a2c0b9544b@it.uc3m.es> <CADVnQy=3FqEjqipX6thgjcjN8YPTOqiduYKU2GccXHwS+a3wVA@mail.gmail.com> <be98e323-506f-bdc8-a128-72c9f4aa5ead@it.uc3m.es> <3862e86e-e588-0cca-38e8-4ea23ef2b4c7@it.uc3m.es> <CADVnQymqHHg4cF94fu81opaBibmShVrsZ3cLyVPcHir0p=Kv3Q@mail.gmail.com> <CAH56bmDKNVAwx=mF7uW6O_vuVdHvPGnw160EAYPaBRth=vE1DA@mail.gmail.com>
In-Reply-To: <CAH56bmDKNVAwx=mF7uW6O_vuVdHvPGnw160EAYPaBRth=vE1DA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Enabled=True; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SiteId=72f988bf-86f1-41af-91ab-2d7cd011db47; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Owner=pravb@ntdev.microsoft.com; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SetDate=2019-11-04T23:29:26.8650492Z; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Name=General; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Application=Microsoft Azure Information Protection; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_ActionId=b6ca0ff9-ef3a-471a-a5f1-81e444274a30; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Extended_MSFT_Method=Automatic
authentication-results: spf=none (sender IP is ) smtp.mailfrom=pravb@microsoft.com;
x-originating-ip: [2001:4898:80e8:0:6c7e:cda3:12a0:f9f8]
x-ms-publictraffictype: Email
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: b3172006-550c-408d-bef9-08d7617ed622
x-ms-traffictypediagnostic: MW2PR2101MB1113:
x-microsoft-antispam-prvs: <MW2PR2101MB1113EB075380BFEC08DC4A76B67F0@MW2PR2101MB1113.namprd21.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:7691;
x-forefront-prvs: 0211965D06
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(396003)(366004)(376002)(39860400002)(346002)(136003)(199004)(51914003)(189003)(8990500004)(86362001)(316002)(6116002)(790700001)(5660300002)(10090500001)(606006)(14444005)(74316002)(229853002)(256004)(33656002)(7736002)(25786009)(102836004)(9686003)(10290500003)(4326008)(46003)(6246003)(2906002)(54896002)(6306002)(55016002)(52536014)(236005)(14454004)(110136005)(486006)(22452003)(76116006)(186003)(478600001)(64756008)(66556008)(66946007)(66446008)(66476007)(81156014)(99286004)(53546011)(8676002)(7696005)(6506007)(81166006)(71200400001)(71190400001)(54906003)(446003)(966005)(11346002)(476003)(8936002)(6436002)(76176011); DIR:OUT; SFP:1102; SCL:1; SRVR:MW2PR2101MB1113; H:MW2PR2101MB1049.namprd21.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: Sh8GI/wLEu8wyxAOnALfhr6AVJWKFhKbJPsZp3geTIprySRDRVta7hZu3WLRNa3OP9Vs/HyN1fIc+CS5HYIMSTIeheOfghflyzCxC2Op1c3fCsCSZcaoAktbumAx5sLuVpHvfluGYajsQecQ2yo8rL/AQFUSTwdc7CGVAnyZ4Mx+Zf7j8rnkOyAN0KUcIX6MOLgMdPR7ycF2zBjzpD3FnaoBRvGVUbcTca4NxrXpecc0q3KjnOJL52ZrX/uKf/BJ9w47cWUHYzKB6FK4WLuBM5ZzOMg7i+gVr/4RGERSo5/KU6E6F0x6Ao29VbL/OMMCn0hXGT8SOqTDcUPmisXdbKIkhwiWWYEHOWV576qZWTKkvoqVAstWTfoFUD0zEClA+y3kmZEno8YOt0NB9YN1mMyy1VPGRsZwq+bMRPQ8ajABCFxb0RWdDlyWAX0/9gJE
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_MW2PR2101MB1049496B027E956ADF60A6E6B67F0MW2PR2101MB1049_"
MIME-Version: 1.0
X-OriginatorOrg: microsoft.com
X-MS-Exchange-CrossTenant-Network-Message-Id: b3172006-550c-408d-bef9-08d7617ed622
X-MS-Exchange-CrossTenant-originalarrivaltime: 04 Nov 2019 23:29:28.1440 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: FcvViQwOMkRU4trgRA+5pi3wOxphuWWWMPygWs8I6478Q8prSht0yR+X944XjIvfa2MiMZcxH9Dno32EpsstqSOdaMQ1EXeKMMxne8Rfpkg=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW2PR2101MB1113
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/VOYQoJ7-jMkvxtKUdgEYEMNQpoI>
Subject: Re: [iccrg] LEDBAT++, rLEDBAT, and slowdowns
X-BeenThere: iccrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of Internet Congestion Control Research Group \(ICCRG\)" <iccrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/iccrg>, <mailto:iccrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/iccrg/>
List-Post: <mailto:iccrg@irtf.org>
List-Help: <mailto:iccrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/iccrg>, <mailto:iccrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Nov 2019 23:29:47 -0000

Thanks for the feedback Neal, Matt and Marcelo.



LEDBAT++ does not make any guarantees that it will measure the base delay accurately. The base delay referred to here and in original LEDBAT is as measured by the flow. The original LEDBAT claims that random stalls will create an opportunity to observe the base delay. With LEDBAT++ the flows themselves add explicit effort to create such stalls. It will also be hard to maintain the property of 90% throughput perf target of standard TCP (in absence of contention) if there are too many slowdown periods. Maybe the slowdown entry could be triggered by a new minRTT measurement as well as throttled by the 90% heuristic of LEDBAT++.



In practice we don’t see the ratcheting effect to be pronounced. Initial measurements on large number of flows show that the queuing delay goes past 100 msec only when there are more than 500 LEDBAT++ flows contending. We need to dig into this more to understand why. But PROBE_RTT like mechanism is something certainly worth experimenting with in the future. And if the results look good we’d be happy to incorporate it into LEDBAT++.

From: Matt Mathis <mattmathis@google.com>
Sent: Friday, September 6, 2019 5:06 PM
To: marcelo bagnulo braun <marcelo@it.uc3m.es>; Praveen Balasubramanian <pravb@microsoft.com>
Cc: iccrg IRTF list <iccrg@irtf.org>; Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>; Yuchung Cheng <ycheng@google.com>
Subject: Re: [iccrg] LEDBAT++, rLEDBAT, and slowdowns

Adding to Neal's points: a robust distributed algorithm for each flow to compute is own minRTT is a known hard problem, going back a long long way.  I believe that it is the essence of the proof in "Flow Control Power is Nondecentralizable" [1] because no flow can tell if its observed min RTT includes other flows' standing queue.   BBR minimizes this problem by explicitly trying to globally synchronize minRTT measurements, which is demonstrably good enough for known cases.  If you can do better, please share and we can consider porting your algorithm to BBR, if you can't do better, please consider adopting the exact algorithm in BBR, so we can all synchronize the minRTT measurements.

BTW To the extent that the proof in [1] applies to BBR, it implies that the min RTT algorithm in BBR is also subject to pathological behavior under some unknown conditions.  Yes I do fret about this from time to time, but no I haven't found an example.  "Global synchronization" is hard, so the most likely failure modes look like strong regional synchronizations in different phases, and flows that traverse multiple regions.  This clearly might fail optimality, but I can't imagine how to cause divergent or latchup behaviors.  Under authentic workloads this is not a worry at all.

[1] JEFFREY M. JAFFE, 1981, Flow Control Power is Nondecentralizable, IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-29, NO. 9,

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of control;
            too weak risks being mistaken for tacit approval.


On Fri, Sep 6, 2019 at 8:26 AM Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org<mailto:40google.com@dmarc.ietf.org>> wrote:
On Wed, Sep 4, 2019 at 6:15 AM marcelo bagnulo braun <marcelo@it.uc3m.es<mailto:marcelo@it.uc3m.es>> wrote:
>
> Apologies for replying to my own email, but i would like to take
> something back, see below...
>
> El 03/09/19 a las 17:57, marcelo bagnulo braun escribió:
> > I guess what I'm trying to get at is that:
> >>
> >> (1) AFAICT the LEDBAT++ and rLEDBAT algorithms so far do not seem to
> >> provide a bound on "the additional queueing delay added by the LEDBAT
> >> flow", in the case where there are several  LEDBAT++/rLEDBAT flows and
> >> the fair share of in-flight data is less than the queue depth.
> >> Instead, the group dynamics would be such that the queue would
> >> gradually but steadily ratchet up until the buffer is full. This does
> >> not seem to meet the goals expressed in RFC6817 ("limiting the
> >> consequent increase in queueing delay on that path...avoiding
> >> interference with the network performance of competing flows") or the
> >> LEDBAT++/rLEDBAT drafts.
> >
> >
> > right, the additional queueing delay added by a single flow is
> > bounded, but if you keep on adding LEDBAT flows one after the other,
> > the total added queuing delay is not bounded indeed.
>
> Let me take this back.
>
> So, let's consider how would this situation you describe would happen.
>
> The scenario that I have been able to identify where this would happen
> would be for example the following.
>
> We have two LEDBAT++ flows (let's assume there is no other traffic for now).
>
> They have managed to accurately measure the base delay and thus they are
> fairly sharing the bottleneck link and inducing half of the target delay
> each.
>
> After n minutes, there is one of the flows (f1) that forgets the oldest
> minute of history, the one that contained the accurate measurement of
> the base delay.
>
> This implies that it will now record as base delay the minimum delay
> observed in the AIMD process, which is the one caused by both flows
> having half of the maximum window (I am assuming both flows are
> synchronized).
>
> The first consequence of this is that flow f1 will take a larger share
> of the bottleneck capacity.
>
> In any case, when the queuing delay measured by f1 is larger than the
> target, f1 will reduce its window to half. Depending on the relationship
> between the target delay and the real base delay, it is possible that
> when f1 reduces it window in half, the resulting delay is still larger
> than its current minimum, resulting in the increase of the queueing
> delay that you mentioned. Since this happens every n minutes,this can
> cause the queueing delay to grow unbounded.
>
> Is this the type of scenario you had in mind?

Yes, that is the type of scenario I had in mind.

Basically, if the fair share of inflight data is less than the standing queue,
then having a single flow cut its inflight will not drain the queue, and thus
no single flow can reveal the true two-way propagation delay by itself.

So an algorithm that relies on a windowed min_rtt estimate and individual
flows cutting by themselves will gradually ratchet up the level of queuing,
causing the queuing delay to grow without bounds.

> However, if we apply the slowdown described in LEDBAT++
> (unsynchronized), then f1 would reduce its window and because f1 is the
> one expelling the other flows out, this would allow f1 to have a more
> accurate measure of the base delay, enabling the AIMD fairness to kick
> in and preventing the increase of the queueing delay.

As noted above, if the fair share of inflight data is less than the
standing queue, when f1 reduces its window it will not drain the queue
fully, so its base delay estimate will ratchet up to a value that
includes the queueing delay from all the other flows queued at
the bottleneck, even if all the others are LEDBAT++ flows.


> In summary, because errors in estimating the queuing delay causes one
> flow to grow more than the others, when this particular flow slows down,
> will allow a more accurate measure of the base delay, enabling AIMD
> fairness mechanisms to reconverge to a fair split.
>
> So, I am not convinced that it is not enough with the slow down proposed
> in LEDBAT++.

When you say a slow-down "will allow a more accurate measure of the
base delay," I think a key question is: more accurate than what, exactly?
An uncoordinated slow-down will allow a more accurate measure
of the base delay than the most recent RTT samples, but it will not
allow a measure that is more accurate than (or even matching)
the current base delay estimate.That's because uncoordinated
slow-downs can cause feedback loops, where flows measure
ever-increasing RTTs from the queues created by other flows
that are also unable to "anchor" their base delay estimates, so
are also allowing ever-increasing amounts of data in flight.

> I am not sure this is true for all combinations of parameter values, i
> need to look into this.

Yes, I think further testing of LEDBAT++/rLEDBAT in multi-flow scenarios
would be worthwhile.

A simple scenario that should illustrate this issue:
 o RTT of 100ms
 o bottleneck with 10*BDP of buffer space (allowing 1 sec of queuing)
 o 3 bulk LEDBAT++/rLEDBAT flows

Per the dynamics discussed above, my theory is that in such a scenario
the queuing delay will grow until the buffer is oscillating near full and
there is ~1sec of queuing in the steady state. That is, rather than
"Low Extra Delay" there will be 1sec of extra delay. :-)

best,
neal

_______________________________________________
iccrg mailing list
iccrg@irtf.org<mailto:iccrg@irtf.org>
https://www.irtf.org/mailman/listinfo/iccrg<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.irtf.org%2Fmailman%2Flistinfo%2Ficcrg&data=02%7C01%7Cpravb%40microsoft.com%7Ce67fcb1b6a6147f6287a08d7332742ff%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637034116037542604&sdata=y%2F4tAkhd6sLU58spKZf1RyDFn4NEN32ilLVqm0k8DEk%3D&reserved=0>