[EToSat] Congestion control beyond initial BDP

Christian Huitema <huitema@huitema.net> Wed, 28 July 2021 19:06 UTC

Return-Path: <huitema@huitema.net>
X-Original-To: etosat@ietfa.amsl.com
Delivered-To: etosat@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 50E6B3A1C15 for <etosat@ietfa.amsl.com>; Wed, 28 Jul 2021 12:06:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.589
X-Spam-Level:
X-Spam-Status: No, score=-2.589 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eqOinGzifeL1 for <etosat@ietfa.amsl.com>; Wed, 28 Jul 2021 12:06:31 -0700 (PDT)
Received: from mx36-out10.antispamcloud.com (mx36-out10.antispamcloud.com [209.126.121.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8471C3A1C14 for <etosat@ietf.org>; Wed, 28 Jul 2021 12:06:31 -0700 (PDT)
Received: from xse121.mail2web.com ([66.113.196.121] helo=xse.mail2web.com) by mx136.antispamcloud.com with esmtp (Exim 4.92) (envelope-from <huitema@huitema.net>) id 1m8ost-000Yae-QH for etosat@ietf.org; Wed, 28 Jul 2021 21:06:29 +0200
Received: from xsmtp21.mail2web.com (unknown [10.100.68.60]) by xse.mail2web.com (Postfix) with ESMTPS id 4GZjnC6fNqz9k1 for <etosat@ietf.org>; Wed, 28 Jul 2021 12:06:23 -0700 (PDT)
Received: from [10.5.2.17] (helo=xmail07.myhosting.com) by xsmtp21.mail2web.com with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.92) (envelope-from <huitema@huitema.net>) id 1m8osp-00089x-Py for etosat@ietf.org; Wed, 28 Jul 2021 12:06:23 -0700
Received: (qmail 26400 invoked from network); 28 Jul 2021 19:06:23 -0000
Received: from unknown (HELO [192.168.1.104]) (Authenticated-user:_huitema@huitema.net@[172.58.43.253]) (envelope-sender <huitema@huitema.net>) by xmail07.myhosting.com (qmail-ldap-1.03) with ESMTPA for <etosat@ietf.org>; 28 Jul 2021 19:06:22 -0000
To: "etosat@ietf.org" <etosat@ietf.org>
From: Christian Huitema <huitema@huitema.net>
Message-ID: <c6ea1df5-f7c0-e6c7-decc-39247759fddb@huitema.net>
Date: Wed, 28 Jul 2021 12:06:23 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Language: en-US
X-Originating-IP: 66.113.196.121
X-Spampanel-Domain: xsmtpout.mail2web.com
X-Spampanel-Username: 66.113.196.0/24
Authentication-Results: antispamcloud.com; auth=pass smtp.auth=66.113.196.0/24@xsmtpout.mail2web.com
X-Spampanel-Outgoing-Class: unsure
X-Spampanel-Outgoing-Evidence: Combined (0.51)
X-Recommended-Action: accept
X-Filter-ID: Pt3MvcO5N4iKaDQ5O6lkdGlMVN6RH8bjRMzItlySaT/NfiY01LUK44C0jcA8c/VOPUtbdvnXkggZ 3YnVId/Y5jcf0yeVQAvfjHznO7+bT5zGFFoTDPM8FUgarAdxxdv742UuDhyzVYcwl2RB+0AaenvP 5/YslDAceeDYSGzxaRgh55uqY3MhMgFAHq5BxPxPXn36fLqvhISQ5ykyqUZqUd1jhnM/Mbva2XLV /LIEzaL2KoAZhJekBPedneT7f699UuS9Bt/28NzUpT//Ljs5DIPAgTtUp75uqlx0KezvZHV8InZ+ 7lKRjU3tJ9MJeN7wWQaaSSaRcFTFxaRvADgOuFdAU5fRzM/QzQW9/IoH33AG8ECuCwECazCwODtO F78PiyQEs+dlGXUJLWZ+Gc08Nmllke3azHdKmySKNUVQl4ntlVxnbS8qIO7oudHyb2T1VQ58xe/l rqiRGalI3YPsxOTrFXToVyBmRCgQVX6zVyFUu8qzeMQP6uTHL0d9UjfY+eX5ZvcELCIKs663F/co VFYFvf25LVONYbYifH5OzZDcG6hsRQZiAIgw+z837AqgX7ewI8e1h7RITgN14BHmGVt/ReJ9Mfhz zmbKTH7wI9GEU1utNskUAORCV2WFZX0jUeJfX5HDIsTH89Prkv7CGV7lLXQUcNAszDsnoUOr0Bgv tOd8godqGxSFX+VnWWnAOI+gTB/pfSlbi1HgG7umZzYYs4qkxKLSV4C340uY5KqGbN7BITAZon7Z Iz1ONK9yUo4/+EUytKrR9Md9I2Rs15ZrklNnvCCvbv98fhU25clPCC/cRgvQKtcrMMueERx3tpbu TSP7FNN6dj7qocLcyr3rLZWjmV4do4Z+2jzmuwOIaIzNoZzswxuMaWjBAlpwGwbLDw1Fkbdr9nDC 8M0BZPUf9oDBqtClgM5jH/om1Q5UomG0v+rwIiID/kwKc8V5Tj9+FRkaOS/DNjANmb8tO61SbYdY AwdpaVzHW7wHO7YhEWyJzIkwSFAW0Pw8uiKeubcolFl/rX+2ReQklqJDASQX2Id+W5hjJNcdGs0+ iHjXODmj5PX/tZQU3bYnWKpb
X-Report-Abuse-To: spam@quarantine11.antispamcloud.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/etosat/1f3tJE3Zg57Vgx3uRi8J9B-gDZA>
Subject: [EToSat] Congestion control beyond initial BDP
X-BeenThere: etosat@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "The EToSat list is a non-WG mailing list used to discuss performance implications of running encrypted transports such as QUIC over satellite." <etosat.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/etosat>, <mailto:etosat-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/etosat/>
List-Post: <mailto:etosat@ietf.org>
List-Help: <mailto:etosat-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/etosat>, <mailto:etosat-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jul 2021 19:06:33 -0000

I have been reviewing and tuning the implementation of the QUIC/BDP 
draft in picoquic, adding a bunch of tests based on simulations. The 
tests are finding some interesting issues. My first fixes were to look 
at teh way the BDP is "plumbed" at the beginning of the connection. I 
ended up doing the following:

1) For new Reno and Cubic, just setting the CWIN was not sufficient. If 
the CWIN is set to the BDP and slow start or high start continue, the 
window will wildly overshoot after 1 RTT, causing lots of losses and a 
degradation in performance. To fix that, I had to set both CWIN and 
SSTHRESH, and exit slow-start immediately.

2) For BBR, the same overshoot issue happens, and is solved by setting 
the bottlenect bandwidth from BDP and then moving to the "probe 
bandwidth" state.

3) In all cases, we have to be careful to not just copy the value of the 
CWIN of the previous connection in the BDP, because CWIN is sometimes 
way too big, for example at the end of slow-start. Instead, I checked 
the estimated bandwidth, obtained from the arrival of ACKs, and set the 
BDP to estimated bandwidth times RTT.

With that, yes, remembering BDP from previous connections works as 
expected in the initial phase of the connection, but no, this is not 
sufficient to properly handle high BDP links. Consider the following:

1) When using New Reno, the window is set at the BDP value, and then 
appears to never change. It does in fact change, but very slowly: 
increase by one packet every RTT. The tests last just a few seconds, so 
this means increase by just a few packets, which is hardly noticeable. 
This also means that if the initial BDP is too small, it is going to 
remain small for the duration of the session.

2) When using BBR, the bottleneck bandwidth is set from the BDP value. 
There were a few tests in which that BDP value was too high. It took me 
some time to understand why it remained too high for a long time, but 
that can be traced to the "10 RTT filter" implemented in BBR: the pacing 
rate is set to the highest bandwidth observed in the last 10 RTT. So, it 
takes at least 6 seconds for BBR to adjust from an overoptimistic 
initial setting. It will also take a long time to adjust from a low 
initial setting, because BBR uses RTT-length epochs, and only tests for 
additional 25% bandwidth in some of these epochs. Still, that would be 
better than New Reno.

3) When using Cubic, the initial ssthresh is set from the BDP, and then 
Cubic moves to the congestion avoidance mode. I observe a series of 
epoch each ending with high number of losses. That may be due to a bug 
in my implementation, but there is also the epoch effect: Cubic epochs 
are set as a function of the RTT, and thus Cubic reacts less well when 
the RTT is large.

Bottom line, fixing the BDP helps but is not sufficient. New Reno is 
probably hopeless -- Sally Floyd's work of years ago still applies. 
Cubic currently does not work well, which means we need specific 
investigations, and then specific recommendations. BBR sort of works, 
but some additional tuning would help. It would probably be a good idea 
to provide specific "implementation recommendations" for the popular 
congestion control algorithms.

-- Christian Huitema