Re: [tcpm] Hystart survey of large server operators

Yuchung Cheng <ycheng@google.com> Wed, 28 July 2021 23:35 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E92F33A0A9A for <tcpm@ietfa.amsl.com>; Wed, 28 Jul 2021 16:35:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -18.096
X-Spam-Level:
X-Spam-Status: No, score=-18.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.499, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S5iuJMZ0lvXg for <tcpm@ietfa.amsl.com>; Wed, 28 Jul 2021 16:35:53 -0700 (PDT)
Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F28A63A0A9C for <tcpm@ietf.org>; Wed, 28 Jul 2021 16:35:52 -0700 (PDT)
Received: by mail-wm1-x330.google.com with SMTP id e25-20020a05600c4b99b0290253418ba0fbso2760207wmp.1 for <tcpm@ietf.org>; Wed, 28 Jul 2021 16:35:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gApTucgVNPHrWD54YxSivcKID7ZkMCaOSRlIyyfa8P4=; b=i86HC68c8SrNjXv6lck0v34bDZ1qYjrGP2S1xG0lhn8hyE1LqxN6THIGDe2U8UWLQV go4VYi/aW8M73wB9hTgPcr/gy61ckDn3lKA0BGO5xCJ6NL0u/xC9zkRjgpsfpAqgZWKi 81yrh49x8s+9g49GMqKDrBFeEFsLsVOrsnH2FBJzO4/WRC4eNNz4nlC6DV1KgPUJzsPC PMPRgIX81jasnc03AWlsgv4iCj7tmyizSzTPJTi020h10BKVhmASARIqdCHx05jNpK8T trODlabq3UsHwAgfrp8yYPSEaChKDgmsboU+Dz2qCvqG54eQVGhu0MwKEv57/0S4MHig QlcA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gApTucgVNPHrWD54YxSivcKID7ZkMCaOSRlIyyfa8P4=; b=cnX96KTpNxBwCK3eeCDuo+rBNw5mVG0WYyxss5Pd9sG8MkwHmh1H7SwaP5Rm7t4jXO 0OO90qgd0FDPik3PZSgjdZIVuscFI0eRm2CQvYy79L/nXbCsGeJUPqYQcreBxeBtp1Hy WrEuDx/QfdHPNWgw8iri6WNIyJWFc20uSE3T94yol6IvLjmzFSSvC/kXtGIOTUmal7ee 3iQ5JBF0IYViEqHtlP+KhPC4M3r4VcuHfV3YTaLSBAXQn4iZAIW1yTmXaqwKIY5S442K PKE1xO/zWvUIy1MlkkMTMxIgttmoGtr/8onV73VZdfSJk8E77ypT8mKrlQxOKIeaKDLi E6ew==
X-Gm-Message-State: AOAM5335w4MJYoAZEMR3K8v0+bxmR//GrnoizPK6xaPbdg40BYkzpovN VoPGMf32kyTkuttv5hK/BWgSvck5wln4s0hyCswuOg==
X-Google-Smtp-Source: ABdhPJyYrGCookS4yaqG9N775bo5YKTtGjtbBfjbLgLttF9xClNGam2d8KkCa8UAqgPgBDDOd0bmOxZ8bY+1aiZXp6Y=
X-Received: by 2002:a7b:ce10:: with SMTP id m16mr11206926wmc.75.1627515346072; Wed, 28 Jul 2021 16:35:46 -0700 (PDT)
MIME-Version: 1.0
References: <162610476442.30543.4667406094304409800@ietfa.amsl.com> <98289918-67d1-2be1-723d-2df66be46fac@bobbriscoe.net> <PH0PR00MB1030126A3220BC056A406490B6E99@PH0PR00MB1030.namprd00.prod.outlook.com> <PH0PR00MB1030E0697BE8E93074B901D2B6E99@PH0PR00MB1030.namprd00.prod.outlook.com> <84ac0f00-b828-2503-fd7c-0ef7c6465768@bobbriscoe.net> <CAK6E8=f7qKDs-MFr4G6bpz82Swn3iCJEkWLL5yr+vV8z9zMb=Q@mail.gmail.com> <004bfb54-32d6-0758-cd36-df52542c5a9d@bobbriscoe.net> <4c203f66-6429-d717-8213-8bdf9d3a7b2a@huitema.net>
In-Reply-To: <4c203f66-6429-d717-8213-8bdf9d3a7b2a@huitema.net>
From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 28 Jul 2021 16:35:08 -0700
Message-ID: <CAK6E8=dpdFjcEoPkMORw+CjFAmpo+ZtS-Scj6qh9oYSmbtZyWA@mail.gmail.com>
To: huitema@huitema.net
Cc: Bob Briscoe <ietf@bobbriscoe.net>, "tcpm@ietf.org" <tcpm@ietf.org>, "draft-ietf-tcpm-hystartplusplus@ietf.org" <draft-ietf-tcpm-hystartplusplus@ietf.org>, Neal Cardwell <ncardwell@google.com>
Content-Type: multipart/alternative; boundary="00000000000012737505c83772ed"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/ntrEWbskev-6QYRviFSBZ4H5Zq0>
Subject: Re: [tcpm] Hystart survey of large server operators
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jul 2021 23:35:59 -0000

same experience with Chrisitian

For more than half a decade Google traffic used Linux CUBIC Hystart. This
includes google.com and YouTube public Internet traffic, as well as
internal Google TCP traffic. When we deployed pacing around 2013 we
disabled the Hystart ACK-train mechanism (that Hystart++ prohibits now) as
it causes false SS exit. But we continued to use the stock Hystart Delay
mechanism. ~2014-15 we switched to BBR which uses a different startup more
robust to delay jitters. But IMO the simpler hystart mitigating SS
over-shoot is still highly valuable for Cubic and other congestion
controls, as Internet bandwidth continues to hike.

Did those providers disable hystart b/c of its poor interaction between the
ack-train mode and pacing? Maybe it's time to disable the ack-train
approach in the upstream Linux when hystart++ is standardized.

On Wed, Jul 28, 2021 at 9:47 AM Christian Huitema <huitema@huitema.net>
wrote:

> I certainly do not qualify as a "large operator", but when implementing
> Cubic for QUIC I found that Hystart was critical for performance.
> Specifically, the classical slow-start is very prone to overshooting the
> capacity of the path, which causes large batches of errors. With web-like
> traffic, these batches of errors cause increase latency for the first
> transactions in a session, and thus a drop in perceived quality. Hystart
> solves that. With Hystart, I observed that a large fraction of sessions do
> not experience any packet loss at all.
>
> As an aside, we should strive for this suppression of packet losses.
> That's actually a big reason for moving from Cubic to BBR. Hystart
> suppresses the losses during the initial phase of the connection, but Cubic
> still relies on periodically testing the limit of path capacity and causing
> losses during the subsequent phase. BBR's probing for bottleneck bandwidth
> is much more conservative, does not cause such losses. It might be possible
> to adapt Cubic to also not cause losses, for example by ending an epoch
> early if too many CE marks are received or if the RTT increases. That would
> be worth trying.
>
> -- Christian Huitema
> On 7/28/2021 3:19 AM, Bob Briscoe wrote:
>
> Yuchung,
>
> It was during a Mar 2018 ad hoc workshop Jana had organized at the London
> IETF entitled 'BBR and the intersection with other work". I can't remember
> why I needed to know at the time, but during the break I approached the
> major server operators individually, established whether they used Cubic,
> and if so asked whether they used Hystart or disabled it. It's hard to
> anonymize the results, because IMMSMC all those that used Cubic said they
> disabled Hystart.
>
> If this isn't correct, then maybe people who replied at the time thought
> they were being asked something else. Or maybe it was correct then but
> isn't now.
>
>
> Bob
>
>
> On 28/07/2021 00:46, Yuchung Cheng wrote:
>
> Wait -- how is hystart "invariably disabled" in Linux (cubic)!?
>
> What data indicates that
>
> On Tue, Jul 27, 2021 at 4:37 PM Bob Briscoe <ietf@bobbriscoe.net
> <mailto:ietf@bobbriscoe.net> <ietf@bobbriscoe.net>> wrote:
>
>     Any large server operator out there who are using Cubic,
>     If you're willing to state whether or not your operations disable
>     Hystart, pls do so in reply.
>     Then Praveen can cite this mailing list thread in the Hystart++
>     draft, as requested below.
>
>     If you are willing to reply privately, I would be willing to keep
>     your confidences, and provide an anonymized result to the list.
>
>     Cheers
>
>
>
>     Bob
>
>     On 27/07/2021 01:50, Praveen Balasubramanian wrote:
>
>
>     Although Hystart is default enabled in Linux, it is invariably
>     disabled. So, it's misleading to just say Hystart is default
>     enabled, which implies it's widely used, when people clearly find
>     it has problems (which motivates Hystart++). I found this out
>     through an informal survey I did at the Mar'18 IETF in London by
>     asking round the implementers of the most prevalent stacks (I
>     would name names if I could find the note I later sent to someone
>     or to some list, but I can't find it - sry).
>
>
>     I'd like some citations on this versus anecdata if possible
>     before I add that caveat to the text. Do large deployments
>     disable this? I haven't come across this suggestion in any
>     Linux tuning guides to date.
>
>
>     --
> ________________________________________________________________
>     Bob Briscoehttp://bobbriscoe.net/  <http://bobbriscoe.net/>
> <http://bobbriscoe.net/>
>
>     _______________________________________________
>     tcpm mailing list
>     tcpm@ietf.org <mailto:tcpm@ietf.org> <tcpm@ietf.org>
>     https://www.ietf.org/mailman/listinfo/tcpm
>     <https://www.ietf.org/mailman/listinfo/tcpm>
> <https://www.ietf.org/mailman/listinfo/tcpm>
>
>
>
> _______________________________________________
> tcpm mailing listtcpm@ietf.orghttps://www.ietf.org/mailman/listinfo/tcpm
>
>