Re: [tcpm] Hystart and delay jitter

Neal Cardwell <ncardwell@google.com> Fri, 12 March 2021 22:18 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1B05F3A1646 for <tcpm@ietfa.amsl.com>; Fri, 12 Mar 2021 14:18:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.6
X-Spam-Level:
X-Spam-Status: No, score=-17.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qoOpR3zZWv7G for <tcpm@ietfa.amsl.com>; Fri, 12 Mar 2021 14:18:30 -0800 (PST)
Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com [IPv6:2607:f8b0:4864:20::932]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 71D4B3A163B for <tcpm@ietf.org>; Fri, 12 Mar 2021 14:18:30 -0800 (PST)
Received: by mail-ua1-x932.google.com with SMTP id g7so2273706uab.12 for <tcpm@ietf.org>; Fri, 12 Mar 2021 14:18:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8OaPESaGTxWAbx8nImHArRvnv3YTyrY0eqGyjJMifiY=; b=A6vP6lq+eG8qYEpbXrp3KpxJ8AL229nrRk8SwbaVY7mxz0dQSyOw1aLLG4UDlr/Fy5 uKKFFlyHvSLMjbquMlKUJlrtgZKKMsLsdfsWqFy1VOPdkbjsa19K8XV5ExwaWmpj6lHN kwH5eRSvyH6HPziq8hEi7+TRpSz3RyUGxTBc0+gDLLJjtGJ5NotbyWCtv76iRmxUEmam PvkSLTRiKs8XOc5F0x+GAayhNRjugta02QeuTEjPPqe6g1XtSoJKzmlfnnjQ2qf7FnJw leabVFWZZH0QZOPAuOFJ607lAxXZhTtFtFg6vqQ/swhq6eKMSlDGINWJwOJcadHmbXHp Z9GA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8OaPESaGTxWAbx8nImHArRvnv3YTyrY0eqGyjJMifiY=; b=OhTaiLUnmh8JJFPs5JSu68zbPoecvWzoA67lgqgcgMMpwp8gh8wRx2BLCtKTaWefMr 2TwMg0lc6QWaBkvYraRQzVV/RH+PNt4uIdtiAIsdAmciYoxP8UD4/p4rJ/dYUxEO33dR EXz9uKX2RLXogJrGd1Q2Qe2YqLF9/cHC25WLhOm76PKGpaOTpg652iT7bfLeSNF+++4G n4mIBjZfqDkAcnzmbCwZ7FX+kRNbWG9EkbbUl1wDII5nxc8Z7YR35eP0D4i0TE/9Em/s 88F8fm59wVtZW8iCgM4bqWfSLLB6vO1vh31Xj4RtkLd63QawQPSUdXUB1FiLiuKEe56w 97AA==
X-Gm-Message-State: AOAM530CPieYc6KgqT4IYzbcXM7QqHwhh78IgXdeONAUoPuUtEcrQGe4 iy4fN6tJ8SnaRbznnotzuaIDiD7ydZqppbMvC0YKwMHJmNBXhQ==
X-Google-Smtp-Source: ABdhPJyOzZpPBFlR1Z6e7x9HOCjRHwp3s2VGgYZO1w4hUUvFkAITJ1OjcdQSlla1LH1usH2EjwBxr0ptnSiO6adBgrQ=
X-Received: by 2002:ab0:6c8:: with SMTP id g66mr254348uag.63.1615587507648; Fri, 12 Mar 2021 14:18:27 -0800 (PST)
MIME-Version: 1.0
References: <376bdc9f-4774-bfc8-1736-6c94fb24953c@huitema.net>
In-Reply-To: <376bdc9f-4774-bfc8-1736-6c94fb24953c@huitema.net>
From: Neal Cardwell <ncardwell@google.com>
Date: Fri, 12 Mar 2021 17:18:10 -0500
Message-ID: <CADVnQymN6UH+XTgdkwdX16TsDTeeTu+S-=O1nVjQFWYbpDT24Q@mail.gmail.com>
To: Christian Huitema <huitema@huitema.net>
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/IPbF2-LNeX4cbsaQu5Zv3Mi36xY>
Subject: Re: [tcpm] Hystart and delay jitter
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Mar 2021 22:18:32 -0000

On Fri, Mar 12, 2021 at 4:29 PM Christian Huitema <huitema@huitema.net> wrote:
>
> Back in November 2019, when adding Cubic and Hystart to my
> implementation of QUIC, I noticed that Hystart was sensitive to delay
> jitter. Hystart detects the buildup of queues by monitoring the RTT.
> Some links experience delay jitter, caused for example by access
> protocol for shared radio links or possibly by link-local ARQ protocols.
> The delay jitter can cause Hystart to make the wrong decision, in two ways:
>
> 1) Delay jitter during a previous period could cause some packets to be
> delivered "faster than usual", causing Hystart to under-estimate the min
> RTT for that period.
>
> 2) Delay jitter during the currently measured period can cause packets
> to be delivered "slower than usual",  causing Hystart to over-estimate
> the min RTT for that period.
>
> The combination of these two issues may cause Hystart to make the wrong
> decisions, and exit slow start at levels well below link capacity.

Yes, we have found in both our production experience and controlled
experiments that the Hystart-Delay algorithm is very susceptible to
spurious triggering from jitter, particularly in LTE and wifi paths.

We discussed this a bit in Spring 2017 in the comparison of BBR's
bandwidth-based mechanism for exiting startup, vs Hystart-Delay's
delay-based mechanism:
  https://www.ietf.org/proceedings/98/slides/slides-98-iccrg-an-update-on-bbr-congestion-control-00.pdf#page=8

> The draft-ietf-tcpm-hystartplusplus-01 does have some protection against
> the second issue, because currentRoundMinRTT is computed on at least
> N_RTT_SAMPLE. If that number is large enough, computing the min over N
> samples should filter out "slower than usual" anomalies. However, the
> draft does not include a protection against "faster than usual"
> anomalies happening in the previous period. In my implementation, I
> protected against that by computing a "min of max" function: compute a
> rolling "MAX over N_RTT_SAMPLE", then compute the MIN value of that
> during the reference period, and use that to set the reference value
> "lastRoundMinRTT".
>
> I think it would be good to add a discussion of the effect of jitter to
> the hystart++ draft. In addition, we may also want to mention
> timestamps. The jitter on RTT may be caused by jitter on either
> direction of transmission -- data path or ACK path. The effect of jitter
> on the ACK path can be minimized if time stamps can be used to monitor
> the variation of one-way delays. This is not discussed in the current
> draft. Maybe it should be.

A discussion of the effect of jitter sounds like a great idea.

neal