Re: [tsvwg] L4S status: #17 Interaction w/ FQ AQMs

Jonathan Morton <chromatix99@gmail.com> Tue, 13 August 2019 01:30 UTC

Return-Path: <chromatix99@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AD0F512002E for <tsvwg@ietfa.amsl.com>; Mon, 12 Aug 2019 18:30:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.749
X-Spam-Level:
X-Spam-Status: No, score=-1.749 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0jPofvWgcf3U for <tsvwg@ietfa.amsl.com>; Mon, 12 Aug 2019 18:30:51 -0700 (PDT)
Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 41735120025 for <tsvwg@ietf.org>; Mon, 12 Aug 2019 18:30:51 -0700 (PDT)
Received: by mail-lf1-x135.google.com with SMTP id a30so12521812lfk.12 for <tsvwg@ietf.org>; Mon, 12 Aug 2019 18:30:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=pbPThFgfZT6/Bv/lU7tehUcSb83RvHpq5DwVzNKI5fo=; b=DXrnuPTepYDoKYs0t0OwHd3fkUQtvarZVoKF7+EynHzak9IlsxYqTKe/xcTfJCwvDh SRa7wykAOUV4vrRD3T1jI4uDkPJUPS/FgPif8l+Ne+FUPmFWsxmtnNvD/Pocp+/ImG8Z No8Njwd2t8H93M3nP4ZWa8eMb3+q1u4gbiWzEUCrWMeMeeailDJqyP/H5UnEgu83hvbY 1ZnroHvhUF4rlS7IwG/Nd1oEWj+3jmJZBtOn4fGyrC/cdXF0M9pXQy1lCJM0cDh8gegZ 8RqWudXro9r69/t676CWMn0Jz0ZisKlClUfcg+0f/Zkv5b0v37WQmvxlxms3h1JhjY1R /b8g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=pbPThFgfZT6/Bv/lU7tehUcSb83RvHpq5DwVzNKI5fo=; b=Tj1afLEv75p4UOBDCWfcEupTOL0BzlUAZux8UrPpWwLpb04WTr7FEjvsrRH0duuZho bbm+CAgczDd6lHoJ8X3aszyThAKrvtyPGzzqf6yG7rc7gSxTkGjojmU8Su/1ZLskFnmn 26OGx7XoveywRM1pAB+QmIRaBCfcHryBSHsNWvKG2SbmafxJLXuQPbXKtqBtPcPDS1+F AAirnztTqlZG9ynBDsNci4ATDsL1huWvVKlsLQUmraCkPWGiSMt6jWEUaJWbYd8X2Eve 5ogu24KFLcsrzvXjxnSRbyWuRy/vakCt8Yt7m/d60Dc9n9zojYa1pzmU1WqmTxJjRwhD yiyA==
X-Gm-Message-State: APjAAAWU98El1Q9DaRTr+XqUAbNOHJCX0qJQMtLNt+FNdqQ6nPmdSq/q LkO6tK4KIt2CdWPQw9tDvUA=
X-Google-Smtp-Source: APXvYqwydWoKHUHl4FySHD5Is+qOIWAbkQ8PG/c4XOG0vCxuE6TIfRZDO0KyZyJC7zj1PnYpVkJEcQ==
X-Received: by 2002:ac2:46f8:: with SMTP id q24mr21611291lfo.89.1565659849458; Mon, 12 Aug 2019 18:30:49 -0700 (PDT)
Received: from jonathartonsmbp.lan (83-245-237-193-nat-p.elisa-mobile.fi. [83.245.237.193]) by smtp.gmail.com with ESMTPSA id h10sm19372696lfp.33.2019.08.12.18.30.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Aug 2019 18:30:48 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <D2E12331-F504-4D5F-B8E7-A1A5E98DDF7E@cablelabs.com>
Date: Tue, 13 Aug 2019 04:30:47 +0300
Cc: Wesley Eddy <wes@mti-systems.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <2275E6A5-C8F8-477F-A24A-3E6168917DDF@gmail.com>
References: <8321f975-dfe7-694c-b5cc-09fa371b9b61@mti-systems.com> <B58A5572-510E-42C7-8181-42A0BE298393@gmail.com> <D2E12331-F504-4D5F-B8E7-A1A5E98DDF7E@cablelabs.com>
To: Greg White <g.white@CableLabs.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/3cCk39F1WQNyipmNXSWcyns4_VI>
Subject: Re: [tsvwg] L4S status: #17 Interaction w/ FQ AQMs
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Aug 2019 01:30:52 -0000

> On 13 Aug, 2019, at 3:00 am, Greg White <g.white@CableLabs.com> wrote:
> 
> I believe this is one of the scenarios that was slated for experimentation to observe the behavior.   I don't recall hearing that experiment was conducted.  

Right, the direct experiment was not conducted at Montreal, because the L4S testbed wasn't ready for it.

Accordingly, we have had to rely on extrapolations from other experiments conducted using as similar a setup as could conveniently be arranged, and our prior experience with the topology in question.  For instance, we had effectively set up such a topology in our AirBNB while were were in Montreal, simply by installing my IQrouter adjacent to the cable modem.

For the most relevant experiment for this case, we used Pete's three test boxes and my Raspberry Pi, running SCE kernels and a firewall rule to simulate TCP Prague's congestion response.

> But, I'm not sure I fully see what your concern is.  The concern I have in this scenario is that the sluggish response of the RFC3168 AQM (i.e. CoDel) in the fq_codel (or Cake) will cause the L4S sender to experience poor latency and in some cases underutilize the link.   Here is the explanation that I sent Sebastian some time ago.

> [Sebastian]     But let's assume I have two flows one L4S one RFC 3168, and my link is already at saturation with each flow getting 50% of the shaper's configured bandwidth…

Okay, this scenario covers a steady-state case.  I'm more concerned with transient behaviour, specifically ensuring that short-term changes in the traffic conditions result in only short-term effects on other traffic.  But there are a couple of fallacies in the following wall of text that I was able to spot, and which need squashing.

> I don't see how the RFC3168 sender would be triggered to reduce its cwnd below 50% of BDP.  (by an FQ-AQM, competing with one other bulk flow)

This is trivially demonstrable as soon as you have an RTT long enough for a given sampling tool to observe the effect reliably, and in fact you can see it in the SCE slide deck (in which there is one chart showing FQ-AQM behaviour).

RFC-3168 senders respond to single CE marks with a Multiplicative Decrease, and FQ-AQMs will provide these marks very shortly after the fair share is reached.  Assuming the standard 50% decrease is employed with NewReno, that means it will be sent down to about 25% link capacity, or more precisely, half the fair share - from where it must linearly grow back again.  CUBIC reduces to 70%, which means it is sent down to about 35% briefly, and recovers more quickly with its polynomial response.

> It seems as if your concern is that, when the RFC3168 flow cuts its cwnd to drain standing queue, that the L4S sender will shut out the RFC3168 sender on the access network link, and thus force the RFC3168 flow's FQ queue to drain completely.

No, the *eventual* convergence to fair shares in steady state is not in question.

FQ enforces it to the best of its ability (which, crucially, is determined by the traffic which makes it through the preceding network elements), and Codel (at least in COBALT form, which is what we tested) does eventually ramp up its marking rate to the point where a DCTCP response function is properly controlled.  But it takes four whole seconds to get there when a new L4S flow starts up, with default settings and the short inherent RTTs that the L4S team appears to favour.  Note that I'm not claiming that four-second queues exist, only that elevated queuing latency (peaking around 125ms) persists for four seconds.

A lot can happen in those four seconds, from the user's perspective.  It's long enough to seriously disrupt online gameplay, causing the player to miss crucial shots because his timing is thrown off by an eighth of a second.  VoIP streams will be forced to switch on their jitter buffers and pause the received audio stream to catch up, and will most likely never recover from that condition to the more realtime behaviour they previously enjoyed.  And so on.

Here is a simple experiment that should verify the existence and extent of the problem:

Control:
[Sender] -> [Baseline RTT 10ms] -> [FQ_Codel, 100% BW] -> [Receiver]

Subject:
[Sender] -> [Baseline RTT 10ms] -> [Dumb FIFO, 105% BW] -> [FQ_Codel, 100% BW] -> [Receiver]

Background traffic is a sparse latency-measuring flow, essentially a surrogate for gaming or VoIP.  The instantaneous latency experienced by this flow over time is the primary measurement.

The experiment is simply to start up one L4S flow in parallel with the sparse flow, and let it run in saturation for say 60 seconds.  Repeat with an RFC-3168 flow (NewReno, CUBIC, doesn't matter which) for a further experimental control.  Flent offers a convenient method of doing this.

Correct behaviour would show a brief latency peak caused by the interaction of slow-start with the FIFO in the subject topology, or no peak at all for the control topology; you should see this for whichever RFC-3168 flow is chosen as the control.  Expected results with L4S in the subject topology, however, are a peak extending about 4 seconds before returning to baseline.

Please let me know how you get on with reproducing this result.

 - Jonathan Morton