Re: [tsvwg] L4S status: #17 Interaction w/ FQ AQMs

Jonathan Morton <chromatix99@gmail.com> Thu, 07 November 2019 03:15 UTC

Return-Path: <chromatix99@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 79B7C12022D for <tsvwg@ietfa.amsl.com>; Wed, 6 Nov 2019 19:15:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.748
X-Spam-Level:
X-Spam-Status: No, score=-1.748 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bNSmxaDOLsxn for <tsvwg@ietfa.amsl.com>; Wed, 6 Nov 2019 19:15:01 -0800 (PST)
Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C23FA1200F3 for <tsvwg@ietf.org>; Wed, 6 Nov 2019 19:15:00 -0800 (PST)
Received: by mail-lf1-x12a.google.com with SMTP id v4so349646lfd.11 for <tsvwg@ietf.org>; Wed, 06 Nov 2019 19:15:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=0sRfG1qxOn0Mr1DqvwTgTngbAWjmPAAmTdVYZh/YQVg=; b=T6Xw4WzpmMbr17s+S4BVlWK+7gXCUbjKRwMqG1rq81MhAAkperLDhMgPb0NVT/gVQu JX+jyTfVz+0ocx42MUqk4wD8aAMMssmPHjFPCD3pnvKi7E9LtonfreqF4QUo1+MsyhhG byL/hgUN4ff2qxNjMebaWZeXKq6PARgJ8NlM790oCSxkif2geLzcb/0+mjiGwg4g3/2X slpQi7PmN3zfiMb/1s/e8kHeKwhzjwhHgv1GbL//2wh/BXLiaxzxUaIKsZLnpvqY06/E rtYauxi7+DX83OOd58l/6/gyCKHyLq9sXBy49K8QE8/K5mvWbHJbJO1ZNWz60SO74RVl tDuA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=0sRfG1qxOn0Mr1DqvwTgTngbAWjmPAAmTdVYZh/YQVg=; b=RLOotGQldz29+tY3dc/xkKPCVjwilPqhxEVR0WhpX7o3kqD60jjygBN5l/Kl2rDYjm 2Jl6rDCdJiMlF46bMxnRpca8M+3oebQGSYexhuDBMn5qVeFIu7hcwaRkaoFcyPxyOlzH F7J3iRBZ6O14s5W2PaKlM/A0IPh/D2eaOIwbt9b/Z2GT4CdOJGT2SOwgh+1ZvDnSYuSJ aXOl0YZupodPVpn8ZHLyRhtwGeYD2GyWI/LBFDtKUf4e6M0bOAE+IfGgzfw/Wy00PNA5 OrWev5Brx8bivsbkU+hYbc7WdhbCnMHfMrBuI9ZmT7LRgRgxPhaST78zTX9hrzKZJOkt s3Jw==
X-Gm-Message-State: APjAAAWBK3hN2333LMmbrdrXz4jhKMjKsB9xLXUgbpsaUWJ5Ro8bIdwY FGUjTeaf7iprttWDH4GZKzY=
X-Google-Smtp-Source: APXvYqwM1QnCRXKs8vsnuEhL6ZASmIjP3GruU0/l9P1013skW8dW6G8GhPcEAonbzjtLJeW6Y2pJVg==
X-Received: by 2002:a19:8196:: with SMTP id c144mr510056lfd.129.1573096499018; Wed, 06 Nov 2019 19:14:59 -0800 (PST)
Received: from jonathartonsmbp.lan (85-76-23-24-nat.elisa-mobile.fi. [85.76.23.24]) by smtp.gmail.com with ESMTPSA id i22sm263873ljg.94.2019.11.06.19.14.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Nov 2019 19:14:58 -0800 (PST)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <D5D560CB-BC47-45BE-811E-E73E2D4909E3@cablelabs.com>
Date: Thu, 07 Nov 2019 05:14:56 +0200
Cc: Sebastian Moeller <moeller0@gmx.de>, "tsvwg@ietf.org" <tsvwg@ietf.org>, Pete Heist <pete@heistp.net>
Content-Transfer-Encoding: quoted-printable
Message-Id: <090EDC6E-7B69-401D-931D-E9C3101E68DD@gmail.com>
References: <8321f975-dfe7-694c-b5cc-09fa371b9b61@mti-systems.com> <B58A5572-510E-42C7-8181-42A0BE298393@gmail.com> <D2E12331-F504-4D5F-B8E7-A1A5E98DDF7E@cablelabs.com> <2275E6A5-C8F8-477F-A24A-3E6168917DDF@gmail.com> <55F724CD-6E74-40D9-8416-D1918C2008DD@cablelabs.com> <BBE7C7A9-0222-4D84-BF27-8D5CAE2F995E@gmail.com> <6f189711-ffa0-90f4-fd16-3464ba4df3ce@mti-systems.com> <4A706B11-3239-4DAC-BE85-0B4BFF2D8FF8@heistp.net> <8B28ECE4-FF4B-4BB2-ACBE-80B30708F97E@cablelabs.com> <AAEA9AC2-B8A1-4837-A7C9-8EEA21A7C523@gmx.de> <D5D560CB-BC47-45BE-811E-E73E2D4909E3@cablelabs.com>
To: Greg White <g.white@CableLabs.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/Ptb9uxFc7q2MdLuKRqzmpl6Dg9k>
Subject: Re: [tsvwg] L4S status: #17 Interaction w/ FQ AQMs
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Nov 2019 03:15:03 -0000

> On 6 Nov, 2019, at 6:23 pm, Greg White <g.white@CableLabs.com> wrote:
> 
> - the main result of concern was due to a bug in initializing the value of TCP Prague alpha, which has been fixed and demonstrated to resolve the latency impact that was spanning multiple seconds

Assuming this is still the same Alpha variable as in DCTCP, I take it that you are pre-loading the EWMA over the observed CE marking rate so that, upon first exiting slow-start, you're guaranteed to have a sharp reduction of cwnd.

If I'm understanding that correctly, I would characterise it as treating the symptom rather than the cause.  I do see the Flent graphs you've produced from the new version, and I find it interesting that you apparently now need a period of linear cwnd growth to converge on the actual path capacity after exiting slow-start, indicating a large negative-going overshoot.  I think this sort of artefact is illustrative of a poor match of solution to problem.

SCE's approach to halving the send rate immediately on slow-start exit is to tune the pacing scale factors, as described in the final slide from my Montreal presentation.  It strikes me that this would have the same effect for you, without needing to initialise Alpha to an unnatural value.

May I now draw your attention to the two-flow tests from Pete Heist's collection, in which the two flows are started 10 seconds apart?  I don't immediately see how this particular class of workarounds deals with the problem of needing to sharply reduce send rate in the first flow when the second flow starts, as this is not marked by an exit from slow-start, and by then the EWMA will have decayed to a relatively low steady-state value.  In this case the traffic collecting in the dumb FIFO will be a mixture of both flows.  SCE's approach to this problem utilises the original definition of CE, which SCE preserves but L4S does not.

Since I see that this workaround has been applied to the public TCP Prague repo a few days ago, we should be able to check this behaviour ourselves fairly soon.

> - the remaining short duration latency spike in the FIFO queue is seen in *all* congestion control variants tested, including BBRv1, NewReno, and Cubic, and is not specific to Prague

This is true, and we don't hold that against you.  It's an unfortunate property of this particular network topology.  The aim is to avoid making it worse or introducing additional problems.

> - if the CoDel queue is upgraded to perform Immediate AQM on L4S flows, the latency spike can be largely avoided.

This is not relevant, since the point of this exercise is to establish the extent of L4S' compatibility with existing, unmodified networks, in which Codel happens to be the most widely deployed AQM.  You cannot reasonably expect the entire Internet to "upgrade" to L4S compatible AQMs before you can safely begin deployment.

With that said, SCE behaves similarly when the FQ-AQM is SCE enabled.  Note that the AQM doesn't need to know whether the traffic is SCE-aware in order to safely apply the SCE signal, because the SCE and CE signals are not mutually ambiguous.

> We invite a thorough review of the work, but believe that this closes issue #17.

I think that remains to be established with more thorough testing, since a theoretical regression still appears to exist, and we need to be sure whether it's real or not.

Of course, if you are able to robustly solve issue #16, then it is likely that issue #17 will also go away.  I've skimmed your paper on the subject in the past couple of days, and I'm honestly surprised that you are only tackling that problem so late in the development cycle.  Again, we stand ready to test your solutions when they are properly implemented and thus testable.

 - Jonathan Morton