Re: [tsvwg] New Version Notification for draft-white-tsvwg-l4sops-02.txt

Sebastian Moeller <moeller0@gmx.de> Mon, 15 March 2021 21:58 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4686B3A1059 for <tsvwg@ietfa.amsl.com>; Mon, 15 Mar 2021 14:58:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 4.852
X-Spam-Level: ****
X-Spam-Status: No, score=4.852 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, GB_SUMOF=5, PDS_SHORTFWD_URISHRT_QP=1.499, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JveZCKcL4CfV for <tsvwg@ietfa.amsl.com>; Mon, 15 Mar 2021 14:58:41 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6A5FD3A1065 for <tsvwg@ietf.org>; Mon, 15 Mar 2021 14:58:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1615845515; bh=i+O24S74cnSrsELmRbcEv+vml/OWVyo4cQlE/iLHQZE=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=kHI+rGzND94BYf9LCWz+TlOr4V6TZc5n4T1LFORtUv4Kklxwc3hm/JiZwFNqhK9hN rwAw8eEUznMIIIiK85mBDHUwna9qurKip5YiXCh1dfWruuOrgvEGsk38pr+VLLM8+y bfc0m1C6Yk0o2dduqM+VbPgX1UhzB0B1sfWJX8eA=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.42.229] ([77.3.26.131]) by mail.gmx.net (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1N8GMq-1lqWz51ShU-0148U7; Mon, 15 Mar 2021 22:58:35 +0100
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <694524BC-E3B8-4591-A735-01973766EE8D@cablelabs.com>
Date: Mon, 15 Mar 2021 22:58:34 +0100
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <8B6133E9-301F-4877-B260-45F8D9988EA9@gmx.de>
References: <161403643683.2598.12970012048226046349@ietfa.amsl.com> <694524BC-E3B8-4591-A735-01973766EE8D@cablelabs.com>
To: Greg White <g.white@cablelabs.com>
X-Mailer: Apple Mail (2.3445.104.17)
X-Provags-ID: V03:K1:iOvwIcfK8eXmg9b30EBIAveP4hz0+mhcc9W1pPWExWyR0XZ7dT2 9I2hh1dMYR4GNTUFL2WfEbseHhtY2gXA2K/TJi5+CbLMWti7OB61Z+zQL0J1TWR+t/tPAGB RXtvn5cAa6MzlYTRbC27iWBReecCWuoyIFc/0wDzJ+RmHDMfIZ1MSl23wszCUMNhB6MJlfb mUSKETb0UFStcu3wp9mCA==
X-UI-Out-Filterresults: notjunk:1;V03:K0:iyzsSWCKvHs=:UCzJgX890hlYVTwGbEQGm7 syZ0jEnCwI5wVxVVGi8HzMI30KgyYdYRJPGl8Sq0k5SeeEn/EeKOdVTReoRNhusvkFgmzzGiy w3ZumgqBufpfhH+xzYkV2eIDYq/lhkmLCyFLav9XaTrk5v7pFIdzlc+Ud6MKKvnxypxgy63xk f2U6eODLWMlORtsUT6LPGOmmQ6/nI8fTshW++J2Z6KrW8ZDnyuwrb8+L3d1RB4Q9KVtFf1Ck3 xkye3fJFMPxQyDYAZph6Cev24diAdDZYqPZZnrfToFLPrRd8G7q8K7i2fHjF9wvkgV4e92W9+ cu0eaocJdN3NT0jvHMW0hMQob+yso95paZe0ftxNncmOClKBFqPWwBrpa+w+Hch4iGmCn118a Rs2ME1GYtpNAl0Lw0Y1eIUTMA/fMPQGr6omjVzanEi7W/99zEJNseqkWjhEOD0uqUy/Y981GP 5BZy6ii1tmYhkSTzCU0kiEfIGfGzCwQ5TruM+ZUVQRM2woj1NkUcnaDUQdc1KWMQLjAs71W5T VmxJ7OdsluZUTh/WavPIv83dgXp0oBm7WiSD1kACxjZBlphe/iQw6rJOR6P4kvoEYzkqSLDA4 IinZNhomyD67supSBEA+unwjUyhVdGV8RyR/3P/RbX2hIod2fZsef51aUcHQ0TDStSoWQA4Cd oxBehbIN8OHjQMu9tBmHp9pQGhq1ShxX/uingn1ivlepjRrk+Yg/XNHLxn81ZuIRJkt+oHhAA nJ5eXz4GLUwiHA8wul7WtCfmpzLV5ianPfj+pF4ctEnpWf0RtcxJMoP+wZrmnGI7sfwFUATF5 ZKLlF6m9HJdpUShYTWBkKWqfVxsadOiaOJ2zCsZkTY4AmhFxU62WMC+1iP2ztFm5C1qIsXK+g jLB2V7576Swihux3zOyqi65bhQTvlBRdqaUeLQX3g=
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/NXEACkPX1pFriOtqsoifPzoI8Sk>
Subject: Re: [tsvwg] New Version Notification for draft-white-tsvwg-l4sops-02.txt
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Mar 2021 21:58:45 -0000

Dear List,

comments below prefixed with [SM] and quotes text from the draft:

> On Feb 23, 2021, at 00:32, Greg White <g.white@cablelabs.com> wrote:
> 
> All, 
> 
> Please see below.  An update to the L4S Operational Guidance draft has just been posted.
> 
> Best Regards,
> Greg
> 
> 
> 
> 
> On 2/22/21, 4:27 PM, "internet-drafts@ietf.org" <internet-drafts@ietf.org> wrote:
> 
> 
>    A new version of I-D, draft-white-tsvwg-l4sops-02.txt
>    has been successfully submitted by Greg White and posted to the
>    IETF repository.
> 
>    Name:		draft-white-tsvwg-l4sops
>    Revision:	02
>    Title:		Operational Guidance for Deployment of L4S in the Internet
>    Document date:	2021-02-22
>    Group:		Individual Submission
>    Pages:		13
>    URL:            https://www.ietf.org/archive/id/draft-white-tsvwg-l4sops-02.txt 
>    Status:         https://datatracker.ietf.org/doc/draft-white-tsvwg-l4sops/ 
>    Htmlized:       https://datatracker.ietf.org/doc/html/draft-white-tsvwg-l4sops 
>    Htmlized:       https://tools.ietf.org/html/draft-white-tsvwg-l4sops-02 
>    Diff:           https://www.ietf.org/rfcdiff?url2=draft-white-tsvwg-l4sops-02 
> 
>    Abstract:
>       This document is intended to provide additional guidance to operators
>       of end-systems, operators of networks, and researchers beyond that
>       provided in [I-D.ietf-tsvwg-ecn-l4s-id] and
>       [I-D.ietf-tsvwg-aqm-dualq-coupled] in order to ensure successful
>       deployment of L4S [I-D.ietf-tsvwg-l4s-arch] in the Internet.  The
>       focus of this document is on potential interactions between L4S flows
>       and Classic ECN ([RFC3168]) flows in Classic ECN bottleneck links.
>       The document discusses the potential outcomes of these interactions,
>       describes mechanisms to detect the presence of [RFC3168] bottlenecks,
>       and identifies opportunites to prevent and/or detect and resolve
>       fairness problems in such networks.
> 




       Operational Guidance for Deployment of L4S in the Internet
                      draft-white-tsvwg-l4sops-02

Abstract

   This document is intended to provide additional guidance to operators
   of end-systems, operators of networks, and researchers beyond that
   provided in [I-D.ietf-tsvwg-ecn-l4s-id] and
   [I-D.ietf-tsvwg-aqm-dualq-coupled] in order to ensure successful
   deployment of L4S [I-D.ietf-tsvwg-l4s-arch] in the Internet.

[SM] L4S is an EXPERIMENT first, so the guidance needs to address that, deployment is only part of the equation here. IMHO the draft needs to explicitly tackle what to do if the experiment is not deemed successful. So "deployment and potential removal of L4S". 


  The
   focus of this document is on potential interactions between L4S flows
   and Classic ECN ([RFC3168]) flows in Classic ECN bottleneck links.

[SM] Given the prevalence of FIFO bottlenecks normally as well as transiently (e.g. peak-hour transit/peering nodes) and the only existing L4S transport TCP Pragues abysmal performance versus CUBIC and itself at lower RTTs (as shown in https://github.com/heistp/l4s-tests Network bias figures 1, 2 3). This draft should also address the topic how L4S traffic is going to interact with non-L4S infrastructure.



   The document discusses the potential outcomes of these interactions,
   describes mechanisms to detect the presence of [RFC3168] bottlenecks,
   and identifies opportunites to prevent and/or detect and resolve
   fairness problems in such networks.


1.  Introduction

   Low-latency, low-loss, scalable throughput (L4S)
   [I-D.ietf-tsvwg-l4s-arch] traffic is designed to provide lower
   queuing delay than conventional traffic via a new network service
   based on a modified Explicit Congestion Notification (ECN) response
   from the network.  L4S traffic is identified by the ECT(1) codepoint,
   and network bottlenecks that support L4S should congestion-mark
   ECT(1) packets to enable L4S congestion feedback.  However, L4S
   traffic is also expected to coexist well with classic congestion
   controlled traffic even if the bottleneck queue does not support L4S.


[SM] Except it does not, again see (https://github.com/heistp/l4s-tests, Network bias figure 1-3) TCP Prague gets pummeled in a FIFO bottleneck by CUBIC. Not a good sign when sentence 3 of the introduction is in disagreement with published data. Yes, that is a safe failure mode for the rest of the internet, but also a data point indicating that the L4S expectations and predictions might be all too rosy.



   This includes paths where the bottleneck link utilizes packet drops
   in response to congestion (either due to buffer overrun or active
   queue management), as well as paths that implement a 'flow-queuing'
   scheduler such as fq_codel [RFC8290].  A potential area of poor
   interoperability lies in network bottlenecks employing a shared queue
   that implements an Active Queue Management (AQM) algorithm that
   provides Explicit Congestion Notification signaling according to
   [RFC3168].  Although RFC3168 has been updated (via [RFC8311]) to
   reserve ECT(1) for experimental use only (also see [IANA-ECN]), and
   its use for L4S has been specified in [I-D.ietf-tsvwg-ecn-l4s-id],
   not all deployed queues have been updated accordingly.  

[SM] "Not all"? I can guarantee that almost no deployed queue (as measured of fraction of deployed queues) has been updated accordingly and this situation will last for a long time (my prediction is that this will last from before, during, and after the L4S experiment has failed and being terminated). 


It has been
   demonstrated ([Fallback]) that when a set of long-running flows
   comprising both classic congestion controlled flows and L4S-compliant
   congestion controlled flows compete for bandwidth in such a legacy
   shared RFC3168 queue, the classic congestion controlled flows may
   achieve lower throughput than they would have if all of the flows had
   been classic congestion controlled flows. 

[SM] "May" as far as I can tell is quite euphemistic, looking at https://camo.githubusercontent.com/42b2427b18a22c9c2f181942fa9fbd08e5b678606f8b3f9c75262e4dfd6d4f1c/687474703a2f2f7363652e646e736d67722e6e65742f726573756c74732f6c34732d323032302d31312d3131543132303030302d66696e616c2f6c34732d73362d726663333136382f6c34732d73362d726663333136382d6e732d7072616775652d76732d63756269632d6e6f65636e2d66715f636f64656c5f31715f2d35304d6269742d32306d735f7463705f64656c69766572795f776974685f7274742e737667 indicates that TCP Prague traffic affirmatively DOES reduce competing non-L4S TCP throughput considerably.
Given existing data confirming this point, please change the language to reflect this reality.


 This 'unfairness' between
   the two classes is more pronounced on longer RTT paths (e.g. 50ms and
   above) and/or at higher link rates (e.g. 50 Mbps and above).  The
   lower the capacity per flow, the less pronounced the problem becomes.
   Thus the imbalance is most significant when the slowest flow rate is
   still high in absolute terms.

[SM] Not wrong, but hardly relevant, given that consumer links with 1Gbps access rates and more are becoming more and more common, it seems out of date calling 50 Mbps a high link rate, no?


   The root cause of the unfairness is that a legacy RFC3168 queue does
   not differentiate between packets marked ECT(0) (used by classic
   senders) and those marked ECT(1) (used by L4S senders), and provides
   an identical congestion signal (CE marks) to both types, while the
   L4S architecture redefines the CE mark and congestion response in the
   case of ECT(1) marked packets. 

[SM] THis is better than before in that it admits that L4S redefined the meaning of CE, but the same is trie for ECT(1). RFC3168 are fully 3168-compliant in not differentiating between the two ECTs, but this text makes it appear RFC3168 queues would do something wrong.



 The result is that the two classes
   respond differently to the CE congestion signal.  The classic senders
   expect that CE marks are sent very rarely (e.g. approximately 1 CE
   mark every 200 round trips on a 50 Mbps x 50ms path) while the L4S
   senders expect very frequent CE marking (e.g. approximately 2 CE
   marks per round trip).  The result is that the classic senders
   respond to the CE marks provided by the bottleneck by yielding
   capacity to the L4S flows.  The resulting rate imbalance can be
   demonstrated, and could be a cause of concern in some cases.

[SM] "could be a cause of concern in some cases", again I applaud cautious wording, but this is over-doing it, I understand the author belong to team L4S, but let's call a spade a spade and admit, that this is highly undesirable behaviour that DOES cause concern and requires counter-measures.


   This concern primarily relates to single-queue (FIFO) bottleneck
   links that implement legacy RFC3168 ECN, but the situation can also
   potentially occur in fq_codel [RFC8290] bottlenecks when flow
   isolation is imperfect due to hash collisions or VPN tunnels.

[SM OK. except "can also potentially occur" is too weak a term. Also not sure why VPNs are singles out, that applies to all tunnels, as far as I can see.


   While the above mentioned unfairness has been demonstrated in
   laboratory testing, it has not been observed in operational networks,
   in part because members of the Transport Working group are not aware
   of any deployments of single-queue Classic ECN bottlenecks in the
   Internet.  

[SM] IMHO this translates into "because we are bad at running experiments over the existing internet". Is that really the message team L4S wants to convey in an RFC that should help getting permission to run an experiment over the existing internet? How hard can it be to send an TCP Prague and an CUBOC flow concurrently over an rfc3168 AQM, e.g. at a personal internet access link (such that this experiment would not affect other users traffic at all)? 
Ca we please remove this paragraph? Or better actualy run the easurements and rephrase this based on real data?


Additionally, this issue was considered and is discussed
   in Appendix B.1 of [I-D.ietf-tsvwg-ecn-l4s-id].  It was recognized
   that compromises would have to be made because IP header space is
   extremely limited.  A number of alternative codepoint schemes were
   compared for their ability to traverse most Internet paths, to work
   over tunnels, to work at lower layers, to work with TCP, etc.  It was
   decided to progress on the basis that robust performance in presence
   of these single-queue RFC3168 bottlenecks is not the most critical
   issue, since it was believed that they are rare.  Nonetheless, there
   is the possibility that such deployments exist, and hence an interest
   in providing guidance to ensure that measures can be taken to address
   the potential issues, should they arise in practice.

[SM] IMHO this is L4S fan-fiction and does not belong in this draft at all. This is about how to deal with the fall-out of L4S mis-design not about justifying them?



2.  Per-Flow Fairness

   There are a number of factors that influence the relative rates
   achieved by a set of users or a set of applications sharing a queue
   in a bottleneck link.  Notably the response that each application has
   to congestion signals (whether loss or explicit signaling) can play a
   large role in determining whether the applications share the
   bandwidth in an equitable manner.  In the Internet, ISPs typically
   control capacity sharing between their customers using a scheduler at
   the access bottleneck rather than relying on the congestion responses
   of end-systems.  So in that context this question primarily concerns
   capacity sharing between the applications used by one customer.

[SM] This is a very generous interpretation that assumes that only the access links are ever congested. In reality any shared link can and occasionally will be congested because the whole internet is "over-subscribed", that is the sum of contracted maximal leaf access rates exceeds the capacity of the core. 
I could live with that rationally, IF the L4S proposal would be to ONLY employ L4S AQMs at the ISP end-user leaf (as then simply disabling L4S/AccECN would allow each user to opt-out). But with L4S' declared scope being all potential points of congestion, this rationale is not convincing, sorry.



   Nonetheless, there are many networks on the Internet where capacity
   sharing relies, at least to some extent, on congestion control in the
   end-systems.  

[SM] Again this is quite euphemistic as this describes most of the existing internet.



The traditional norm for congestion response has been
   that it is handled on a per-connection basis, and that (all else
   being equal) it results in each connection in the bottleneck
   achieving a data rate inversely proportional to the average RTT of
   the connection.  

[SM] As far as I see it throughput = windowsize/RTT (* a constant). That is given a large enough windowsize there is no RTT bias observed in throughput.
The effect described is an interaction effect in the botlenecks buffers, that causes an RTT dependent throughput. Describing this root cause of RTT-bias correctly is important, as that has direct consequences how/where to try to remedy this!


The end result (in the case of steady-state behavior
   of a set of like connections) is that each user or application
   achieves a data rate proportional to N/RTT, where N is the number of
   simultaneous connections that the user or application creates, and
   RTT is the harmonic mean of the average round-trip-times for those
   connections.  Thus, users or applications that create a larger number
   of connections and/or that have a lower RTT achieve a larger share of
   the bottleneck link rate than others.

[SM] The N effect is correct, but the RTT effect is an unfortunate side-effect of bottlenecks being under-managed!


   While this may not be considered fair by many, it nonetheless has
   been the typical starting point for discussions around fairness.  In
   fact it has been common when evaluating new congestion responses to
   actually set aside N & RTT as variables in the equation, and just
   compare per-flow rates between flows with the same RTT. 

 For example
   [RFC5348] defines the congestion response for a flow to be
   '"reasonably fair" if its sending rate is generally within a factor
   of two of the sending rate of a [Reno] TCP flow under the same
   conditions.'  Given that RTTs can vary by roughly two orders of
   magnitude and flow counts can vary by at least an order of magnitude
   between applications, it seems that the accepted definition of
   reasonable fairness leaves quite a bit of room for different levels
   of performance between users or applications, and so perhaps isn't
   the gold standard, but is rather a metric that is used because of its
   convenience.
   In practice, the effect of this RTT dependence has historically been
   muted by the fact that many networks were deployed with very large
   ("bloated") drop-tail buffers that would introduce queuing delays
   well in excess of the base RTT of the flows utilizing the link, thus
   equalizing (to some degree) the effective RTTs of those flows.
   Recently, as network equipment suppliers and operators have worked to
   improve the latency performance of the network by the use of smaller
   buffers and/or AQM algorithms, this has had the side-effect of
   uncovering the inherent RTT bias in classic congestion control
   algorithms.

[SM] This is quite a long section that purposefully mis-identifies the root cause of RTT-bias from the congrested queues to CC algorithms.
This is simply incorrect and the whole section should be droped from the draft, as this is rather a philosphical stance and not a fact to base operational guidance on.



   The L4S architecture aims to significantly improve this situation, by
   requiring senders to adopt a congestion response that eliminates RTT
   bias as much as possible (see [I-D.ietf-tsvwg-ecn-l4s-id]).  

[SM And this is simply insane, "requiring senders to adopt" is anything but a strictly enforced policy that networks can relay on. As said before "hope" is not a solid engineering method. And the "as much as possible" is simply not much, as the root cause of RTT bias is not in the CCs but in the bottleneck queues. AS CUBIC shows, a protocol can try to paper over this problem (essentially by responding less step to drops depending on a flows RTT), but the only real solution is to equip the bottleneck buffers with enough smarts to selectively drop packets to equalize per-flow throughput independent of a flows RTT. See figure 6 in Høiland-Jørgensen, Toke, Per Hurtig, and Anna Brunstrom. ‘The Good, the Bad and the WiFi: Modern AQMs in a Residential Setting’. Computer Networks 89 (4 October 2015): 90–106. https://doi.org/10.1016/j.comnet.2015.07.014. for examples of how well different queue management approaches achieve that goal.


As a
   result, L4S promotes a level of per-flow fairness beyond what is
   ordinarily considered for classic senders, the legacy RFC3168 issue
   notwithstanding.

[SM] That does not match reported data for an L4S transport (TCP Prague) with an L4S AQM (https://camo.githubusercontent.com/0ca81a2fabe48e8fce0f98f8b8347c79d27340684fe0791a3ee6685cf4cdb02e/687474703a2f2f7363652e646e736d67722e6e65742f726573756c74732f6c34732d323032302d31312d3131543132303030302d66696e616c2f73312d6368617274732f727474666169725f63635f71646973635f31306d735f3136306d732e737667) where at 10ms/160ms two CUBIC flows in an L4S' AQM classic quueue achieve much better rate "fairness" (still worse than CUBIC achieves in a FIFO) than two TCPPrague flows.
This section needs to be dropped, because it is simply not backed by data.



   It is also worth noting that the congestion control algorithms
   deployed currently on the internet tend toward (RTT-weighted)
   fairness only over long timescales.  For example, the cubic algorithm
   can take minutes to converge to fairness when a new flow joins an
   existing flow on a link [Cubic].  Since the vast majority of TCP
   connections don't last for minutes, it is unclear to what degree per-
   flow, same-RTT fairness, even when demonstrated in the lab,
   translates to the real world.

[SM] Again, please drop this. This offers only vague assertions with the goal of making L5S' abysmal performance more palatable, but seems irrelvant for operational deployment oof L4S.


   So, in real networks, where per-application, per-end-host or per-
   customer fairness may be more important than long-term, same-RTT,
   per-flow fairness, it may not be that instructive to focus on the
   latter as being a necessary end goal.

[SM] Then offer another measure of equitable sharing that you are willing to get behind.


   Nonetheless, situations in which the presence of an L4S flow has the
   potential to cause harm [Harm] to classic flows need to be
   understood.  Most importantly, if there are situations in which the
   introduction of L4S traffic would degrade classic traffic performance
   significantly, i.e. to the point that it would be considered
   starvation, these situations need to be understood and either
   remedied or avoided.

[SM] Please numerically define "starvation" or drop the term. 

   Aligned with this context, the guidance provided in this document is
   aimed not at monitoring the relative performance of L4S senders
   compared against classic senders on a per-flow basis, 

[SM] One more argument to drop most of the preceding paragraphs, as they are irrelevant t the scope of this draft.


but rather at
   identifying instances where RFC3168 bottlenecks are deployed so that
   operators of L4S senders can have the opportunity to assess whether
   any actions need to be taken.  

[SM] This is again very cautious, even though the reason why this draft exists in the first place seems to be the fact that most agree that "action needs to be taken". Again, let's call a spade a spade.


Additionally this document provides
   guidance for network operators around configuring any RFC3168
   bottlenecks to minimize the potential for negative interactions
   between L4S and classic senders.


3.  Detection of Classic ECN Bottlenecks

   The IETF encourages researchers, end system deployers and network
   operators to conduct experiments to identify to what degree legacy
   RFC3168 bottlecks exist in networks.  These types of measurement
   campaigns, even if each is conducted over a limited set of paths,
   could be useful to further understand the scope of any potential
   issues, to guide end system deployers on where to examine performance
   more closely (or possibly delay L4S deployment), and to help network
   operators identify nodes where remediation may be necessary to
   provide the best performance.

[SM] this might be a decent place to cite Pet Heist's recent draft of ECN measurements?


   The design of such experiments should consider not only the detection
   of RFC3168 ECN marking, but also the determination whether the
   bottleneck AQM is a single queue (FIFO) or a flow-queuing system. It
   is believed that the vast majority, if not all, of the RFC3168 AQMs
   in use at bottleneck links are flow-queuing systems (e.g. fq_codel
   [RFC8290] or [COBALT]).  When flow isolation is successful, the FQ
   scheduling of such queues isolates classic congestion control traffic
   from L4S traffic, and thus eliminates the potential for unfairness.
   But, these systems are known to sometimes result in imperfect
   isolation, either due to hash collisions (see Section 5.3 of
   [RFC8290]) or because of VPN tunneling (see Section 6.2 of

[SM] Are all tunnels also VPNs?


   [RFC8290]).  It is believed that the majority of fq_codel deployments
   in bottleneck links today (e.g.  [Cake]) employ hashing algorithms
   that virtually eliminate the possibility of collisions, making this a
   non-issue for those deployments.  

[SM] While I personally like fq_codel and cake, with their default 1000 queues hash collisions are considerably more likely than this sentence implies.


But, VPN tunnels remain an issue
   for fq_codel deployments, and the introduction of L4S traffic raises
   the possibility that tunnels containing mixed classic and L4S traffic
   would exist, in which case fq_codel implementations that have not
   been updated to be L4S-aware

[SM] So practically all deployed instances, lets be explicit here, please.


 could exhibit similar unfairness
   properties as single queue AQMs.  

[SM] Will instead of "could".

Until such queues are upgraded to
   support L4S or treat ECT(1) as not-ECT traffic, end-host mitigations
   such as separating L4S and Classic traffic into distinct VPN tunnels
   could be employed.

[SM] This holds merit if a network designer has control over the tunnel creation and has the capabilities to map arbitrary flows to arbitrary tunnels, that is edecidedly outside of the technical expertise of most end users that might want to use a VPN service on the access link. 

   [Fallback] contains recommendations on some of the mechanisms that
   can be used to detect legacy RFC3168 bottlenecks.  TODO: summarize
   the main ones here.

[SM] Please do, and add information about how long such an detection will take, and what the false positive and false negative rates of these mechanisms are.


4.  Operator of an L4S host

   From a host's perspective, support for L4S involves both endpoints:
   ECT(1) marking & L4S-compatible congestion control at the sender, and
   ECN feedback at the receiver.  Between these two entities, it is
   primarily incumbent upon the sender to evaluate the potential for
   presence of legacy RFC3168 FIFO bottlenecks and make decisions
   whether or not to use L4S congestion control.  A general purpose
   receiver is not expected to perform any testing or monitoring for
   RFC3168, and is also not expected to invoke any active response in
   the case that such a bottleneck exists.  That said, it is certainly
   possible for receivers to disable L4S functionality by not
   negotiating ECN support with the sender.

[SM] That sounds wrong, L4S will require both side parycipating, so one can expect that both sides are involved in detection of conditions where L4S is known to fail. And in effect they already do, it is only die to the receivers reverse traffic to the sender that L4S-CC works.


   Prior to deployment of any new technology, it is commonplace for the
   parties involved in the deployment to validate the performance of the
   new technology, via lab testing, limited field testing, large scale
   field testing, etc.  The same is expected for deployers of L4S
   technology.  

[SM] Refreshing position. I wonder why we talk about deploying L4S before the core technologies have seen "limited field testing, large scale
   field testing, etc."?


As part of that validation, it is recommended that
   deployers consider the issue of RFC3168 FIFO bottlenecks and conduct
   experiments as described in the previous section, or otherwise assess
   the impact that the L4S technology will have in the networks in which
   it is to be deployed, and take action as is described further in this
   section.

   If pre-deployment testing raises concerns about issues with RFC3168
   bottlenecks, the actions taken may depend on the server type:

   o  General purpose servers (e.g. web servers)

      *  Active testing could be performed by the server.  For example,
         a javascript application could run simultaneous downloads
         during page reading time in order to survey for presence of
         legacy RFC3168 FIFO bottlenecks on paths to users.

[SM] To generally work this requires the downloads to come from the users end-hosts. IMHO that is not a realistic model (unless we are talking about speedtest applications, that routinely measure upload rates with the end-users consent). So [leae note how approximate this "solution" is.


      *  Passive testing could be built in to the transport protocol
         implementation at the sender in order to perform detection (see
         [Fallback]).

[SM] If that is to fly, this needs to be made a requirement for L4S transports and the behaviour needs to be policed enforced, which is highly unlikely, so this is not a realistic option either.


      *  Taking action based on the detection of RFC3168 FIFO
         bottlenecks is likely not needed for short transactional
         transfers (e.g. sub 10 seconds) since these are unlikely to
         achieve the steady-state conditions where unfairness has been
         observed.

[SM] Where does the 10 seconds number come from?


      *  For longer file transfers, it may be possible to fall-back to
         Classic behavior in real-time, or to simply disable L4S for
         future long file transfers to clients where legacy RFC3168 has
         been detected.

   o  Specialized servers handling long-running sessions (e.g. cloud
      gaming)

      *  Active testing could be performed at each session startup

      *  Active testing could be integrated into a "pre-validation" of
         the service, done when the user signs up, and periodically
         thereafter

      *  In-band detection as described in [Fallback] could be performed
         during the session

[SM] Is there any proof that Fallback actually offers robust and reliably methods with explicit and acceptable false positive and false negative rates?

   In addition, the responsibilities of and actions taken by a sender
   may depend on the environment in which it is deployed.  The following
   sub-sections discuss two scenarios: senders serving a limited known
   target audience and those that serve an unknown target audience.

4.1.  Edge Servers

   Some hosts (such as CDN leaf nodes and servers internal to an ISP)
   are deployed in environments in which they serve content to a
   constrained set of networks or clients.  The operator of such hosts
   may be able to determine whether there is the possibility of
   [RFC3168] FIFO bottlenecks being present, and utilize this
   information to make decisions on selectively deploying L4S and/or
   disabling it (e.g. bleaching ECN).  Furthermore, such an operator may
   be able to determine the likelihood of an L4S bottleneck being
   present, and use this information as well.

   For example, if a particular network is known to have deployed legacy
   [RFC3168] FIFO bottlenecks, deployment of L4S for that network should
   be delayed until those bottlenecks can be upgraded to mitigate any
   potential issues as discussed in the next section.

[SM] Or, upgrade tose to fq-codel, which will give most of the latency advantage that 
L4S promises but none of the side-effects... just saying. 

   Prior to deploying L4S on edge servers a server operator should:

   o  Consult with network operators on presence of legacy [RFC3168]
      FIFO bottlenecks

   o  Consult with network operators on presence of L4S bottlenecks

   o  Perform pre-deployment testing per network


   If a particular network offers connectivity to other networks (e.g.
   in the case of an ISP offering service to their customer's networks),
   the lack of RFC3168 FIFO bottleneck deployment in the ISP network
   can't be taken as evidence that RFC3168 FIFO bottlenecks don't exist
   end-to-end (because one may have been deployed by the end-user
   network).  In these cases, deployment of L4S will need to take
   appropriate steps to detect the presence of such bottlenecks.  At
   present, it is believed that the vast majority of RFC3168 bottlenecks
   in end-user networks are implementations that utilize fq_codel or
   Cake, where the unfairness problem is less likely to be a concern.

[SM] As long as the users are not running VPN traffic...

   While this doesn't completely eliminate the possibility that a legacy
   [RFC3168] FIFO bottleneck could exist, it nonetheless provides useful
   information that can be utilized in the decision making around the
   potential risk for any unfairness to be experienced by end users.

4.2.  Other hosts

   Hosts that are deployed in locations that serve a wide variety of
   networks face a more difficult prospect in terms of handling the
   potential presence of RFC3168 FIFO bottlenecks.  Nonetheless, the
   steps listed in the ealier section (based on server type) can be
   taken to minimize the risk of unfairness.

   Since existing studies have hinted that RFC3168 FIFO bottlenecks are
   rare, detections using these techniques may also prove to be rare.
   Therefore, it may be possible for a host to cache a list of end host
   ip addresses where a RFC3168 bottleneck has been detected.  Entries
   in such a cache would need to age-out after a period of time to
   account for IP address changes, path changes, equipment upgrades,
   etc.

   It has been suggested that a public blacklist of domains that
   implement RFC3168 FIFO bottlenecks or a public whitelist of domains
   that are participating in the L4S experiment could be maintained.
   There are a number of significant issues that would seem to make this
   idea infeasible, not the least of which is the fact that presence of
   RFC3168 FIFO bottlenecks or L4S bottlenecks is not a property of a
   domain, it is the property of a path between two endpoints.

[SM] +1; worth mentioning!


5.  Operator of a Network Employing RFC3168 FIFO Bottlenecks

   While it is, of course, preferred for networks to deploy L4S-capable
   high fidelity congestion signaling,

[SM] Citation needed, it is all but clear that L4S actually improves upon the state of the art for generic internet traffic (of arbitrary RTTs), so please rephrase this more objectively and less based on hopes and promises.

 and while it is more preferable
   for L4S senders to detect problems themselves, 

[SM] But still unclear how feasible that is.

a network operator who
   has deployed equipment in a likely bottleneck link location (i.e. a
   link that is expected to be fully saturated) that is configured with
   a leagcy [RFC3168] FIFO AQM can take certain steps in order to
   improve rate fairness between classic traffic and L4S traffic, and
   thus enable L4S to be deployed in a greater number of paths.

   Some of the options listed in this section may not be feasible in all
   networking equipment.

5.1.  Configure AQM to treat ECT(1) as NotECT

   If equipment is configurable in such as way as to only supply CE
   marks to ECT(0) packets, and treat ECT(1) packets identically to
   NotECT, or is upgradable to support this capability, doing so will
   eliminate the risk of unfairness.

[SM] Maybe mention the cost of doing so (slowing down rfc31568 flows using ETC(1) instead f ECT(0))?



5.2.  ECT(1) Tunnel Bypass

   Using an [RFC6040] compatibility mode tunnel, tunnel ECT(1) traffic
   through the [RFC3168] bottleneck with the outer header indicating
   Not-ECT.

   Two variants exist for this approach

   1.  per-domain: tunnel ECT(1) pkts to domain edge towards dst

   2.  per-dst: tunnel ECT(1) pkts to dst

[SM] Explain why a network operator should do that work (what is in it for them)?


5.3.  Configure Non-Coupled Dual Queue

   Equipment supporting [RFC3168] may be configurable to enable two
   parallel queues for the same traffic class, with classification done
   based on the ECN field.

   Option 1:

   o  Configure 2 queues, both with ECN; 50:50 WRR scheduler

      *  Queue #1: ECT(1) & CE packets - Shallow immediate AQM target

      *  Queue #2: ECT(0) & NotECT packets - Classic AQM target

   o  Outcome in the case of n L4S flows and m long-running Classic
      flows

      *  if m & n are non-zero, flows get 1/2n and 1/2m of the capacity,
         otherwise 1/n or 1/m

      *  never < 1/2 each flow's rate if all had been Classic

   This option would allow L4S flows to achieve low latency, low loss
   and scalable throughput, but would sacrifice the more precise flow
   balance offered by [I-D.ietf-tsvwg-aqm-dualq-coupled].  This option
   would be expected to result in some reordering of previously CE
   marked packets sent by Classic ECN senders, which is a trait shared
   with [I-D.ietf-tsvwg-aqm-dualq-coupled].  As is discussed in
   [I-D.ietf-tsvwg-ecn-l4s-id], this reordering would be either zero
   risk or very low risk.

[SM] IMHO that is actually preferable to using the dual queue coupled AQM, as this solution does not have the nasty ~1:16 priority scheduler issue that dualQ papers over with some heuristics... It would still give L4S initially (until its fraction of traffic approaches 50%) an undeserved rate gain, but at least this is conceptually easy to predict and understand.


   Option 2:

   o  Configure 2 queues, both with AQM; 50:50 WRR scheduler

      *  Queue #1: ECT(1) & NotECT packets - ECN disabled

      *  Queue #2: ECT(0) & CE packets - ECN enabled

   o  Outcome

      *  ECT(1) treated as NotECT

      *  Flow balance for the 2 queues the same as in option 1

   This option would not allow L4S flows to achieve low latency, low
   loss and scalable throughput in this bottleneck link.  As a result it
   is a less prefered option.

5.4.  WRED with ECT(1) Differentation

   This configuration is similar to Option 2 in the previous section,
   but uses a single queue with WRED functionality.

   o  Configure the queue with two WRED classes

   o  Class #1: ECT(1) & NotECT packets - ECN disabled

   o  Class #2: ECT(0) & CE packets - ECN enabled


[SM] I can predict the response I will get, but for all of these it would be nice to get some measurements to confirm that thy perform as excpected.


5.5.  Disable RFC3168 ECN Marking

   Disabling [RFC3168] ECN marking eliminates the unfairness issue.
   Clearly a downside to this approach is that classic senders will no
   longer get the benefits of Explict Congestion Notification.

5.6.  Re-mark ECT(1) to NotECT Prior to AQM

   While not a recommended alternative, remarking ECT(1) packets as
   NotECT (i.e. bleaching ECT(1)) ensures that they are treated
   identically to classic NotECT senders.  However, this also eliminates
   the possibility of downstream L4S bottlenecks providing high fidelity
   congestion signals.

[SM] This is were I miss a section what to do if the L4S experiment is declared a failure. IMHO recommending to remark ECT(1) to NotECT by all nodes formerly participating in the experiment, would allow the fastest most efficient way to contain the fall-out.

6. What L4S-deploying networks need to do at the end of the L4S experiment

This sections give guidance how L4S deploying networks should respond to any of the two likely outcomes of the IETF-supported L4S experiment, so here is a first rough draft:

"6.1 Successful termination of the L4S experiment

If the L4S experiment is deemed a success participating networks are encouraged to continue deploying L4S aware nodes and if possible replace all non L4S-aware rfc3168 AQM already deployed as far as feasible. 

6.2 Unsuccesful termination of the L4S experiment

If the L4S experiment shows that L4S can not be safely deploying in the existing internet or that it does not achieve its goals and promises well enough, the L4S experiment might need to be terminated. In that case participants of the L4S experiment are expected to configure their L4S-aware network nodes to re-map ECT(1) to NotECT prior to AQM as described in section 5.6. That will act as a safety valve that allows a quick termination of all L4S side-effects in the unlikely cases that should be merited."

Please note that, based on the existing data I consider 6.2 the expected outcome, but still tried to keep the general  L4S-positive tone of the rest of the draft (even though I also believe the whole draft could be improved by less pro-L4S bias).


Best Regards
	Sebastian

P.S.: I do not really believe that this document can remedy the safety issues L4S design and implementation bring, but at least it explicitly mentions some of them and hence is worth having. My preferred outcome however would be to drop all L4S drafts instead, until the safety and functionality issues are fixed for good, which realistically will:
a) require a massive redesign and
b) is unlikely to happen.

P.P.S.: Thanks for keeping this short and focussed.