Re: [tsvwg] New Version Notification for draft-white-tsvwg-l4sops-02.txt

Sebastian Moeller <moeller0@gmx.de> Mon, 15 March 2021 21:58 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <694524BC-E3B8-4591-A735-01973766EE8D@cablelabs.com>
Date: Mon, 15 Mar 2021 22:58:34 +0100
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <8B6133E9-301F-4877-B260-45F8D9988EA9@gmx.de>
References: <161403643683.2598.12970012048226046349@ietfa.amsl.com> <694524BC-E3B8-4591-A735-01973766EE8D@cablelabs.com>
To: Greg White <g.white@cablelabs.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/NXEACkPX1pFriOtqsoifPzoI8Sk>
Subject: Re: [tsvwg] New Version Notification for draft-white-tsvwg-l4sops-02.txt
Precedence: list

Dear List,

comments below prefixed with [SM] and quotes text from the draft:

> On Feb 23, 2021, at 00:32, Greg White <g.white@cablelabs.com> wrote:
>
> All,
>
> Please see below. An update to the L4S Operational Guidance draft has just been posted.
>
> Best Regards,
> Greg
>
>
>
>
> On 2/22/21, 4:27 PM, "internet-drafts@ietf.org" <internet-drafts@ietf.org> wrote:
>
>
> A new version of I-D, draft-white-tsvwg-l4sops-02.txt
> has been successfully submitted by Greg White and posted to the
> IETF repository.
>
> Name: draft-white-tsvwg-l4sops
> Revision: 02
> Title: Operational Guidance for Deployment of L4S in the Internet
> Document date: 2021-02-22
> Group: Individual Submission
> Pages: 13
> URL: https://www.ietf.org/archive/id/draft-white-tsvwg-l4sops-02.txt
> Status: https://datatracker.ietf.org/doc/draft-white-tsvwg-l4sops/
> Htmlized: https://datatracker.ietf.org/doc/html/draft-white-tsvwg-l4sops
> Htmlized: https://tools.ietf.org/html/draft-white-tsvwg-l4sops-02
> Diff: https://www.ietf.org/rfcdiff?url2=draft-white-tsvwg-l4sops-02
>
> Abstract:
> This document is intended to provide additional guidance to operators
> of end-systems, operators of networks, and researchers beyond that
> provided in [I-D.ietf-tsvwg-ecn-l4s-id] and
> [I-D.ietf-tsvwg-aqm-dualq-coupled] in order to ensure successful
> deployment of L4S [I-D.ietf-tsvwg-l4s-arch] in the Internet. The
> focus of this document is on potential interactions between L4S flows
> and Classic ECN ([RFC3168]) flows in Classic ECN bottleneck links.
> The document discusses the potential outcomes of these interactions,
> describes mechanisms to detect the presence of [RFC3168] bottlenecks,
> and identifies opportunites to prevent and/or detect and resolve
> fairness problems in such networks.
>

Operational Guidance for Deployment of L4S in the Internet
draft-white-tsvwg-l4sops-02

Abstract

This document is intended to provide additional guidance to operators
of end-systems, operators of networks, and researchers beyond that
provided in [I-D.ietf-tsvwg-ecn-l4s-id] and
[I-D.ietf-tsvwg-aqm-dualq-coupled] in order to ensure successful
deployment of L4S [I-D.ietf-tsvwg-l4s-arch] in the Internet.

[SM] L4S is an EXPERIMENT first, so the guidance needs to address that, deployment is only part of the equation here. IMHO the draft needs to explicitly tackle what to do if the experiment is not deemed successful. So "deployment and potential removal of L4S".

The
focus of this document is on potential interactions between L4S flows
and Classic ECN ([RFC3168]) flows in Classic ECN bottleneck links.

[SM] Given the prevalence of FIFO bottlenecks normally as well as transiently (e.g. peak-hour transit/peering nodes) and the only existing L4S transport TCP Pragues abysmal performance versus CUBIC and itself at lower RTTs (as shown in https://github.com/heistp/l4s-tests Network bias figures 1, 2 3). This draft should also address the topic how L4S traffic is going to interact with non-L4S infrastructure.

The document discusses the potential outcomes of these interactions,
describes mechanisms to detect the presence of [RFC3168] bottlenecks,
and identifies opportunites to prevent and/or detect and resolve
fairness problems in such networks.

1. Introduction

Low-latency, low-loss, scalable throughput (L4S)
[I-D.ietf-tsvwg-l4s-arch] traffic is designed to provide lower
queuing delay than conventional traffic via a new network service
based on a modified Explicit Congestion Notification (ECN) response
from the network. L4S traffic is identified by the ECT(1) codepoint,
and network bottlenecks that support L4S should congestion-mark
ECT(1) packets to enable L4S congestion feedback. However, L4S
traffic is also expected to coexist well with classic congestion
controlled traffic even if the bottleneck queue does not support L4S.

[SM] Except it does not, again see (https://github.com/heistp/l4s-tests, Network bias figure 1-3) TCP Prague gets pummeled in a FIFO bottleneck by CUBIC. Not a good sign when sentence 3 of the introduction is in disagreement with published data. Yes, that is a safe failure mode for the rest of the internet, but also a data point indicating that the L4S expectations and predictions might be all too rosy.

This includes paths where the bottleneck link utilizes packet drops
in response to congestion (either due to buffer overrun or active
queue management), as well as paths that implement a 'flow-queuing'
scheduler such as fq_codel [RFC8290]. A potential area of poor
interoperability lies in network bottlenecks employing a shared queue
that implements an Active Queue Management (AQM) algorithm that
provides Explicit Congestion Notification signaling according to
[RFC3168]. Although RFC3168 has been updated (via [RFC8311]) to
reserve ECT(1) for experimental use only (also see [IANA-ECN]), and
its use for L4S has been specified in [I-D.ietf-tsvwg-ecn-l4s-id],
not all deployed queues have been updated accordingly.

[SM] "Not all"? I can guarantee that almost no deployed queue (as measured of fraction of deployed queues) has been updated accordingly and this situation will last for a long time (my prediction is that this will last from before, during, and after the L4S experiment has failed and being terminated).

It has been
demonstrated ([Fallback]) that when a set of long-running flows
comprising both classic congestion controlled flows and L4S-compliant
congestion controlled flows compete for bandwidth in such a legacy
shared RFC3168 queue, the classic congestion controlled flows may
achieve lower throughput than they would have if all of the flows had
been classic congestion controlled flows.

[SM] "May" as far as I can tell is quite euphemistic, looking at https://camo.githubusercontent.com/42b2427b18a22c9c2f181942fa9fbd08e5b678606f8b3f9c75262e4dfd6d4f1c/687474703a2f2f7363652e646e736d67722e6e65742f726573756c74732f6c34732d323032302d31312d3131543132303030302d66696e616c2f6c34732d73362d726663333136382f6c34732d73362d726663333136382d6e732d7072616775652d76732d63756269632d6e6f65636e2d66715f636f64656c5f31715f2d35304d6269742d32306d735f7463705f64656c69766572795f776974685f7274742e737667 indicates that TCP Prague traffic affirmatively DOES reduce competing non-L4S TCP throughput considerably.
Given existing data confirming this point, please change the language to reflect this reality.

This 'unfairness' between
the two classes is more pronounced on longer RTT paths (e.g. 50ms and
above) and/or at higher link rates (e.g. 50 Mbps and above). The
lower the capacity per flow, the less pronounced the problem becomes.
Thus the imbalance is most significant when the slowest flow rate is
still high in absolute terms.

[SM] Not wrong, but hardly relevant, given that consumer links with 1Gbps access rates and more are becoming more and more common, it seems out of date calling 50 Mbps a high link rate, no?

The root cause of the unfairness is that a legacy RFC3168 queue does
not differentiate between packets marked ECT(0) (used by classic
senders) and those marked ECT(1) (used by L4S senders), and provides
an identical congestion signal (CE marks) to both types, while the
L4S architecture redefines the CE mark and congestion response in the
case of ECT(1) marked packets.

[SM] THis is better than before in that it admits that L4S redefined the meaning of CE, but the same is trie for ECT(1). RFC3168 are fully 3168-compliant in not differentiating between the two ECTs, but this text makes it appear RFC3168 queues would do something wrong.

The result is that the two classes
respond differently to the CE congestion signal. The classic senders
expect that CE marks are sent very rarely (e.g. approximately 1 CE
mark every 200 round trips on a 50 Mbps x 50ms path) while the L4S
senders expect very frequent CE marking (e.g. approximately 2 CE
marks per round trip). The result is that the classic senders
respond to the CE marks provided by the bottleneck by yielding
capacity to the L4S flows. The resulting rate imbalance can be
demonstrated, and could be a cause of concern in some cases.

[SM] "could be a cause of concern in some cases", again I applaud cautious wording, but this is over-doing it, I understand the author belong to team L4S, but let's call a spade a spade and admit, that this is highly undesirable behaviour that DOES cause concern and requires counter-measures.

This concern primarily relates to single-queue (FIFO) bottleneck
links that implement legacy RFC3168 ECN, but the situation can also
potentially occur in fq_codel [RFC8290] bottlenecks when flow
isolation is imperfect due to hash collisions or VPN tunnels.

[SM OK. except "can also potentially occur" is too weak a term. Also not sure why VPNs are singles out, that applies to all tunnels, as far as I can see.

While the above mentioned unfairness has been demonstrated in
laboratory testing, it has not been observed in operational networks,
in part because members of the Transport Working group are not aware
of any deployments of single-queue Classic ECN bottlenecks in the
Internet.

[SM] IMHO this translates into "because we are bad at running experiments over the existing internet". Is that really the message team L4S wants to convey in an RFC that should help getting permission to run an experiment over the existing internet? How hard can it be to send an TCP Prague and an CUBOC flow concurrently over an rfc3168 AQM, e.g. at a personal internet access link (such that this experiment would not affect other users traffic at all)?
Ca we please remove this paragraph? Or better actualy run the easurements and rephrase this based on real data?

Additionally, this issue was considered and is discussed
in Appendix B.1 of [I-D.ietf-tsvwg-ecn-l4s-id]. It was recognized
that compromises would have to be made because IP header space is
extremely limited. A number of alternative codepoint schemes were
compared for their ability to traverse most Internet paths, to work
over tunnels, to work at lower layers, to work with TCP, etc. It was
decided to progress on the basis that robust performance in presence
of these single-queue RFC3168 bottlenecks is not the most critical
issue, since it was believed that they are rare. Nonetheless, there
is the possibility that such deployments exist, and hence an interest
in providing guidance to ensure that measures can be taken to address
the potential issues, should they arise in practice.

[SM] IMHO this is L4S fan-fiction and does not belong in this draft at all. This is about how to deal with the fall-out of L4S mis-design not about justifying them?

2. Per-Flow Fairness

There are a number of factors that influence the relative rates
achieved by a set of users or a set of applications sharing a queue
in a bottleneck link. Notably the response that each application has
to congestion signals (whether loss or explicit signaling) can play a
large role in determining whether the applications share the
bandwidth in an equitable manner. In the Internet, ISPs typically
control capacity sharing between their customers using a scheduler at
the access bottleneck rather than relying on the congestion responses
of end-systems. So in that context this question primarily concerns
capacity sharing between the applications used by one customer.

[SM] This is a very generous interpretation that assumes that only the access links are ever congested. In reality any shared link can and occasionally will be congested because the whole internet is "over-subscribed", that is the sum of contracted maximal leaf access rates exceeds the capacity of the core.
I could live with that rationally, IF the L4S proposal would be to ONLY employ L4S AQMs at the ISP end-user leaf (as then simply disabling L4S/AccECN would allow each user to opt-out). But with L4S' declared scope being all potential points of congestion, this rationale is not convincing, sorry.

Nonetheless, there are many networks on the Internet where capacity
sharing relies, at least to some extent, on congestion control in the
end-systems.

[SM] Again this is quite euphemistic as this describes most of the existing internet.

The traditional norm for congestion response has been
that it is handled on a per-connection basis, and that (all else
being equal) it results in each connection in the bottleneck
achieving a data rate inversely proportional to the average RTT of
the connection.

[SM] As far as I see it throughput = windowsize/RTT (* a constant). That is given a large enough windowsize there is no RTT bias observed in throughput.
The effect described is an interaction effect in the botlenecks buffers, that causes an RTT dependent throughput. Describing this root cause of RTT-bias correctly is important, as that has direct consequences how/where to try to remedy this!

The end result (in the case of steady-state behavior
of a set of like connections) is that each user or application
achieves a data rate proportional to N/RTT, where N is the number of
simultaneous connections that the user or application creates, and
RTT is the harmonic mean of the average round-trip-times for those
connections. Thus, users or applications that create a larger number
of connections and/or that have a lower RTT achieve a larger share of
the bottleneck link rate than others.

[SM] The N effect is correct, but the RTT effect is an unfortunate side-effect of bottlenecks being under-managed!

While this may not be considered fair by many, it nonetheless has
been the typical starting point for discussions around fairness. In
fact it has been common when evaluating new congestion responses to
actually set aside N & RTT as variables in the equation, and just
compare per-flow rates between flows with the same RTT.

For example
[RFC5348] defines the congestion response for a flow to be
'"reasonably fair" if its sending rate is generally within a factor
of two of the sending rate of a [Reno] TCP flow under the same
conditions.' Given that RTTs can vary by roughly two orders of
magnitude and flow counts can vary by at least an order of magnitude
between applications, it seems that the accepted definition of
reasonable fairness leaves quite a bit of room for different levels
of performance between users or applications, and so perhaps isn't
the gold standard, but is rather a metric that is used because of its
convenience.
In practice, the effect of this RTT dependence has historically been
muted by the fact that many networks were deployed with very large
("bloated") drop-tail buffers that would introduce queuing delays
well in excess of the base RTT of the flows utilizing the link, thus
equalizing (to some degree) the effective RTTs of those flows.
Recently, as network equipment suppliers and operators have worked to
improve the latency performance of the network by the use of smaller
buffers and/or AQM algorithms, this has had the side-effect of
uncovering the inherent RTT bias in classic congestion control
algorithms.

[SM] This is quite a long section that purposefully mis-identifies the root cause of RTT-bias from the congrested queues to CC algorithms.
This is simply incorrect and the whole section should be droped from the draft, as this is rather a philosphical stance and not a fact to base operational guidance on.

The L4S architecture aims to significantly improve this situation, by
requiring senders to adopt a congestion response that eliminates RTT
bias as much as possible (see [I-D.ietf-tsvwg-ecn-l4s-id]).

[SM And this is simply insane, "requiring senders to adopt" is anything but a strictly enforced policy that networks can relay on. As said before "hope" is not a solid engineering method. And the "as much as possible" is simply not much, as the root cause of RTT bias is not in the CCs but in the bottleneck queues. AS CUBIC shows, a protocol can try to paper over this problem (essentially by responding less step to drops depending on a flows RTT), but the only real solution is to equip the bottleneck buffers with enough smarts to selectively drop packets to equalize per-flow throughput independent of a flows RTT. See figure 6 in Høiland-Jørgensen, Toke, Per Hurtig, and Anna Brunstrom. ‘The Good, the Bad and the WiFi: Modern AQMs in a Residential Setting’. Computer Networks 89 (4 October 2015): 90–106. https://doi.org/10.1016/j.comnet.2015.07.014. for examples of how well different queue management approaches achieve that goal.

As a
result, L4S promotes a level of per-flow fairness beyond what is
ordinarily considered for classic senders, the legacy RFC3168 issue
notwithstanding.

[SM] That does not match reported data for an L4S transport (TCP Prague) with an L4S AQM (https://camo.githubusercontent.com/0ca81a2fabe48e8fce0f98f8b8347c79d27340684fe0791a3ee6685cf4cdb02e/687474703a2f2f7363652e646e736d67722e6e65742f726573756c74732f6c34732d323032302d31312d3131543132303030302d66696e616c2f73312d6368617274732f727474666169725f63635f71646973635f31306d735f3136306d732e737667) where at 10ms/160ms two CUBIC flows in an L4S' AQM classic quueue achieve much better rate "fairness" (still worse than CUBIC achieves in a FIFO) than two TCPPrague flows.
This section needs to be dropped, because it is simply not backed by data.

It is also worth noting that the congestion control algorithms
deployed currently on the internet tend toward (RTT-weighted)
fairness only over long timescales. For example, the cubic algorithm
can take minutes to converge to fairness when a new flow joins an
existing flow on a link [Cubic]. Since the vast majority of TCP
connections don't last for minutes, it is unclear to what degree per-
flow, same-RTT fairness, even when demonstrated in the lab,
translates to the real world.

[SM] Again, please drop this. This offers only vague assertions with the goal of making L5S' abysmal performance more palatable, but seems irrelvant for operational deployment oof L4S.

So, in real networks, where per-application, per-end-host or per-
customer fairness may be more important than long-term, same-RTT,
per-flow fairness, it may not be that instructive to focus on the
latter as being a necessary end goal.

[SM] Then offer another measure of equitable sharing that you are willing to get behind.

Nonetheless, situations in which the presence of an L4S flow has the
potential to cause harm [Harm] to classic flows need to be
understood. Most importantly, if there are situations in which the
introduction of L4S traffic would degrade classic traffic performance
significantly, i.e. to the point that it would be considered
starvation, these situations need to be understood and either
remedied or avoided.

[SM] Please numerically define "starvation" or drop the term.

Aligned with this context, the guidance provided in this document is
aimed not at monitoring the relative performance of L4S senders
compared against classic senders on a per-flow basis,

[SM] One more argument to drop most of the preceding paragraphs, as they are irrelevant t the scope of this draft.

but rather at
identifying instances where RFC3168 bottlenecks are deployed so that
operators of L4S senders can have the opportunity to assess whether
any actions need to be taken.

[SM] This is again very cautious, even though the reason why this draft exists in the first place seems to be the fact that most agree that "action needs to be taken". Again, let's call a spade a spade.

Additionally this document provides
guidance for network operators around configuring any RFC3168
bottlenecks to minimize the potential for negative interactions
between L4S and classic senders.

3. Detection of Classic ECN Bottlenecks

The IETF encourages researchers, end system deployers and network
operators to conduct experiments to identify to what degree legacy
RFC3168 bottlecks exist in networks. These types of measurement
campaigns, even if each is conducted over a limited set of paths,
could be useful to further understand the scope of any potential
issues, to guide end system deployers on where to examine performance
more closely (or possibly delay L4S deployment), and to help network
operators identify nodes where remediation may be necessary to
provide the best performance.

[SM] this might be a decent place to cite Pet Heist's recent draft of ECN measurements?

The design of such experiments should consider not only the detection
of RFC3168 ECN marking, but also the determination whether the
bottleneck AQM is a single queue (FIFO) or a flow-queuing system. It
is believed that the vast majority, if not all, of the RFC3168 AQMs
in use at bottleneck links are flow-queuing systems (e.g. fq_codel
[RFC8290] or [COBALT]). When flow isolation is successful, the FQ
scheduling of such queues isolates classic congestion control traffic
from L4S traffic, and thus eliminates the potential for unfairness.
But, these systems are known to sometimes result in imperfect
isolation, either due to hash collisions (see Section 5.3 of
[RFC8290]) or because of VPN tunneling (see Section 6.2 of

[SM] Are all tunnels also VPNs?

[RFC8290]). It is believed that the majority of fq_codel deployments
in bottleneck links today (e.g. [Cake]) employ hashing algorithms
that virtually eliminate the possibility of collisions, making this a
non-issue for those deployments.

[SM] While I personally like fq_codel and cake, with their default 1000 queues hash collisions are considerably more likely than this sentence implies.

But, VPN tunnels remain an issue
for fq_codel deployments, and the introduction of L4S traffic raises
the possibility that tunnels containing mixed classic and L4S traffic
would exist, in which case fq_codel implementations that have not
been updated to be L4S-aware

[SM] So practically all deployed instances, lets be explicit here, please.

could exhibit similar unfairness
properties as single queue AQMs.

[SM] Will instead of "could".

Until such queues are upgraded to
support L4S or treat ECT(1) as not-ECT traffic, end-host mitigations
such as separating L4S and Classic traffic into distinct VPN tunnels
could be employed.

[SM] This holds merit if a network designer has control over the tunnel creation and has the capabilities to map arbitrary flows to arbitrary tunnels, that is edecidedly outside of the technical expertise of most end users that might want to use a VPN service on the access link.

[Fallback] contains recommendations on some of the mechanisms that
can be used to detect legacy RFC3168 bottlenecks. TODO: summarize
the main ones here.

[SM] Please do, and add information about how long such an detection will take, and what the false positive and false negative rates of these mechanisms are.

4. Operator of an L4S host

From a host's perspective, support for L4S involves both endpoints:
ECT(1) marking & L4S-compatible congestion control at the sender, and
ECN feedback at the receiver. Between these two entities, it is
primarily incumbent upon the sender to evaluate the potential for
presence of legacy RFC3168 FIFO bottlenecks and make decisions
whether or not to use L4S congestion control. A general purpose
receiver is not expected to perform any testing or monitoring for
RFC3168, and is also not expected to invoke any active response in
the case that such a bottleneck exists. That said, it is certainly
possible for receivers to disable L4S functionality by not
negotiating ECN support with the sender.

[SM] That sounds wrong, L4S will require both side parycipating, so one can expect that both sides are involved in detection of conditions where L4S is known to fail. And in effect they already do, it is only die to the receivers reverse traffic to the sender that L4S-CC works.

Prior to deployment of any new technology, it is commonplace for the
parties involved in the deployment to validate the performance of the
new technology, via lab testing, limited field testing, large scale
field testing, etc. The same is expected for deployers of L4S
technology.

[SM] Refreshing position. I wonder why we talk about deploying L4S before the core technologies have seen "limited field testing, large scale
field testing, etc."?

As part of that validation, it is recommended that
deployers consider the issue of RFC3168 FIFO bottlenecks and conduct
experiments as described in the previous section, or otherwise assess
the impact that the L4S technology will have in the networks in which
it is to be deployed, and take action as is described further in this
section.

If pre-deployment testing raises concerns about issues with RFC3168
bottlenecks, the actions taken may depend on the server type:

o General purpose servers (e.g. web servers)

* Active testing could be performed by the server. For example,
a javascript application could run simultaneous downloads
during page reading time in order to survey for presence of
legacy RFC3168 FIFO bottlenecks on paths to users.

[SM] To generally work this requires the downloads to come from the users end-hosts. IMHO that is not a realistic model (unless we are talking about speedtest applications, that routinely measure upload rates with the end-users consent). So [leae note how approximate this "solution" is.

* Passive testing could be built in to the transport protocol
implementation at the sender in order to perform detection (see
[Fallback]).

[SM] If that is to fly, this needs to be made a requirement for L4S transports and the behaviour needs to be policed enforced, which is highly unlikely, so this is not a realistic option either.

* Taking action based on the detection of RFC3168 FIFO
bottlenecks is likely not needed for short transactional
transfers (e.g. sub 10 seconds) since these are unlikely to
achieve the steady-state conditions where unfairness has been
observed.

[SM] Where does the 10 seconds number come from?

* For longer file transfers, it may be possible to fall-back to
Classic behavior in real-time, or to simply disable L4S for
future long file transfers to clients where legacy RFC3168 has
been detected.

o Specialized servers handling long-running sessions (e.g. cloud
gaming)

* Active testing could be performed at each session startup

* Active testing could be integrated into a "pre-validation" of
the service, done when the user signs up, and periodically
thereafter

* In-band detection as described in [Fallback] could be performed
during the session

[SM] Is there any proof that Fallback actually offers robust and reliably methods with explicit and acceptable false positive and false negative rates?

In addition, the responsibilities of and actions taken by a sender
may depend on the environment in which it is deployed. The following
sub-sections discuss two scenarios: senders serving a limited known
target audience and those that serve an unknown target audience.

4.1. Edge Servers

Some hosts (such as CDN leaf nodes and servers internal to an ISP)
are deployed in environments in which they serve content to a
constrained set of networks or clients. The operator of such hosts
may be able to determine whether there is the possibility of
[RFC3168] FIFO bottlenecks being present, and utilize this
information to make decisions on selectively deploying L4S and/or
disabling it (e.g. bleaching ECN). Furthermore, such an operator may
be able to determine the likelihood of an L4S bottleneck being
present, and use this information as well.

For example, if a particular network is known to have deployed legacy
[RFC3168] FIFO bottlenecks, deployment of L4S for that network should
be delayed until those bottlenecks can be upgraded to mitigate any
potential issues as discussed in the next section.

[SM] Or, upgrade tose to fq-codel, which will give most of the latency advantage that
L4S promises but none of the side-effects... just saying.

Prior to deploying L4S on edge servers a server operator should:

o Consult with network operators on presence of legacy [RFC3168]
FIFO bottlenecks

o Consult with network operators on presence of L4S bottlenecks

o Perform pre-deployment testing per network

If a particular network offers connectivity to other networks (e.g.
in the case of an ISP offering service to their customer's networks),
the lack of RFC3168 FIFO bottleneck deployment in the ISP network
can't be taken as evidence that RFC3168 FIFO bottlenecks don't exist
end-to-end (because one may have been deployed by the end-user
network). In these cases, deployment of L4S will need to take
appropriate steps to detect the presence of such bottlenecks. At
present, it is believed that the vast majority of RFC3168 bottlenecks
in end-user networks are implementations that utilize fq_codel or
Cake, where the unfairness problem is less likely to be a concern.

[SM] As long as the users are not running VPN traffic...

While this doesn't completely eliminate the possibility that a legacy
[RFC3168] FIFO bottleneck could exist, it nonetheless provides useful
information that can be utilized in the decision making around the
potential risk for any unfairness to be experienced by end users.

4.2. Other hosts

Hosts that are deployed in locations that serve a wide variety of
networks face a more difficult prospect in terms of handling the
potential presence of RFC3168 FIFO bottlenecks. Nonetheless, the
steps listed in the ealier section (based on server type) can be
taken to minimize the risk of unfairness.

Since existing studies have hinted that RFC3168 FIFO bottlenecks are
rare, detections using these techniques may also prove to be rare.
Therefore, it may be possible for a host to cache a list of end host
ip addresses where a RFC3168 bottleneck has been detected. Entries
in such a cache would need to age-out after a period of time to
account for IP address changes, path changes, equipment upgrades,
etc.

It has been suggested that a public blacklist of domains that
implement RFC3168 FIFO bottlenecks or a public whitelist of domains
that are participating in the L4S experiment could be maintained.
There are a number of significant issues that would seem to make this
idea infeasible, not the least of which is the fact that presence of
RFC3168 FIFO bottlenecks or L4S bottlenecks is not a property of a
domain, it is the property of a path between two endpoints.

[SM] +1; worth mentioning!

5. Operator of a Network Employing RFC3168 FIFO Bottlenecks

While it is, of course, preferred for networks to deploy L4S-capable
high fidelity congestion signaling,

[SM] Citation needed, it is all but clear that L4S actually improves upon the state of the art for generic internet traffic (of arbitrary RTTs), so please rephrase this more objectively and less based on hopes and promises.

and while it is more preferable
for L4S senders to detect problems themselves,

[SM] But still unclear how feasible that is.

a network operator who
has deployed equipment in a likely bottleneck link location (i.e. a
link that is expected to be fully saturated) that is configured with
a leagcy [RFC3168] FIFO AQM can take certain steps in order to
improve rate fairness between classic traffic and L4S traffic, and
thus enable L4S to be deployed in a greater number of paths.

Some of the options listed in this section may not be feasible in all
networking equipment.

5.1. Configure AQM to treat ECT(1) as NotECT

If equipment is configurable in such as way as to only supply CE
marks to ECT(0) packets, and treat ECT(1) packets identically to
NotECT, or is upgradable to support this capability, doing so will
eliminate the risk of unfairness.

[SM] Maybe mention the cost of doing so (slowing down rfc31568 flows using ETC(1) instead f ECT(0))?

5.2. ECT(1) Tunnel Bypass

Using an [RFC6040] compatibility mode tunnel, tunnel ECT(1) traffic
through the [RFC3168] bottleneck with the outer header indicating
Not-ECT.

Two variants exist for this approach

1. per-domain: tunnel ECT(1) pkts to domain edge towards dst

2. per-dst: tunnel ECT(1) pkts to dst

[SM] Explain why a network operator should do that work (what is in it for them)?

5.3. Configure Non-Coupled Dual Queue

Equipment supporting [RFC3168] may be configurable to enable two
parallel queues for the same traffic class, with classification done
based on the ECN field.

Option 1:

o Configure 2 queues, both with ECN; 50:50 WRR scheduler

* Queue #1: ECT(1) & CE packets - Shallow immediate AQM target

* Queue #2: ECT(0) & NotECT packets - Classic AQM target

o Outcome in the case of n L4S flows and m long-running Classic
flows

* if m & n are non-zero, flows get 1/2n and 1/2m of the capacity,
otherwise 1/n or 1/m

* never < 1/2 each flow's rate if all had been Classic

This option would allow L4S flows to achieve low latency, low loss
and scalable throughput, but would sacrifice the more precise flow
balance offered by [I-D.ietf-tsvwg-aqm-dualq-coupled]. This option
would be expected to result in some reordering of previously CE
marked packets sent by Classic ECN senders, which is a trait shared
with [I-D.ietf-tsvwg-aqm-dualq-coupled]. As is discussed in
[I-D.ietf-tsvwg-ecn-l4s-id], this reordering would be either zero
risk or very low risk.

[SM] IMHO that is actually preferable to using the dual queue coupled AQM, as this solution does not have the nasty ~1:16 priority scheduler issue that dualQ papers over with some heuristics... It would still give L4S initially (until its fraction of traffic approaches 50%) an undeserved rate gain, but at least this is conceptually easy to predict and understand.

Option 2:

o Configure 2 queues, both with AQM; 50:50 WRR scheduler

* Queue #1: ECT(1) & NotECT packets - ECN disabled

* Queue #2: ECT(0) & CE packets - ECN enabled

o Outcome

* ECT(1) treated as NotECT

* Flow balance for the 2 queues the same as in option 1

This option would not allow L4S flows to achieve low latency, low
loss and scalable throughput in this bottleneck link. As a result it
is a less prefered option.

5.4. WRED with ECT(1) Differentation

This configuration is similar to Option 2 in the previous section,
but uses a single queue with WRED functionality.

o Configure the queue with two WRED classes

o Class #1: ECT(1) & NotECT packets - ECN disabled

o Class #2: ECT(0) & CE packets - ECN enabled

[SM] I can predict the response I will get, but for all of these it would be nice to get some measurements to confirm that thy perform as excpected.

5.5. Disable RFC3168 ECN Marking

Disabling [RFC3168] ECN marking eliminates the unfairness issue.
Clearly a downside to this approach is that classic senders will no
longer get the benefits of Explict Congestion Notification.

5.6. Re-mark ECT(1) to NotECT Prior to AQM

While not a recommended alternative, remarking ECT(1) packets as
NotECT (i.e. bleaching ECT(1)) ensures that they are treated
identically to classic NotECT senders. However, this also eliminates
the possibility of downstream L4S bottlenecks providing high fidelity
congestion signals.

[SM] This is were I miss a section what to do if the L4S experiment is declared a failure. IMHO recommending to remark ECT(1) to NotECT by all nodes formerly participating in the experiment, would allow the fastest most efficient way to contain the fall-out.

6. What L4S-deploying networks need to do at the end of the L4S experiment

This sections give guidance how L4S deploying networks should respond to any of the two likely outcomes of the IETF-supported L4S experiment, so here is a first rough draft:

"6.1 Successful termination of the L4S experiment

If the L4S experiment is deemed a success participating networks are encouraged to continue deploying L4S aware nodes and if possible replace all non L4S-aware rfc3168 AQM already deployed as far as feasible.

6.2 Unsuccesful termination of the L4S experiment

If the L4S experiment shows that L4S can not be safely deploying in the existing internet or that it does not achieve its goals and promises well enough, the L4S experiment might need to be terminated. In that case participants of the L4S experiment are expected to configure their L4S-aware network nodes to re-map ECT(1) to NotECT prior to AQM as described in section 5.6. That will act as a safety valve that allows a quick termination of all L4S side-effects in the unlikely cases that should be merited."

Please note that, based on the existing data I consider 6.2 the expected outcome, but still tried to keep the general L4S-positive tone of the rest of the draft (even though I also believe the whole draft could be improved by less pro-L4S bias).

Best Regards
Sebastian

P.S.: I do not really believe that this document can remedy the safety issues L4S design and implementation bring, but at least it explicitly mentions some of them and hence is worth having. My preferred outcome however would be to drop all L4S drafts instead, until the safety and functionality issues are fixed for good, which realistically will:
a) require a massive redesign and
b) is unlikely to happen.

P.P.S.: Thanks for keeping this short and focussed.

[tsvwg] FW: New Version Notification for draft-wh… Greg White
Re: [tsvwg] New Version Notification for draft-wh… Ingemar Johansson S
Re: [tsvwg] New Version Notification for draft-wh… Sebastian Moeller