Re: [tsvwg] Artart last call review of draft-ietf-tsvwg-ecn-l4s-id-27

Bob Briscoe <ietf@bobbriscoe.net> Thu, 04 August 2022 23:58 UTC
Content-Type: multipart/alternative; boundary="------------iityg57qQTbHZDRGjNu7UKDX"
Message-ID: <e25423b1-61b5-ea3f-06bb-0c10992a55fb@bobbriscoe.net>
Date: Fri, 05 Aug 2022 00:57:50 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0
Content-Language: en-GB
To: Bernard Aboba <bernard.aboba@gmail.com>
Cc: draft-ietf-tsvwg-ecn-l4s-id.all@ietf.org, last-call@ietf.org, tsvwg@ietf.org, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>, art@ietf.org
References: <165913867042.43653.10267120686300599117@ietfa.amsl.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
In-Reply-To: <165913867042.43653.10267120686300599117@ietfa.amsl.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/JHry9WE_k6Q2gts0ItxIZXggck4>
Subject: Re: [tsvwg] Artart last call review of draft-ietf-tsvwg-ecn-l4s-id-27
Precedence: list
Bernard,

Thank you for taking the time to produce this extremely thorough review.
Pls see [BB] inline;
You will need an HTML email reader for the diffs in this email.
Alternatively, I've temporarily uploaded a side-by-side diff here:
https://bobbriscoe.net/tmp/draft-ietf-tsvwg-ecn-l4s-id-28a-DIFF-27.html


On 30/07/2022 00:51, Bernard Aboba via Datatracker wrote:
> Reviewer: Bernard Aboba
> Review result: On the Right Track
>
> Here are my review comments.  I believe this is quite an important document, so
> that making the reasoning as clear as possible is important.  Unfortunately,
> the writing and overall organization makes the document hard to follow. If the
> authors are open to it, I'd be willing to invest more time to help get it into
> shape.

[BB] Thank you. You have already obviously sunk considerable time into 
it. Often I've found that your proposed alternative text didn't quite 
mean what we intended. But I've taken this as a sign that we hadn't 
explained it well and tried to guess what made you stumble.

This draft is in the long tail of many statistics: number of years since 
first draft, number of revisions, number of pages, etc. etc.
So I hope you will understand that this document has been knocked into 
all sorts of different shapes already, during a huge amount of WG review 
and consensus building, which I have tried not to upset, while also 
trying to understand why you felt it needed further changes.

> Overall Comments
>
> Abstract
>
> Since this is an Experimental document, I was expecting the Abstract and
> perhaps the Introduction to refer briefly to the considerations covered in
> Section 7, (such as potential experiments and open issues).

[BB] Good point - I'm surprised no-one has brought this up before - 
thanks. I'll add the following:
Abstract:

                    ...to prevent it degrading the low queuing delay and
    low loss of L4S traffic.  This*experimental track*  specification defines the rules that
    L4S transports and network elements need to follow with the intention
    that L4S flows neither harm each other's performance nor that of
    Classic traffic.*It also suggests open questions to be investigated ****during 
experimentation.*   Examples of new ...


Intro:
There wasn't really a relevant point to mention the Experiments section 
(§7) until the document roadmap (which you ask for later).
So we added a brief summary of the "L4S Experiments" there (see later 
for the actual text). The only change to the Intro was the first line:

     This *experimental track* specification...

> Organization and inter-relation between Sections
>
> The document has organizational issues which make it more difficult to read.
>
> I think that Section 1 should provide an overview of the specification, helping
> the reader navigate it.

[BB]  Section 3 already  provides the basis of a roadmap to both this 
and other documents. It points to §4 (Transports) & §5 (Network nodes).
It ought to have also referred to §6 (Tunnels and Encapsulations), which 
was added to the draft fairly recently (but without updating this 
roadmap). We can and should add that.

We could even move §3 to be the last subsection of §1 (i.e. §1.4). Then 
it could start the roadmap with §2, which gives the requirements for L4S 
packet identification.
However, a number of other documents already refer to the Prague L4S 
Requirements in §4, particularly §4.3. I mean not just I-Ds (which can 
still be changed), but also papers that have already been published. So 
a pragmatic compromise would be to just switch round sections 2 
(requirements) & 3 (roadmap).

Then we could retitle §3 to "L4S Packet Identification: Document Roadmap"
and add brief mentions of the tail sections (§7 L4S Experiments, and the 
usual IANA and Security Considerations).
The result is below, with manually added diff colouring (given we'd 
moved the whole section as well, so it's not a totally precise diff).

2.  L4S Packet Identification*: Document Roadmap*

    The L4S treatment is an experimental track alternative packet marking
    treatment to the Classic ECN treatment in [RFC3168], which has been
    updated by [RFC8311] to allow experiments such as the one defined in
    the present specification.  [RFC4774] discusses some of the issues
    and evaluation criteria when defining alternative ECN semantics*, which are further discussed in Section 4.3.1*.

*The L4S architecture [I-D.ietf-tsvwg-l4s-arch] describes the three main 
components of L4S: the sending host behaviour, the marking behaviour in 
the network and the L4S ECN protocol that identifies L4S packets as they 
flow between the two. *
*The next section of the present document (Section 3) records the 
requirements that informed the choice of L4S identifier. Then subsequent 
sections specify the* L4S ECN*protocol, which i) *identifies
    packets that have been sent from hosts that are expected to comply
    with a broad type of sending behaviour; and ii) identifies the
    marking treatment that network nodes are expected to apply to L4S
    packets.

    For a packet to receive L4S treatment as it is forwarded, the sender
    sets the ECN field in the IP header to the ECT(1) codepoint.  See
    Section 4 for full transport layer behaviour requirements, including
    feedback and congestion response.

    A network node that implements the L4S service always classifies
    arriving ECT(1) packets for L4S treatment and by default classifies
    CE packets for L4S treatment unless the heuristics described in
    Section 5.3 are employed.  See Section 5 for full network element
    behaviour requirements, including classification, ECN-marking and
    interaction of the L4S identifier with other identifiers and per-hop
    behaviours.

*L4S ECN works with ECN tunnelling and encapsulation behaviour as is, 
except there is one known case where careful attention to configuration 
is required, which is detailed in Section 6. ***L4S ECN is currently on the experimental track. So Section 7 collects 
together the general questions and issues that remain open for 
investigation during L4S experimentation. Open issues or questions 
specific to particular components are called out in the specifications 
of each component part, such as the DualQ 
[I-D.ietf-tsvwg-aqm-dualq-coupled].* The IANA assignment of the L4S 
identifier is specified in Section 8. And Section 9 covers security 
considerations specific to the L4S identifier. System security aspects, 
such as policing and privacy, are covered in the L4S architecture 
[I-D.ietf-tsvwg-l4s-arch].*



> Section 1.1 refers to definitions in Section 1.2 so I'd suggest that that
> Section 1.2 might be come first.

[BB] The reason for the Problem Statement being the first subsection was 
because that's what motivates people to read on.

Your suggestion has been made by others in the past, and the solution 
was to informally explain new terms in the sections before the formal 
terminology section, as they arose.
The formal terminology section can be considered as the end of the 
Introductory material and the start of the formal body of the spec.

If there are phrases that are not clearly explained before the 
terminology section, pls do point them out.
We can reconsider moving the terminology section to 1.1 if there are a lot.
But we'd rather the reader could continue straight into the summary of 
the problem and that it is understandable stand-alone - without relying 
on formal definitions elsewhere.

>
> Section 1.3 provides basic information on Scope and the relationship of this
> document to other documents.  I was therefore expecting Section 7 to include
> questions on some of the related documents (e.g. how L4S might be tested along
> with RTP).

[BB] That isn't the role of this document, which would be too abstract 
(or too long) if it had to cover how to test each different type of 
congestion control and each type of AQM.
Quoting from §7:

    The specification of each scalable congestion control will need to
    include protocol-specific requirements for configuration and
    monitoring performance during experiments.Appendix A  <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-27#appendix-A>  of the
    guidelines in [RFC5706  <https://datatracker.ietf.org/doc/html/rfc5706>] provides a helpful checklist.


Over the last 3 months, everyone involved in interop testing has been 
defining all the test plans, which had their first test-drive last week 
indeed, the success of the planning and organization of the tests 
surprised us all - kudos to Greg White who was largely responsible for 
coordinating it.
We may end up writing that all up as a separate draft. If many tests 
were documented centrally like this, each CC or AQM might only need to 
identify any special-case tests specific to itself.
That might even cover testing with live traffic over the Internet as 
well. But let's walk before we run.


> I wonder whether much of Section 2 could be combined with Appendix B, with the
> remainder moved into the Introduction, which might also refer to Appendix B.

[BB] What is the problem that you are trying to solve by breaking up 
this section?

If we split up this section, someone else will want parts moved back, or 
something else moved. Unless there's a major problem with this section, 
we'd rather it stayed in one piece. Its main purpose is to record the 
requirements and to say (paraphrasing), "The outcome is a compromise 
between requirements 'cos header space is limited. Other solutions were 
considered, but this one was the least worst."

Summary: no action here yet, pending motivating reasoning from your side.

>
> Section 4.2
>
>     RTP over UDP:  A prerequisite for scalable congestion control is for
>        both (all) ends of one media-level hop to signal ECN
>        support [RFC6679] and use the new generic RTCP feedback format of
>        [RFC8888].  The presence of ECT(1) implies that both (all) ends of
>        that media-level hop support ECN.  However, the converse does not
>        apply.  So each end of a media-level hop can independently choose
>        not to use a scalable congestion control, even if both ends
>        support ECN.
>
> [BA] The document earlier refers to an L4S modified version of SCreAM, but does
> not provide a reference.  Since RFC 8888 is not deployed today, this paragraph
> (and Section 7) leaves me somewhat unclear on the plan to evaluate L4S impact
> on RTP. Or is the focus on experimentation with RTP over QUIC (e.g.
> draft-ietf-avtcore-rtp-over-quic)?

[BB] Ingemar has given this reply:
[IJ] RFC8298 (SCReAM) in its current version does not describe support 
for L4S. The open source running code on github does however support 
L4S. An update of RFC8298 has lagged behind but I hope to start with an 
RFC8298-bis after the vacation.
RFC8888 is implemented in the public available code for SCReAM 
(https://github.com/EricssonResearch/scream). This code has been 
extensively used in demos of 5G Radio Access Networks with L4S 
capability. The example demos have been cloud gaming and video streaming 
for remote controlled cars.
The code includes gstreamer plugins as well as multi-camera code 
tailored for NVidia Jetson Nano/Xavier NX (that can be easily modified 
for other platforms).

[BB] As an interim reference, Ingemar's README is already cited as 
[SCReAM-L4S]. it is a brief but decent document about the L4S variant of 
SCReAM, which also gives further references (and the open source code is 
its own spec).

Summary: The RFC 8888 part of this question seems to be about plans for 
how the software for another RFC is expected to be installed or bundled.
Is this a question that you want this draft to answer?

>
>     For instance, for DCTCP [RFC8257], TCP Prague
>     [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux] and the
>     L4S variant of SCReAM [RFC8298], the average recovery time is always
>     half a round trip (or half a reference round trip), whatever the flow
>     rate.
>
> [BA] I'm not sure that an L4S variant of SCReAM could really be considered
> "scalable" where simulcast or scalable video coding was being sent. In these
> scenarios, adding a layer causes a multiplicative increase in bandwidth, so
> that "probing" (e.g. stuffing the channel with RTX probes or FEC) is often a
> necessary precursor to make it possible to determine whether adding layers is
> actually feasible.

[BB] Ingemar has given this reply:
[IJ] The experiments run so far with SCReAM have been with the NVENC 
encoder, which supports rate changes on a frame by frame basis, and 
Jetson Nano/Xavier NX/Xavier AGX that is a bit more slow in its rate 
control loop. So the actual probing is done by adjusting the target 
bitrate of the video encoder.

[BB] Since last week (in the first L4S interop), we now have 2 other 
implementations of real-time video with L4S support directly over UDP 
(from NVIDIA and Nokia); in addition to the original 2015 demo (also 
from Nokia). You'd have to ask Ermin Sakic <esakic@nvidia.com> about the 
NVIDIA coding, and similarly Koen De Schepper 
<koen.de_schepper@nokia.com> about the Nokia ones. I do know that both 
Nokia ones change rate packet-by-packet (and if channel conditions are 
poor, the new one can even reduce down to 500kb/s while still preserving 
the same low latency).

The message here is that, for low latency video, you can't just use any 
old encoding that was designed without latency in mind.
Again, is this a question that you want this draft to answer? It seems 
like something that would be discussed in the spec of each r-t CC technique.

>
>     As with all transport behaviours, a detailed specification (probably
>     an experimental RFC) is expected for each congestion control,
>     following the guidelines for specifying new congestion control
>     algorithms in [RFC5033].  In addition it is expected to document
>     these L4S-specific matters, specifically the timescale over which the
>     proportionality is averaged, and control of burstiness.  The recovery
>     time requirement above is worded as a 'SHOULD' rather than a 'MUST'
>     to allow reasonable flexibility for such implementations.
>
> [BA] Is the L4S variant of SCReaM one of the detailed specifications that is
> going to be needed? From the text I wasn't sure whether this was documented
> work-in-progress or a future work item.

[BB] We cannot force implementers to write open specifications of their 
algorithms. Implementers might have secrecy constraints, or just not 
choose to invest the time in spec writing. So there is no hit-list of 
specs that 'MUST' be written, except we consider it proper to document 
the reference implementation of the Prague CC.
Nonetheless, others also consider it proper to document their algorithm 
(e.g. BBRv2), and in the case of SCReAM, Ingemar has promised he will 
(as quoted above).

We don't (yet?) have a description of the latest two implementations 
that the draft can refer to (they only announced these on the first day 
of the interop last week).
We try to keep a living web page up to date that points to current 
implementations ( https://l4s.net/#code ). However, I don't think the 
RFC Editor would accept this as an archival reference.

> Section 4.3.1
>
>        To summarize, the coexistence problem is confined to cases of
>        imperfect flow isolation in an FQ, or in potential cases where a
>        Classic ECN AQM has been deployed in a shared queue (see the L4S
>        operational guidance [I-D.ietf-tsvwg-l4sops] for further details
>        including recent surveys attempting to quantify prevalence).
>        Further, if one of these cases does occur, the coexistence problem
>        does not arise unless sources of Classic and L4S flows are
>        simultaneously sharing the same bottleneck queue (e.g. different
>        applications in the same household) and flows of each type have to
>        be large enough to coincide for long enough for any throughput
>        imbalance to have developed.
>
> [BA] This seems to me to be one of the key questions that could limit the
> "incremental deployment benefit".  A reference to the discussion in Section 7
> might be appropriate here.

[BB] OK. At the end of the above para I've added:

Therefore, how often the coexistence
        problem arises in practice is listed in Section 7 as an open
        question that L4S experiments will need to answer.

> 5.4.1.1.1.  'Safe' Unresponsive Traffic
>
>     The above section requires unresponsive traffic to be 'safe' to mix
>     with L4S traffic.  Ideally this means that the sender never sends any
>     sequence of packets at a rate that exceeds the available capacity of
>     the bottleneck link.  However, typically an unresponsive transport
>     does not even know the bottleneck capacity of the path, let alone its
>     available capacity.  Nonetheless, an application can be considered
>     safe enough if it paces packets out (not necessarily completely
>     regularly) such that its maximum instantaneous rate from packet to
>     packet stays well below a typical broadband access rate.
>
> [BA] The problem with video traffic is that the encoder typically
> targets an "average bitrate" resulting in a keyframe with a
> bitrate that is above the bottleneck bandwidth and delta frames
> that are below it.  Since the "average rate" may not be
> resettable before sending another keyframe, video has limited
> ability to respond to congestion other than perhaps by dropping
> simulcast and SVC layers. Does this mean that a video is
> "Unsafe Unresponsive Traffic"?

[BB] This section on 'Safe' Unresponsive traffic is about traffic that 
is so low rate that it doesn't need to use ECN to respond to congestion 
at all (e.g. DNS, NTP). Video definitely does not fall into that category.

I think your question is really asking whether video even /with/ ECN 
support can be considered responsive enough to maintain low latency. For 
this you ought to try to see the demonstration that Nokia did last week 
(if a recording is put online) or the Ericsson demonstration which is 
already online [EDT-5GLL]. Both over emulated 5G radio access networks 
with variability of channel conditions, and both showed very fast 
interaction within the video with no perceivable lag to the human eye. 
With the Nokia one last week, using finger gestures sent over the radio 
network, you could control the viewport into a video from a 360⁰ camera, 
which was calculated and generated at the remote end. No matter how fast 
you shook your finger around, the viewport stayed locked onto it.

Regarding keyframes, for low latency video, these are generally spread 
across the packets carrying the other frames.

[EDT-5GLL] Ericsson and DT demo 5G low latency feature: 
https://www.ericsson.com/en/news/2021/10/dt-and-ericsson-successfully-test-new-5g-low-latency-feature-for-time-critical-applications

I detect here that this also isn't a question about the draft - more a 
question of "I need to see it to believe it"?

>
> NITs
>
> Abstract
>
>     The L4S identifier defined in this document distinguishes L4S from
>     'Classic' (e.g. TCP-Reno-friendly) traffic.  It gives an incremental
>     migration path so that suitably modified network bottlenecks can
>     distinguish and isolate existing traffic that still follows the
>     Classic behaviour, to prevent it degrading the low queuing delay and
>     low loss of L4S traffic.  This specification defines the rules that
>
> [BA] Might be clear to say "This allows suitably modified network..."

[BB] I'm not sure what the problem is. But I'm assuming you're saying 
you tripped over the word 'gives'. How about simplifying:

***It gives an incremental migration path so that suitably modified Then,*  network bottlenecks can*be incrementally modified to*
    distinguish and isolate existing traffic that still follows the
    Classic behaviour, to prevent it degrading the low queuing delay and
    low loss of L4S traffic.


> The words "incremental migration path" suggest that there deployment of
> L4S-capable network devices and endpoints provides incremental benefit.
> That is, once new network devices are put in place (e.g. by replacing
> a last-mile router), devices that are upgraded to support L4S will
> see benefits, even if other legacy devices are not ugpraded.
>
> If this is the point you are looking to make, you might want to clarify
> the language.

[BB] I hope the above diff helps. Is that enough for an abstract, which 
has to be kept very brief?
Especially as all the discussion about incremental deployment is in the 
L4S architecture doc, so it wouldn't be appropriate to make deployment a 
big thing in the abstract of this draft.
Nonetheless, we can flesh out the text where incremental deployment is 
already mentioned in the intro (see our suggested text for your later 
point about this, below).

Summary: We propose only the above diff on these points about 
"incremental migration" in the abstract.

>     L4S transports and network elements need to follow with the intention
>     that L4S flows neither harm each other's performance nor that of
>     Classic traffic.  Examples of new active queue management (AQM)
>     marking algorithms and examples of new transports (whether TCP-like
>     or real-time) are specified separately.
>
> [BA] Don't understand "need to follow with the intention". Is this
> stating a design principle, or is does it represent deployment
> guidance?

[BB] I think a missing comma is the culprit. Sorry for confusion. It 
should be:

    This specification defines the rules that
    L4S transports and network elements need to follow, with the intention
    that L4S flows neither harm each other's performance nor that of
    Classic traffic.


> The sentence "L4S flows neither harm each other's performance nor that
> of classic traffic" might be better placed after the first sentence
> in the second paragraph, since it relates in part to the "incremental
> deployment benefit" argument.

[BB] That wouldn't be appropriate, because:
* To prevent "Classic harms L4S" an L4S AQM needs the L4S identifier on 
packets to isolate them
* To prevent "L4S harms Classic" needs the L4S sender to detect that 
it's causing harm which is sender behaviour (rules), not identifier-based.
So the sentence has to come after the point about "the spec defines the 
rules".

Summary: we propose no action on this point.

> Section 1. Introduction
>
>     This specification defines the protocol to be used for a new network
>     service called low latency, low loss and scalable throughput (L4S).
>     L4S uses an Explicit Congestion Notification (ECN) scheme at the IP
>     layer with the same set of codepoint transitions as the original (or
>     'Classic') Explicit Congestion Notification (ECN [RFC3168]).
>     RFC 3168 required an ECN mark to be equivalent to a drop, both when
>     applied in the network and when responded to by a transport.  Unlike
>     Classic ECN marking, the network applies L4S marking more immediately
>     and more aggressively than drop, and the transport response to each
>
>     [BA] Not sure what "aggressively" means here. In general, marking
>     traffic seems like a less aggressive action than dropping it. Do
>     you mean "more frequently"?

[BB] OK; 'frequently' it is.

(FWIW, I recall that the transport response used to be described as more 
aggressive (because it reduces less in response to each mark), and the 
idea was that using aggressive for both would segue nicely into the next 
sentence about the two counterbalancing. Someone asked for that to be 
changed, and now the last vestiges of that failed literary device are 
cast onto the cutting room floor. The moral of this tale: never try to 
write a literary masterpiece by committee ;)

>
>     Also, it's a bit of a run-on sentence, so I'd break it up:
>
>     "than drop.  The transport response to each"
>
>     mark is reduced and smoothed relative to that for drop.  The two
>     changes counterbalance each other so that the throughput of an L4S
>     flow will be roughly the same as a comparable non-L4S flow under the
>     same conditions.

[BB] Not sure about this - by the next sentence (about the two changes), 
the reader has lost track of them. How about using numbering to 
structure the long sentence:

    Unlike
    Classic ECN marking: i) the network applies L4S marking more immediately
    and more aggressively than drop; and ii) the transport response to each
    mark is reduced and smoothed relative to that for drop. The two
    changes counterbalance each other...

OK?

> Nonetheless, the much more frequent ECN control
>     signals and the finer responses to these signals result in very low
>     queuing delay without compromising link utilization, and this low
>     delay can be maintained during high load.  For instance, queuing
>     delay under heavy and highly varying load with the example DCTCP/
>     DualQ solution cited below on a DSL or Ethernet link is sub-
>     millisecond on average and roughly 1 to 2 milliseconds at the 99th
>     percentile without losing link utilization [DualPI2Linux], [DCttH19].
>
>     [BA] I'd delete "cited below" since you provide the citation at
>     the end of the sentence.

[BB] 'Cited below' referred to the DCTCP and DualQ citations in the 
subsequent para, because this is the first time either term has been 
mentioned.
     '*Described* below'
was what was really meant. I think that makes it clear enough (?).

>     Note that the inherent queuing delay while waiting to acquire a
>     discontinuous medium such as WiFi has to be minimized in its own
>     right, so it would be additional to the above (see section 6.3 of the
>     L4S architecture [I-D.ietf-tsvwg-l4s-arch]).
>
>     [BA] Not sure what "discontinuous medium" means. Do you mean
>     wireless?  Also "WiFi" is a colloquialism; the actual standard
>     is IEEE 802.11 (WiFi Alliance is an industry organization).
>     Might reword this as follows:
>
>     "Note that the changes proposed here do not lessen delays from
>      accessing the medium (such as is experienced in [IEEE-802.11]).
>      For discussion, see Section 6.3 of the L4S architecture
>      [I-D.ietf-tsvwg-l4s-arch]."

[BB] We've used 'shared' instead. Other examples of shared media are 
LTE, 5G, DOCSIS (cable), DVB (satellite), PON (passive optical network). 
So I've just said 'wireless' rather than give a gratuitous citation of 
802.11.

    Note that theinherent  queuing delay while waiting to acquire a
    discontinuous
    *shared*  medium such asWiFi  *wireless*  has to beminimized in its own right, so it would be additional  *added*  to theabove  *above. It is a different issue that needs to be addressed, but separately*  (see
    section 6.3 of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]).


Then, because wireless is less specific, I've taken out 'inherent' 
because strictly medium acquisition delay is not inherent to a medium - 
it depends on the multiplexing scheme. For instance radio networks can 
use CDM (code division multiplexing), and they did in 3G.
'Inherent' was trying to get over the sense that this delay is not 
amenable to reduction by congestion control. Rather than try to cram all 
those concepts into one sentence, I've split it.

OK?

>
>     L4S is not only for elastic (TCP-like) traffic - there are scalable
>     congestion controls for real-time media, such as the L4S variant of
>     the SCReAM [RFC8298] real-time media congestion avoidance technique
>     (RMCAT).  The factor that distinguishes L4S from Classic traffic is
>
>     [BA] Is there a document that defines the L4S variant of SCReAM?

[BB] I've retagged Ingemar's readme as [SCReAM-L4S], and included it 
here to match the other two occurrences of SCReAM:

                                            such as the L4S variant
    *[SCReAM-L4S]*  of the SCReAM [RFC8298] real-time media congestion
    avoidance technique (RMCAT).


It sounds like Ingemar plans to update RFC8298 with a bis, so I guess 
eventually [RFC8298] should automatically become a reference to its own 
update.

>
>     its behaviour in response to congestion.  The transport wire
>     protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and
>     therefore not suitable for distinguishing L4S from Classic packets).
>
>     The L4S identifier defined in this document is the key piece that
>     distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic.  It
>     gives an incremental migration path so that suitably modified network
>     bottlenecks can distinguish and isolate existing Classic traffic from
>     L4S traffic to prevent the former from degrading the very low delay
>     and loss of the new scalable transports, without harming Classic
>     performance at these bottlenecks.  Initial implementation of the
>     separate parts of the system has been motivated by the performance
>     benefits.
>
> [BA] I think you are making an "incremental benefit" argument here,
> but it might be made more explicit:
>
> "  The L4S identifier defined in this document distinguishes L4S from
>     'Classic' (e.g. Reno-friendly) traffic. This allows suitably
>     modified network bottlenecks to distinguish and isolate existing
>     Classic traffic from L4S traffic, preventing the former from
>     degrading the very low delay and loss of the new scalable
>     transports, without harming Classic performance. As a result,
>     deployment of L4S in network bottlenecks provides incremental
>     benefits to endpoints whose transports support L4S."

[BB] We don't really want to lose the point about the identifier being 
key. So I've kept that. And for the middle sentence, I've used the 
simpler construction developed above (for the similar wording in the 
abstract).

Regarding the last sentence, no, it meant more than that. It meant that, 
even though implementer's customers get no benefit until both parts are 
deployed, for some implementers the 'size of the potential prize' has 
already been great enough to warrant investment in implementing their 
part, without any guarantee that other parts will be implemented. 
However, we need to be careful not to stray into conjecture and 
predictions, particularly not commercial ones, which is why this 
sentence was written in the past tense. Pulling this all together, how 
about:

    The L4S identifier defined in this document is the key piece that
    distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic.It gives an incremental migration path so that suitably modified   *Then,*
    network bottlenecks can*be incrementally modified to*  distinguish and
    isolate existing Classic traffic from L4Straffic*,*  to prevent the
    former from degrading the very low*queuing*  delay and loss of the new
    scalable transports, without harming Classic performance at these
    bottlenecks.*Although both sender and network deployment are required before any 
benefit, i*nitial implementations of the separate
    parts of the system*have*  been motivated by the*potential*  performance
    benefits.

I considered adding "have *already* been motivated..." or "*at the time 
of writing,* initial implementations..." but decided against both - they 
sounded a bit hyped up.
What do you think?


> Section 1.1 1.1.  Latency, Loss and Scaling Problems
>
>     Latency is becoming the critical performance factor for many (most?)
>     applications on the public Internet, e.g. interactive Web, Web
>     services, voice, conversational video, interactive video, interactive
>     remote presence, instant messaging, online gaming, remote desktop,
>     cloud-based applications, and video-assisted remote control of
>     machinery and industrial processes.  In the 'developed' world,
>     further increases in access network bit-rate offer diminishing
>     returns, whereas latency is still a multi-faceted problem.  In the
>     last decade or so, much has been done to reduce propagation time by
>     placing caches or servers closer to users.  However, queuing remains
>     a major intermittent component of latency.
>
> [BA] Since this paragraph provides context for the work, you might
> consider placing it earlier (in Section 1 as well as potentially in
> the Abstract).

[BB] The L4S architecture Intro already starts like you suggest.
     See 
https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-arch-19#section-1

The present doc starts out more as a technical spec might, with a 4-para 
intro focusing on what it says technically. Then it has a fairly long 
subsection to summarize the problem for those reading it stand-alone. 
That is intentional (so readers who have already read the architecture 
can easily jump).

Summary: We propose to leave the opening of the intro unchanged.

> Might modify this as follows:
>
> "
>     Latency is the critical performance factor for many Internet
>     applications, including web services, voice, realtime video,
>     remote presence, instant messaging, online gaming, remote
>     desktop, cloud services, and remote control of machinery and
>     industrial processes. In these applications, increases in access
>     network bitrate may offer diminishing returns. As a result,
>     much has been done to reduce delays by placing caches or
>     servers closer to users. However, queuing remains a major
>     contributor to latency."
We've picked up most, but not all, of your suggestions:

    Latency is becoming the critical performance factor for many (most?)
    applications on the public Internet,
    *Internet applications,*  e.g. interactiveWeb, Web  *web, web*  services, voice,
    conversational video, interactive video, interactive remote presence,
    instant messaging, online gaming, remote desktop, cloud-basedapplications,
    *applications & services,*  andvideo-assisted  remote control of machinery and
    industrial processes.  In*many parts of*  the'developed'  world, further increases
    in access networkbit-rate  *bit rate*  offer diminishing
    returns*[Dukkipati06],*  whereas latency is still a multi-faceted problem.In the last decade or so,   *As a result,*  much
    has been done to reduce propagation time by placing caches or servers
    closer to users.  However, queuing remains amajor intermittent  *major, albeit intermittent,*  component of latency.


We've added [Dukkipati06], because we were asked to justify the similar 
'diminishing returns' claim in the L4S architecture, and Dukkipati06 
provides a plot supporting that in its intro:

    *[Dukkipati06] Dukkipati, N. and N. McKeown, "Why Flow-Completion Time 
is the Right Metric for Congestion Control", ACM CCR 36(1):59--62, 
January 2006, <https://dl.acm.org/doi/10.1145/1111322.1111336>.*


The distinctions between different applications of the same technology 
were deliberately intended to distinguish different degrees of latency 
sensitivity, so we left some of them in.
OK?

>     The Diffserv architecture provides Expedited Forwarding [RFC3246], so
>     that low latency traffic can jump the queue of other traffic.  If
>     growth in high-throughput latency-sensitive applications continues,
>     periods with solely latency-sensitive traffic will become
>     increasingly common on links where traffic aggregation is low.  For
>     instance, on the access links dedicated to individual sites (homes,
>     small enterprises or mobile devices).  These links also tend to
>     become the path bottleneck under load.  During these periods, if all
>     the traffic were marked for the same treatment, at these bottlenecks
>     Diffserv would make no difference.  Instead, it becomes imperative to
>     remove the underlying causes of any unnecessary delay.
>
> [BA] This paragraph is hard to follow. You might consider rewriting it as
> follows:
>
>     "The Diffserv architecture provides Expedited Forwarding [RFC3246], to
>     enable low latency traffic to jump the queue of other traffic. However,
>     the latency-sensitive applications are growing in number along
>     with the fraction of latency-sensitive traffic. On bottleneck links where
>     traffic aggregation is low (such as links to homes, small enterprises or
>     mobile devices), if all traffic is marked for the same treatment, Diffserv
>     will not make a difference. Instead, it is necessary to remove unnecessary
>     delay."

[BB] Your proposed replacement has the following problems:
* It relies on prediction (the previous text avoided prediction, instead 
saying "if growth ... continues");
* The proposed replacement loses the critical sense of "periods with 
solely latency sensitive traffic" (not all the time)
* it also loses the critical idea that the same links that are low stat 
mux tend to also be those where the bottleneck is.
How about:

    The Diffserv architecture provides Expedited Forwarding [RFC3246], so
    that low latency traffic can jump the queue of other traffic.  If
    growth inhigh-throughput  latency-sensitive applications continues, periods with
    solely latency-sensitive traffic will become increasingly common on
    links where traffic aggregation is low.For instance, on the access links dedicated to individual sites (homes, 
small enterprises or mobile devices). These links also tend to become 
the path bottleneck under load. During these periods, if all   *During these periods, if all*
    the traffic were marked for the same treatment,at these bottlenecks  Diffserv would make
    no difference.Instead,   *The links with low aggregation also tend to become the path bottleneck 
under load, for instance, the access links dedicated to individual sites 
(homes, small enterprises or mobile devices). So, instead of 
differentiation,*  it becomes imperative to
    remove the underlying causes of any unnecessary delay.


I tried to guess what you found hard to follow, but still to keep all 
the concepts. The main changes were:
*  to switch the sentence order so "periods with solely" and "these 
periods" were not a few sentences apart.
* to make it clear what 'instead' meant.
Better?


>    long enough for the queue to fill the buffer, making every packet in
>     other flows sharing the buffer sit through the queue.
>
>     [BA] "sit through" -> "share"

[BB] Nah, that's tautology "other flows sharing the buffer share the queue".
And it loses the sense of waiting. If "sit through" isn't 
understandable, how about

    "...causing every packet in other flows sharing the buffer to have to
    work its way through the queue."
?

>
>     Active queue management (AQM) was originally developed to solve this
>     problem (and others).  Unlike Diffserv, which gives low latency to
>     some traffic at the expense of others, AQM controls latency for _all_
>     traffic in a class.  In general, AQM methods introduce an increasing
>     level of discard from the buffer the longer the queue persists above
>     a shallow threshold.  This gives sufficient signals to capacity-
>     seeking (aka. greedy) flows to keep the buffer empty for its intended
>     purpose: absorbing bursts.  However, RED [RFC2309] and other
>     algorithms from the 1990s were sensitive to their configuration and
>     hard to set correctly.  So, this form of AQM was not widely deployed.
>
>     More recent state-of-the-art AQM methods, e.g. FQ-CoDel [RFC8290],
>     PIE [RFC8033], Adaptive RED [ARED01], are easier to configure,
>     because they define the queuing threshold in time not bytes, so it is
>     invariant for different link rates.  However, no matter how good the
>     AQM, the sawtoothing sending window of a Classic congestion control
>     will either cause queuing delay to vary or cause the link to be
>     underutilized.  Even with a perfectly tuned AQM, the additional
>     queuing delay will be of the same order as the underlying speed-of-
>     light delay across the network, thereby roughly doubling the total
>     round-trip time.
>
> [BA] Would suggest rewriting as follows:
>
> "  More recent state-of-the-art AQM methods such as FQ-CoDel [RFC8290],
>     PIE [RFC8033] and Adaptive RED [ARED01], are easier to configure,
>     because they define the queuing threshold in time not bytes, providing
>     link rate invariance.  However, AQM does not change the "sawtooth"
>     sending behavior of Classic congestion control algorithms, which
>     alternates between varying queuing delay and link underutilization.
>     Even with a perfectly tuned AQM, the additional queuing delay will
>     be of the same order as the underlying speed-of-light delay across
>     the network, thereby roughly doubling the total round-trip time."

[BB] We've taken most of these suggestions, but link rate invariance is 
rather a mouthful.
Also more queue delay or more under-utilization wasn't meant to imply 
alternating between the two.
So how about:

    More recent state-of-the-art AQM methods,e.g.  *such as*  FQ-CoDel [RFC8290],
    PIE [RFC8033]*or*  Adaptive RED [ARED01], are easier to configure,
    because they define the queuing threshold in time not bytes, soit
    *configuration*  is invariantfor different  *whatever the*  linkrates.  *rate.*   However,no matter how good the AQM,  the
    sawtoothingsending  window of a Classic congestion control*creates a dilemma for the operator: i)*  either*configure a shallow AQM operating point, so the tips of the sawteeth*  cause*minimal queue*  delay*but the troughs underutilize the link,*  or*ii) configure the operating point deeper into the buffer, so the 
troughs utilize*  the link*better but then the tips cause more delay variation.*   Even...

OK?


>     If a sender's own behaviour is introducing queuing delay variation,
>     no AQM in the network can 'un-vary' the delay without significantly
>     compromising link utilization.  Even flow-queuing (e.g. [RFC8290]),
>     which isolates one flow from another, cannot isolate a flow from the
>     delay variations it inflicts on itself.  Therefore those applications
>     that need to seek out high bandwidth but also need low latency will
>     have to migrate to scalable congestion control.
>
> [BA] I'd suggest you delete the last sentence, since the point is
> elaborated on in more detail in the next paragraph.

[BB] Actually, this point is not made in the next para (but you might 
have thought it was because it's not clear, so below I've tried to fix it).
Indeed, I've realized we need to /add/ to the last sentence, because we 
haven't yet said what a scalable control is...

        ...migrate to scalable congestion control*, which uses much smaller sawtooth variations*.


>     Altering host behaviour is not enough on its own though.  Even if
>     hosts adopt low latency behaviour (scalable congestion controls),
>     they need to be isolated from the behaviour of existing Classic
>     congestion controls that induce large queue variations.  L4S enables
>     that migration by providing latency isolation in the network and
>
> [BA] "enables that migration" -> "motivates incremental deployment"
>
>     distinguishing the two types of packets that need to be isolated: L4S
>     and Classic.  L4S isolation can be achieved with a queue per flow
>     (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is
>     sufficient, and actually gives better tail latency.  Both approaches
>     are addressed in this document.

[BB] The intended meaning here is 'enables' (technical feasibility), not 
motivates (human inclination).
But whatever, in the rewording below, I don't think either is needed. 
I'm also assuming that middle sentence didn't make sense for you, and I 
think I see why. So how about:

    Altering host behaviour is not enough on its own though.  Even if
    hosts adopt low latencybehaviour (scalable congestion controls), they need to be
    isolated from thebehaviour of  *large queue variations induced by*  existing Classic
    congestion controlsthat induce large queue variations.L4S enables that migration by providing  *L4S AQMs providethat*  latency isolation in the network and
    distinguishing  *the L4S identifier enables the AQMs to distinguish*  the
    two types of packets  that need to be isolated: L4S and Classic.


How's that?

>     The DualQ solution was developed to make very low latency available
>     without requiring per-flow queues at every bottleneck.  This was
>
> [BA] "This was" -> "This was needed"

[BB] Not quite that strong. More like:
     "This was useful"
>
>     Latency is not the only concern addressed by L4S: It was known when
>
>     [BA] ":" -> "."

[BB] OK.

>
>     explanation is summarised without the maths in Section 4 of the L4S
>
>     [BA] "summarised without the maths" -> "summarized without the mathematics"

[BB] OK - that nicely side-steps stumbles from either side of the Atlantic.

>
> 1.2.  Terminology
>
> [BA] Since Section 1.1 refers to some of the Terminology defined in
> this section, I'd consider placing this section before that one.

[BB] See earlier for push-back on this.

>
>     Reno-friendly:  The subset of Classic traffic that is friendly to the
>        standard Reno congestion control defined for TCP in [RFC5681].
>        The TFRC spec. [RFC5348] indirectly implies that 'friendly' is
>
>        [BA] "spec." -> "specification"

[BB] I checked this after a previous review comment, and 'spec' is now 
considered to be a word in its own right. I should have removed the 
full-stop though, which I did for all other occurrences.
However, the RFC Editor might have a style preference on this point, in 
which case I will acquiesce.


>
>        defined as "generally within a factor of two of the sending rate
>        of a TCP flow under the same conditions".  Reno-friendly is used
>        here in place of 'TCP-friendly', given the latter has become
>        imprecise, because the TCP protocol is now used with so many
>        different congestion control behaviours, and Reno is used in non-
>
>        [BA] "Reno is used" -> "Reno can be used"

[BB] OK

>
> 4.  Transport Layer Behaviour (the 'Prague Requirements')
>
> [BA] This section is empty and there are no previous references to Prague. So I
> think you need to say a few words here to introduce the section.

[BB] OK. How about:

*   This section defines L4S behaviour at the transport layer, also known
    as the Prague L4S Requirements (see Appendix A for the origin of the
    name).
*

Again, thank you very much for all the time and effort you've put into 
this review.

Regards



Bob



-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/
[tsvwg] Artart last call review of draft-ietf-tsv… Bernard Aboba via Datatracker
Re: [tsvwg] Artart last call review of draft-ietf… Bob Briscoe
Re: [tsvwg] Artart last call review of draft-ietf… Bernard Aboba
Re: [tsvwg] Artart last call review of draft-ietf… Bob Briscoe
[tsvwg] New rev -28 (was: Artart last call review… Bob Briscoe