Re: [art] Artart last call review of draft-ietf-tsvwg-ecn-l4s-id-27
Bernard Aboba <bernard.aboba@gmail.com> Fri, 05 August 2022 01:00 UTC
Return-Path: <bernard.aboba@gmail.com>
X-Original-To: art@ietfa.amsl.com
Delivered-To: art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 87871C14F72B; Thu, 4 Aug 2022 18:00:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5ifJiiDfvu7n; Thu, 4 Aug 2022 17:59:58 -0700 (PDT)
Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8D025C14F725; Thu, 4 Aug 2022 17:59:57 -0700 (PDT)
Received: by mail-ed1-x535.google.com with SMTP id t5so1581261edc.11; Thu, 04 Aug 2022 17:59:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=VWsqrJ29mK/6hTdl19zU2DvTMeYZ5hWQSmFwkwNGUnc=; b=VJqRF2e7RP308kw1Uq016oAHnql3DWlZkpig9XBkjDgBG/3iKX3eAxBCLIviufv3St es8cd1LGYeJbVgtixSxznjLQfM724bM8fr4/PIZC8ZKD+CpEker2b0HExOcZdsFJ0vYb Agn2ZF/wr+soWWc8OsY2a2tFOUtg77/zzmBoD0/AVBIUNH2APYQrcEBscRD7sVPxpgBb 4emyKith/pfKKZmrx0bA4eRAvtP+axUJ4DpnAzLnamaFJA+Cr72ZPuRgl4Oz5nLWfK/V /bojKg7DKst0dEwTZ3myYEPPPp/rApM2Mt3jGYmJGZulcbxAAuU7hYlbDCIogQ6tOlpe AA2g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=VWsqrJ29mK/6hTdl19zU2DvTMeYZ5hWQSmFwkwNGUnc=; b=pWATyV8xXc5kbEf989Y3CGE8Rk2FTusOakZIgNuo10kkIj1US4waT3LDKWFPNfYN3E xoZZRI8OWXRFyISEZOxqNH8zHdAQh63tDAQz+ybY8EeNjxm5td9DkERUVQQDHohMjG/R j0ZUY0vdfvCDAxbekQuJEnBE6EC2rv41uR9WQXbb0SE1xiUn98EEDbnUoC/MSQ3W5tcO Ax9lTfDKNF2qUpK1TluBSjXAGoXirSoSCoDk9YtHxD44TQNBjfJJuAo9mZuiGEnNiBzE GdNFlMACFJnzFwoW5bxs70fj2k2yLnhZLgGTgcaBUQPKiH6PKb9NmpyEK/BSIRZisGf3 9L1Q==
X-Gm-Message-State: ACgBeo2RBYS1pt/t1FamUWj6V2IE2CrBMg/13GfO0KH2apZCjm4OCzHc KgSh8XXAkViDrXl0K22Izhgna6KhvfxT/6KKg88=
X-Google-Smtp-Source: AA6agR65Kvy6fHge8p0IFcPAR10MMPWO4QHmCpSIPB0kBWnKm8Ybr70rzBsMOoNKle+NL2xRn3VOzu10B6mI42a/GnQ=
X-Received: by 2002:a05:6402:84f:b0:437:6293:d264 with SMTP id b15-20020a056402084f00b004376293d264mr4321791edz.317.1659661195597; Thu, 04 Aug 2022 17:59:55 -0700 (PDT)
MIME-Version: 1.0
References: <165913867042.43653.10267120686300599117@ietfa.amsl.com> <e25423b1-61b5-ea3f-06bb-0c10992a55fb@bobbriscoe.net>
In-Reply-To: <e25423b1-61b5-ea3f-06bb-0c10992a55fb@bobbriscoe.net>
From: Bernard Aboba <bernard.aboba@gmail.com>
Date: Thu, 04 Aug 2022 17:59:44 -0700
Message-ID: <CAOW+2duaq8M1gwZ5snmhhibnNMEywn-5B=C2awAiE6akcxYyBw@mail.gmail.com>
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: draft-ietf-tsvwg-ecn-l4s-id.all@ietf.org, last-call@ietf.org, tsvwg IETF list <tsvwg@ietf.org>, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>, Applications and Real-Time Area Discussion <art@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000002dc8605e573fca4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/art/r2kc9pzG9JAcrIrrsiyQAdUepus>
Subject: Re: [art] Artart last call review of draft-ietf-tsvwg-ecn-l4s-id-27
X-BeenThere: art@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Applications and Real-Time Area Discussion <art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/art>, <mailto:art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/art/>
List-Post: <mailto:art@ietf.org>
List-Help: <mailto:art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/art>, <mailto:art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Aug 2022 01:00:05 -0000
Bob -- Thanks for the thoughtful and thorough response to my review. I have gone through the changes you suggest, and they look good to me. On Thu, Aug 4, 2022 at 4:57 PM Bob Briscoe <ietf@bobbriscoe.net> wrote: > Bernard, > > Thank you for taking the time to produce this extremely thorough review. > Pls see [BB] inline; > You will need an HTML email reader for the diffs in this email. > Alternatively, I've temporarily uploaded a side-by-side diff here: > > https://bobbriscoe.net/tmp/draft-ietf-tsvwg-ecn-l4s-id-28a-DIFF-27.html > > > On 30/07/2022 00:51, Bernard Aboba via Datatracker wrote: > > Reviewer: Bernard Aboba > Review result: On the Right Track > > Here are my review comments. I believe this is quite an important document, so > that making the reasoning as clear as possible is important. Unfortunately, > the writing and overall organization makes the document hard to follow. If the > authors are open to it, I'd be willing to invest more time to help get it into > shape. > > > [BB] Thank you. You have already obviously sunk considerable time into it. > Often I've found that your proposed alternative text didn't quite mean what > we intended. But I've taken this as a sign that we hadn't explained it well > and tried to guess what made you stumble. > > This draft is in the long tail of many statistics: number of years since > first draft, number of revisions, number of pages, etc. etc. > So I hope you will understand that this document has been knocked into all > sorts of different shapes already, during a huge amount of WG review and > consensus building, which I have tried not to upset, while also trying to > understand why you felt it needed further changes. > > Overall Comments > > Abstract > > Since this is an Experimental document, I was expecting the Abstract and > perhaps the Introduction to refer briefly to the considerations covered in > Section 7, (such as potential experiments and open issues). > > > [BB] Good point - I'm surprised no-one has brought this up before - > thanks. I'll add the following: > Abstract: > > ...to prevent it degrading the low queuing delay and > low loss of L4S traffic. This *experimental track* specification defines the rules that > L4S transports and network elements need to follow with the intention > that L4S flows neither harm each other's performance nor that of > Classic traffic. *It also suggests open questions to be investigated ** during experimentation.* Examples of new ... > > > Intro: > There wasn't really a relevant point to mention the Experiments section > (§7) until the document roadmap (which you ask for later). > So we added a brief summary of the "L4S Experiments" there (see later for > the actual text). The only change to the Intro was the first line: > > This *experimental track* specification... > > Organization and inter-relation between Sections > > The document has organizational issues which make it more difficult to read. > > I think that Section 1 should provide an overview of the specification, helping > the reader navigate it. > > > [BB] Section 3 already provides the basis of a roadmap to both this and > other documents. It points to §4 (Transports) & §5 (Network nodes). > It ought to have also referred to §6 (Tunnels and Encapsulations), which > was added to the draft fairly recently (but without updating this roadmap). > We can and should add that. > > We could even move §3 to be the last subsection of §1 (i.e. §1.4). Then it > could start the roadmap with §2, which gives the requirements for L4S > packet identification. > However, a number of other documents already refer to the Prague L4S > Requirements in §4, particularly §4.3. I mean not just I-Ds (which can > still be changed), but also papers that have already been published. So a > pragmatic compromise would be to just switch round sections 2 > (requirements) & 3 (roadmap). > > Then we could retitle §3 to "L4S Packet Identification: Document Roadmap" > and add brief mentions of the tail sections (§7 L4S Experiments, and the > usual IANA and Security Considerations). > The result is below, with manually added diff colouring (given we'd moved > the whole section as well, so it's not a totally precise diff). > > 2. L4S Packet Identification*: Document Roadmap* > > The L4S treatment is an experimental track alternative packet marking > treatment to the Classic ECN treatment in [RFC3168], which has been > updated by [RFC8311] to allow experiments such as the one defined in > the present specification. [RFC4774] discusses some of the issues > and evaluation criteria when defining alternative ECN semantics*, > which are further discussed in Section 4.3.1*. > * The L4S architecture [I-D.ietf-tsvwg-l4s-arch] describes the three > main components of L4S: the sending host behaviour, the marking > behaviour in the network and the L4S ECN protocol that identifies L4S > packets as they flow between the two. > ** The next section of the present document (Section 3) records the > requirements that informed the choice of L4S identifier. Then > subsequent sections specify the* L4S ECN *protocol, which i) *identifies > packets that have been sent from hosts that are expected to comply > with a broad type of sending behaviour; and ii) identifies the > marking treatment that network nodes are expected to apply to L4S > packets. > > For a packet to receive L4S treatment as it is forwarded, the sender > sets the ECN field in the IP header to the ECT(1) codepoint. See > Section 4 for full transport layer behaviour requirements, including > feedback and congestion response. > > A network node that implements the L4S service always classifies > arriving ECT(1) packets for L4S treatment and by default classifies > CE packets for L4S treatment unless the heuristics described in > Section 5.3 are employed. See Section 5 for full network element > behaviour requirements, including classification, ECN-marking and > interaction of the L4S identifier with other identifiers and per-hop > behaviours. > * L4S ECN works with ECN tunnelling and encapsulation behaviour as is, > except there is one known case where careful attention to > configuration is required, which is detailed in Section 6. > > ** L4S ECN is currently on the experimental track. So Section 7 > collects together the general questions and issues that remain open > for investigation during L4S experimentation. Open issues or > questions specific to particular components are called out in the > specifications of each component part, such as the DualQ > [I-D.ietf-tsvwg-aqm-dualq-coupled]. > > The IANA assignment of the L4S identifier is specified in > Section 8. And Section 9 covers security considerations specific to > the L4S identifier. System security aspects, such as policing and > privacy, are covered in the L4S architecture > [I-D.ietf-tsvwg-l4s-arch].* > > > > Section 1.1 refers to definitions in Section 1.2 so I'd suggest that that > Section 1.2 might be come first. > > > [BB] The reason for the Problem Statement being the first subsection was > because that's what motivates people to read on. > > Your suggestion has been made by others in the past, and the solution was > to informally explain new terms in the sections before the formal > terminology section, as they arose. > The formal terminology section can be considered as the end of the > Introductory material and the start of the formal body of the spec. > > If there are phrases that are not clearly explained before the terminology > section, pls do point them out. > We can reconsider moving the terminology section to 1.1 if there are a > lot. > But we'd rather the reader could continue straight into the summary of the > problem and that it is understandable stand-alone - without relying on > formal definitions elsewhere. > > Section 1.3 provides basic information on Scope and the relationship of this > document to other documents. I was therefore expecting Section 7 to include > questions on some of the related documents (e.g. how L4S might be tested along > with RTP). > > > [BB] That isn't the role of this document, which would be too abstract (or > too long) if it had to cover how to test each different type of congestion > control and each type of AQM. > Quoting from §7: > > The specification of each scalable congestion control will need to > include protocol-specific requirements for configuration and > monitoring performance during experiments. Appendix A <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-27#appendix-A> of the > guidelines in [RFC5706 <https://datatracker.ietf.org/doc/html/rfc5706>] provides a helpful checklist. > > > Over the last 3 months, everyone involved in interop testing has been > defining all the test plans, which had their first test-drive last week > indeed, the success of the planning and organization of the tests surprised > us all - kudos to Greg White who was largely responsible for coordinating > it. > We may end up writing that all up as a separate draft. If many tests were > documented centrally like this, each CC or AQM might only need to identify > any special-case tests specific to itself. > That might even cover testing with live traffic over the Internet as well. > But let's walk before we run. > > > I wonder whether much of Section 2 could be combined with Appendix B, with the > remainder moved into the Introduction, which might also refer to Appendix B. > > > [BB] What is the problem that you are trying to solve by breaking up this > section? > > If we split up this section, someone else will want parts moved back, or > something else moved. Unless there's a major problem with this section, > we'd rather it stayed in one piece. Its main purpose is to record the > requirements and to say (paraphrasing), "The outcome is a compromise > between requirements 'cos header space is limited. Other solutions were > considered, but this one was the least worst." > > Summary: no action here yet, pending motivating reasoning from your side. > > Section 4.2 > > RTP over UDP: A prerequisite for scalable congestion control is for > both (all) ends of one media-level hop to signal ECN > support [RFC6679] and use the new generic RTCP feedback format of > [RFC8888]. The presence of ECT(1) implies that both (all) ends of > that media-level hop support ECN. However, the converse does not > apply. So each end of a media-level hop can independently choose > not to use a scalable congestion control, even if both ends > support ECN. > > [BA] The document earlier refers to an L4S modified version of SCreAM, but does > not provide a reference. Since RFC 8888 is not deployed today, this paragraph > (and Section 7) leaves me somewhat unclear on the plan to evaluate L4S impact > on RTP. Or is the focus on experimentation with RTP over QUIC (e.g. > draft-ietf-avtcore-rtp-over-quic)? > > > [BB] Ingemar has given this reply: > [IJ] RFC8298 (SCReAM) in its current version does not describe support for > L4S. The open source running code on github does however support L4S. An > update of RFC8298 has lagged behind but I hope to start with an RFC8298-bis > after the vacation. > RFC8888 is implemented in the public available code for SCReAM ( > https://github.com/EricssonResearch/scream). This code has been > extensively used in demos of 5G Radio Access Networks with L4S capability. > The example demos have been cloud gaming and video streaming for remote > controlled cars. > The code includes gstreamer plugins as well as multi-camera code tailored > for NVidia Jetson Nano/Xavier NX (that can be easily modified for other > platforms). > > [BB] As an interim reference, Ingemar's README is already cited as > [SCReAM-L4S]. it is a brief but decent document about the L4S variant of > SCReAM, which also gives further references (and the open source code is > its own spec). > > Summary: The RFC 8888 part of this question seems to be about plans for > how the software for another RFC is expected to be installed or bundled. > Is this a question that you want this draft to answer? > > For instance, for DCTCP [RFC8257], TCP Prague > [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux] and the > L4S variant of SCReAM [RFC8298], the average recovery time is always > half a round trip (or half a reference round trip), whatever the flow > rate. > > [BA] I'm not sure that an L4S variant of SCReAM could really be considered > "scalable" where simulcast or scalable video coding was being sent. In these > scenarios, adding a layer causes a multiplicative increase in bandwidth, so > that "probing" (e.g. stuffing the channel with RTX probes or FEC) is often a > necessary precursor to make it possible to determine whether adding layers is > actually feasible. > > > [BB] Ingemar has given this reply: > [IJ] The experiments run so far with SCReAM have been with the NVENC > encoder, which supports rate changes on a frame by frame basis, and Jetson > Nano/Xavier NX/Xavier AGX that is a bit more slow in its rate control loop. > So the actual probing is done by adjusting the target bitrate of the video > encoder. > > [BB] Since last week (in the first L4S interop), we now have 2 other > implementations of real-time video with L4S support directly over UDP (from > NVIDIA and Nokia); in addition to the original 2015 demo (also from Nokia). > You'd have to ask Ermin Sakic <esakic@nvidia.com> <esakic@nvidia.com> > about the NVIDIA coding, and similarly Koen De Schepper > <koen.de_schepper@nokia.com> <koen.de_schepper@nokia.com> about the Nokia > ones. I do know that both Nokia ones change rate packet-by-packet (and if > channel conditions are poor, the new one can even reduce down to 500kb/s > while still preserving the same low latency). > > The message here is that, for low latency video, you can't just use any > old encoding that was designed without latency in mind. > Again, is this a question that you want this draft to answer? It seems > like something that would be discussed in the spec of each r-t CC technique. > > As with all transport behaviours, a detailed specification (probably > an experimental RFC) is expected for each congestion control, > following the guidelines for specifying new congestion control > algorithms in [RFC5033]. In addition it is expected to document > these L4S-specific matters, specifically the timescale over which the > proportionality is averaged, and control of burstiness. The recovery > time requirement above is worded as a 'SHOULD' rather than a 'MUST' > to allow reasonable flexibility for such implementations. > > [BA] Is the L4S variant of SCReaM one of the detailed specifications that is > going to be needed? From the text I wasn't sure whether this was documented > work-in-progress or a future work item. > > > [BB] We cannot force implementers to write open specifications of their > algorithms. Implementers might have secrecy constraints, or just not choose > to invest the time in spec writing. So there is no hit-list of specs that > 'MUST' be written, except we consider it proper to document the reference > implementation of the Prague CC. > Nonetheless, others also consider it proper to document their algorithm > (e.g. BBRv2), and in the case of SCReAM, Ingemar has promised he will (as > quoted above). > > We don't (yet?) have a description of the latest two implementations that > the draft can refer to (they only announced these on the first day of the > interop last week). > We try to keep a living web page up to date that points to current > implementations ( https://l4s.net/#code ). However, I don't think the RFC > Editor would accept this as an archival reference. > > Section 4.3.1 > > To summarize, the coexistence problem is confined to cases of > imperfect flow isolation in an FQ, or in potential cases where a > Classic ECN AQM has been deployed in a shared queue (see the L4S > operational guidance [I-D.ietf-tsvwg-l4sops] for further details > including recent surveys attempting to quantify prevalence). > Further, if one of these cases does occur, the coexistence problem > does not arise unless sources of Classic and L4S flows are > simultaneously sharing the same bottleneck queue (e.g. different > applications in the same household) and flows of each type have to > be large enough to coincide for long enough for any throughput > imbalance to have developed. > > [BA] This seems to me to be one of the key questions that could limit the > "incremental deployment benefit". A reference to the discussion in Section 7 > might be appropriate here. > > > [BB] OK. At the end of the above para I've added: > > Therefore, how often the coexistence > problem arises in practice is listed in Section 7 as an open > question that L4S experiments will need to answer. > > 5.4.1.1.1. 'Safe' Unresponsive Traffic > > The above section requires unresponsive traffic to be 'safe' to mix > with L4S traffic. Ideally this means that the sender never sends any > sequence of packets at a rate that exceeds the available capacity of > the bottleneck link. However, typically an unresponsive transport > does not even know the bottleneck capacity of the path, let alone its > available capacity. Nonetheless, an application can be considered > safe enough if it paces packets out (not necessarily completely > regularly) such that its maximum instantaneous rate from packet to > packet stays well below a typical broadband access rate. > > [BA] The problem with video traffic is that the encoder typically > targets an "average bitrate" resulting in a keyframe with a > bitrate that is above the bottleneck bandwidth and delta frames > that are below it. Since the "average rate" may not be > resettable before sending another keyframe, video has limited > ability to respond to congestion other than perhaps by dropping > simulcast and SVC layers. Does this mean that a video is > "Unsafe Unresponsive Traffic"? > > > [BB] This section on 'Safe' Unresponsive traffic is about traffic that is > so low rate that it doesn't need to use ECN to respond to congestion at all > (e.g. DNS, NTP). Video definitely does not fall into that category. > > I think your question is really asking whether video even /with/ ECN > support can be considered responsive enough to maintain low latency. For > this you ought to try to see the demonstration that Nokia did last week (if > a recording is put online) or the Ericsson demonstration which is already > online [EDT-5GLL]. Both over emulated 5G radio access networks with > variability of channel conditions, and both showed very fast interaction > within the video with no perceivable lag to the human eye. With the Nokia > one last week, using finger gestures sent over the radio network, you could > control the viewport into a video from a 360⁰ camera, which was calculated > and generated at the remote end. No matter how fast you shook your finger > around, the viewport stayed locked onto it. > > Regarding keyframes, for low latency video, these are generally spread > across the packets carrying the other frames. > > [EDT-5GLL] Ericsson and DT demo 5G low latency feature: > https://www.ericsson.com/en/news/2021/10/dt-and-ericsson-successfully-test-new-5g-low-latency-feature-for-time-critical-applications > > I detect here that this also isn't a question about the draft - more a > question of "I need to see it to believe it"? > > NITs > > Abstract > > The L4S identifier defined in this document distinguishes L4S from > 'Classic' (e.g. TCP-Reno-friendly) traffic. It gives an incremental > migration path so that suitably modified network bottlenecks can > distinguish and isolate existing traffic that still follows the > Classic behaviour, to prevent it degrading the low queuing delay and > low loss of L4S traffic. This specification defines the rules that > > [BA] Might be clear to say "This allows suitably modified network..." > > > [BB] I'm not sure what the problem is. But I'm assuming you're saying you > tripped over the word 'gives'. How about simplifying: > > *It gives an incremental > migration path so that suitably modified Then,* network bottlenecks can *be incrementally modified to* > distinguish and isolate existing traffic that still follows the > Classic behaviour, to prevent it degrading the low queuing delay and > low loss of L4S traffic. > > > The words "incremental migration path" suggest that there deployment of > L4S-capable network devices and endpoints provides incremental benefit. > That is, once new network devices are put in place (e.g. by replacing > a last-mile router), devices that are upgraded to support L4S will > see benefits, even if other legacy devices are not ugpraded. > > If this is the point you are looking to make, you might want to clarify > the language. > > > [BB] I hope the above diff helps. Is that enough for an abstract, which > has to be kept very brief? > Especially as all the discussion about incremental deployment is in the > L4S architecture doc, so it wouldn't be appropriate to make deployment a > big thing in the abstract of this draft. > Nonetheless, we can flesh out the text where incremental deployment is > already mentioned in the intro (see our suggested text for your later point > about this, below). > > Summary: We propose only the above diff on these points about "incremental > migration" in the abstract. > > L4S transports and network elements need to follow with the intention > that L4S flows neither harm each other's performance nor that of > Classic traffic. Examples of new active queue management (AQM) > marking algorithms and examples of new transports (whether TCP-like > or real-time) are specified separately. > > [BA] Don't understand "need to follow with the intention". Is this > stating a design principle, or is does it represent deployment > guidance? > > > [BB] I think a missing comma is the culprit. Sorry for confusion. It > should be: > > This specification defines the rules that > L4S transports and network elements need to follow, with the intention > that L4S flows neither harm each other's performance nor that of > Classic traffic. > > > The sentence "L4S flows neither harm each other's performance nor that > of classic traffic" might be better placed after the first sentence > in the second paragraph, since it relates in part to the "incremental > deployment benefit" argument. > > > [BB] That wouldn't be appropriate, because: > * To prevent "Classic harms L4S" an L4S AQM needs the L4S identifier on > packets to isolate them > * To prevent "L4S harms Classic" needs the L4S sender to detect that it's > causing harm which is sender behaviour (rules), not identifier-based. > So the sentence has to come after the point about "the spec defines the > rules". > > Summary: we propose no action on this point. > > Section 1. Introduction > > This specification defines the protocol to be used for a new network > service called low latency, low loss and scalable throughput (L4S). > L4S uses an Explicit Congestion Notification (ECN) scheme at the IP > layer with the same set of codepoint transitions as the original (or > 'Classic') Explicit Congestion Notification (ECN [RFC3168]). > RFC 3168 required an ECN mark to be equivalent to a drop, both when > applied in the network and when responded to by a transport. Unlike > Classic ECN marking, the network applies L4S marking more immediately > and more aggressively than drop, and the transport response to each > > [BA] Not sure what "aggressively" means here. In general, marking > traffic seems like a less aggressive action than dropping it. Do > you mean "more frequently"? > > > [BB] OK; 'frequently' it is. > > (FWIW, I recall that the transport response used to be described as more > aggressive (because it reduces less in response to each mark), and the idea > was that using aggressive for both would segue nicely into the next > sentence about the two counterbalancing. Someone asked for that to be > changed, and now the last vestiges of that failed literary device are cast > onto the cutting room floor. The moral of this tale: never try to write a > literary masterpiece by committee ;) > > Also, it's a bit of a run-on sentence, so I'd break it up: > > "than drop. The transport response to each" > > mark is reduced and smoothed relative to that for drop. The two > changes counterbalance each other so that the throughput of an L4S > flow will be roughly the same as a comparable non-L4S flow under the > same conditions. > > > [BB] Not sure about this - by the next sentence (about the two changes), > the reader has lost track of them. How about using numbering to structure > the long sentence: > > Unlike > Classic ECN marking: i) the network applies L4S marking more immediately > and more aggressively than drop; and ii) the transport response to each > mark is reduced and smoothed relative to that for drop. The two > changes counterbalance each other... > > OK? > > Nonetheless, the much more frequent ECN control > signals and the finer responses to these signals result in very low > queuing delay without compromising link utilization, and this low > delay can be maintained during high load. For instance, queuing > delay under heavy and highly varying load with the example DCTCP/ > DualQ solution cited below on a DSL or Ethernet link is sub- > millisecond on average and roughly 1 to 2 milliseconds at the 99th > percentile without losing link utilization [DualPI2Linux], [DCttH19]. > > [BA] I'd delete "cited below" since you provide the citation at > the end of the sentence. > > > [BB] 'Cited below' referred to the DCTCP and DualQ citations in the > subsequent para, because this is the first time either term has been > mentioned. > '*Described* below' > was what was really meant. I think that makes it clear enough (?). > > Note that the inherent queuing delay while waiting to acquire a > discontinuous medium such as WiFi has to be minimized in its own > right, so it would be additional to the above (see section 6.3 of the > L4S architecture [I-D.ietf-tsvwg-l4s-arch]). > > [BA] Not sure what "discontinuous medium" means. Do you mean > wireless? Also "WiFi" is a colloquialism; the actual standard > is IEEE 802.11 (WiFi Alliance is an industry organization). > Might reword this as follows: > > "Note that the changes proposed here do not lessen delays from > accessing the medium (such as is experienced in [IEEE-802.11]). > For discussion, see Section 6.3 of the L4S architecture > [I-D.ietf-tsvwg-l4s-arch]." > > > [BB] We've used 'shared' instead. Other examples of shared media are LTE, > 5G, DOCSIS (cable), DVB (satellite), PON (passive optical network). So I've > just said 'wireless' rather than give a gratuitous citation of 802.11. > > Note that the inherent queuing delay while waiting to acquire a > discontinuous > *shared* medium such as WiFi *wireless* has to be minimized in its own > right, so it would be additional *added* to the above *above. It is > a different issue that needs to be addressed, but separately* (see > section 6.3 of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]). > > > Then, because wireless is less specific, I've taken out 'inherent' because > strictly medium acquisition delay is not inherent to a medium - it depends > on the multiplexing scheme. For instance radio networks can use CDM (code > division multiplexing), and they did in 3G. > 'Inherent' was trying to get over the sense that this delay is not > amenable to reduction by congestion control. Rather than try to cram all > those concepts into one sentence, I've split it. > > OK? > > L4S is not only for elastic (TCP-like) traffic - there are scalable > congestion controls for real-time media, such as the L4S variant of > the SCReAM [RFC8298] real-time media congestion avoidance technique > (RMCAT). The factor that distinguishes L4S from Classic traffic is > > [BA] Is there a document that defines the L4S variant of SCReAM? > > > [BB] I've retagged Ingemar's readme as [SCReAM-L4S], and included it here > to match the other two occurrences of SCReAM: > > such as the L4S variant > *[SCReAM-L4S]* of the SCReAM [RFC8298] real-time media congestion > avoidance technique (RMCAT). > > > It sounds like Ingemar plans to update RFC8298 with a bis, so I guess > eventually [RFC8298] should automatically become a reference to its own > update. > > its behaviour in response to congestion. The transport wire > protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and > therefore not suitable for distinguishing L4S from Classic packets). > > The L4S identifier defined in this document is the key piece that > distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic. It > gives an incremental migration path so that suitably modified network > bottlenecks can distinguish and isolate existing Classic traffic from > L4S traffic to prevent the former from degrading the very low delay > and loss of the new scalable transports, without harming Classic > performance at these bottlenecks. Initial implementation of the > separate parts of the system has been motivated by the performance > benefits. > > [BA] I think you are making an "incremental benefit" argument here, > but it might be made more explicit: > > " The L4S identifier defined in this document distinguishes L4S from > 'Classic' (e.g. Reno-friendly) traffic. This allows suitably > modified network bottlenecks to distinguish and isolate existing > Classic traffic from L4S traffic, preventing the former from > degrading the very low delay and loss of the new scalable > transports, without harming Classic performance. As a result, > deployment of L4S in network bottlenecks provides incremental > benefits to endpoints whose transports support L4S." > > > [BB] We don't really want to lose the point about the identifier being > key. So I've kept that. And for the middle sentence, I've used the simpler > construction developed above (for the similar wording in the abstract). > > Regarding the last sentence, no, it meant more than that. It meant that, > even though implementer's customers get no benefit until both parts are > deployed, for some implementers the 'size of the potential prize' has > already been great enough to warrant investment in implementing their part, > without any guarantee that other parts will be implemented. However, we > need to be careful not to stray into conjecture and predictions, > particularly not commercial ones, which is why this sentence was written in > the past tense. Pulling this all together, how about: > > The L4S identifier defined in this document is the key piece that > distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic. It > gives an incremental migration path so that suitably modified *Then,* > network bottlenecks can *be incrementally modified to* distinguish and > isolate existing Classic traffic from L4S traffic*,* to prevent the > former from degrading the very low *queuing* delay and loss of the new > scalable transports, without harming Classic performance at these > bottlenecks. *Although both sender and network deployment are > required before any benefit, i*nitial implementations of the separate > parts of the system *have* been motivated by the *potential* performance > benefits. > > I considered adding "have *already* been motivated..." or "*at the time > of writing,* initial implementations..." but decided against both - they > sounded a bit hyped up. > What do you think? > > > Section 1.1 1.1. Latency, Loss and Scaling Problems > > Latency is becoming the critical performance factor for many (most?) > applications on the public Internet, e.g. interactive Web, Web > services, voice, conversational video, interactive video, interactive > remote presence, instant messaging, online gaming, remote desktop, > cloud-based applications, and video-assisted remote control of > machinery and industrial processes. In the 'developed' world, > further increases in access network bit-rate offer diminishing > returns, whereas latency is still a multi-faceted problem. In the > last decade or so, much has been done to reduce propagation time by > placing caches or servers closer to users. However, queuing remains > a major intermittent component of latency. > > [BA] Since this paragraph provides context for the work, you might > consider placing it earlier (in Section 1 as well as potentially in > the Abstract). > > > [BB] The L4S architecture Intro already starts like you suggest. > See > https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-arch-19#section-1 > > The present doc starts out more as a technical spec might, with a 4-para > intro focusing on what it says technically. Then it has a fairly long > subsection to summarize the problem for those reading it stand-alone. That > is intentional (so readers who have already read the architecture can > easily jump). > > Summary: We propose to leave the opening of the intro unchanged. > > Might modify this as follows: > > " > Latency is the critical performance factor for many Internet > applications, including web services, voice, realtime video, > remote presence, instant messaging, online gaming, remote > desktop, cloud services, and remote control of machinery and > industrial processes. In these applications, increases in access > network bitrate may offer diminishing returns. As a result, > much has been done to reduce delays by placing caches or > servers closer to users. However, queuing remains a major > contributor to latency." > > We've picked up most, but not all, of your suggestions: > > Latency is becoming the critical performance factor for many (most?) > applications on the public Internet, > *Internet applications,* e.g. interactive Web, Web *web, web* services, voice, > conversational video, interactive video, interactive remote presence, > instant messaging, online gaming, remote desktop, cloud-based applications, > *applications & services,* and video-assisted remote control of machinery and > industrial processes. In *many parts of* the 'developed' world, further increases > in access network bit-rate *bit rate* offer diminishing > returns *[Dukkipati06],* whereas latency is still a multi-faceted problem. In the > last decade or so, *As a result,* much > has been done to reduce propagation time by placing caches or servers > closer to users. However, queuing remains a major intermittent *major, albeit > intermittent,* component of latency. > > > We've added [Dukkipati06], because we were asked to justify the similar > 'diminishing returns' claim in the L4S architecture, and Dukkipati06 > provides a plot supporting that in its intro: > > *[Dukkipati06] > Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is > the Right Metric for Congestion Control", ACM CCR > 36(1):59--62, January 2006, > <https://dl.acm.org/doi/10.1145/1111322.1111336> <https://dl.acm.org/doi/10.1145/1111322.1111336>.* > > > The distinctions between different applications of the same technology > were deliberately intended to distinguish different degrees of latency > sensitivity, so we left some of them in. > OK? > > The Diffserv architecture provides Expedited Forwarding [RFC3246], so > that low latency traffic can jump the queue of other traffic. If > growth in high-throughput latency-sensitive applications continues, > periods with solely latency-sensitive traffic will become > increasingly common on links where traffic aggregation is low. For > instance, on the access links dedicated to individual sites (homes, > small enterprises or mobile devices). These links also tend to > become the path bottleneck under load. During these periods, if all > the traffic were marked for the same treatment, at these bottlenecks > Diffserv would make no difference. Instead, it becomes imperative to > remove the underlying causes of any unnecessary delay. > > [BA] This paragraph is hard to follow. You might consider rewriting it as > follows: > > "The Diffserv architecture provides Expedited Forwarding [RFC3246], to > enable low latency traffic to jump the queue of other traffic. However, > the latency-sensitive applications are growing in number along > with the fraction of latency-sensitive traffic. On bottleneck links where > traffic aggregation is low (such as links to homes, small enterprises or > mobile devices), if all traffic is marked for the same treatment, Diffserv > will not make a difference. Instead, it is necessary to remove unnecessary > delay." > > > [BB] Your proposed replacement has the following problems: > * It relies on prediction (the previous text avoided prediction, instead > saying "if growth ... continues"); > * The proposed replacement loses the critical sense of "periods with > solely latency sensitive traffic" (not all the time) > * it also loses the critical idea that the same links that are low stat > mux tend to also be those where the bottleneck is. > How about: > > The Diffserv architecture provides Expedited Forwarding [RFC3246], so > that low latency traffic can jump the queue of other traffic. If > growth in high-throughput latency-sensitive applications continues, periods with > solely latency-sensitive traffic will become increasingly common on > links where traffic aggregation is low. For > instance, on the access links dedicated to individual sites (homes, > small enterprises or mobile devices). These links also tend to > become the path bottleneck under load. During these periods, if all *During these periods, if all* > the traffic were marked for the same treatment, at these bottlenecks Diffserv would make > no difference. Instead, *The links with low aggregation also tend to become > the path bottleneck under load, for instance, the access links > dedicated to individual sites (homes, small enterprises or mobile > devices). So, instead of differentiation,* it becomes imperative to > remove the underlying causes of any unnecessary delay. > > > I tried to guess what you found hard to follow, but still to keep all the > concepts. The main changes were: > * to switch the sentence order so "periods with solely" and "these > periods" were not a few sentences apart. > * to make it clear what 'instead' meant. > Better? > > > long enough for the queue to fill the buffer, making every packet in > other flows sharing the buffer sit through the queue. > > [BA] "sit through" -> "share" > > > [BB] Nah, that's tautology "other flows sharing the buffer share the > queue". > And it loses the sense of waiting. If "sit through" isn't understandable, > how about > > "...causing every packet in other flows sharing the buffer to have to > work its way through the queue." > ? > > Active queue management (AQM) was originally developed to solve this > problem (and others). Unlike Diffserv, which gives low latency to > some traffic at the expense of others, AQM controls latency for _all_ > traffic in a class. In general, AQM methods introduce an increasing > level of discard from the buffer the longer the queue persists above > a shallow threshold. This gives sufficient signals to capacity- > seeking (aka. greedy) flows to keep the buffer empty for its intended > purpose: absorbing bursts. However, RED [RFC2309] and other > algorithms from the 1990s were sensitive to their configuration and > hard to set correctly. So, this form of AQM was not widely deployed. > > More recent state-of-the-art AQM methods, e.g. FQ-CoDel [RFC8290], > PIE [RFC8033], Adaptive RED [ARED01], are easier to configure, > because they define the queuing threshold in time not bytes, so it is > invariant for different link rates. However, no matter how good the > AQM, the sawtoothing sending window of a Classic congestion control > will either cause queuing delay to vary or cause the link to be > underutilized. Even with a perfectly tuned AQM, the additional > queuing delay will be of the same order as the underlying speed-of- > light delay across the network, thereby roughly doubling the total > round-trip time. > > [BA] Would suggest rewriting as follows: > > " More recent state-of-the-art AQM methods such as FQ-CoDel [RFC8290], > PIE [RFC8033] and Adaptive RED [ARED01], are easier to configure, > because they define the queuing threshold in time not bytes, providing > link rate invariance. However, AQM does not change the "sawtooth" > sending behavior of Classic congestion control algorithms, which > alternates between varying queuing delay and link underutilization. > Even with a perfectly tuned AQM, the additional queuing delay will > be of the same order as the underlying speed-of-light delay across > the network, thereby roughly doubling the total round-trip time." > > > [BB] We've taken most of these suggestions, but link rate invariance is > rather a mouthful. > Also more queue delay or more under-utilization wasn't meant to imply > alternating between the two. > So how about: > > More recent state-of-the-art AQM methods, e.g. *such as* FQ-CoDel [RFC8290], > PIE [RFC8033] *or* Adaptive RED [ARED01], are easier to configure, > because they define the queuing threshold in time not bytes, so it > *configuration* is invariant for different *whatever the* link rates. *rate.* However, no matter how good the > AQM, the > sawtoothing sending window of a Classic congestion control *creates a dilemma > for the operator: i)* either *configure a shallow AQM operating point, > so the tips of the sawteeth* cause *minimal queue* delay *but the troughs > underutilize the link,* or *ii) configure the operating point deeper > into the buffer, so the troughs utilize* the link *better but then the > tips cause more delay variation.* Even... > > OK? > > > If a sender's own behaviour is introducing queuing delay variation, > no AQM in the network can 'un-vary' the delay without significantly > compromising link utilization. Even flow-queuing (e.g. [RFC8290]), > which isolates one flow from another, cannot isolate a flow from the > delay variations it inflicts on itself. Therefore those applications > that need to seek out high bandwidth but also need low latency will > have to migrate to scalable congestion control. > > [BA] I'd suggest you delete the last sentence, since the point is > elaborated on in more detail in the next paragraph. > > > [BB] Actually, this point is not made in the next para (but you might have > thought it was because it's not clear, so below I've tried to fix it). > Indeed, I've realized we need to /add/ to the last sentence, because we > haven't yet said what a scalable control is... > > ...migrate to scalable congestion control*, which uses much smaller > sawtooth variations*. > > > Altering host behaviour is not enough on its own though. Even if > hosts adopt low latency behaviour (scalable congestion controls), > they need to be isolated from the behaviour of existing Classic > congestion controls that induce large queue variations. L4S enables > that migration by providing latency isolation in the network and > > [BA] "enables that migration" -> "motivates incremental deployment" > > distinguishing the two types of packets that need to be isolated: L4S > and Classic. L4S isolation can be achieved with a queue per flow > (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is > sufficient, and actually gives better tail latency. Both approaches > are addressed in this document. > > > [BB] The intended meaning here is 'enables' (technical feasibility), not > motivates (human inclination). > But whatever, in the rewording below, I don't think either is needed. I'm > also assuming that middle sentence didn't make sense for you, and I think I > see why. So how about: > > Altering host behaviour is not enough on its own though. Even if > hosts adopt low latency behaviour (scalable congestion controls), they need to be > isolated from the behaviour of *large queue variations induced by* existing Classic > congestion controls that induce large queue variations. L4S enables that migration > by providing *L4S AQMs provide that* latency isolation in the network and > distinguishing *the L4S identifier enables the AQMs to distinguish* the > two types of packets that need to be isolated: L4S and Classic. > > > How's that? > > The DualQ solution was developed to make very low latency available > without requiring per-flow queues at every bottleneck. This was > > [BA] "This was" -> "This was needed" > > > [BB] Not quite that strong. More like: > "This was useful" > > Latency is not the only concern addressed by L4S: It was known when > > [BA] ":" -> "." > > > [BB] OK. > > explanation is summarised without the maths in Section 4 of the L4S > > [BA] "summarised without the maths" -> "summarized without the mathematics" > > > [BB] OK - that nicely side-steps stumbles from either side of the Atlantic. > > 1.2. Terminology > > [BA] Since Section 1.1 refers to some of the Terminology defined in > this section, I'd consider placing this section before that one. > > > [BB] See earlier for push-back on this. > > Reno-friendly: The subset of Classic traffic that is friendly to the > standard Reno congestion control defined for TCP in [RFC5681]. > The TFRC spec. [RFC5348] indirectly implies that 'friendly' is > > [BA] "spec." -> "specification" > > > [BB] I checked this after a previous review comment, and 'spec' is now > considered to be a word in its own right. I should have removed the > full-stop though, which I did for all other occurrences. > However, the RFC Editor might have a style preference on this point, in > which case I will acquiesce. > > > defined as "generally within a factor of two of the sending rate > of a TCP flow under the same conditions". Reno-friendly is used > here in place of 'TCP-friendly', given the latter has become > imprecise, because the TCP protocol is now used with so many > different congestion control behaviours, and Reno is used in non- > > [BA] "Reno is used" -> "Reno can be used" > > > [BB] OK > > 4. Transport Layer Behaviour (the 'Prague Requirements') > > [BA] This section is empty and there are no previous references to Prague. So I > think you need to say a few words here to introduce the section. > > > [BB] OK. How about: > > > > > * This section defines L4S behaviour at the transport layer, also known > as the Prague L4S Requirements (see Appendix A for the origin of the > name). * > > Again, thank you very much for all the time and effort you've put into > this review. > > Regards > > > > Bob > > > > -- > ________________________________________________________________ > Bob Briscoe http://bobbriscoe.net/ > >
- [art] Artart last call review of draft-ietf-tsvwg… Bernard Aboba via Datatracker
- Re: [art] Artart last call review of draft-ietf-t… Bob Briscoe
- Re: [art] Artart last call review of draft-ietf-t… Bernard Aboba
- Re: [art] Artart last call review of draft-ietf-t… Bob Briscoe
- [art] New rev -28 (was: Artart last call review o… Bob Briscoe