Re: [tsvwg] Artart last call review of draft-ietf-tsvwg-ecn-l4s-id-27

Bernard Aboba <bernard.aboba@gmail.com> Fri, 05 August 2022 01:00 UTC

Return-Path: <bernard.aboba@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 87871C14F72B; Thu, 4 Aug 2022 18:00:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5ifJiiDfvu7n; Thu, 4 Aug 2022 17:59:58 -0700 (PDT)
Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8D025C14F725; Thu, 4 Aug 2022 17:59:57 -0700 (PDT)
Received: by mail-ed1-x535.google.com with SMTP id t5so1581261edc.11; Thu, 04 Aug 2022 17:59:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=VWsqrJ29mK/6hTdl19zU2DvTMeYZ5hWQSmFwkwNGUnc=; b=VJqRF2e7RP308kw1Uq016oAHnql3DWlZkpig9XBkjDgBG/3iKX3eAxBCLIviufv3St es8cd1LGYeJbVgtixSxznjLQfM724bM8fr4/PIZC8ZKD+CpEker2b0HExOcZdsFJ0vYb Agn2ZF/wr+soWWc8OsY2a2tFOUtg77/zzmBoD0/AVBIUNH2APYQrcEBscRD7sVPxpgBb 4emyKith/pfKKZmrx0bA4eRAvtP+axUJ4DpnAzLnamaFJA+Cr72ZPuRgl4Oz5nLWfK/V /bojKg7DKst0dEwTZ3myYEPPPp/rApM2Mt3jGYmJGZulcbxAAuU7hYlbDCIogQ6tOlpe AA2g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=VWsqrJ29mK/6hTdl19zU2DvTMeYZ5hWQSmFwkwNGUnc=; b=pWATyV8xXc5kbEf989Y3CGE8Rk2FTusOakZIgNuo10kkIj1US4waT3LDKWFPNfYN3E xoZZRI8OWXRFyISEZOxqNH8zHdAQh63tDAQz+ybY8EeNjxm5td9DkERUVQQDHohMjG/R j0ZUY0vdfvCDAxbekQuJEnBE6EC2rv41uR9WQXbb0SE1xiUn98EEDbnUoC/MSQ3W5tcO Ax9lTfDKNF2qUpK1TluBSjXAGoXirSoSCoDk9YtHxD44TQNBjfJJuAo9mZuiGEnNiBzE GdNFlMACFJnzFwoW5bxs70fj2k2yLnhZLgGTgcaBUQPKiH6PKb9NmpyEK/BSIRZisGf3 9L1Q==
X-Gm-Message-State: ACgBeo2RBYS1pt/t1FamUWj6V2IE2CrBMg/13GfO0KH2apZCjm4OCzHc KgSh8XXAkViDrXl0K22Izhgna6KhvfxT/6KKg88=
X-Google-Smtp-Source: AA6agR65Kvy6fHge8p0IFcPAR10MMPWO4QHmCpSIPB0kBWnKm8Ybr70rzBsMOoNKle+NL2xRn3VOzu10B6mI42a/GnQ=
X-Received: by 2002:a05:6402:84f:b0:437:6293:d264 with SMTP id b15-20020a056402084f00b004376293d264mr4321791edz.317.1659661195597; Thu, 04 Aug 2022 17:59:55 -0700 (PDT)
MIME-Version: 1.0
References: <165913867042.43653.10267120686300599117@ietfa.amsl.com> <e25423b1-61b5-ea3f-06bb-0c10992a55fb@bobbriscoe.net>
In-Reply-To: <e25423b1-61b5-ea3f-06bb-0c10992a55fb@bobbriscoe.net>
From: Bernard Aboba <bernard.aboba@gmail.com>
Date: Thu, 04 Aug 2022 17:59:44 -0700
Message-ID: <CAOW+2duaq8M1gwZ5snmhhibnNMEywn-5B=C2awAiE6akcxYyBw@mail.gmail.com>
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: draft-ietf-tsvwg-ecn-l4s-id.all@ietf.org, last-call@ietf.org, tsvwg IETF list <tsvwg@ietf.org>, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>, Applications and Real-Time Area Discussion <art@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000002dc8605e573fca4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/ml_qL89tJ4J7R-S7OYlNLOqw74I>
Subject: Re: [tsvwg] Artart last call review of draft-ietf-tsvwg-ecn-l4s-id-27
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Aug 2022 01:00:05 -0000

Bob --

Thanks for the thoughtful and thorough response to my review.  I have gone
through the changes you suggest, and they look good to me.

On Thu, Aug 4, 2022 at 4:57 PM Bob Briscoe <ietf@bobbriscoe.net> wrote:

> Bernard,
>
> Thank you for taking the time to produce this extremely thorough review.
> Pls see [BB] inline;
> You will need an HTML email reader for the diffs in this email.
> Alternatively, I've temporarily uploaded a side-by-side diff here:
>
> https://bobbriscoe.net/tmp/draft-ietf-tsvwg-ecn-l4s-id-28a-DIFF-27.html
>
>
> On 30/07/2022 00:51, Bernard Aboba via Datatracker wrote:
>
> Reviewer: Bernard Aboba
> Review result: On the Right Track
>
> Here are my review comments.  I believe this is quite an important document, so
> that making the reasoning as clear as possible is important.  Unfortunately,
> the writing and overall organization makes the document hard to follow. If the
> authors are open to it, I'd be willing to invest more time to help get it into
> shape.
>
>
> [BB] Thank you. You have already obviously sunk considerable time into it.
> Often I've found that your proposed alternative text didn't quite mean what
> we intended. But I've taken this as a sign that we hadn't explained it well
> and tried to guess what made you stumble.
>
> This draft is in the long tail of many statistics: number of years since
> first draft, number of revisions, number of pages, etc. etc.
> So I hope you will understand that this document has been knocked into all
> sorts of different shapes already, during a huge amount of WG review and
> consensus building, which I have tried not to upset, while also trying to
> understand why you felt it needed further changes.
>
> Overall Comments
>
> Abstract
>
> Since this is an Experimental document, I was expecting the Abstract and
> perhaps the Introduction to refer briefly to the considerations covered in
> Section 7, (such as potential experiments and open issues).
>
>
> [BB] Good point - I'm surprised no-one has brought this up before -
> thanks. I'll add the following:
> Abstract:
>
>                    ...to prevent it degrading the low queuing delay and
>    low loss of L4S traffic.  This *experimental track* specification defines the rules that
>    L4S transports and network elements need to follow with the intention
>    that L4S flows neither harm each other's performance nor that of
>    Classic traffic.  *It also suggests open questions to be investigated **   during experimentation.*  Examples of new ...
>
>
> Intro:
> There wasn't really a relevant point to mention the Experiments section
> (§7) until the document roadmap (which you ask for later).
> So we added a brief summary of the "L4S Experiments" there (see later for
> the actual text). The only change to the Intro was the first line:
>
>     This *experimental track* specification...
>
> Organization and inter-relation between Sections
>
> The document has organizational issues which make it more difficult to read.
>
> I think that Section 1 should provide an overview of the specification, helping
> the reader navigate it.
>
>
> [BB]  Section 3 already  provides the basis of a roadmap to both this and
> other documents. It points to §4 (Transports) & §5 (Network nodes).
> It ought to have also referred to §6 (Tunnels and Encapsulations), which
> was added to the draft fairly recently (but without updating this roadmap).
> We can and should add that.
>
> We could even move §3 to be the last subsection of §1 (i.e. §1.4). Then it
> could start the roadmap with §2, which gives the requirements for L4S
> packet identification.
> However, a number of other documents already refer to the Prague L4S
> Requirements in §4, particularly §4.3. I mean not just I-Ds (which can
> still be changed), but also papers that have already been published. So a
> pragmatic compromise would be to just switch round sections 2
> (requirements) & 3 (roadmap).
>
> Then we could retitle §3 to "L4S Packet Identification: Document Roadmap"
> and add brief mentions of the tail sections (§7 L4S Experiments, and the
> usual IANA and Security Considerations).
> The result is below, with manually added diff colouring (given we'd moved
> the whole section as well, so it's not a totally precise diff).
>
> 2.  L4S Packet Identification*: Document Roadmap*
>
>    The L4S treatment is an experimental track alternative packet marking
>    treatment to the Classic ECN treatment in [RFC3168], which has been
>    updated by [RFC8311] to allow experiments such as the one defined in
>    the present specification.  [RFC4774] discusses some of the issues
>    and evaluation criteria when defining alternative ECN semantics*,
>    which are further discussed in Section 4.3.1*.
> *   The L4S architecture [I-D.ietf-tsvwg-l4s-arch] describes the three
>    main components of L4S: the sending host behaviour, the marking
>    behaviour in the network and the L4S ECN protocol that identifies L4S
>    packets as they flow between the two.
> **   The next section of the present document (Section 3) records the
>    requirements that informed the choice of L4S identifier.  Then
>    subsequent sections specify the* L4S ECN *protocol, which i) *identifies
>    packets that have been sent from hosts that are expected to comply
>    with a broad type of sending behaviour; and ii) identifies the
>    marking treatment that network nodes are expected to apply to L4S
>    packets.
>
>    For a packet to receive L4S treatment as it is forwarded, the sender
>    sets the ECN field in the IP header to the ECT(1) codepoint.  See
>    Section 4 for full transport layer behaviour requirements, including
>    feedback and congestion response.
>
>    A network node that implements the L4S service always classifies
>    arriving ECT(1) packets for L4S treatment and by default classifies
>    CE packets for L4S treatment unless the heuristics described in
>    Section 5.3 are employed.  See Section 5 for full network element
>    behaviour requirements, including classification, ECN-marking and
>    interaction of the L4S identifier with other identifiers and per-hop
>    behaviours.
> *   L4S ECN works with ECN tunnelling and encapsulation behaviour as is,
>    except there is one known case where careful attention to
>    configuration is required, which is detailed in Section 6.
>
> **   L4S ECN is currently on the experimental track.  So Section 7
>    collects together the general questions and issues that remain open
>    for investigation during L4S experimentation.  Open issues or
>    questions specific to particular components are called out in the
>    specifications of each component part, such as the DualQ
>    [I-D.ietf-tsvwg-aqm-dualq-coupled].
>
>    The IANA assignment of the L4S identifier is specified in
>    Section 8.  And Section 9 covers security considerations specific to
>    the L4S identifier.  System security aspects, such as policing and
>    privacy, are covered in the L4S architecture
>    [I-D.ietf-tsvwg-l4s-arch].*
>
>
>
> Section 1.1 refers to definitions in Section 1.2 so I'd suggest that that
> Section 1.2 might be come first.
>
>
> [BB] The reason for the Problem Statement being the first subsection was
> because that's what motivates people to read on.
>
> Your suggestion has been made by others in the past, and the solution was
> to informally explain new terms in the sections before the formal
> terminology section, as they arose.
> The formal terminology section can be considered as the end of the
> Introductory material and the start of the formal body of the spec.
>
> If there are phrases that are not clearly explained before the terminology
> section, pls do point them out.
> We can reconsider moving the terminology section to 1.1 if there are a
> lot.
> But we'd rather the reader could continue straight into the summary of the
> problem and that it is understandable stand-alone - without relying on
> formal definitions elsewhere.
>
> Section 1.3 provides basic information on Scope and the relationship of this
> document to other documents.  I was therefore expecting Section 7 to include
> questions on some of the related documents (e.g. how L4S might be tested along
> with RTP).
>
>
> [BB] That isn't the role of this document, which would be too abstract (or
> too long) if it had to cover how to test each different type of congestion
> control and each type of AQM.
> Quoting from §7:
>
>    The specification of each scalable congestion control will need to
>    include protocol-specific requirements for configuration and
>    monitoring performance during experiments.  Appendix A <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-27#appendix-A> of the
>    guidelines in [RFC5706 <https://datatracker.ietf.org/doc/html/rfc5706>] provides a helpful checklist.
>
>
> Over the last 3 months, everyone involved in interop testing has been
> defining all the test plans, which had their first test-drive last week
> indeed, the success of the planning and organization of the tests surprised
> us all - kudos to Greg White who was largely responsible for coordinating
> it.
> We may end up writing that all up as a separate draft. If many tests were
> documented centrally like this, each CC or AQM might only need to identify
> any special-case tests specific to itself.
> That might even cover testing with live traffic over the Internet as well.
> But let's walk before we run.
>
>
> I wonder whether much of Section 2 could be combined with Appendix B, with the
> remainder moved into the Introduction, which might also refer to Appendix B.
>
>
> [BB] What is the problem that you are trying to solve by breaking up this
> section?
>
> If we split up this section, someone else will want parts moved back, or
> something else moved. Unless there's a major problem with this section,
> we'd rather it stayed in one piece. Its main purpose is to record the
> requirements and to say (paraphrasing), "The outcome is a compromise
> between requirements 'cos header space is limited. Other solutions were
> considered, but this one was the least worst."
>
> Summary: no action here yet, pending motivating reasoning from your side.
>
> Section 4.2
>
>    RTP over UDP:  A prerequisite for scalable congestion control is for
>       both (all) ends of one media-level hop to signal ECN
>       support [RFC6679] and use the new generic RTCP feedback format of
>       [RFC8888].  The presence of ECT(1) implies that both (all) ends of
>       that media-level hop support ECN.  However, the converse does not
>       apply.  So each end of a media-level hop can independently choose
>       not to use a scalable congestion control, even if both ends
>       support ECN.
>
> [BA] The document earlier refers to an L4S modified version of SCreAM, but does
> not provide a reference.  Since RFC 8888 is not deployed today, this paragraph
> (and Section 7) leaves me somewhat unclear on the plan to evaluate L4S impact
> on RTP. Or is the focus on experimentation with RTP over QUIC (e.g.
> draft-ietf-avtcore-rtp-over-quic)?
>
>
> [BB] Ingemar has given this reply:
> [IJ] RFC8298 (SCReAM) in its current version does not describe support for
> L4S. The open source running code on github does however support L4S. An
> update of RFC8298 has lagged behind but I hope to start with an RFC8298-bis
> after the vacation.
> RFC8888 is implemented in the public available code for SCReAM (
> https://github.com/EricssonResearch/scream). This code has been
> extensively used in demos of 5G Radio Access Networks with L4S capability.
> The example demos have been cloud gaming and video streaming for remote
> controlled cars.
> The code includes gstreamer plugins as well as multi-camera code tailored
> for NVidia Jetson Nano/Xavier NX (that can be easily modified for other
> platforms).
>
> [BB] As an interim reference, Ingemar's README is already cited as
> [SCReAM-L4S]. it is a brief but decent document about the L4S variant of
> SCReAM, which also gives further references (and the open source code is
> its own spec).
>
> Summary: The RFC 8888 part of this question seems to be about plans for
> how the software for another RFC is expected to be installed or bundled.
> Is this a question that you want this draft to answer?
>
>    For instance, for DCTCP [RFC8257], TCP Prague
>    [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux] and the
>    L4S variant of SCReAM [RFC8298], the average recovery time is always
>    half a round trip (or half a reference round trip), whatever the flow
>    rate.
>
> [BA] I'm not sure that an L4S variant of SCReAM could really be considered
> "scalable" where simulcast or scalable video coding was being sent. In these
> scenarios, adding a layer causes a multiplicative increase in bandwidth, so
> that "probing" (e.g. stuffing the channel with RTX probes or FEC) is often a
> necessary precursor to make it possible to determine whether adding layers is
> actually feasible.
>
>
> [BB] Ingemar has given this reply:
> [IJ] The experiments run so far with SCReAM have been with the NVENC
> encoder, which supports rate changes on a frame by frame basis, and Jetson
> Nano/Xavier NX/Xavier AGX that is a bit more slow in its rate control loop.
> So the actual probing is done by adjusting the target bitrate of the video
> encoder.
>
> [BB] Since last week (in the first L4S interop), we now have 2 other
> implementations of real-time video with L4S support directly over UDP (from
> NVIDIA and Nokia); in addition to the original 2015 demo (also from Nokia).
> You'd have to ask Ermin Sakic <esakic@nvidia.com> <esakic@nvidia.com>
> about the NVIDIA coding, and similarly Koen De Schepper
> <koen.de_schepper@nokia.com> <koen.de_schepper@nokia.com> about the Nokia
> ones. I do know that both Nokia ones change rate packet-by-packet (and if
> channel conditions are poor, the new one can even reduce down to 500kb/s
> while still preserving the same low latency).
>
> The message here is that, for low latency video, you can't just use any
> old encoding that was designed without latency in mind.
> Again, is this a question that you want this draft to answer? It seems
> like something that would be discussed in the spec of each r-t CC technique.
>
>    As with all transport behaviours, a detailed specification (probably
>    an experimental RFC) is expected for each congestion control,
>    following the guidelines for specifying new congestion control
>    algorithms in [RFC5033].  In addition it is expected to document
>    these L4S-specific matters, specifically the timescale over which the
>    proportionality is averaged, and control of burstiness.  The recovery
>    time requirement above is worded as a 'SHOULD' rather than a 'MUST'
>    to allow reasonable flexibility for such implementations.
>
> [BA] Is the L4S variant of SCReaM one of the detailed specifications that is
> going to be needed? From the text I wasn't sure whether this was documented
> work-in-progress or a future work item.
>
>
> [BB] We cannot force implementers to write open specifications of their
> algorithms. Implementers might have secrecy constraints, or just not choose
> to invest the time in spec writing. So there is no hit-list of specs that
> 'MUST' be written, except we consider it proper to document the reference
> implementation of the Prague CC.
> Nonetheless, others also consider it proper to document their algorithm
> (e.g. BBRv2), and in the case of SCReAM, Ingemar has promised he will (as
> quoted above).
>
> We don't (yet?) have a description of the latest two implementations that
> the draft can refer to (they only announced these on the first day of the
> interop last week).
> We try to keep a living web page up to date that points to current
> implementations ( https://l4s.net/#code ). However, I don't think the RFC
> Editor would accept this as an archival reference.
>
> Section 4.3.1
>
>       To summarize, the coexistence problem is confined to cases of
>       imperfect flow isolation in an FQ, or in potential cases where a
>       Classic ECN AQM has been deployed in a shared queue (see the L4S
>       operational guidance [I-D.ietf-tsvwg-l4sops] for further details
>       including recent surveys attempting to quantify prevalence).
>       Further, if one of these cases does occur, the coexistence problem
>       does not arise unless sources of Classic and L4S flows are
>       simultaneously sharing the same bottleneck queue (e.g. different
>       applications in the same household) and flows of each type have to
>       be large enough to coincide for long enough for any throughput
>       imbalance to have developed.
>
> [BA] This seems to me to be one of the key questions that could limit the
> "incremental deployment benefit".  A reference to the discussion in Section 7
> might be appropriate here.
>
>
> [BB] OK. At the end of the above para I've added:
>
>                                     Therefore, how often the coexistence
>        problem arises in practice is listed in Section 7 as an open
>        question that L4S experiments will need to answer.
>
> 5.4.1.1.1.  'Safe' Unresponsive Traffic
>
>    The above section requires unresponsive traffic to be 'safe' to mix
>    with L4S traffic.  Ideally this means that the sender never sends any
>    sequence of packets at a rate that exceeds the available capacity of
>    the bottleneck link.  However, typically an unresponsive transport
>    does not even know the bottleneck capacity of the path, let alone its
>    available capacity.  Nonetheless, an application can be considered
>    safe enough if it paces packets out (not necessarily completely
>    regularly) such that its maximum instantaneous rate from packet to
>    packet stays well below a typical broadband access rate.
>
> [BA] The problem with video traffic is that the encoder typically
> targets an "average bitrate" resulting in a keyframe with a
> bitrate that is above the bottleneck bandwidth and delta frames
> that are below it.  Since the "average rate" may not be
> resettable before sending another keyframe, video has limited
> ability to respond to congestion other than perhaps by dropping
> simulcast and SVC layers. Does this mean that a video is
> "Unsafe Unresponsive Traffic"?
>
>
> [BB] This section on 'Safe' Unresponsive traffic is about traffic that is
> so low rate that it doesn't need to use ECN to respond to congestion at all
> (e.g. DNS, NTP). Video definitely does not fall into that category.
>
> I think your question is really asking whether video even /with/ ECN
> support can be considered responsive enough to maintain low latency. For
> this you ought to try to see the demonstration that Nokia did last week (if
> a recording is put online) or the Ericsson demonstration which is already
> online [EDT-5GLL]. Both over emulated 5G radio access networks with
> variability of channel conditions, and both showed very fast interaction
> within the video with no perceivable lag to the human eye. With the Nokia
> one last week, using finger gestures sent over the radio network, you could
> control the viewport into a video from a 360⁰ camera, which was calculated
> and generated at the remote end. No matter how fast you shook your finger
> around, the viewport stayed locked onto it.
>
> Regarding keyframes, for low latency video, these are generally spread
> across the packets carrying the other frames.
>
> [EDT-5GLL] Ericsson and DT demo 5G low latency feature:
> https://www.ericsson.com/en/news/2021/10/dt-and-ericsson-successfully-test-new-5g-low-latency-feature-for-time-critical-applications
>
> I detect here that this also isn't a question about the draft - more a
> question of "I need to see it to believe it"?
>
> NITs
>
> Abstract
>
>    The L4S identifier defined in this document distinguishes L4S from
>    'Classic' (e.g. TCP-Reno-friendly) traffic.  It gives an incremental
>    migration path so that suitably modified network bottlenecks can
>    distinguish and isolate existing traffic that still follows the
>    Classic behaviour, to prevent it degrading the low queuing delay and
>    low loss of L4S traffic.  This specification defines the rules that
>
> [BA] Might be clear to say "This allows suitably modified network..."
>
>
> [BB] I'm not sure what the problem is. But I'm assuming you're saying you
> tripped over the word 'gives'. How about simplifying:
>
>              *It gives an incremental
>    migration path so that suitably modified  Then,* network bottlenecks can *be incrementally modified to*
>    distinguish and isolate existing traffic that still follows the
>    Classic behaviour, to prevent it degrading the low queuing delay and
>    low loss of L4S traffic.
>
>
> The words "incremental migration path" suggest that there deployment of
> L4S-capable network devices and endpoints provides incremental benefit.
> That is, once new network devices are put in place (e.g. by replacing
> a last-mile router), devices that are upgraded to support L4S will
> see benefits, even if other legacy devices are not ugpraded.
>
> If this is the point you are looking to make, you might want to clarify
> the language.
>
>
> [BB] I hope the above diff helps. Is that enough for an abstract, which
> has to be kept very brief?
> Especially as all the discussion about incremental deployment is in the
> L4S architecture doc, so it wouldn't be appropriate to make deployment a
> big thing in the abstract of this draft.
> Nonetheless, we can flesh out the text where incremental deployment is
> already mentioned in the intro (see our suggested text for your later point
> about this, below).
>
> Summary: We propose only the above diff on these points about "incremental
> migration" in the abstract.
>
>    L4S transports and network elements need to follow with the intention
>    that L4S flows neither harm each other's performance nor that of
>    Classic traffic.  Examples of new active queue management (AQM)
>    marking algorithms and examples of new transports (whether TCP-like
>    or real-time) are specified separately.
>
> [BA] Don't understand "need to follow with the intention". Is this
> stating a design principle, or is does it represent deployment
> guidance?
>
>
> [BB] I think a missing comma is the culprit. Sorry for confusion. It
> should be:
>
>    This specification defines the rules that
>    L4S transports and network elements need to follow, with the intention
>    that L4S flows neither harm each other's performance nor that of
>    Classic traffic.
>
>
> The sentence "L4S flows neither harm each other's performance nor that
> of classic traffic" might be better placed after the first sentence
> in the second paragraph, since it relates in part to the "incremental
> deployment benefit" argument.
>
>
> [BB] That wouldn't be appropriate, because:
> * To prevent "Classic harms L4S" an L4S AQM needs the L4S identifier on
> packets to isolate them
> * To prevent "L4S harms Classic" needs the L4S sender to detect that it's
> causing harm which is sender behaviour (rules), not identifier-based.
> So the sentence has to come after the point about "the spec defines the
> rules".
>
> Summary: we propose no action on this point.
>
> Section 1. Introduction
>
>    This specification defines the protocol to be used for a new network
>    service called low latency, low loss and scalable throughput (L4S).
>    L4S uses an Explicit Congestion Notification (ECN) scheme at the IP
>    layer with the same set of codepoint transitions as the original (or
>    'Classic') Explicit Congestion Notification (ECN [RFC3168]).
>    RFC 3168 required an ECN mark to be equivalent to a drop, both when
>    applied in the network and when responded to by a transport.  Unlike
>    Classic ECN marking, the network applies L4S marking more immediately
>    and more aggressively than drop, and the transport response to each
>
>    [BA] Not sure what "aggressively" means here. In general, marking
>    traffic seems like a less aggressive action than dropping it. Do
>    you mean "more frequently"?
>
>
> [BB] OK; 'frequently' it is.
>
> (FWIW, I recall that the transport response used to be described as more
> aggressive (because it reduces less in response to each mark), and the idea
> was that using aggressive for both would segue nicely into the next
> sentence about the two counterbalancing. Someone asked for that to be
> changed, and now the last vestiges of that failed literary device are cast
> onto the cutting room floor. The moral of this tale: never try to write a
> literary masterpiece by committee ;)
>
>    Also, it's a bit of a run-on sentence, so I'd break it up:
>
>    "than drop.  The transport response to each"
>
>    mark is reduced and smoothed relative to that for drop.  The two
>    changes counterbalance each other so that the throughput of an L4S
>    flow will be roughly the same as a comparable non-L4S flow under the
>    same conditions.
>
>
> [BB] Not sure about this - by the next sentence (about the two changes),
> the reader has lost track of them. How about using numbering to structure
> the long sentence:
>
>    Unlike
>    Classic ECN marking: i) the network applies L4S marking more immediately
>    and more aggressively than drop; and ii) the transport response to each
>    mark is reduced and smoothed relative to that for drop. The two
>    changes counterbalance each other...
>
> OK?
>
> Nonetheless, the much more frequent ECN control
>    signals and the finer responses to these signals result in very low
>    queuing delay without compromising link utilization, and this low
>    delay can be maintained during high load.  For instance, queuing
>    delay under heavy and highly varying load with the example DCTCP/
>    DualQ solution cited below on a DSL or Ethernet link is sub-
>    millisecond on average and roughly 1 to 2 milliseconds at the 99th
>    percentile without losing link utilization [DualPI2Linux], [DCttH19].
>
>    [BA] I'd delete "cited below" since you provide the citation at
>    the end of the sentence.
>
>
> [BB] 'Cited below' referred to the DCTCP and DualQ citations in the
> subsequent para, because this is the first time either term has been
> mentioned.
>     '*Described* below'
> was what was really meant. I think that makes it clear enough (?).
>
>    Note that the inherent queuing delay while waiting to acquire a
>    discontinuous medium such as WiFi has to be minimized in its own
>    right, so it would be additional to the above (see section 6.3 of the
>    L4S architecture [I-D.ietf-tsvwg-l4s-arch]).
>
>    [BA] Not sure what "discontinuous medium" means. Do you mean
>    wireless?  Also "WiFi" is a colloquialism; the actual standard
>    is IEEE 802.11 (WiFi Alliance is an industry organization).
>    Might reword this as follows:
>
>    "Note that the changes proposed here do not lessen delays from
>     accessing the medium (such as is experienced in [IEEE-802.11]).
>     For discussion, see Section 6.3 of the L4S architecture
>     [I-D.ietf-tsvwg-l4s-arch]."
>
>
> [BB] We've used 'shared' instead. Other examples of shared media are LTE,
> 5G, DOCSIS (cable), DVB (satellite), PON (passive optical network). So I've
> just said 'wireless' rather than give a gratuitous citation of 802.11.
>
>    Note that the inherent queuing delay while waiting to acquire a
>    discontinuous
>    *shared* medium such as WiFi *wireless* has to be minimized in its own
>    right, so it would be additional *added* to the above *above.  It is
>    a different issue that needs to be addressed, but separately* (see
>    section 6.3 of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]).
>
>
> Then, because wireless is less specific, I've taken out 'inherent' because
> strictly medium acquisition delay is not inherent to a medium - it depends
> on the multiplexing scheme. For instance radio networks can use CDM (code
> division multiplexing), and they did in 3G.
> 'Inherent' was trying to get over the sense that this delay is not
> amenable to reduction by congestion control. Rather than try to cram all
> those concepts into one sentence, I've split it.
>
> OK?
>
>    L4S is not only for elastic (TCP-like) traffic - there are scalable
>    congestion controls for real-time media, such as the L4S variant of
>    the SCReAM [RFC8298] real-time media congestion avoidance technique
>    (RMCAT).  The factor that distinguishes L4S from Classic traffic is
>
>    [BA] Is there a document that defines the L4S variant of SCReAM?
>
>
> [BB] I've retagged Ingemar's readme as [SCReAM-L4S], and included it here
> to match the other two occurrences of SCReAM:
>
>                                            such as the L4S variant
>    *[SCReAM-L4S]* of the SCReAM [RFC8298] real-time media congestion
>    avoidance technique (RMCAT).
>
>
> It sounds like Ingemar plans to update RFC8298 with a bis, so I guess
> eventually [RFC8298] should automatically become a reference to its own
> update.
>
>    its behaviour in response to congestion.  The transport wire
>    protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and
>    therefore not suitable for distinguishing L4S from Classic packets).
>
>    The L4S identifier defined in this document is the key piece that
>    distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic.  It
>    gives an incremental migration path so that suitably modified network
>    bottlenecks can distinguish and isolate existing Classic traffic from
>    L4S traffic to prevent the former from degrading the very low delay
>    and loss of the new scalable transports, without harming Classic
>    performance at these bottlenecks.  Initial implementation of the
>    separate parts of the system has been motivated by the performance
>    benefits.
>
> [BA] I think you are making an "incremental benefit" argument here,
> but it might be made more explicit:
>
> "  The L4S identifier defined in this document distinguishes L4S from
>    'Classic' (e.g. Reno-friendly) traffic. This allows suitably
>    modified network bottlenecks to distinguish and isolate existing
>    Classic traffic from L4S traffic, preventing the former from
>    degrading the very low delay and loss of the new scalable
>    transports, without harming Classic performance. As a result,
>    deployment of L4S in network bottlenecks provides incremental
>    benefits to endpoints whose transports support L4S."
>
>
> [BB] We don't really want to lose the point about the identifier being
> key. So I've kept that. And for the middle sentence, I've used the simpler
> construction developed above (for the similar wording in the abstract).
>
> Regarding the last sentence, no, it meant more than that. It meant that,
> even though implementer's customers get no benefit until both parts are
> deployed, for some implementers the 'size of the potential prize' has
> already been great enough to warrant investment in implementing their part,
> without any guarantee that other parts will be implemented. However, we
> need to be careful not to stray into conjecture and predictions,
> particularly not commercial ones, which is why this sentence was written in
> the past tense. Pulling this all together, how about:
>
>    The L4S identifier defined in this document is the key piece that
>    distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic.  It
>    gives an incremental migration path so that suitably modified  *Then,*
>    network bottlenecks can *be incrementally modified to* distinguish and
>    isolate existing Classic traffic from L4S traffic*,* to prevent the
>    former from degrading the very low *queuing* delay and loss of the new
>    scalable transports, without harming Classic performance at these
>    bottlenecks.  *Although both sender and network deployment are
>    required before any benefit, i*nitial implementations of the separate
>    parts of the system *have* been motivated by the *potential* performance
>    benefits.
>
> I considered adding "have *already* been motivated..." or "*at the time
> of writing,* initial implementations..." but decided against both - they
> sounded a bit hyped up.
> What do you think?
>
>
> Section 1.1 1.1.  Latency, Loss and Scaling Problems
>
>    Latency is becoming the critical performance factor for many (most?)
>    applications on the public Internet, e.g. interactive Web, Web
>    services, voice, conversational video, interactive video, interactive
>    remote presence, instant messaging, online gaming, remote desktop,
>    cloud-based applications, and video-assisted remote control of
>    machinery and industrial processes.  In the 'developed' world,
>    further increases in access network bit-rate offer diminishing
>    returns, whereas latency is still a multi-faceted problem.  In the
>    last decade or so, much has been done to reduce propagation time by
>    placing caches or servers closer to users.  However, queuing remains
>    a major intermittent component of latency.
>
> [BA] Since this paragraph provides context for the work, you might
> consider placing it earlier (in Section 1 as well as potentially in
> the Abstract).
>
>
> [BB] The L4S architecture Intro already starts like you suggest.
>     See
> https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-arch-19#section-1
>
> The present doc starts out more as a technical spec might, with a 4-para
> intro focusing on what it says technically. Then it has a fairly long
> subsection to summarize the problem for those reading it stand-alone. That
> is intentional (so readers who have already read the architecture can
> easily jump).
>
> Summary: We propose to leave the opening of the intro unchanged.
>
> Might modify this as follows:
>
> "
>    Latency is the critical performance factor for many Internet
>    applications, including web services, voice, realtime video,
>    remote presence, instant messaging, online gaming, remote
>    desktop, cloud services, and remote control of machinery and
>    industrial processes. In these applications, increases in access
>    network bitrate may offer diminishing returns. As a result,
>    much has been done to reduce delays by placing caches or
>    servers closer to users. However, queuing remains a major
>    contributor to latency."
>
> We've picked up most, but not all, of your suggestions:
>
>    Latency is becoming the critical performance factor for many (most?)
>    applications on the public Internet,
>    *Internet applications,* e.g. interactive Web, Web *web, web* services, voice,
>    conversational video, interactive video, interactive remote presence,
>    instant messaging, online gaming, remote desktop, cloud-based applications,
>    *applications & services,* and video-assisted remote control of machinery and
>    industrial processes.  In *many parts of* the 'developed' world, further increases
>    in access network bit-rate *bit rate* offer diminishing
>    returns *[Dukkipati06],* whereas latency is still a multi-faceted problem.  In the
>    last decade or so,  *As a result,* much
>    has been done to reduce propagation time by placing caches or servers
>    closer to users.  However, queuing remains a major intermittent *major, albeit
>    intermittent,* component of latency.
>
>
> We've added [Dukkipati06], because we were asked to justify the similar
> 'diminishing returns' claim in the L4S architecture, and Dukkipati06
> provides a plot supporting that in its intro:
>
>    *[Dukkipati06]
>               Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is
>               the Right Metric for Congestion Control", ACM CCR
>               36(1):59--62, January 2006,
>               <https://dl.acm.org/doi/10.1145/1111322.1111336> <https://dl.acm.org/doi/10.1145/1111322.1111336>.*
>
>
> The distinctions between different applications of the same technology
> were deliberately intended to distinguish different degrees of latency
> sensitivity, so we left some of them in.
> OK?
>
>    The Diffserv architecture provides Expedited Forwarding [RFC3246], so
>    that low latency traffic can jump the queue of other traffic.  If
>    growth in high-throughput latency-sensitive applications continues,
>    periods with solely latency-sensitive traffic will become
>    increasingly common on links where traffic aggregation is low.  For
>    instance, on the access links dedicated to individual sites (homes,
>    small enterprises or mobile devices).  These links also tend to
>    become the path bottleneck under load.  During these periods, if all
>    the traffic were marked for the same treatment, at these bottlenecks
>    Diffserv would make no difference.  Instead, it becomes imperative to
>    remove the underlying causes of any unnecessary delay.
>
> [BA] This paragraph is hard to follow. You might consider rewriting it as
> follows:
>
>    "The Diffserv architecture provides Expedited Forwarding [RFC3246], to
>    enable low latency traffic to jump the queue of other traffic. However,
>    the latency-sensitive applications are growing in number along
>    with the fraction of latency-sensitive traffic. On bottleneck links where
>    traffic aggregation is low (such as links to homes, small enterprises or
>    mobile devices), if all traffic is marked for the same treatment, Diffserv
>    will not make a difference. Instead, it is necessary to remove unnecessary
>    delay."
>
>
> [BB] Your proposed replacement has the following problems:
> * It relies on prediction (the previous text avoided prediction, instead
> saying "if growth ... continues");
> * The proposed replacement loses the critical sense of "periods with
> solely latency sensitive traffic" (not all the time)
> * it also loses the critical idea that the same links that are low stat
> mux tend to also be those where the bottleneck is.
> How about:
>
>    The Diffserv architecture provides Expedited Forwarding [RFC3246], so
>    that low latency traffic can jump the queue of other traffic.  If
>    growth in high-throughput latency-sensitive applications continues, periods with
>    solely latency-sensitive traffic will become increasingly common on
>    links where traffic aggregation is low.  For
>    instance, on the access links dedicated to individual sites (homes,
>    small enterprises or mobile devices).  These links also tend to
>    become the path bottleneck under load.  During these periods, if all  *During these periods, if all*
>    the traffic were marked for the same treatment, at these bottlenecks Diffserv would make
>    no difference.  Instead,  *The links with low aggregation also tend to become
>    the path bottleneck under load, for instance, the access links
>    dedicated to individual sites (homes, small enterprises or mobile
>    devices).  So, instead of differentiation,* it becomes imperative to
>    remove the underlying causes of any unnecessary delay.
>
>
> I tried to guess what you found hard to follow, but still to keep all the
> concepts. The main changes were:
> *  to switch the sentence order so "periods with solely" and "these
> periods" were not a few sentences apart.
> * to make it clear what 'instead' meant.
> Better?
>
>
>   long enough for the queue to fill the buffer, making every packet in
>    other flows sharing the buffer sit through the queue.
>
>    [BA] "sit through" -> "share"
>
>
> [BB] Nah, that's tautology "other flows sharing the buffer share the
> queue".
> And it loses the sense of waiting. If "sit through" isn't understandable,
> how about
>
>    "...causing every packet in other flows sharing the buffer to have to
>    work its way through the queue."
> ?
>
>    Active queue management (AQM) was originally developed to solve this
>    problem (and others).  Unlike Diffserv, which gives low latency to
>    some traffic at the expense of others, AQM controls latency for _all_
>    traffic in a class.  In general, AQM methods introduce an increasing
>    level of discard from the buffer the longer the queue persists above
>    a shallow threshold.  This gives sufficient signals to capacity-
>    seeking (aka. greedy) flows to keep the buffer empty for its intended
>    purpose: absorbing bursts.  However, RED [RFC2309] and other
>    algorithms from the 1990s were sensitive to their configuration and
>    hard to set correctly.  So, this form of AQM was not widely deployed.
>
>    More recent state-of-the-art AQM methods, e.g. FQ-CoDel [RFC8290],
>    PIE [RFC8033], Adaptive RED [ARED01], are easier to configure,
>    because they define the queuing threshold in time not bytes, so it is
>    invariant for different link rates.  However, no matter how good the
>    AQM, the sawtoothing sending window of a Classic congestion control
>    will either cause queuing delay to vary or cause the link to be
>    underutilized.  Even with a perfectly tuned AQM, the additional
>    queuing delay will be of the same order as the underlying speed-of-
>    light delay across the network, thereby roughly doubling the total
>    round-trip time.
>
> [BA] Would suggest rewriting as follows:
>
> "  More recent state-of-the-art AQM methods such as FQ-CoDel [RFC8290],
>    PIE [RFC8033] and Adaptive RED [ARED01], are easier to configure,
>    because they define the queuing threshold in time not bytes, providing
>    link rate invariance.  However, AQM does not change the "sawtooth"
>    sending behavior of Classic congestion control algorithms, which
>    alternates between varying queuing delay and link underutilization.
>    Even with a perfectly tuned AQM, the additional queuing delay will
>    be of the same order as the underlying speed-of-light delay across
>    the network, thereby roughly doubling the total round-trip time."
>
>
> [BB] We've taken most of these suggestions, but link rate invariance is
> rather a mouthful.
> Also more queue delay or more under-utilization wasn't meant to imply
> alternating between the two.
> So how about:
>
>    More recent state-of-the-art AQM methods, e.g. *such as* FQ-CoDel [RFC8290],
>    PIE [RFC8033] *or* Adaptive RED [ARED01], are easier to configure,
>    because they define the queuing threshold in time not bytes, so it
>    *configuration* is invariant for different *whatever the* link rates. *rate.*  However, no matter how good the
>    AQM, the
>    sawtoothing sending window of a Classic congestion control *creates a dilemma
>    for the operator: i)* either *configure a shallow AQM operating point,
>    so the tips of the sawteeth* cause *minimal queue* delay *but the troughs
>    underutilize the link,* or *ii) configure the operating point deeper
>    into the buffer, so the troughs utilize* the link *better but then the
>    tips cause more delay variation.*  Even...
>
> OK?
>
>
>    If a sender's own behaviour is introducing queuing delay variation,
>    no AQM in the network can 'un-vary' the delay without significantly
>    compromising link utilization.  Even flow-queuing (e.g. [RFC8290]),
>    which isolates one flow from another, cannot isolate a flow from the
>    delay variations it inflicts on itself.  Therefore those applications
>    that need to seek out high bandwidth but also need low latency will
>    have to migrate to scalable congestion control.
>
> [BA] I'd suggest you delete the last sentence, since the point is
> elaborated on in more detail in the next paragraph.
>
>
> [BB] Actually, this point is not made in the next para (but you might have
> thought it was because it's not clear, so below I've tried to fix it).
> Indeed, I've realized we need to /add/ to the last sentence, because we
> haven't yet said what a scalable control is...
>
>        ...migrate to scalable congestion control*, which uses much smaller
>    sawtooth variations*.
>
>
>    Altering host behaviour is not enough on its own though.  Even if
>    hosts adopt low latency behaviour (scalable congestion controls),
>    they need to be isolated from the behaviour of existing Classic
>    congestion controls that induce large queue variations.  L4S enables
>    that migration by providing latency isolation in the network and
>
> [BA] "enables that migration" -> "motivates incremental deployment"
>
>    distinguishing the two types of packets that need to be isolated: L4S
>    and Classic.  L4S isolation can be achieved with a queue per flow
>    (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is
>    sufficient, and actually gives better tail latency.  Both approaches
>    are addressed in this document.
>
>
> [BB] The intended meaning here is 'enables' (technical feasibility), not
> motivates (human inclination).
> But whatever, in the rewording below, I don't think either is needed. I'm
> also assuming that middle sentence didn't make sense for you, and I think I
> see why. So how about:
>
>    Altering host behaviour is not enough on its own though.  Even if
>    hosts adopt low latency behaviour (scalable congestion controls), they need to be
>    isolated from the behaviour of *large queue variations induced by* existing Classic
>    congestion controls that induce large queue variations.  L4S enables that migration
>    by providing *L4S AQMs provide that* latency isolation in the network and
>    distinguishing *the L4S identifier enables the AQMs to distinguish* the
>    two types of packets that need to be isolated: L4S and Classic.
>
>
> How's that?
>
>    The DualQ solution was developed to make very low latency available
>    without requiring per-flow queues at every bottleneck.  This was
>
> [BA] "This was" -> "This was needed"
>
>
> [BB] Not quite that strong. More like:
>     "This was useful"
>
>    Latency is not the only concern addressed by L4S: It was known when
>
>    [BA] ":" -> "."
>
>
> [BB] OK.
>
>    explanation is summarised without the maths in Section 4 of the L4S
>
>    [BA] "summarised without the maths" -> "summarized without the mathematics"
>
>
> [BB] OK - that nicely side-steps stumbles from either side of the Atlantic.
>
> 1.2.  Terminology
>
> [BA] Since Section 1.1 refers to some of the Terminology defined in
> this section, I'd consider placing this section before that one.
>
>
> [BB] See earlier for push-back on this.
>
>    Reno-friendly:  The subset of Classic traffic that is friendly to the
>       standard Reno congestion control defined for TCP in [RFC5681].
>       The TFRC spec. [RFC5348] indirectly implies that 'friendly' is
>
>       [BA] "spec." -> "specification"
>
>
> [BB] I checked this after a previous review comment, and 'spec' is now
> considered to be a word in its own right. I should have removed the
> full-stop though, which I did for all other occurrences.
> However, the RFC Editor might have a style preference on this point, in
> which case I will acquiesce.
>
>
>       defined as "generally within a factor of two of the sending rate
>       of a TCP flow under the same conditions".  Reno-friendly is used
>       here in place of 'TCP-friendly', given the latter has become
>       imprecise, because the TCP protocol is now used with so many
>       different congestion control behaviours, and Reno is used in non-
>
>       [BA] "Reno is used" -> "Reno can be used"
>
>
> [BB] OK
>
> 4.  Transport Layer Behaviour (the 'Prague Requirements')
>
> [BA] This section is empty and there are no previous references to Prague. So I
> think you need to say a few words here to introduce the section.
>
>
> [BB] OK. How about:
>
>
>
>
> *   This section defines L4S behaviour at the transport layer, also known
>    as the Prague L4S Requirements (see Appendix A for the origin of the
> name). *
>
> Again, thank you very much for all the time and effort you've put into
> this review.
>
> Regards
>
>
>
> Bob
>
>
>
> --
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>
>