Re: [tsvwg] Adoption call for draft-white-tsvwg-l4sops - to conclude 24th March 2021

"Holland, Jake" <jholland@akamai.com> Mon, 22 March 2021 21:40 UTC

Return-Path: <jholland@akamai.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A21C03A132A for <tsvwg@ietfa.amsl.com>; Mon, 22 Mar 2021 14:40:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.71
X-Spam-Level:
X-Spam-Status: No, score=0.71 tagged_above=-999 required=5 tests=[DKIMWL_WL_HIGH=-0.251, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URI_DOTEDU=1.16] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NTc7aOavWFt6 for <tsvwg@ietfa.amsl.com>; Mon, 22 Mar 2021 14:40:44 -0700 (PDT)
Received: from mx0a-00190b01.pphosted.com (mx0a-00190b01.pphosted.com [IPv6:2620:100:9001:583::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 500A33A1326 for <tsvwg@ietf.org>; Mon, 22 Mar 2021 14:40:44 -0700 (PDT)
Received: from pps.filterd (m0050093.ppops.net [127.0.0.1]) by m0050093.ppops.net-00190b01. (8.16.0.43/8.16.0.43) with SMTP id 12MLciXf020969; Mon, 22 Mar 2021 21:40:33 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=jan2016.eng; bh=q6OWWq66CsMm1FmA92FWjsdEhUq74Z/nU/ab8bo8PRM=; b=CpLyeHXAY+4bfJKNJvQr5sg0F0llr0yCuPPUgB4FL5na9AH+nS0lED3uBcGEPp0GT2H9 CC6rKn947s4BfggZNOAQyVduHDR0o/tjQY1nZDZ4kcA7N9QYE3MeVn8gXAoiupbjHFYt 7Aqjnj0FfBhvJwht9b6SGhB/9I+sjjYnH628pEhtC5dkMGdsuuK3Bvm2cbIA79Mzlflf HL9dy6p0VHedk2BhaXFH7CfQjPMk0SVvW7ucbIk84XjD0h0GNqyup5GyuVBCvxulApo7 aOeQ5SLv6093b2wI1diXVwT3b+SXwc2jQvrDDXQwRB7qrAC/3D7szqzXCDXM6SrFQP2F jQ==
Received: from prod-mail-ppoint5 (prod-mail-ppoint5.akamai.com [184.51.33.60] (may be forged)) by m0050093.ppops.net-00190b01. with ESMTP id 37d96yptjn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Mar 2021 21:40:33 +0000
Received: from pps.filterd (prod-mail-ppoint5.akamai.com [127.0.0.1]) by prod-mail-ppoint5.akamai.com (8.16.0.43/8.16.0.43) with SMTP id 12MLYoZw022754; Mon, 22 Mar 2021 14:40:31 -0700
Received: from email.msg.corp.akamai.com ([172.27.123.31]) by prod-mail-ppoint5.akamai.com with ESMTP id 37df1bvv9a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 22 Mar 2021 14:40:31 -0700
Received: from usma1ex-dag1mb6.msg.corp.akamai.com (172.27.123.65) by usma1ex-dag1mb4.msg.corp.akamai.com (172.27.123.104) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 22 Mar 2021 17:40:31 -0400
Received: from usma1ex-dag1mb6.msg.corp.akamai.com ([172.27.123.65]) by usma1ex-dag1mb6.msg.corp.akamai.com ([172.27.123.65]) with mapi id 15.00.1497.012; Mon, 22 Mar 2021 17:40:31 -0400
From: "Holland, Jake" <jholland@akamai.com>
To: Sebastian Moeller <moeller0@gmx.de>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>
CC: "tsvwg@ietf.org" <tsvwg@ietf.org>
Thread-Topic: [tsvwg] Adoption call for draft-white-tsvwg-l4sops - to conclude 24th March 2021
Thread-Index: AQHXFP4WIsFlwDG1Gk+9W+D9gkjRuKqLxRYAgAEmmICAAQbTAIAAMq2AgABkMYCAAeH2gA==
Date: Mon, 22 Mar 2021 21:40:30 +0000
Message-ID: <DCB5719F-CBB5-4F91-9839-5F468C0158EF@akamai.com>
References: <e9da704b-7705-baf9-a82c-39d4fe4e7ef1@erg.abdn.ac.uk> <CAA93jw5+OOpFEWYXD2xTDw6+mDNx_foqn1JpR7j3v9VxWwY7qw@mail.gmail.com> <8fd946f0-0744-dbbd-d806-0c044674499b@erg.abdn.ac.uk> <FB077630-CA92-4CF0-8B87-826A880459DE@gmx.de> <d72f71a6-f96d-83a5-9570-6a0c18731553@erg.abdn.ac.uk> <37CEF935-EECB-4DDF-BC72-EB3201CD9475@gmx.de>
In-Reply-To: <37CEF935-EECB-4DDF-BC72-EB3201CD9475@gmx.de>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/16.46.21021202
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [172.27.164.43]
Content-Type: text/plain; charset="utf-8"
Content-ID: <86CB684BD3447940A10455BE941C86F9@akamai.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369, 18.0.761 definitions=2021-03-22_11:2021-03-22, 2021-03-22 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 bulkscore=0 spamscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2103220157
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369, 18.0.761 definitions=2021-03-22_11:2021-03-22, 2021-03-22 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 priorityscore=1501 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 impostorscore=0 lowpriorityscore=0 malwarescore=0 spamscore=0 mlxscore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2103220158
X-Agari-Authentication-Results: mx.akamai.com; spf=${SPFResult} (sender IP is 184.51.33.60) smtp.mailfrom=jholland@akamai.com smtp.helo=prod-mail-ppoint5
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/AtHgMjLZHC_lXFeKHn3O6-kTm14>
Subject: Re: [tsvwg] Adoption call for draft-white-tsvwg-l4sops - to conclude 24th March 2021
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Mar 2021 21:40:49 -0000

Hi Sebastian,

I can take a crack at answering some of your questions, speaking only
for myself.

On 3/21/21, 1:54 AM, "Sebastian Moeller" <moeller0@gmx.de> wrote:
> a) The current state of L4S design and implementation: data I cited shows catastrophic failures of L4S's two core components under realistic conditions. Why this is not considered a show-stopper and/or all-hand-on-deck situation for the WG and team L4S, is hard to understand. 
> 	Also, why people in this WG are willing to just ignore that fact instead of engaging in a discussion of how to meaningfully remedy that (so changing the actual L4S core technologies and not just watering down the drafts text, to "legally" permit tose failure modes). So what is the hurry of adding yet another L4S draft, which in an ideal world would probably need major changes once the other L4S drafts are actually fixed?

This has been under discussion for a long time, of course.  Just to
recap the way I remember it (slightly simplified to elide some side
points): the initial idea was floated during the "hallway discussion"
mentioned in the minutes at the tail end of the first tsvwg interim
last year [1] to make an "operational considerations" draft to explain
ways that operators could constrain the deployment of L4S to address
the safety issues, particularly from the seemingly intractable issue
#16 [2] about TCP Prague behavior in a shared classic queue.

The point was raised that there are many networks where it's known that
there are not in fact classic marking AQM queues (e.g. 5G networks when
sending traffic from embedded edge compute nodes or other path-aware
senders to in-network receivers, as well as some e.g. cable or fiber
networks that manage their users' wifi boxes), and that in such networks
for a constrained set of senders, issue #16 is not a hazard, and with the
associated knowledge about the bottlenecks, the latency benefits shown in
the early L4S lab experiments might be realizable.  (Likewise, they can
have pretty good expectations about whether there's dumb fifos at the
bottlenecks, if that's a concern.)

This was a fair point, and done properly could result in L4S+TCP Prague
deployments that do not cause damage to innocent bystanders.

Meanwhile, the L4S proponents raised a few well-founded objections to
SCE with ECT(1) as output, for instance that in a shared queue the
high fidelity congestion-responsive flows will necessarily be subjected
to latency spikes driven by classic flow competition.  This was borne
out in the SCE testing, e.g. in [3]--although it represents a substantial
overall improved latency profile relative to the classic flow without a
high-fidelity signal response, you can see the green spikes for the SCE
latency in the shared queue tests, and how this limits the potential to
stay within in a tight latency budget, even under favorable conditions.

As I understand it, this was the key driver behind the well-founded
points raised by several participants in the 3rd interim last year [4],
when they mentioned issues like the need for a classifier and traffic
segregation in order to achieve better latency than classic AQMs via
use of a high fidelity congestion signal from the network.

There remain some fair questions to ask here, most notably IMO the
same one you've raised, which I'll phrase as "is it appropriate to
assign ECT(1) to this when the only known safe deployments would be
known-path scenarios that could almost equally well use a local DSCP
assignment for classification?"  I think several others have tried to
ask roughly the same question, but it has so far been dismissed by
the authors for the reasons given in Appendix B.4 of l4s-id [5].

I'd expect this question and perhaps some others to see some more
discussion during tsvwg last call if not before then, and also
likely during IETF last call if it achieves rough consensus in
tsvwg, but I'm hoping the discussion will be sharpened somewhat if
we have a mature version of the operational considerations doc to
refer to, which is a big part of why I think it should be adopted.

To me the key conclusion at this stage is that the operational
considerations draft needs to share fate with the rest of the L4S
drafts.  This is true regardless of whether L4S ultimately gets
dropped, gets substantially changed, or achieves consensus and moves
forward as-is, since as it stands today, the operational considerations
is a key part of the safety story for avoiding damage to bystanders.
Given that the other L4S drafts are currently adopted, this means to
me that the operational considerations also needs to be in the same
state, IMO.

In the meantime, absent new data or new unconsidered proposals, there is
nothing new to say, so I don't see much point in going around in circles. 
(I want to note here however an exception for the comments raised by
Jonathan and Pete from re-analyzing the prior results in terms of harm
to bystanders and the consequences of our choices [6], which I think
does meaningfully contribute without going in circles, in spite of not
coming as a result of new observations or new proposed changes to the
protocols.  New methods of analysis with different or clearer insights
also can be helpful IMO, even without a new proposal or data.)

> b) The ops draft itself is also not ready. I gave detailed feed-back (at multiple stages in its development and ) and have (unfortunately) have read it multiple times. How about we first let this last feed-back round play out before making adoption calls?

I'm glad you're contributing feedback, thanks for doing that.  As I
understand it, adopting the draft in the wg just formalizes the work
as in-scope and establishes consensus that it's worth working on as a
group. There's a decent explanation of the procedural considerations in
RFC 7221.

But IMO, given that there are proponents looking to move forward with some
form of deployment of the technology in hopes of getting the benefits
shown in lab, it's important to document the precautions necessary to do
so safely in deployment, and as far as I can tell, this is currently the
intended role of the operational considerations draft.  And in that context,
as RFC 7221 puts it, the pragmatics to consider here are "Initial, not final",
and "Adoption, not approval", so while it's great that feedback is being
generated and hopefully considered carefully, it's not (IMO) helpful to put
off adopting a doc when we have a strong belief that other currently-adopted
docs can't be deployed safely at scale without considering the advice that's
in-scope for that doc.

(Note also that adoption is not a permanent commitment--see section 5.2 of
RFC 7221, which says that the WG can later decide that adopted docs are no
longer useful to keep as working group documents.  Presumably this will
eventually be the result if rough consensus cannot be reached and the
authors don't think it's worth addressing the objections raised to the
proposed final form of the docs, if enough people still hold substantive
objections at that time.)

>	All that said, my prediction is that we as a WG are going to accept the L4S drafts in autumn as is as experimental RFCs (as far as this WG can, I understand that we  will not have the last word, but I expect no headwind in the later process either, as detailed evaluation and judgement seems the prerogative of the WGs, so higher ups will assume due diligence from the lower stages by default). With the fig leaf that the 4S-ops draft would fix up the safety story for the L4S design and implementation. And that is something I do not think to be a correct (and ops draft can not retroactively defang dualQ's mis-design (unless the recommendation were :never use dialQ")).

This has not been my experience with the IESG.  It's reasonably common
that they'll send a draft back to the working group or ask for further
revisions to a document.  There's many examples in which they've raised
detailed and significant objections to wg documents, e.g.:
https://datatracker.ietf.org/doc/draft-ietf-babel-source-specific/ballot/
https://datatracker.ietf.org/doc/draft-ietf-cdni-uri-signing/ballot/

>	The sad part is, all of this is not rocket science. For example requiring a guarding DSCP for the duration of the experiment to ascertain that only knowing and willing parties participate in the experiment would have beed relative easy to specify and would have been something where an OPs draft could shine, by giving the right points how to implement passing of said DSCP from participating to domain (e.g. SLAs, TCAs, ...) Alas, the draft did not, it did add however a few gratuitous paragraphs of what I call L4S-fan-fiction. I guess a possibility squandered.

Since it's been stated several times by several people, the authors are
presumably aware that some number of people hold the opinion that more
discussion on using DSCP as a part of the classifier would be useful,
and if those people remain unconvinced they'll presumably state their
reservations during last call, and it will be considered during the
evaluation of rough consensus.

Again, here I'm hopeful that good progress on the operational
considerations draft will contribute to making that discussion more
productive than it would be otherwise, which is an important part of
why I support adopting it.

Not re-raising the same objections (even unaddressed objections)
when progress on another front happens is different from dropping the
objections.  I think there's reason to believe that there are still
several people with serious concerns about the safety (as they've
previously stated), and who have some reservations about the proposed
classifier solution.

Since there's a relatively confusing and painful email series across
many threads that got a lot of discussion in a lot of different
directions, I'm basically waiting for a set of docs that the authors
present as ready for another round of review (or for last call) to see
if the concerns have been addressed by then, since it's been so hard
to tell exactly what's on the table short of that.

I hope that's a helpful walkthrough of my reasons for supporting
adoption, and that it gives some insight toward answering your
questions.

Best regards,
Jake

[1] Interim 1 of last year:
    https://datatracker.ietf.org/doc/minutes-interim-2020-tsvwg-01-202002200900/
[2] Issue #16: https://trac.ietf.org/trac/tsvwg/ticket/16
[3] There are many great SCE lab measurements, but this one:
    http://sce.dnsmgr.net/results/ect1-2020-04-23-final/sce-s2-twoflow/sce-s2-twoflow-ns-cubic-vs-cubic-sce-twin_codel_af-20ms_tcp_delivery_with_rtt.svg
    is a good example, shown in the 20ms twin-codel-af from Scenario 2 of
    https://github.com/heistp/sce-l4s-ect1, demonstrating ~5+ ms regular
    additional latency spikes driven by the competing classic traffic.
[4] Interim 3 of last year:
    https://datatracker.ietf.org/doc/minutes-interim-2020-tsvwg-03-202004271000/
    Specifically in the section 4 "Discussion of slides" and the several
    people who said things about classifiers and the need for multiple queues.
[5] The documented objections to DSCP, from Appendix B.4 in l4s-id:
    https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14#appendix-B.4
[6] Jonathan and Pete's slides from the end of tsvwg 110 day 1:
    https://datatracker.ietf.org/meeting/110/materials/slides-110-tsvwg-sessa-621-thoughts-on-the-ecn-deployment-observations-00#page=11
    This seems like a follow-up on Ware et.al's paper on harm as a metric,
    presented at the IRTF open meeting during IETF 109
    https://www.cs.cmu.edu/~rware/assets/pdf/ware-hotnets19.pdf