Re: [tsvwg] Reasons for WGLC/RFC asap

Sebastian Moeller <moeller0@gmx.de> Sat, 21 November 2020 08:26 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <36CD217D-1E40-4286-A202-5D1027792160@cablelabs.com>
Date: Sat, 21 Nov 2020 09:25:41 +0100
Cc: Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org>, tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <F38D3B68-FEDA-4A23-B8E0-263FC51C2E1B@gmx.de>
References: <AM8PR07MB747626CB7622CB89209018A8B9E10@AM8PR07MB7476.eurprd07.prod.outlook.com> <5dff4f73463c2a7e7cc8dc8255ae9825e78f4c11.camel@petri-meat.com> <4FF8800F-B618-4818-AF5E-1E997EA9FBF3@eggert.org> <HE1PR0701MB2876C4FDA655284D4E563042C2E00@HE1PR0701MB2876.eurprd07.prod.outlook.com> <3EAC47AC-5937-4DB2-8B3C-D8C8A4459FBA@eggert.org> <HE1PR0701MB28761E448ACA1DAE0B2DCF56C2E00@HE1PR0701MB2876.eurprd07.prod.outlook.com> <F73E8DBB-0887-462C-ACC6-FA9212E50DF7@eggert.org> <HE1PR0701MB2876C5AE506AFCCCD99C1F54C2E00@HE1PR0701MB2876.eurprd07.prod.outlook.com> <BDCF6C59-28FF-4923-99AA-9D10EC9C8AC3@gmx.de> <36CD217D-1E40-4286-A202-5D1027792160@cablelabs.com>
To: Greg White <g.white@CableLabs.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/OgFfBrcj9OTIbvMX0Q0qEX85Pgw>
Subject: Re: [tsvwg] Reasons for WGLC/RFC asap
Precedence: list

Hi Greg,

more below, prefixed [SM].


> On Nov 20, 2020, at 23:14, Greg White <g.white@CableLabs.com> wrote:
> 
>   [SM] Ingemar, I and other OpenWrt users have an rfc3168 AQM on my internet access link, that by default uses ECN for the downlink. I do not have the view of the world as apple has, but I can guarantee the number is grater than zero. And as stated before people doing this today are exactly the latency conscious users L4S should aim at winning over... 
>  
> This is great news!  

	[SM] I appreciate your positive tone, but IMHO tone isn't to only or even the biggest issue L4S faces. The fact that team L4S continuously ignores/talks down my inputs, even though I have relevant in the field experience? Not sure I would call that "great". 



> Those users will undoubtedly be asking for (or implementing and submitting PRs to enable) L4S support in OpenWrt (it’s a simple software change after all, no kernel updates or new hardware needed).  

	[SM] What are you talking about, here? Qdiscs in Linux come as part of the kernel, so a kernel update would be very much needed tp\o get an L4S compatible AQM out to the users. But first such an AQM would need to be included in the upstream kernel, OpenWrt tries hard  to stay close to the kernel and carry as little non-"mainline-able code as possible.


> They will be some of the first to see the benefits of it!  

	[SM] Let's not get carries away here, please. The question in the WG still is can L4S actually demonstrate its promised performance (without introducing additional regressions) over a number of realistic conditions. And the answer to that challenge so far is a "meh" at best. 


> I would hope, for the sake of the Internet and the IETF, that people like you will do what you can to ensure that the L4S experiment has the best chance of success.

	[SM] Well, to get on board, L4S first needs to convince me. Interactions with Koen and Bob leave me under the impression that L4S is not exceptionally well engineered. Both offer quite shall we say "interesting" theories about how RTT-dependence happens, or how PIE and CoDel work that are so far removed from reality that I have a hard time accepting any of their assessments at face value without first looking at supporting data. And that gets to the next problem, many of my requests for data have been responded to with "go measure yourself if you want to see that data". I have very little good-will left, abd I have a considerably less rosy view on current L4S chance to succeed in the real internet than you. And with the current state of affairs I am not going to hang my name and the little reputation I have on the L4S train, sorry.


>  
> I don’t think I’ve heard you say that you are opposed to high-fidelity congestion control, or that you believe that it is impossible or inherently dangerous.

	[SM] I consider HFCC an interesting idea in which I see promise. I now want to see hard-boiled evaluation and testing to see how well this idea stands up to the harsh reality of the existing internet, yes. And sure I HOPE it will prove itself, but let's get the data first and let's look at the eventual results accepting the possibility that HFCC might not actually work well in reality.


>  I seem to remember that you were (are?) of the belief that it would be a good thing in the context of the SCE codepoint definition, you just aren’t happy with the L4S codepoint decision.  

	[SM] Yes, I still consider the re-definition of CE as a completely gratuitous change that causes dangerous safety issues with existing users. And I do not like the  way team L4S treats the resulting safety issues. Merely trying to argue issues away solves nothing. Actively resisting calls to generate relevant data does not fill me with confidence. Behaving as if year old data measured with DCTCP would extrapolate to TCP Prague without even a shred of data showing that; in spite of the fact that TCP Prague introduced significant changes to its DCTCP ancestor, and hence is expected to behave differently. Not sure I would call any of the great or even sufficient diligent engineering.


> Actually, I don’t remember hearing anyone claim that high-fidelity congestion signaling makes congestion control worse, or is inherently dangerous.

	[SM] This is pretty much the reason IMHO, why the L4S drafts are still under consideration, but a one point L4S will need to deliver on its promises. Unfortunately it is not there yet (and after 7? years of really slow progress, I am not convinced that it ever will get there).

>  
> So, I think the discussion now centers on the following points?
>  
> 	• TCP Prague is unfinished, and some would like to see it finished before WGLC on the L4S drafts.

	[SM] Not to put too fine a point, if the project is to introduce HFCC to the internet, that HFCC should be one on the core products of the L4S project, but it is treated as a side show and preferable somebody else's problem. Under a conservative assumption that signaling mechanism and reaction to the signaling are not completely orthogonal issues, it seems obvious that they should be developed and evaluated in lock-step. But that is almost the opposite of the L4S approach. I happen to predict that as a consequence of that approach, the network bits are far from optimal or even good enough, as their interaction with the HFCC in the endpoints is understudied.


> 		• Current TCP Prague performance does not show the benefits of high-fidelity CC in a number of cases.  

	[SM] +1; unfortunately that includes quite a number of realistic and common conditions, like multiple flows as different RTTs, where L4S TCP Prague fall way behind the status quo.

> Many believe that this is not an indication that high-fidelity CC is impossible, but others still want to be convinced.

	[SM] Not sure anybody claimed from L4S/TCP Pragues failures that HFCC is impossible, but rather that the L4S approach is simply lackluster and not good enough.

> 	• The DualQ AQM does not provide as precise per-flow fairness as FQ (or possibly VDQ-CSAQM) with the current TCP Prague

	[SM] That is too mild a description, current DualQ will introduce a robust and reliable RATE advantage for TCP Prague at any RTT combination. That is not "not provide as precise per-flow fairness" that is actual predictable UNFAIRNESS. Let's call a spade a spade here, no need for euphemisms or window dressing, this is an engineering, not a marketing organisation.


> 	• Coexistence in light of single queue RFC3168 bottlenecks is not well understood (their prevalence, the frequency with which they are saturated with mixed-type long-running flows of moderate BDP, etc.) so we need to have safeguards in case it is a real issue

	[SM] +1; unfortunately the currentt approach seems to be tailored to pass a few conditions that were used in the tests demonstrating that issue in the first place and constitute no robust or general solution to the issue. As seen how classification performance tanks when a bit more jitter is introduced than the algorithm designer expected.


> 	• Tunnels carrying mixed traffic can exhibit intra-tunnel flow unfairness in current FQ RFC3168 ECN bottlenecks, and non-RFC6040 compliant tunnels can still exhibit that issue in L4S-aware FQ bottlenecks 

	[SM] Yepp, mostly a consequence of re-defining CE, and quite ironic given the stated rationale for that re-definition was to better deal with tunnels... (Which IMHO is not a good argument, as initial deployment will be mostly at the ISP end-user edge, where tunneling is still possible but certainly not the majority use case and the end-user might be able to adjust the tunnel behavior).

>  
> Also, there seems to be some difference of opinion in the community on the fundamental question of whether it is considered acceptable for a new congestion control to cause some level of harm (in the Ware sense*) to existing CCs,

	[SM] This is not a qualitative, but a quantitative question.

> even if it otherwise provides substantial benefits

	[SM] Yes, a realistic honest ledger showing benefits and costs, without the usual marketing and hype would be a good document to offer the the WG to help in that decision.


> that would be desirable to its users and encourage its adoption (and thus contribute to the decline in deployment of the status quo CCs).

	[SM] This is gated on the open question whether L4S as currentky designed and implemented works well enough to begin with. I see a considerable mismatch between actual delivered performance and promised performance and I would at the very least expect that these are brought in line, hopefully by toning down the overly excited promises in the drafts and by fixing the implementation to actually work under broader conditions and without an L4S over classic bias.


>  In my personal opinion, strict adherence to zero-harm is too constraining, since it doesn’t provide any way to assess the upside potential of a new algorithm.

	[SM] +1; but current L4S' issue is not that it is not given leeway to do a bit more harm, but that ATM its harm on rate in some conditions realy seems bound by TCPs unwillingness to reduce its congestion widows below to segments...

>  
> * https://www.cs.cmu.edu/~rware/assets/pdf/ware-hotnets19.pdf

	[SM] Yes, great link from Wes, but I am sure we will end in weird food-fight immediately, as team L4S will try to trade the harm done on the rates of non-L4S flows with the benefit to the latency under load for the L4S flows. Ignoring the fact that in both cases the harm-benefits are pro L4S and contra non-L4S. 

Best Regards
	Sebastian


>  
> -Greg
>  
>  
> From: tsvwg <tsvwg-bounces@ietf.org> on behalf of Sebastian Moeller <moeller0@gmx.de>
> Date: Friday, November 20, 2020 at 12:52 AM
> To: Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org>
> Cc: tsvwg IETF list <tsvwg@ietf.org>
> Subject: Re: [tsvwg] Reasons for WGLC/RFC asap
>  
> Hi Ingemar,
>  
> a correction below.
>  
>  
> > On Nov 19, 2020, at 15:24, Ingemar Johansson S  wrote:
> > 
> > Hi
> > 
> > Please see inline [IJ]
> > 
> > /Ingemar
> > 
> >> -----Original Message-----
> >> From: Lars Eggert 
> >> Sent: den 19 november 2020 14:10
> >> To: Ingemar Johansson S 
> >> Cc: Steven Blake ; tsvwg IETF list
> > 
> >> Subject: Re: [tsvwg] Reasons for WGLC/RFC asap
> >> 
> >> Hi,
> >> 
> >> On 2020-11-19, at 14:36, Ingemar Johansson S
> >>  wrote:
> >>> The only way is see is that the L4S
> >>> drafts are moved to WGLC, then people will hopefully read the drafts
> >>> and come with requests for clarifications where needed. Until then you
> >>> can only expect more of the same long incomprehensible discussion
> >>> threads until March, when we will repeat the same process again.
> >> 
> >> to clarify: I don't have opinions about the L4S drafts. I haven't read
> > them in a
> >> while, I agree that I should.
> >> 
> >> One point I am trying to make is that since the set of documents we are
> >> discussing seems incomplete, in that it doesn't seem to contain a TCP
> > variant
> >> that intends to delivers benefits over L4S paths w/o regressions.
> >> 
> >> My main point though is that there seem to be questions raised about the
> >> performance and behavior of L4S with various TCP variants. This is not an
> >> issue with the content of the L4S drafts. It's a remaining uncertainty
> > related to
> >> the experimental evaluation and analysis that the L4S mechanisms have seen
> >> so far. Going forward with a LC is not going to bring further clarity
> > here.
> >> 
> >>> [IJ] The only pain point I see now is the RTT bias in cases where long
> >>> RTT Prague flows compete with short RTT ditto. This is being addressed
> >>> by the developers and it is not only an L4S problem. Besides this,
> >>> Prague will be presented at ICCRG tomorrow as I understand it.
> >> 
> >> That is one point. I think interactions with tunnels was another.
> > [IJ] My take is that this is something for L4S ops that I see as a product
> > of the L4S experiment. It is not something that needs to be answered on its
> > own as it is mainly the RFC3168 AQM issue in another shape, we don't know
> > how widely spread this problem is or to how large extent this equipment can
> > be updated, earlier discussions on fq-codel implementations in home gateways
> > was not conclusive I guess but it appears that many of these have automatic
> > upgrade features,
>  
>   [SM] This is simply untrue, most home router will only see an update if the user actively monitor's the provider's or manufacturer's website and most manufacturer's will only create and distribute updates around the time a device is still marketed/sold. There is pretty little home gear out these which gets automatic updates or only notifications of possible updates. I agree that that would be a great situation to be in, but realistically the home router update and security situation is only mildly above the update and security situation of IoT devices. Remember it is the "s" in IoT that stands for security/safety...
>   So, if your stance is, automatic updates are robust, reliable and wide-spread, please post some references supporting that hypothesis.
>  
>  
>  
>  
> > I guess the reason here is to be able to upgrade to combat
> > security threats but this is admittedly speculation.
>  
>   [SM] For someone putting their operational safety in that basket I would have expected more robust and reliable numbers for how many devices might be amenable to automated updates. Like hard numbers supporting the hypothesis that automatic updates are wide spread enough to allow distribution of such functionality updates. Also think about who is supposed to make back-ports of the required Linux kernel changes to the often ancient kernel versions SoC-SDKs are based on? To be honest that level of engineering by wishful thinking frightens me a bit.
>  
>  
> > 
> >> 
> >> I'm actually looking forward to the presentation on TCP Prague - is there
> > a
> >> draft?
> > [IJ] Not that I know of, code is found at https://protection.greathorn.com/services/v2/lookupUrl/e306a141-abad-430f-a4f2-02ad90bdc40e/327/a503b21fc948618a92ec662ee8b3671098594e6d?domain=github.com&path=/L4STeam/  
>  
>   [SM] Is it just me that sees a problem in that the reference protocol implementation just exists in a repository, without even an attempt of writing an internet dratf describing that protocol? There are zero guarantees the tomorrows TCP Prague still is L4S compliant (today's is not, the rfc3168 detection and fall back are disabled by default, see https://protection.greathorn.com/services/v2/lookupUrl/81c6e38a-dede-4789-8d8c-692e8b1cf6f9/327/a503b21fc948618a92ec662ee8b3671098594e6d?domain=github.com&path=/L4STeam/linux/commit/b256daedc7672b2188f19e8b6f71ef5db7afc720).
>  
>  
> >> 
> >>> Besides this there is discussion around all sorts of cases with
> >>> RFC3168 style AQMs, additional discussion before a WGLC will
> >>> definitely not make us more wise.
> >> 
> >> Discussion won't, but experimentation would.
> > [IJ] Yes, but we need to get past the discussion on all possible things that
> > can happen. Recall from the discussion yesterday that Stuart Cheshire and
> > his team has not need evidence of ECN marking AQMs. So I fear that we spend
> > a lot of time discussing problems that are very rare.
>  
>   [SM] Ingemar, I and other OpenWrt users have an rfc3168 AQM on my internet access link, that by default uses ECN for the downlink. I do not have the view of the world as apple has, but I can guarantee the number is grater than zero. And as stated before people doing this today are exactly the latency conscious users L4S should aim at winning over... 
>  
> > 
> >> 
> >>> As regards to investment, already today there is investment in this,
> >>> examples that are disclosed in the open are Broadcom and Nokia. I can
> >>> imagine that there is some expectation that L4S will materialize in
> >>> RFCs
> >> 
> >> Sorry I was unclear. You are right that there is investment by vendors.
> > But I
> >> think the key question if there will be an investment by operators, since
> > they
> >> need to eventually buy L4S kit and deploy it. And that investment will
> > only pay
> >> off if end systems actually have deployed a CC scheme that takes advantage
> > of
> >> L4S. So the ready availability of such a scheme is IMO a key requirement.
> > [IJ] I would say that we already have CC algos that can be used. Surely they
> > do not meet all the requirements today but I don't see why it will not be
> > possible.
>  
>   [SM] This tells more about your confidence in your vision than in what might be achiebale in those CCs. IMHO ithe onus is on team L4S to demonstrate that possibility by demonstrating a CC that ticks all the check marks (does not need to be perfect, but should show significantly better performance in some dimension, while doing at least no harm in others, currently TCP Prague fails on both accounts).
>  
>  
> > It is definitely the case that we will be discussing what typical
> > RTTs are like forever but that is not something that should delay the L4S
> > drafts I think.  
>  
>   [SM] Sorry, this is getting absurd, I point out both Bob completely missing how PIE works (and what it's target variable actually describes ) and an inconsistency between two of the L4S drafts only to be ridiculed here. Ingemar, publishing RFC with inconsistent terms and definition is not a sign of a professional authoring process. Reviews are supposed to help finding such issues, and the response of the authors should not be something like your offense above, but a simple "thanks for pointing out the inconsistency, we are going to fix it, here is our proposed change". The ambiguity between the DualQ and the id drafts I pointed out are real, and "typical RTT" is not a well known term of art that every IETF member or operator is going to understand intuitively.
>   Also, if my worst fear is correct, protocol and AQM "typical RTT"  values need to be selected in lock-step to be effective, and such a requirement needs to be made explicit in an RFC. And if I am wrong that the two uses of the term have different definitions that needs to be disambiguated in the internet drafts, no?
>  
>  
> > 
> >> 
> >>> [IJ] If this would only be a IETF matter, then you are right. We
> >>> however try to address this also in 3GPP standards to make the whole
> >>> thing work in products, and that is of course hard to do if L4S is not
> > even an
> >> RFC.
> >> 
> >> Is L4S currently a requirement for a future 3GPP release?
> > [IJ] L4S is not currently pushed for in 3GPP, there is however work on the
> > support for extended reality that requires low latency. L4S will definitely
> > fit there. I would expect that L4S support can be proposed ~ mid next year.
>  
>   [SM] So you are willing to push an under-tested solution into 3GPP, but you insist upon that un-tested solution having IETF blessing? That seems rather odd, I would have expected that the primary requirement should be "works robustly and reliably without (unacceptable) side-effects" and not only "has RFC status". I have a hard time believing 3GPP works like that...
>  
> Best Regards
>   Sebastian
>  
>  
> > 
> >> 
> >> Thanks,
> >> Lars
>

[tsvwg] Reasons for WGLC/RFC asap De Schepper, Koen (Nokia - BE/Antwerp)
Re: [tsvwg] Reasons for WGLC/RFC asap Jonathan Morton
Re: [tsvwg] Reasons for WGLC/RFC asap Sebastian Moeller
Re: [tsvwg] Reasons for WGLC/RFC asap De Schepper, Koen (Nokia - BE/Antwerp)
Re: [tsvwg] Reasons for WGLC/RFC asap Steven Blake
Re: [tsvwg] Reasons for WGLC/RFC asap Sebastian Moeller
Re: [tsvwg] Reasons for WGLC/RFC asap Lars Eggert
Re: [tsvwg] Reasons for WGLC/RFC asap Ingemar Johansson S
Re: [tsvwg] Reasons for WGLC/RFC asap Lars Eggert
Re: [tsvwg] Reasons for WGLC/RFC asap Roland Bless
Re: [tsvwg] Reasons for WGLC/RFC asap Sebastian Moeller
Re: [tsvwg] Reasons for WGLC/RFC asap Ingemar Johansson S
Re: [tsvwg] Reasons for WGLC/RFC asap Lars Eggert
Re: [tsvwg] Reasons for WGLC/RFC asap Ingemar Johansson S
Re: [tsvwg] Reasons for WGLC/RFC asap Scharf, Michael
Re: [tsvwg] Reasons for WGLC/RFC asap Pete Heist
Re: [tsvwg] Reasons for WGLC/RFC asap Gorry Fairhurst
Re: [tsvwg] Reasons for WGLC/RFC asap Jonathan Morton
Re: [tsvwg] Reasons for WGLC/RFC asap Jonathan Morton
Re: [tsvwg] Reasons for WGLC/RFC asap Steven Blake
Re: [tsvwg] Reasons for WGLC/RFC asap Pete Heist
Re: [tsvwg] Reasons for WGLC/RFC asap Greg White
Re: [tsvwg] Reasons for WGLC/RFC asap Jonathan Morton
Re: [tsvwg] Reasons for WGLC/RFC asap Pete Heist
Re: [tsvwg] Reasons for WGLC/RFC asap Sebastian Moeller
Re: [tsvwg] Reasons for WGLC/RFC asap Ingemar Johansson S
Re: [tsvwg] Reasons for WGLC/RFC asap Sebastian Moeller
Re: [tsvwg] Reasons for WGLC/RFC asap Ingemar Johansson S
Re: [tsvwg] Reasons for WGLC/RFC asap Sebastian Moeller
Re: [tsvwg] Reasons for WGLC/RFC asap Ingemar Johansson S
Re: [tsvwg] Reasons for WGLC/RFC asap Greg White
Re: [tsvwg] Reasons for WGLC/RFC asap Sebastian Moeller
Re: [tsvwg] Reasons for WGLC/RFC asap Jonathan Morton