Re: [tsvwg] L4S issue #22: CE Ambiguity and Reordering

Sebastian Moeller <moeller0@gmx.de> Mon, 24 February 2020 14:59 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C3EA43A0CC1 for <tsvwg@ietfa.amsl.com>; Mon, 24 Feb 2020 06:59:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.648
X-Spam-Level:
X-Spam-Status: No, score=-1.648 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mPMxOXp-UlIs for <tsvwg@ietfa.amsl.com>; Mon, 24 Feb 2020 06:59:12 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 30B983A0CBF for <tsvwg@ietf.org>; Mon, 24 Feb 2020 06:59:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1582556303; bh=Cp9x12rZiMVmgPatjJgFfEOKT405IcG9w8Nvft6uGU8=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=hMudmosQ9XknbV2a/xIjJZd8oDGQEI7ZpWoSQFCEbbZ4PBs4eScsbvr1r8koxNXCP nuMP6NKEIKH3+ct/3hgSOo+/lJ25z4DvR7JMNqwTx/E3ymwlP4BBMk4NTHgk+GsiP/ ZpYekfBQmqhQen+5NL5kTvEbwlZoqPIul4zhFohc=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [10.11.12.22] ([134.76.241.253]) by mail.gmx.com (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MOiDd-1img082G4t-00QEAh; Mon, 24 Feb 2020 15:58:23 +0100
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <74bd428d-950d-81c2-2771-611f802bed7f@bobbriscoe.net>
Date: Mon, 24 Feb 2020 15:58:21 +0100
Cc: tsvwg IETF list <tsvwg@ietf.org>, "Holland, Jake" <jholland@akamai.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <906619E6-213E-4433-9E84-396B6316C56C@gmx.de>
References: <MN2PR19MB4045424F1F0FBA9817A4B1E283420@MN2PR19MB4045.namprd19.prod.outlook.com> <74bd428d-950d-81c2-2771-611f802bed7f@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
X-Mailer: Apple Mail (2.3445.104.11)
X-Provags-ID: V03:K1:JTR6LCJRd4EkvGCUCaoDTJ6ElpO1pKS2pn/oWLs2ZHzaJwQXxP3 K1uCEgDH5ydZdJZyCHoA3oryW0o4DQN8UL//qEBKfLPgJmBprO3lI20y5Yk/a7HntqsrFGf 93A/HMOXIKmGG/WtOG302m6svV84oBECRKPIMpntuEoazKNa6PQpVd5iHFBaBa0TaaxsUxN 0tm+SpJBCN4I4LMCGNltw==
X-UI-Out-Filterresults: notjunk:1;V03:K0:mCBFnMHbXnY=:vecDQ8pkfbPm+gNgjRZl1a Tae67weJZKj0Abl5Vcze1CnoVTE9MXH69yDO/C2KEWHrDtRL1cGxa61qvSSLuZArxU37M+/V4 1uqKnIlIbcfH6IHVEEgtjSYMk4rQiM41P8S2TroRJIjsJSHWD4PtJK5McDB4UG+Z7RVIkHSS4 L22zmOoYkZyx3WyMyLZt18MjyOeH/3l6oiwLcz2iQH0/3BRFDLT7G6SF/ZvO5m3Gzp0VqBtMf 7PNlxNq5hwPZtMzrodG1VHbFUKZBPN77iXVEi76g581pbqVw8T6o4CAilTckbUaq3wiLITUO/ ++E2wmc6DD18ZOMIsKeIQquKeVRE+XdXkdGIw6/qDkyHnH0jccYOhFjJzX45ee8Tfh0W8+629 oVZ9g5M4pKAXue0b927bhTa7h/Yn+jwy0pkDI4aQyHUXt34bM8HppZsh8zfsqHfVTXJrxyV9s wUQeHAm25O7WpqNv/LeETv12b6oElS+DFaL8f8jT4ZKGPqmOWltHngKta5sa20lbONH7O+96i yVB1Av7435qVuvIY7D8rnm336vNaURG22ryiNC92GwAlRAWiDBDVpAcOmJ9gBftjxjkuLHtU3 E4wmveNCORrEiNZ1zGQgjRZnUNxYZvNgp7PfxDM6dOwMiqTSnvrH99yNSLpIYCRmZjPTlIk0A dCxOnB8AigysqGaucn4lun/IV0Imujo5e1frxlrVfZgJc0w1VwkffZsr3LuQt1FCtivHoNB/5 O76CmjCF8RvulQnQ8eMOssYyNrXpK82JrowiDVpQo0oKl7ymOlS6KZYnQEtRjbDSBqUQAJqAs ly9C44OdOkp8kUN1q66ZlLJhte017ncp7PLCdf9KXtP103fgLlyMazkDXZaGdWFOBvSqc/IHJ vbE+WDQqarME17iDMRkAO15oj7qsyT8mJ/bHBdyfbsUiJQthlwYjAvJcKljqUyuLlmq8pQWlI hISOT6LA5tgvC1CYjKYT12Q0N7NaA+WoPqDNSxadywdRKfUbO9rxd7DTbhY1W6RJq+UgITNyY 3nNBUeSbceJUWVVI5O0kiQCSYQUbsQBW4/wHtlecYyD5NUzWrKxPI/8UmabtM4PvZN/sYtk7E bKFxf29Dk2wk1yOT9DaV8vuzxd0xHwg1fJA+tdydb/Oa9raGm0WvlQLiBfSqyyY5cADQgcu1n Zw4Z1a0Kned4DrWL8oOzUdO5vOHAU1awKrGvW1dqAKyxkskPHCi8xlR+Ev9UJ0fjKctakpaMY +S2nN4/jucAYJ9iK/
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/LBWzDbCbzeP1pyCz2fZ_2ZIDzEs>
Subject: Re: [tsvwg] L4S issue #22: CE Ambiguity and Reordering
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Feb 2020 14:59:15 -0000

Dear Bob,

more below in-line.




> On Feb 21, 2020, at 09:51, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Jake, Sebastian,
> 
> In yesterday's tsvwg L4S interim, issue #22 was discussed.
> RTT dependence has been moved to issue #28.

	[SM] I am not sure that #28 is a reasonable place to discuss RTT dependence. #28 it is about dualQ's brokenness in the light of the required degree of isolation. Proposing to wedge the ~15ms under-explained delay (of dualQ's non-LL queue) into TCP Prague's congestion response calculations temporarily might be viable hack to confirm the hypothesis that the unequal between class sharing is a consequence of the coupling method. But nobody really doubted that and the real solution to the problem is IMNHO to put the isolating AQM on solid grounds and not make it depend on end-point compliance and on all L4S-aware protocols adding the same ~15ms hack to their congestion response function calculation. 
	Otherwise please start by showing theoretically and empirically why the 15 ms are a magic value that will always be correct/ideal/optimal/? I have asked about the justification for the non-LL queue's reference delay value of 15ms now multiple times and have not gotten a satisfactory response from Team L4S at all. I would be delighted if you could just refer me to a paper/white paper giving rationale and data supporting 15ms as being superior to say 5ms.


> Classic ECN FIFO AQM concerns are in issue #16.
> That leaves issue #22 solely for the remaining issue about CE ambiguity mentioned by Jake under #22: reordering.
> 
> There was a stand-off in the discussion yesterday as to who is now more obliged to produce data about this: the complainant about L4S or the defendant of L4S.

	[SM] Yes, I was quite baffled about that. This is not my call obviously, but I want to mention, that you want something of the IETF here, so whether you like it or not, the onus is on you to convince the IETF that your design is safe. I have been arguing for a long time, that real hard data is a much better argument than theoretical musings, but I seem to not have gotten through to you. So let me repeat, the best way to invalidate my objections is to demonstrate with real data, that the objections are unfounded. Is that tedious? you bet, but if you dislike it, just retract your drafts and the issue goes away ;) 

> 
> As a defendant,

	[SM] IMHO that is a problematic mindset, but it explains a lot. By looking at yourself as the defendant, you will with all means at your disposal defend your existing proposal vigorously, but isn't the goal here is to see whether RFC drafts can be improved by considering and incorporating ideas brought up in the discussion?

> let me explain why the WG needs the complainant to give more data or more argumentation. In summary, this judgement call was made when the WG considered it back in the 2017 timeframe. If you want to resurrect the question, we need new data or new argumentation. 

	[SM] That argument seems backward, as by this rationale the IETF would be bound to all precedent no matter how bad or what was learned in the mean time. For a legal entity that might be a viable position, but for engineers? But humor me, and point to a detailed minutes/video showing that this point has truly been discussed and consensus was reached, preferably by a non-generic hum.

> 
> 
> I have pointed to the justification for this not being an issue in the ecn-l4s-id draft. It is pasted below, and has been broadly unchanged since the very first draft (during the process of choosing an identifier, we obviously had this concern ourselves - until we took advice from the tcpm WG). Jake's posting in the issue tracker says this argument relies on RACK deployment. You will see that the argument does not rely on RACK deployment, it is merely even less likely to be an issue as RACK is deployed.
> 
> No-one can produce data for the lack of prevalence of a rare event, at least not without enormous cost. So, before we can proceed, if you think it might not be rare, we need to know:

	[SM] That is not the data that was requested though. The data requested was to quantify the level of re-ordering that the systemic routing of all CE-marked packets into the LL-queue produces for flows in the non-LL queue. And that seems not to involve enormous costs, given the L4S test-bed you already implemented. Let me put it that way to assess risk you ideally both asses the likelihood of an event as well as the cost, and I am asking for a cost assessment, while you argue that assessing the likelihood is costly and hence you reject the idea of assessing the "cost", what do misunderstand here?

This constant haggling about not having to demonstrate that your claims actually hold water in reality, instead of simply running experiments and arguing based on real data makes me uneasy. Not the least as our discussions about Codel type AQMs demonstrated a rather unexpected lack of understanding of the details on your side. So arguing about real data seems a much better way forward.


> 	• which of that list of points you dispute, with some evidence or argumentation for why it is wrong.
> 	• Then the WG can discuss whether that threatens the whole argument or whether you're just clutching at straws (i.e. does the risk of an event with minor impact

	[SM] Except you have not demonstrated the "minor impact" assertion...

> become worrying when the probability of it happening is the combination of 2 middling probabilities and 3 small probabilities instead of 5 small ones?).

	[SM] Yes, that is going to be a judgement call the ietf needs to make, and it is well possible/probably that the final judgement is going to be to your liking, but that in no way invalidates the request to give real hard data to base the judgement on. To stay in your legal framework, the ietf or the plaintiff  should be allowed meaningful discovery.

> 	• Only then do we need to discuss whether more data is needed to resolve a specific dispute and who needs to produce that data.
> 
> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
>    "Risk of reordering Classic CE packets" in
>    
> Appendix B.1
>  discusses the resulting ambiguity if packets originally
>    marked ECT(0) are marked CE by an upstream AQM before they arrive at
>    a node that classifies CE as L4S.  It argues that the risk of re-
>    ordering is vanishingly small and the consequence of such a low level
>    of re-ordering is minimal.

	[SM] "vanishingly small" indicates that some estimate has been made about the likelihood, but below you claim to have found one paper you also find suspicious. 

> 
> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ 
> 
> If you follow the ref to the appendix, you find the explanation below. 
> 
> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
>    Risk of reordering Classic CE packets:  Classifying all CE packets
>       into the L4S queue risks any CE packets that were originally
>       ECT(0) being incorrectly classified as L4S.  If there were delay
>       in the Classic queue, these incorrectly classified CE packets
>       would arrive early, which is a form of reordering.  Reordering can
>       cause TCP senders (and senders of similar transports) to
>       retransmit spuriously.  However, the risk of spurious
>       retransmissions would be extremely low for the following reasons:
> 
>       1.  It is quite unusual to experience queuing at more than one
>           bottleneck on the same path (the available capacities have to
>           be identical).

	[SM] While I think it is obvious that this is a simplification, let me stipulate this, but also ask you to translate "unusual" into numerical probability, please.

> 
>       2.  In only a subset of these unusual cases would the first
>           bottleneck support Classic ECN marking while the second
>           supported L4S ECN marking, which would be the only scenario
>           where some ECT(0) packets could be CE marked by an AQM
>           supporting Classic ECN then the remainder experienced further
>           delay through the Classic side of a subsequent L4S DualQ AQM.

	[SM] Right now, the ratio of non-L4S AQMs to L4S AQM on the internet is going to be massively biased against L4S, so much that initially the assumption needs to be that the subset in 2) is close to the set in 1).


> 
>       3.  Even then, when a few packets are delivered early, it takes
>           very unusual conditions to cause a spurious retransmission, in
>           contrast to when some packets are delivered late.

	[SM] I believe we all agree that a few early reorderings  is less severe than a few late. But since the I see no proposal to make L4S cause the second kind of re-ordering I would ask that we drop discussion that as it is a tangent. My request was/is that you run a few reasonably diverse tests to see whether you manage to ever create a 3 dupACK scenario. Combine an aggressive rfc3168 AQM with a more lenient L4S AQM and test with a number of different overload scenarios, if you fail to cause 3 dupACKs I would consider my objection eviscerated. Less talk, more data, please.

>  The first
>           bottleneck has to apply CE-marks to at least N contiguous
>           packets and the second bottleneck has to inject an
>           uninterrupted sequence of at least N of these packets between
>           two packets earlier in the stream (where N is the reordering
>           window that the transport protocol allows before it considers
>           a packet is lost).
> 
>              For example consider N=3, and consider the sequence of
>              packets 100, 101, 102, 103,... and imagine that packets
>              150,151,152 from later in the flow are injected as follows:
>              100, 150, 151, 101, 152, 102, 103...  If this were late
>              reordering, even one packet arriving 50 out of sequence
>              would trigger a spurious retransmission, but there is no
>              spurious retransmission here, with early reordering,
>              because packet 101 moves the cumulative ACK counter forward
>              before 3 packets have arrived out of order.  Later, when
>              packets 148, 149, 153... arrive, even though there is a
>              3-packet hole, there will be no problem, because the
>              packets to fill the hole are already in the receive buffer.
> 
>       4.  Even with the current TCP recommendation of N=3 [
> RFC5681
> ]
>           spurious retransmissions will be unlikely for all the above
>           reasons.  As RACK [
> I-D.ietf-tcpm-rack
> ] is becoming widely
>           deployed, it tends to adapt its reordering window to a larger
>           value of N, which will make the chance of a contiguous
>           sequence of N early arrivals vanishingly small.
> 
>       5.  Even a run of 2 CE marks within a Classic ECN flow is
>           unlikely, given FQ-CoDel is the only known widely deployed AQM
>           that supports Classic ECN marking and it takes great care to
>           separate out flows and to space any markings evenly along each
>           flow.
> 
>       It is extremely unlikely that the above set of 5 eventualities
>       that are each unusual in themselves would all happen
>       simultaneously.  But, even if they did, the consequences would
>       hardly be dire: the odd spurious fast retransmission.  Admittedly
>       TCP (and similar transports) reduce their congestion window when
>       they deem there has been a loss, but even this can be recovered
>       once the sender detects that the retransmission was spurious.
> 
> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ 
> 
> Before you ask about bullet 1 (multibottleneck path), I described it as "quite unusual" because the measurement approach used in the only study I could find to support this assertion was rather suspect.

	[SM] So you base your probability assessment on a single study which you do not even believe? Did you try to sanity check /reproduce those results with your own set-up? Because that is a rather strong argument for my request for real data. Or did I misunderstand your argument here?


> The whole argument only hangs on the likelihood of experiencing multiple bottlenecks on your path being "quite unusual", which is supported by:
> 	• most paths are from data centres to clients where only the client access link is the bottleneck, 

	[SM] If that is the only use-case for L4S this should be added to the draft text please. Otherwise I consider this to be irrelevant to this discussion about making L4S safe for the rest of the internet.

> 	• but still many paths do have 2 access bottlenecks, ...
> 	• there would be a chance of having the same available bandwidth in both access links in two cases:
> 		• the traffic is alone in both links and the capacities of both are the same
> 		• there is other traffic in either link that temporarily makes the available bandwidths the same

	[SM] I claim that is not how all of the internet works. With my former ISP during peak hours traffic from direct peering content providers usually was fast, while traffic coming over "transit" links was often slower and had encountered congestion (as seen in increased delay and packet loss). So here I routinely saw flows that had experienced congestion upstream and where mixed with flows that where experiencing congestion through my own link's AQM. Now, I have no indication/data that the "slow" packets actually carried CE marks, but your argument is about prevalence of the dual bottleneck scenario and there I think you are simply overly optimistic.
	So there exist real scenarios in which an access link will be both the bottleneck and will also see flows that already experienced congestion, and there seems no fixed relation between such a congested flow's rate and the rate of other flows. That reality does not seem to match your assumptions. So what shall we do, adjust reality?


> I think it's fair to say this latter case will be transient, so I'll only proceed with the question of whether the link capacities might be the same...
> 	• taking residential or mobile access technologies first, the bandwidths of some tend to be sold in round numbers, however the bandwidth of most access technologies is asymmetric, so the upstream round number of one would have to match the downstream round number of the other
> 	• large multi-user access technologies (e.g. DC network access, campus network access, corporate network access) will nearly always be carrying other traffic, so they fall into the transient case
> Taking all this, I think it is reasonable to say multiple bottlenecks will happen, but they will be 'quite unusual'. Even if this is not the case, the other 4 bullets lead to a very small probability once all is considered. And the impact if the event does happen is itself minor anyway.

	[SM] I find it puzzling that you both argue the impact of re-ordering to be minor and at the same time reject any proposal that you empirically test the impacts potential realistic/worst-case magnitude. If risk = impact * probability, and assessing probability is hard and/or costly, assessing (worst-case) impact could well be cheap and easy and it will still give you a handle on the risk.

> 
> Note also that bullet 2 makes this solely a transition phenomenon. It only becomes a permanent phenomenon if the L4S experiment half succeeds so that Classic ECN and L4S ECN permanently co-exist.

	[SM] If L4S reaches experimental status and will see some roll-out into the real world it will be really hard to put the genie gack into the bottle even with strict contingency plans in place for experimental failure and revocation, and so far I have seen very little in that regards (except from CableLabs for the  docsis specs).

Best Regards
	Sebastian


> 
> Regards
> 
> 
> Bob
> 
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               
> http://bobbriscoe.net/